IBM A2 User Manual

Page 1

A2 Processor

User’s Manual

for Blue Gene/Q

Note: This document and the information it contains are provided on an as-is basis. There is no plan for providing for future updates and corrections to this document.

Title Page

October 23, 2012 Version 1.3

Page 2

Copyright and Disclaimer

Printed in the United States of America October 2012

IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml

Intel is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

Other company, product, and service names may be trademarks or service marks of others.

All information contained in this document is subject to change without notice. The products described in this document are NOT intended for use in applications such as implantation, life support, or other hazardous uses where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this document was obtained in specific environments, and is presented as an illustration. The results obtained in other operating environments may vary.

THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be liable for damages arising directly or indirectly from any use of the information contained in this document.

IBM Systems and Technology Group 2070 Route 52, Bldg. 330 Hopewell Junction, NY 12533-6351

The IBM home page can be found at ibm.com®. The IBM semiconductor solutions home page can be found at ibm.com/chips.

Version 1.3 October 23, 2012

Page 3

User’s Manual

A2 Processor

List of Figures ............................................................................................................... 21

List of Tables ................................................................................................................. 23

Revision Log ................................................................................................................. 29

About This Book .......................................................................................................... 31

Who Should Use This Book .................................................................................................................. 31

How to Use This Book ........................................................................................................................... 31

Notation ................................................................................................................................................. 32

Related Publications ............................................................................................................................. 33

List of Acronyms and Abbreviations .......................................................................... 35

1. Overview .................................................................................................................... 45

1.1 A2 Core Key Design Fundamentals ................................................................................................ 45

1.2 A2 Core Features ............................................................................................................................ 46

1.3 The A2 Core as a Power ISA Implementation ................................................................................ 49

1.3.1 Embedded Hypervisor ........................................................................................................... 49

1.4 A2 Core Organization ...................................................................................................................... 49

1.4.1 Instruction Unit ....................................................................................................................... 50

1.4.2 Execution Unit ....................................................................................................................... 51

1.4.3 Instruction and Data Cache Controllers ................................................................................. 51

1.4.3.1 Instruction Cache Controller ........................................................................................... 51

1.4.3.2 Data Cache Controller .................................................................................................... 51

1.4.4 Memory Management Unit (MMU) ........................................................................................ 52

1.4.5 Timers .................................................................................................................................... 54

1.4.6 Debug Facilities ..................................................................................................................... 54

1.4.6.1 Debug Modes ................................................................................................................. 54

1.4.6.2 Development Tool Support ............................................................................................. 55

1.4.7 Floating-Point Unit Organization ............................................................................................ 55

1.4.7.1 Arithmetic and Load/Store Pipelines .............................................................................. 56

1.4.8 IEEE 754 and Architectural Compliance ............................................................................... 56

1.4.8.1 IEEE 754 Compliance .................................................................................................... 57

1.4.9 Floating-Point Unit Implementation ....................................................................................... 57

1.4.9.1 Reciprocal Estimates ...................................................................................................... 57

1.4.9.2 Denormalized B Operands ............................................................................................. 57

1.4.9.3 Non-IEEE mode ............................................................................................................. 57

1.4.10 Floating-Point Unit Interfaces .............................................................................................. 57

1.4.10.1 A2 Processor Core Interface ........................................................................................ 57

1.4.10.2 Clock and Power Management Interface ..................................................................... 58

1.5 Core Interfaces ................................................................................................................................ 58

1.5.1 System Interface .................................................................................................................... 58

1.5.2 Auxiliary Execution Unit (AXU) Port ...................................................................................... 59

1.5.3 JTAG Port .............................................................................................................................. 59

Version 1.3 October 23, 2012

Contents

Page 3 of 864

Page 4

User’s Manual

A2 Processor

2. CPU Programming Model ......................................................................................... 61

2.1 Logical Partitioning .......................................................................................................................... 61

2.1.1 Overview ................................................................................................................................ 61

2.2 Storage Addressing ......................................................................................................................... 62

2.2.1 Storage Operands .................................................................................................................. 62

2.2.2 Effective Address Calculation ................................................................................................ 64

2.2.2.1 Data Storage Addressing Modes .................................................................................... 65

2.2.2.2 Instruction Storage Addressing Modes ........................................................................... 65

2.2.3 Byte Ordering ......................................................................................................................... 66

2.2.3.1 Structure Mapping Examples ......................................................................................... 66

2.2.3.2 Instruction Byte Ordering ................................................................................................ 67

2.2.3.3 Data Byte Ordering ......................................................................................................... 68

2.2.3.4 Byte-Reverse Instructions .............................................................................................. 69

2.3 Multithreading .................................................................................................................................. 70

2.3.1 Thread Identification .............................................................................................................. 70

2.3.1.1 Thread Identification Register (TIR) ............................................................................... 70

2.3.1.2 Processor Identification Register (PIR) .......................................................................... 70

2.3.1.3 Guest Processor Identification Register (GPIR) ............................................................. 71

2.3.2 Thread Run State ................................................................................................................... 71

2.3.2.1 Thread Stop I/O Pin ........................................................................................................ 71

2.3.2.2 Thread Control and Status Register (THRCTL) ............................................................. 71

2.3.2.3 Core Configuration Register 0 (CCR0) ........................................................................... 72

2.3.2.4 Thread Enable Register (TENS, TENC) ......................................................................... 72

2.3.2.5 Thread Enable Status Register (TENSR) ....................................................................... 73

2.3.3 Wake On Interrupt .................................................................................................................. 74

2.3.3.1 Core Configuration Register 1 (CCR1) ........................................................................... 74

2.3.4 Thread Priority ....................................................................................................................... 75

2.3.4.1 Program Priority Register (PPR32) ................................................................................ 75

2.3.4.2 Instruction Unit Configuration Register 1 (IUCR1) .......................................................... 77

2.3.5 Resources Shared between Threads .................................................................................... 77

2.3.6 Shared Resources ................................................................................................................. 77

2.3.6.1 Accessing Shared Resources ........................................................................................ 78

2.3.7 Duplicated Resources ............................................................................................................ 78

2.3.8 Pipeline Sharing ..................................................................................................................... 79

2.3.8.1 Instruction Cache ............................................................................................................ 80

2.3.8.2 Instruction Buffer and Decode Dependency ................................................................... 80

2.3.8.3 Instruction Issue ............................................................................................................. 80

2.3.8.4 Ram Unit ......................................................................................................................... 81

2.3.8.5 Microcode Unit ................................................................................................................ 82

2.3.8.6 Integer Unit ..................................................................................................................... 82

2.4 Registers ......................................................................................................................................... 82

2.4.1 Register Mapping ................................................................................................................... 84

2.4.2 Register Types ....................................................................................................................... 84

2.4.2.1 General Purpose Registers ............................................................................................ 84

2.4.2.2 Special Purpose Registers ............................................................................................. 84

2.4.2.3 Condition Register .......................................................................................................... 85

2.4.2.4 Machine State Register .................................................................................................. 85

2.5 32-Bit Mode ..................................................................................................................................... 85

2.5.1 64-Bit Specific Instructions ..................................................................................................... 85

2.5.2 32-Bit Instruction Selection .................................................................................................... 85

Contents

Page 4 of 864

Version 1.3

October 23, 2012

Page 5

User’s Manual

A2 Processor

2.6 Instruction Categories ..................................................................................................................... 86

2.7 Instruction Classes .......................................................................................................................... 87

2.7.1 Defined Instruction Class ....................................................................................................... 87

2.7.2 Illegal Instruction Class .......................................................................................................... 88

2.7.3 Reserved Instruction Class .................................................................................................... 88

2.8 Implemented Instruction Set Summary ........................................................................................... 88

2.8.1 Integer Instructions ................................................................................................................ 89

2.8.1.1 Integer Storage Access Instructions ............................................................................... 89

2.8.1.2 Integer Arithmetic Instructions ........................................................................................ 91

2.8.1.3 Integer Logical Instructions ............................................................................................ 92

2.8.1.4 Integer Compare Instructions ......................................................................................... 92

2.8.1.5 Integer Trap Instructions ................................................................................................ 92

2.8.1.6 Integer Rotate Instructions ............................................................................................. 92

2.8.1.7 Integer Shift Instructions ................................................................................................. 93

2.8.1.8 Integer Population Count Instructions ............................................................................ 93

2.8.1.9 Integer Select Instruction ................................................................................................ 93

2.8.2 Branch Instructions ................................................................................................................ 94

2.8.3 Processor Control Instructions .............................................................................................. 94

2.8.3.1 Condition Register Logical Instructions .......................................................................... 94

2.8.3.2 Register Management Instructions ................................................................................. 95

2.8.3.3 System Linkage Instructions .......................................................................................... 95

2.8.3.4 Processor Control Instructions ....................................................................................... 95

2.8.4 Storage Control Instructions .................................................................................................. 95

2.8.4.1 Cache Management Instructions .................................................................................... 96

2.8.4.2 TLB Management Instructions ....................................................................................... 96

2.8.4.3 Processor Synchronization Instruction ........................................................................... 97

2.8.4.4 Load and Reserve and Store Conditional Instructions ................................................... 97

2.8.4.5 Storage Synchronization Instructions ............................................................................. 97

2.8.4.6 Wait Instruction ............................................................................................................... 98

2.8.5 Initiate Coprocessor Instructions ........................................................................................... 98

2.8.5.1 Cache Initialization Instructions ...................................................................................... 98

2.9 Branch Processing .......................................................................................................................... 99

2.9.1 Branch Addressing ................................................................................................................ 99

2.9.2 Branch Instruction BI Field .................................................................................................... 99

2.9.3 Branch Instruction BO Field ................................................................................................... 99

2.9.4 Branch Prediction ................................................................................................................ 100

2.9.4.1 Branch Decoder ........................................................................................................... 100

2.9.4.2 Branch Direction Prediction .......................................................................................... 101

2.9.4.3 Branch Prioritization ..................................................................................................... 104

2.9.4.4 Branch Target Prediction .............................................................................................. 104

2.9.4.5 Redirection ................................................................................................................... 105

2.9.5 Branch Control Registers .................................................................................................... 105

2.9.5.1 Link Register (LR) ........................................................................................................ 105

2.9.5.2 Count Register (CTR) ................................................................................................... 106

2.9.5.3 Condition Register (CR) ............................................................................................... 107

2.10 Integer Processing ...................................................................................................................... 110

2.10.1 General Purpose Registers (GPRs) .................................................................................. 110

2.10.2 Integer Exception Register (XER) ..................................................................................... 110

2.10.2.1 Summary Overflow (SO) Field ................................................................................... 112

2.10.2.2 Overflow (OV) Field .................................................................................................... 112

Version 1.3 October 23, 2012

Contents

Page 5 of 864

Page 6

User’s Manual

A2 Processor

2.10.2.3 Carry (CA) Field .......................................................................................................... 112

2.10.2.4 Transfer Byte Count (TBC) Field ................................................................................ 113

2.11 Processor Control ........................................................................................................................ 113

2.11.1 Special Purpose Registers General (SPRG0–SPRG8) ..................................................... 114

2.11.2 External Process ID Load Context (EPLC) Register .......................................................... 119

2.11.3 External Process ID Store Context (EPSC) Register ......................................................... 119

2.12 Privileged Modes ......................................................................................................................... 120

2.12.1 Privileged Instructions ........................................................................................................ 121

2.12.1.1 Cache Locking Instructions ........................................................................................ 121

2.12.2 Privileged SPRs ................................................................................................................. 122

2.13 Speculative Accesses ................................................................................................................. 122

2.14 Synchronization ........................................................................................................................... 122

2.14.1 Context Synchronization .................................................................................................... 122

2.14.2 Execution Synchronization ................................................................................................. 124

2.14.3 Storage Ordering and Synchronization .............................................................................. 124

2.15 Software Transactional Memory Acceleration ............................................................................. 125

2.15.1 Summary ............................................................................................................................ 125

2.15.2 Implementation .................................................................................................................. 125

2.15.2.1 L1 D-Cache ................................................................................................................ 126

2.15.3 Watch Operation Ordering Requirements .......................................................................... 126

2.15.4 Impact on Existing Software .............................................................................................. 126

3. FU Programming Model .......................................................................................... 127

3.1 Storage Addressing ....................................................................................................................... 127

3.1.1 Storage Operands ................................................................................................................ 127

3.1.2 Effective Address Calculation .............................................................................................. 128

3.1.3 Data Storage Addressing Modes ......................................................................................... 128

3.2 Floating-Point Exceptions .............................................................................................................. 129

3.3 Floating-Point Registers ................................................................................................................ 129

3.3.1 Register Types ..................................................................................................................... 130

3.3.1.1 Floating-Point Registers (FPR0–FPR31) ..................................................................... 130

3.3.1.2 Floating-Point Status and Control Register (FPSCR) .................................................. 131

3.4 Floating-Point Data Formats ......................................................................................................... 133

3.4.1 Value Representation .......................................................................................................... 134

3.4.2 Binary Floating-Point Numbers ............................................................................................ 135

3.4.2.1 Normalized Numbers .................................................................................................... 135

3.4.2.2 Denormalized Numbers ................................................................................................ 136

3.4.2.3 Zero Values .................................................................................................................. 136

3.4.3 Infinities ................................................................................................................................ 136

3.4.3.1 Not a Numbers ............................................................................................................. 136

3.4.4 Sign of Result ....................................................................................................................... 137

3.4.5 Normalization and Denormalization ..................................................................................... 138

3.4.6 Data Handling and Precision ............................................................................................... 138

3.4.7 Rounding .............................................................................................................................. 139

3.5 Floating-Point Execution Models ................................................................................................... 140

3.5.1 Execution Model for IEEE Operations ................................................................................. 141

3.5.2 Execution Model for Multiply-Add Type Instructions ............................................................ 143

3.6 Floating-Point Instructions ............................................................................................................. 143

3.6.1 Instructions by Category ...................................................................................................... 144

Contents

Page 6 of 864

Version 1.3

October 23, 2012

Page 7

User’s Manual

A2 Processor

3.6.2 Load and Store Instructions ................................................................................................. 145

3.6.3 Floating-Point Store Instructions ......................................................................................... 146

3.6.4 Floating-Point Move Instructions ......................................................................................... 148

3.6.5 Floating-Point Arithmetic Instructions .................................................................................. 148

3.6.5.1 Floating-Point Multiply-Add Instructions ....................................................................... 149

3.6.6 Floating-Point Rounding and Conversion Instructions ........................................................ 149

3.6.7 Floating-Point Compare Instructions ................................................................................... 150

3.6.8 Floating-Point Status and Control Register Instructions ...................................................... 151

4. Initialization ............................................................................................................. 153

4.1 Core Reset .................................................................................................................................... 153

4.2 A2 Core State After Reset ............................................................................................................. 154

4.3 Software Initiated Reset Requests ................................................................................................ 160

4.3.1 Software Reset Requests .................................................................................................... 160

4.3.1.1 From Debug ................................................................................................................. 161

4.3.1.2 From Watchdog Timer .................................................................................................. 161

4.3.2 Reset Request Status .......................................................................................................... 161

4.3.2.1 Debug Facility Reset Status ......................................................................................... 162

4.3.2.2 Timer Facility Reset Status .......................................................................................... 162

4.4 Initialization Software Requirements ............................................................................................. 163

5. Instruction and Data Caches ................................................................................. 169

5.1 Data Cache Array Organization and Operation ............................................................................ 169

5.2 Instruction Cache Array Organization and Operation ................................................................... 170

5.3 Cache Line Replacement Policy ................................................................................................... 170

5.4 Instruction Cache Controller .......................................................................................................... 170

5.4.1 ICC Operations .................................................................................................................... 171

5.4.2 Instruction Cache Coherency .............................................................................................. 171

5.4.2.1 Self-Modifying Code ..................................................................................................... 172

5.4.2.2 Instruction Cache Synonyms ........................................................................................ 172

5.4.3 Instruction Cache Control and Debug ................................................................................. 172

5.4.3.1 Instruction Cache Management and Debug Instruction Summary ............................... 172

5.4.3.2 Instruction Cache Parity Operations ............................................................................. 173

5.4.3.3 Simulating Instruction Cache Parity Errors for Software Testing ................................. 173

5.5 Data Cache Controller ................................................................................................................... 173

5.5.1 DCC Operations .................................................................................................................. 174

5.5.1.1 Load and Store Alignment ............................................................................................ 175

5.5.1.2 Load Operations ........................................................................................................... 175

5.5.1.3 Store Operations .......................................................................................................... 176

5.5.1.4 Data Read and Instruction Fetch Interface Requests .................................................. 176

5.5.1.5 Data Write Interface Requests ..................................................................................... 176

5.5.1.6 Storage Access Ordering ............................................................................................. 177

5.5.2 Data Cache Coherency ....................................................................................................... 177

5.5.3 Data Cache Control ............................................................................................................. 177

5.5.3.1 Data Cache Management Instruction Summary .......................................................... 177

5.5.3.2 dcbt and dcbtst Operation ............................................................................................ 178

5.5.3.3 Cache Locking Mechanisms ........................................................................................ 179

5.5.3.4 Data Cache Parity Operations ...................................................................................... 183

5.5.3.5 Simulating Data Cache Parity Errors for Software Testing .......................................... 183

Version 1.3 October 23, 2012

Contents

Page 7 of 864

Page 8

User’s Manual

A2 Processor

5.5.3.6 Data Cache Disable ...................................................................................................... 183

6. Memory Management .............................................................................................. 185

6.1 MMU Overview .............................................................................................................................. 185

6.1.1 Support for Power ISA MMU Architecture ........................................................................... 186

6.2 Page Identification ......................................................................................................................... 186

6.2.1 Virtual Address Formation ................................................................................................... 187

6.2.2 Address Space Identifier Convention ................................................................................... 187

6.2.3 Exclusion Range (X-bit) Operation ...................................................................................... 188

6.2.4 TLB Match Process .............................................................................................................. 189

6.3 Address Translation ...................................................................................................................... 191

6.4 Access Control .............................................................................................................................. 193

6.4.1 Execute Access ................................................................................................................... 193

6.4.2 Write Access ........................................................................................................................ 193

6.4.3 Read Access ........................................................................................................................ 194

6.4.4 Access Control Applied to Cache Management Instructions ............................................... 194

6.5 Storage Attributes .......................................................................................................................... 195

6.5.1 Write-Through (W) ............................................................................................................... 196

6.5.2 Caching Inhibited (I) ............................................................................................................. 196

6.5.3 Memory Coherence Required (M) ....................................................................................... 196

6.5.4 Guarded (G) ......................................................................................................................... 196

6.5.5 Endian (E) ............................................................................................................................ 197

6.5.6 User-Definable (U0–U3) ...................................................................................................... 197

6.5.7 Supported Storage Attribute Combinations ......................................................................... 197

6.5.8 Aliasing ................................................................................................................................ 197

6.6 Translation Lookaside Buffer ......................................................................................................... 198

6.7 Effective to Real Address Translation Arrays ................................................................................ 203

6.7.1 ERAT Context Synchronization ........................................................................................... 204

6.7.2 ERAT Reset Behavior .......................................................................................................... 205

6.7.3 Atomic Update of ERAT Entries ........................................................................................... 205

6.7.4 ERAT LRU Round-Robin Replacement Mode ..................................................................... 205

6.7.5 ERAT LRU Replacement Watermark .................................................................................. 206

6.7.6 ERAT (TLB Lookaside Information) Coherency and Back-Invalidation ............................... 206

6.7.7 ERAT External PID (EPID) Context and Instruction Dependencies .................................... 208

6.8 Logical to Real Address Translation Array (Category E.HV.LRAT) .............................................. 209

6.9 TLB Management Instructions (Architected) ................................................................................. 212

6.9.1 TLB Read and Write Instructions (tlbre and tlbwe) ............................................................. 213

6.9.2 TLB Search Instruction (tlbsx[.]) ........................................................................................ 215

6.9.3 TLB Search and Reserve Instruction (tlbsrx.) .................................................................... 215

6.9.4 TLB Invalidate Virtual Address (Indexed) Instruction (tlbivax) ............................................ 216

6.9.5 TLB Invalidate Local (Indexed) Instruction (tlbilx) ............................................................... 218

6.9.6 TLB Sync Instruction (tlbsync) ............................................................................................ 218

6.10 ERAT Management Instructions (Non-Architected) .................................................................... 219

6.10.1 ERAT Read and Write Instructions (eratre and eratwe) ................................................... 219

6.10.2 ERAT Search Instruction (eratsx[.]) ................................................................................. 220

6.10.3 ERAT Invalidate Virtual Address (Indexed) Instruction (erativax) ..................................... 221

6.10.4 ERAT Invalidate Local (Indexed) Instruction (eratilx) ........................................................ 224

6.11 32-Bit Mode Memory Management Behavior .............................................................................. 224

6.11.1 32-Bit Mode TLB Read and Write Instructions (tlbre and tlbwe) ...................................... 225

Contents

Page 8 of 864

Version 1.3

October 23, 2012

Page 9

User’s Manual

A2 Processor

6.11.2 32-Bit Mode TLB Search Instruction (tlbsx[.]) ................................................................. 225

6.11.3 32-Bit Mode TLB Search and Reserve Instruction (tlbsrx.) ............................................. 225

6.11.4 32-Bit Mode TLB Invalidate Virtual Address (Indexed) Instruction (tlbivax) ..................... 226

6.11.5 32-Bit Mode TLB Invalidate Local (Indexed) Instruction (tlbilx) ........................................ 226

6.11.6 32-Bit Mode TLB Sync Instruction (tlbsync) ..................................................................... 226

6.11.7 32-Bit Mode ERAT Read and Write Instructions (eratre and eratwe) .............................. 226

6.11.8 32-Bit Mode ERAT Search Instruction (eratsx[.]) ............................................................ 227

6.11.9 32-Bit Mode ERAT Invalidate Virtual Address (Indexed) Instruction (erativax) ................ 227

6.11.10 32-Bit Mode ERAT Invalidate Local (Indexed) Instruction (eratilx) ................................. 228

6.12 Page Reference and Change Status Management .................................................................... 228

6.13 TLB and ERAT Parity Operations ............................................................................................... 229

6.13.1 Parity Errors Generated from tlbre or eratre .................................................................... 230

6.13.2 Simulating TLB and ERAT Parity Errors for Software Testing .......................................... 231

6.14 ERAT-Only Mode Operation ....................................................................................................... 232

6.15 TLB Reservations and TLB Write Conditional (Category E.TWC) .............................................. 232

6.16 Hardware Page Table Walking (Category E.PT) ........................................................................ 237

6.16.1 Searching the TLB for Direct and Indirect Entries ............................................................. 237

6.16.2 Indirect TLB Entry Page and Sub-Page Sizes ................................................................... 238

6.16.3 Hardware Page Table Entry Format .................................................................................. 239

6.16.4 Calculation of Hardware Page Table Entry Real Address ................................................. 240

6.16.5 Hardware Page Table Errors and Exceptions ................................................................... 241

6.16.6 Hardware Page Table Storage Control Attributes ............................................................. 241

6.16.7 TLB Update After Hardware Page Table Translation ........................................................ 242

6.17 Storage Control Registers (Architected) ..................................................................................... 244

6.17.1 Process ID Register (PID) ................................................................................................. 244

6.17.2 Logical Partition ID Register (LPIDR) ................................................................................ 245

6.17.3 External PID Load Context (EPLC) Register ..................................................................... 246

6.17.4 External PID Store Context (EPSC) Register .................................................................... 247

6.17.5 MMU Assist Register 0 (MAS0) ......................................................................................... 248

6.17.6 MMU Assist Register 1 (MAS1) ......................................................................................... 249

6.17.7 MMU Assist Register 2 (MAS2) ......................................................................................... 251

6.17.8 MMU Assist Register 2 Upper (MAS2U) ........................................................................... 252

6.17.9 MMU Assist Register 3 (MAS3) ......................................................................................... 253

6.17.10 MMU Assist Register 4 (MAS4) ....................................................................................... 255

6.17.11 MMU Assist Register 5 (MAS5) ....................................................................................... 256

6.17.12 MMU Assist Register 6 (MAS6) ....................................................................................... 257

6.17.13 MMU Assist Register 7 (MAS7) ....................................................................................... 258

6.17.14 MMU Assist Register 8 (MAS8) ....................................................................................... 259

6.17.15 MAS0_MAS1 Register ..................................................................................................... 260

6.17.16 MAS5_MAS6 Register ..................................................................................................... 261

6.17.17 MAS7_MAS3 Register ..................................................................................................... 262

6.17.18 MAS8_MAS1 Register ..................................................................................................... 263

6.17.19 MMU Configuration Register (MMUCFG) ........................................................................ 264

6.17.20 MMU Control and Status Register 0 (MMUCSR0) .......................................................... 265

6.17.21 TLB 0 Configuration Register (TLB0CFG) ....................................................................... 266

6.17.22 TLB 0 Page Size Register (TLB0PS) .............................................................................. 268

6.17.23 LRAT Configuration Register (LRATCFG) ...................................................................... 269

6.17.24 LRAT Page Size Register (LRATPS) .............................................................................. 270

6.17.25 Embedded Page Table Configuration Register (EPTCFG) ............................................. 272

6.17.26 Logical Page Exception Register (LPER) ........................................................................ 273

Version 1.3 October 23, 2012

Contents

Page 9 of 864

Page 10

User’s Manual

A2 Processor

6.17.27 Logical Page Exception Register Upper (LPERU) ........................................................... 274

6.17.28 MAS Register Update Summary ...................................................................................... 275

6.18 Storage Control Registers (Non-Architected) .............................................................................. 277

6.18.1 Memory Management Unit Control Register 0 (MMUCR0) ............................................... 277

6.18.2 Memory Management Unit Control Register 1 (MMUCR1) ............................................... 280

6.18.3 Memory Management Unit Control Register 2 (MMUCR2) ............................................... 287

6.18.4 Memory Management Unit Control Register 3 (MMUCR3) ............................................... 290

7. CPU Interrupts and Exceptions .............................................................................. 293

7.1 Overview ....................................................................................................................................... 293

7.2 Directed Interrupts ......................................................................................................................... 293

7.3 Interrupt Classes ........................................................................................................................... 294

7.3.1 Asynchronous Interrupts ...................................................................................................... 294

7.3.2 Synchronous Interrupts ........................................................................................................ 294

7.3.2.1 Synchronous, Precise Interrupts .................................................................................. 294

7.3.2.2 Synchronous, Imprecise Interrupts ............................................................................... 295

7.3.3 Critical and Noncritical Interrupts ......................................................................................... 296

7.3.4 Machine Check Interrupts .................................................................................................... 296

7.4 Interrupt Processing ...................................................................................................................... 297

7.4.1 Partially Executed Instructions ............................................................................................. 299

7.5 Interrupt Processing Registers ...................................................................................................... 300

7.5.1 Register Mapping ................................................................................................................. 301

7.5.2 Machine State Register (MSR) ............................................................................................ 301

7.5.3 Machine State Register Protect (MSRP) ............................................................................. 303

7.5.4 Embedded Processor Control Register (EPCR) .................................................................. 304

7.5.5 Save/Restore Register 0 (SRR0) ......................................................................................... 305

7.5.6 Save/Restore Register 1 (SRR1) ......................................................................................... 306

7.5.7 Guest Save/Restore Register 0 (GSRR0) ........................................................................... 308

7.5.8 Guest Save/Restore Register 1 (GSRR1) ........................................................................... 308

7.5.9 Critical Save/Restore Register 0 (CSRR0) .......................................................................... 310

7.5.10 Critical Save/Restore Register 1 (CSRR1) ........................................................................ 311

7.5.11 Machine Check Save/Restore Register 0 (MCSRR0) ....................................................... 313

7.5.12 Machine Check Save/Restore Register 1 (MCSRR1) ....................................................... 313

7.5.13 Data Exception Address Register (DEAR) ......................................................................... 315

7.5.14 Guest Data Exception Address Register (GDEAR) ........................................................... 316

7.5.15 Interrupt Vector Prefix Register (IVPR) .............................................................................. 318

7.5.16 Guest Interrupt Vector Prefix Register (GIVPR) ................................................................ 318

7.5.17 Exception Syndrome Register (ESR) ................................................................................. 318

7.5.18 Guest Exception Syndrome Register (GESR) ................................................................... 320

7.5.19 Machine Check Status Register (MCSR) ........................................................................... 322

7.6 Interrupt Definitions ....................................................................................................................... 323

7.6.1 Critical Input Interrupt ........................................................................................................... 326

7.6.2 Machine Check Interrupt ...................................................................................................... 327

7.6.2.1 Machine Check Status Register (MCSR) ..................................................................... 329

7.6.3 Data Storage Interrupt ......................................................................................................... 330

7.6.4 Instruction Storage Interrupt ................................................................................................ 334

7.6.5 External Input Interrupt ........................................................................................................ 336

7.6.6 Alignment Interrupt ............................................................................................................... 337

7.6.7 Program Interrupt ................................................................................................................. 338

Contents

Page 10 of 864

Version 1.3

October 23, 2012

Page 11

User’s Manual

A2 Processor

7.6.8 Floating-Point Unavailable Interrupt .................................................................................... 342

7.6.9 System Call Interrupt ........................................................................................................... 342

7.6.10 Auxiliary Processor Unavailable Interrupt .......................................................................... 343

7.6.11 Decrementer Interrupt ....................................................................................................... 343

7.6.12 Fixed-Interval Timer Interrupt ............................................................................................ 344

7.6.13 Watchdog Timer Interrupt .................................................................................................. 344

7.6.14 Data TLB Error Interrupt .................................................................................................... 345

7.6.15 Instruction TLB Error Interrupt ........................................................................................... 346

7.6.16 Vector Unavailable Interrupt .............................................................................................. 347

7.6.17 Debug Interrupt .................................................................................................................. 347

7.6.18 Processor Doorbell Interrupt .............................................................................................. 351

7.6.19 Processor Doorbell Critical Interrupt .................................................................................. 352

7.6.20 Guest Processor Doorbell Interrupt ................................................................................... 352

7.6.21 Guest Processor Doorbell Critical Interrupt ....................................................................... 353

7.6.22 Guest Processor Doorbell Machine Check Interrupt ......................................................... 353

7.6.23 Embedded Hypervisor System Call Interrupt .................................................................... 354

7.6.24 Embedded Hypervisor Privilege Interrupt .......................................................................... 354

7.6.25 LRAT Error Interrupt .......................................................................................................... 355

7.6.26 User Decrementer Interrupt ............................................................................................... 356

7.6.27 Performance Monitor Interrupt ........................................................................................... 356

7.7 Processor Messages ..................................................................................................................... 357

7.7.1 Processor Message Handling and Filtering ......................................................................... 357

7.7.2 Doorbell Message Filtering .................................................................................................. 358

7.7.3 Doorbell Critical Message Filtering ...................................................................................... 359

7.7.4 Guest Doorbell Message Filtering ....................................................................................... 360

7.7.5 Guest Doorbell Critical Message Filtering ........................................................................... 360

7.7.6 Guest Doorbell Machine Check Message Filtering ............................................................. 361

7.8 Interrupt Ordering and Masking .................................................................................................... 362

7.8.1 Interrupt Ordering Software Requirements .......................................................................... 363

7.8.2 Interrupt Order ..................................................................................................................... 364

7.9 Exception Priorities ....................................................................................................................... 365

7.9.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions ............ 366

7.9.2 Exception Priorities for Floating-Point Load and Store Instructions .................................... 367

7.9.3 Exception Priorities for Floating-Point Instructions (Other) .................................................. 367

7.9.4 Exception Priorities for Privileged Instructions .................................................................... 368

7.9.5 Exception Priorities for Trap Instructions ............................................................................. 368

7.9.6 Exception Priorities for System Call Instruction ................................................................... 368

7.9.7 Exception Priorities for Branch Instructions ......................................................................... 369

7.9.8 Exception Priorities for Return From Interrupt Instructions .................................................. 369

7.9.9 Exception Priorities for Reserved Instructions ..................................................................... 369

7.9.10 Exception Priorities for All Other Instructions .................................................................... 370

8. FU Interrupts and Exceptions ................................................................................ 371

8.1 Floating-Point Exceptions ............................................................................................................. 371

8.2 Exceptions List .............................................................................................................................. 372

8.3 Floating-Point Interrupts ................................................................................................................ 375

8.3.1 Floating-Point Unavailable Interrupt .................................................................................... 375

8.3.2 Floating-Point Assist Interrupt ............................................................................................. 375

8.4 Floating-Point Exception Behavior ................................................................................................ 375

Version 1.3 October 23, 2012

Contents

Page 11 of 864

Page 12

User’s Manual

A2 Processor

8.4.1 Invalid Operation Exception ................................................................................................. 375

8.4.1.1 Action ............................................................................................................................ 376

8.4.2 Zero Divide Exception .......................................................................................................... 377

8.4.2.1 Action ............................................................................................................................ 377

8.4.3 Overflow Exception .............................................................................................................. 378

8.4.3.1 Action ............................................................................................................................ 378

8.4.4 Underflow Exception ............................................................................................................ 379

8.4.4.1 Action ............................................................................................................................ 379

8.4.5 Inexact Exception ................................................................................................................. 380

8.4.5.1 Action ............................................................................................................................ 380

8.5 Exception Priorities for Floating-Point Load and Store Instructions .............................................. 380

8.6 Exception Priorities for Other Floating-Point Instructions .............................................................. 381

8.7 QNaN ............................................................................................................................................ 381

8.8 Updating FPRs on Exceptions ...................................................................................................... 382

8.9 Floating-Point Status and Control Register (FPSCR) ................................................................... 382

8.10 Updating the Condition Register ................................................................................................. 385

8.10.1 Condition Register (CR) ..................................................................................................... 385

8.10.2 Updating CR Fields ............................................................................................................ 386

8.10.3 Generation of QNaN Results ............................................................................................. 386

9. Timer Facilities ........................................................................................................ 387

9.1 Time Base ..................................................................................................................................... 388

9.1.1 Reading the Time Base ....................................................................................................... 389

9.1.2 Writing the Time Base .......................................................................................................... 389

9.2 Decrementer (DEC) ....................................................................................................................... 389

9.3 User Decrementer (UDEC) ........................................................................................................... 391

9.4 Fixed Interval Timer (FIT) .............................................................................................................. 392

9.5 Watchdog Timer ............................................................................................................................ 393

9.6 Timer Control Register (TCR) ....................................................................................................... 395

9.7 Timer Status Register (TSR) ......................................................................................................... 397

9.8 Freezing the Timer Facilities ......................................................................................................... 397

9.9 Selection of the Timer Clock Source ............................................................................................. 398

9.10 Synchronizing Timers Across Multiple Cores .............................................................................. 398

10. Debug Facilities ..................................................................................................... 399

10.1 Implications of Hypervisor on Debug Controls ............................................................................ 399

10.2 Support for Development Tools ................................................................................................... 399

10.3 Debug Modes .............................................................................................................................. 399

10.3.1 Internal Debug Mode ......................................................................................................... 400

10.3.2 External Debug Mode ........................................................................................................ 400

10.3.3 Trace Debug Mode ............................................................................................................ 401

10.4 Debug Events .............................................................................................................................. 402

10.4.1 Instruction Address Compare (IAC) Debug Event ............................................................. 402

10.4.1.1 IAC Debug Event Fields ............................................................................................. 403

10.4.1.2 IAC Debug Event Processing ..................................................................................... 404

10.4.2 Data Address Compare (DAC) Debug Event ..................................................................... 405

10.4.2.1 DAC Debug Event Fields ............................................................................................ 405

10.4.2.2 DAC Debug Event Processing ................................................................................... 407

Contents

Page 12 of 864

Version 1.3

October 23, 2012

Page 13

User’s Manual

A2 Processor

10.4.2.3 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses 407

10.4.2.4 DAC Debug Events Applied to Various Instruction Types ......................................... 408

10.4.3 Data Value Compare (DVC) Debug Event ........................................................................ 409

10.4.3.1 DVC Debug Event Fields ........................................................................................... 409

10.4.3.2 DVC Debug Event Processing ................................................................................... 410

10.4.3.3 DVC Debug Events Applied to Instructions that Result in Multiple Storage Accesses 410

10.4.3.4 DVC Debug Events Applied to Various Instruction Types ......................................... 411

10.4.3.5 DVC Debug Events Applied to Floating-Point Loads and Stores ............................... 411

10.4.4 Instruction Complete (ICMP) Debug Event ....................................................................... 411

10.4.5 Branch Taken (BRT) Debug Event .................................................................................... 412

10.4.6 Trap (TRAP) Debug Event ................................................................................................ 412

10.4.7 Return (RET) Debug Event ............................................................................................... 412

10.4.8 Interrupt (IRPT) Debug Event ............................................................................................ 413

10.4.9 Unconditional Debug Event (UDE) .................................................................................... 414

10.4.10 Instruction Value Compare (IVC) Debug Event ............................................................... 414

10.4.11 Debug Event Summary ................................................................................................... 415

10.5 Debug Reset ............................................................................................................................... 415

10.6 Debug Timer Freeze ................................................................................................................... 415

10.7 Debug Registers ......................................................................................................................... 415

10.7.1 Debug Control Register 0 (DBCR0) .................................................................................. 416

10.7.2 Debug Control Register 1 (DBCR1) .................................................................................. 418

10.7.3 Debug Control Register 2 (DBCR2) .................................................................................. 419

10.7.4 Debug Control Register 3 (DBCR3) .................................................................................. 421

10.7.5 Debug Status Register (DBSR) ........................................................................................ 422

10.7.6 Debug Status Register Write Register (DBSRWR) ........................................................... 423

10.7.7 Instruction Address Compare Registers (IAC1–IAC4) ...................................................... 425

10.7.8 Data Address Compare Registers (DAC1–DAC2) ............................................................ 426

10.7.9 Data Value Compare Registers (DVC1–DVC2) ................................................................ 427

10.7.10 Instruction Address Register (IAR) .................................................................................. 428

10.7.11 Instruction Match Mask Registers (IMMR) ...................................................................... 429

10.7.12 Instruction Match Registers (IMR) ................................................................................... 429

10.8 Instruction Stuffing ...................................................................................................................... 429

10.8.1 Ram Mode Overview ......................................................................................................... 430

10.8.2 Ram Register Descriptions ................................................................................................ 431

10.8.3 Example Ram Mode Procedures ....................................................................................... 434

10.8.3.1 SPR Read/Write Using GPR as Temporary Storage ................................................. 434

10.8.3.2 Using Microcode Scratch Registers as Temporary Storage ...................................... 435

10.8.4 Supported Ram Instructions .............................................................................................. 436

10.9 Direct Access to I-Cache and D-Cache Directories .................................................................... 437

10.9.1 General Read D-Cache Directory Sequence for L1 D-Cache ........................................... 437

10.9.2 Instruction Unit Debug Register 0 (IUDBG0) ..................................................................... 438

10.9.3 Instruction Unit Debug Register 1 (IUDBG1) ..................................................................... 439

10.9.4 Instruction Unit Debug Register 2 (IUDBG2) ..................................................................... 439

10.9.5 Execution Unit Debug Register 0 (XUDBG0) .................................................................... 440

10.9.6 Execution Unit Debug Register 1 (XUDBG1) .................................................................... 440

10.9.7 Execution Unit Debug Register 2 (XUDBG2) .................................................................... 441

10.10 Thread Control and Status ........................................................................................................ 441

10.10.1 Using THRCTL Register to Stop Thread 0 ...................................................................... 443

10.10.2 Using THRCTL Register to Start Thread 0 ...................................................................... 443

10.10.3 Using THRCTL Register to Instruction Step Thread 0 .................................................... 443

Version 1.3 October 23, 2012

Contents

Page 13 of 864

Page 14

User’s Manual

A2 Processor

10.11 PC Configuration Register 0 (PCCR0) ...................................................................................... 444

10.12 Trace and Trigger Bus ............................................................................................................... 445

10.12.1 Trace and Trigger Bus Overview ..................................................................................... 445

10.12.2 Unit Level Trace and Trigger Bus Implementation ........................................................... 446

10.12.3 Debug Select Registers ................................................................................................... 447

11. Performance Events and Event Selection ........................................................... 449

11.1 Event Bus Overview .................................................................................................................... 449

11.2 A2 Core Event Bus and PC Unit Controls ................................................................................... 450

11.2.1 Enabling Performance Event and Trace Bus Latches ....................................................... 450

11.2.2 Performance Analysis Operating Modes ........................................................................... 450

11.2.3 Core Performance Event Selection to External Event Bus ................................................ 450

11.2.4 Core Event Select Register (CESR) .................................................................................. 452

11.3 Unit Level Performance Event Selection ..................................................................................... 454

11.3.1 Unit Event Multiplexer Component .................................................................................... 454

11.3.2 Performance Monitor Event Tags and Count Modes ......................................................... 456

11.3.3 Unit Performance Event Tables ......................................................................................... 457

11.4 Unit Performance Event Tables .................................................................................................. 458

11.4.1 FU Performance Events Table ........................................................................................... 458

11.4.2 IU Performance Events Table ............................................................................................ 458

11.4.3 XU Performance Events Table .......................................................................................... 460

11.4.4 LSU Performance Events Table ........................................................................................ 462

11.4.5 MMU Performance Events Table ....................................................................................... 465

11.5 Unit Event Select Registers ......................................................................................................... 466

11.5.1 FU Event Select Register (AESR) ..................................................................................... 466

11.5.2 IU Event Select Registers .................................................................................................. 468

11.5.3 XU Event Select Registers ................................................................................................. 470

11.5.4 LSU Event Select Registers ............................................................................................... 472

11.5.5 MMU Event Select Registers ............................................................................................. 474

11.6 A2 Support for Core Instruction Trace ......................................................................................... 476

11.6.1 Instruction Trace Mode Setup ............................................................................................ 476

11.6.2 Instruction Trace Record Data ........................................................................................... 476

11.6.3 Instruction Trace Record Formats and Ordering ............................................................... 477

11.6.4 Debug Bus Control When in Instruction Trace Mode ......................................................... 478

11.6.4.1 FU Trace Records ...................................................................................................... 479

11.6.4.2 XU Debug Bus Control ............................................................................................... 479

11.7 A2 Support for Instruction Sampling ............................................................................................ 479

12. Implementation Dependent Instructions ............................................................. 481

12.1 Miscellaneous .............................................................................................................................. 481

12.1.1 Attention (attn) ................................................................................................................... 481

12.2 TLB Management Instructions .................................................................................................... 482

12.2.1 TLB Read Entry (tlbre) ...................................................................................................... 482

12.2.2 TLB Write Entry (tlbwe) ..................................................................................................... 484

12.2.3 TLB Search Indexed (tlbsx[.]) ........................................................................................... 486

12.2.4 TLB Search and Reserve Indexed (tlbsrx.) ....................................................................... 488

12.2.5 TLB Invalidate Virtual Address Indexed (tlbivax) .............................................................. 490

12.2.6 TLB Invalidate Local Indexed (tlbilx) ................................................................................. 493

12.3 ERAT Management Instructions ................................................................................................. 496

Contents

Page 14 of 864

Version 1.3

October 23, 2012

Page 15

User’s Manual

A2 Processor

12.3.1 ERAT Read Entry (eratre) ................................................................................................. 496

12.3.2 ERAT Write Entry (eratwe) ............................................................................................... 499

12.3.3 ERAT Search Indexed (eratsx[.]) ..................................................................................... 502

12.3.4 ERAT Invalidate Virtual Address Indexed (erativax) ........................................................ 504

12.3.5 ERAT Invalidate Local Indexed (eratilx) ........................................................................... 507

12.4 Software Transactional Memory Instructions .............................................................................. 509

12.4.1 Load Doubleword and Watch Indexed X-Form (ldawx.) ................................................... 510

12.4.2 Watch Check All X-Form (wchkall) ................................................................................... 511

12.4.3 Watch Clear X-Form (wclr) ............................................................................................... 512

12.5 Coprocessor Instructions ............................................................................................................ 513

12.5.1 Initiate Coprocessor Store Word Indexed (icswx[.]) ......................................................... 515

12.5.1.1 General Registers ...................................................................................................... 516

12.5.1.2 Initial Execution .......................................................................................................... 517

12.5.2 Initiate Coprocessor Store Word External Process ID Indexed (icswepx[.]) .................... 518

12.5.3 Execution ........................................................................................................................... 518

12.5.3.1 Condition Register 0 ................................................................................................... 519

12.5.4 Coprocessor-Request Block .............................................................................................. 520

12.5.4.1 Available Coprocessor Register (ACOP) ................................................................... 520

12.5.4.2 Hypervisor Available Coprocessor Register (HACOP) ............................................... 521

12.6 Data Cache Block Flush .............................................................................................................. 523

12.6.1 Data Cache Block Flush (dcbf) ......................................................................................... 523

12.7 Data Cache Block Flush by External PID .................................................................................... 524

12.7.1 Data Cache Block Flush by External PID (dcbfep) ........................................................... 524

13. Power Management Methods .............................................................................. 525

13.1 Chip Power Management Controls ............................................................................................. 525

13.2 Power-Saving Instructions .......................................................................................................... 525

13.2.1 Power-Saving Instruction Sequence ................................................................................. 526

14. Register Summary ................................................................................................ 529

14.1 Register Categories .................................................................................................................... 529

14.2 Reserved Fields .......................................................................................................................... 535

14.3 Unimplemented SPRs ................................................................................................................. 535

14.4 Device Control Registers ............................................................................................................ 535

14.5 Alphabetical Register Listing ....................................................................................................... 537

14.5.1 ACOP - Available Coprocessor ......................................................................................... 538

14.5.2 AESR - AXU Event Select Register ................................................................................... 539

14.5.3 CCR0 - Core Configuration Register 0 .............................................................................. 541

14.5.4 CCR1 - Core Configuration Register 1 .............................................................................. 542

14.5.5 CCR2 - Core Configuration Register 2 .............................................................................. 543

14.5.6 CCR3 - Core Configuration Register 3 .............................................................................. 545

14.5.7 CESR - Core Event Select Register .................................................................................. 546

14.5.8 CR - Condition Register ..................................................................................................... 549

14.5.9 CSRR0 - Critical Save/Restore Register 0 ........................................................................ 550

14.5.10 CSRR1 - Critical Save/Restore Register 1 ...................................................................... 551

14.5.11 CTR - Count Register ...................................................................................................... 553

14.5.12 DAC1 - Data Address Compare 1 ................................................................................... 554

14.5.13 DAC2 - Data Address Compare 2 ................................................................................... 555

14.5.14 DAC3 - Data Address Compare 3 ................................................................................... 556

Version 1.3 October 23, 2012

Contents

Page 15 of 864

Page 16

User’s Manual

A2 Processor

14.5.15 DAC4 - Data Address Compare 4 .................................................................................... 557

14.5.16 DBCR0 - Debug Control Register 0 ................................................................................. 558

14.5.17 DBCR1 - Debug Control Register 1 ................................................................................. 560

14.5.18 DBCR2 - Debug Control Register 2 ................................................................................. 562

14.5.19 DBCR3 - Debug Control Register 3 ................................................................................. 564

14.5.20 DBSR - Debug Status Register ........................................................................................ 565

14.5.21 DBSRWR - Debug Status Register Write Register .......................................................... 567

14.5.22 DEAR - Data Exception Address Register ....................................................................... 569

14.5.23 DEC - Decrementer ......................................................................................................... 570

14.5.24 DECAR - Decrementer Auto-Reload ............................................................................... 571

14.5.25 DVC1 - Data Value Compare 1 ........................................................................................ 572

14.5.26 DVC2 - Data Value Compare 2 ........................................................................................ 573

14.5.27 EPCR - Embedded Processor Control Register .............................................................. 574

14.5.28 EPLC - External Process ID Load Context ...................................................................... 576

14.5.29 EPSC - External Process ID Store Context ..................................................................... 577

14.5.30 EPTCFG - Embedded Page Table Configuration Register .............................................. 578

14.5.31 ESR - Exception Syndrome Register ............................................................................... 579

14.5.32 GDEAR - Guest Data Exception Address Register ......................................................... 581

14.5.33 GESR - Guest Exception Syndrome Register ................................................................. 582

14.5.34 GIVPR - Guest Interrupt Vector Prefix Register ............................................................... 584

14.5.35 GPIR - Guest Processor ID Register ............................................................................... 585

14.5.36 GSPRG0 - Guest Software Special Purpose Register 0 ................................................. 586

14.5.37 GSPRG1 - Guest Software Special Purpose Register 1 ................................................. 587

14.5.38 GSPRG2 - Guest Software Special Purpose Register 2 ................................................. 588

14.5.39 GSPRG3 - Guest Software Special Purpose Register 3 ................................................. 589

14.5.40 GSRR0 - Guest Save/Restore Register 0 ........................................................................ 590

14.5.41 GSRR1 - Guest Save/Restore Register 1 ........................................................................ 591

14.5.42 HACOP - Hypvervisor Available Coprocessor ................................................................. 593

14.5.43 IAC1 - Instruction Address Compare 1 ............................................................................ 594

14.5.44 IAC2 - Instruction Address Compare 2 ............................................................................ 595

14.5.45 IAC3 - Instruction Address Compare 3 ............................................................................ 596

14.5.46 IAC4 - Instruction Address Compare 4 ............................................................................ 597

14.5.47 IAR - Instruction Address Register ................................................................................... 598

14.5.48 IESR1 - IU Event Select Register 1 ................................................................................. 599

14.5.49 IESR2 - IU Event Select Register 2 ................................................................................. 600

14.5.50 IMMR - Instruction Match Mask Register ......................................................................... 601

14.5.51 IMPDEP0 - Implementation Dependent Region 0 ........................................................... 602

14.5.52 IMPDEP1 - Implementation Dependent Region 1 ........................................................... 603

14.5.53 IMR - Instruction Match Register ..................................................................................... 604

14.5.54 IUCR0 - Instruction Unit Configuration Register 0 ........................................................... 605

14.5.55 IUCR1 - Instruction Unit Configuration Register 1 ........................................................... 606

14.5.56 IUCR2 - Instruction Unit Configuration Register 2 ........................................................... 607

14.5.57 IUDBG0 - Instruction Unit Debug Register 0 ................................................................... 608

14.5.58 IUDBG1 - Instruction Unit Debug Register 1 ................................................................... 609

14.5.59 IUDBG2 - Instruction Unit Debug Register 2 ................................................................... 610

14.5.60 IULFSR - Instruction Unit LFSR ....................................................................................... 611

14.5.61 IULLCR - Instruction Unit Live Lock Control Register ...................................................... 612

14.5.62 IVPR - Interrupt Vector Prefix Register ............................................................................ 613

14.5.63 LPER - Logical Page Exception Register ........................................................................ 614

14.5.64 LPERU - Logical Page Exception Register (Upper) ......................................................... 615

Contents

Page 16 of 864

Version 1.3

October 23, 2012

Page 17

User’s Manual

A2 Processor

14.5.65 LPIDR - Logical Partition ID Register .............................................................................. 616

14.5.66 LR - Link Register ............................................................................................................ 617

14.5.67 LRATCFG - LRAT Configuration Register ....................................................................... 618

14.5.68 LRATPS - LRAT Page Size Register .............................................................................. 619

14.5.69 MAS0 - MMU Assist Register 0 ....................................................................................... 620

14.5.70 MAS0_MAS1 - MMU Assist Registers 0 and 1 ............................................................... 621

14.5.71 MAS1 - MMU Assist Register 1 ....................................................................................... 622

14.5.72 MAS2 - MMU Assist Register 2 ....................................................................................... 624

14.5.73 MAS2U - MMU Assist Register 2 (Upper) ....................................................................... 625

14.5.74 MAS3 - MMU Assist Register 3 ....................................................................................... 626

14.5.75 MAS4 - MMU Assist Register 4 ....................................................................................... 628

14.5.76 MAS5 - MMU Assist Register 5 ....................................................................................... 629

14.5.77 MAS5_MAS6 - MMU Assist Registers 5 and 6 ............................................................... 630

14.5.78 MAS6 - MMU Assist Register 6 ....................................................................................... 631

14.5.79 MAS7 - MMU Assist Register 7 ....................................................................................... 632

14.5.80 MAS7_MAS3 - MMU Assist Registers 7 and 3 ............................................................... 633

14.5.81 MAS8 - MMU Assist Register 8 ....................................................................................... 634

14.5.82 MAS8_MAS1 - MMU Assist Registers 8 and 1 ............................................................... 635

14.5.83 MCSR - Machine Check Syndrome Register .................................................................. 636

14.5.84 MCSRR0 - Machine Check Save/Restore Register 0 ..................................................... 638

14.5.85 MCSRR1 - Machine Check Save/Restore Register 1 ..................................................... 639

14.5.86 MESR1 - MMU Event Select Register 1 .......................................................................... 641

14.5.87 MESR2 - MMU Event Select Register 2 .......................................................................... 642

14.5.88 MMUCFG - MMU Configuration Register ........................................................................ 643

14.5.89 MMUCR0 - Memory Management Unit Control Register 0 ............................................. 644

14.5.90 MMUCR1 - Memory Management Unit Control Register 1 ............................................. 645

14.5.91 MMUCR2 - Memory Management Unit Control Register 2 ............................................. 647

14.5.92 MMUCR3 - Memory Management Unit Control Register 3 ............................................. 649

14.5.93 MMUCSR0 - MMU Control and Status Register 0 .......................................................... 650

14.5.94 MSR - Machine State Register ........................................................................................ 651

14.5.95 MSRP - Machine State Register Protect ......................................................................... 653

14.5.96 PID - Process ID .............................................................................................................. 654

14.5.97 PIR - Processor ID Register ............................................................................................ 655

14.5.98 PPR32 - Program Priority Register .................................................................................. 656

14.5.99 PVR - Processor Version Register .................................................................................. 657

14.5.100 SPRG0 - Software Special Purpose Register 0 ............................................................ 658

14.5.101 SPRG1 - Software Special Purpose Register 1 ............................................................ 659

14.5.102 SPRG2 - Software Special Purpose Register 2 ............................................................ 660

14.5.103 SPRG3 - Software Special Purpose Register 3 ............................................................ 661

14.5.104 SPRG4 - Software Special Purpose Register 4 ............................................................ 662

14.5.105 SPRG5 - Software Special Purpose Register 5 ............................................................ 663

14.5.106 SPRG6 - Software Special Purpose Register 6 ............................................................ 664

14.5.107 SPRG7 - Software Special Purpose Register 7 ............................................................ 665

14.5.108 SPRG8 - Software Special Purpose Register 8 ............................................................ 666

14.5.109 SRR0 - Save/Restore Register 0 ................................................................................... 667

14.5.110 SRR1 - Save/Restore Register 1 ................................................................................... 668

14.5.111 TB - Timebase ............................................................................................................... 670

14.5.112 TBL - Timebase Lower .................................................................................................. 671

14.5.113 TBU - Timebase Upper .................................................................................................. 672

14.5.114 TCR - Timer Control Register ........................................................................................ 673

Version 1.3 October 23, 2012

Contents

Page 17 of 864

Page 18

User’s Manual

A2 Processor

14.5.115 TENC - Thread Enable Clear Register .......................................................................... 675

14.5.116 TENS - Thread Enable Set Register .............................................................................. 676

14.5.117 TENSR - Thread Enable Status Register ...................................................................... 677

14.5.118 TIR - Thread Identification Register ............................................................................... 678

14.5.119 TLB0CFG - TLB 0 Configuration Register ..................................................................... 679

14.5.120 TLB0PS - TLB 0 Page Size Register ............................................................................. 680

14.5.121 TRACE - Hardware Trace Macro Control Register ........................................................ 681

14.5.122 TSR - Timer Status Register .......................................................................................... 682

14.5.123 UDEC - User Decrementer ............................................................................................ 683

14.5.124 VRSAVE - Vector Register Save ................................................................................... 684

14.5.125 XER - Fixed Point Exception Register ........................................................................... 685

14.5.126 XESR1 - XU Event Select Register 1 ............................................................................ 686

14.5.127 XESR2 - XU Event Select Register 2 ............................................................................ 687

14.5.128 XESR3 - XU Event Select Register 3 ............................................................................ 688

14.5.129 XESR4 - XU Event Select Register 4 ............................................................................ 689

14.5.130 XUCR0 - Execution Unit Configuration Register 0 ......................................................... 690

14.5.131 XUCR1 - Execution Unit Configuration Register 1 ......................................................... 693

14.5.132 XUCR2 - Execution Unit Configuration Register 2 ......................................................... 694

14.5.133 XUCR3 - Execution Unit Configuration Register 3 ......................................................... 695

14.5.134 XUCR4 - Execution Unit Configuration Register 4 ......................................................... 696

14.5.135 XUDBG0 - Execution Unit Debug Register 0 ................................................................. 697

14.5.136 XUDBG1 - Execution Unit Debug Register 1 ................................................................. 698

14.5.137 XUDBG2 - Execution Unit Debug Register 2 ................................................................. 699

15. SCOM Accessible Registers ................................................................................. 701

15.1 Serial Communications (SCOM) Description .............................................................................. 701

15.2 SCOM Register Summary ........................................................................................................... 703

15.2.1 Read and Write Access Methods ....................................................................................... 703

15.2.1.1 Reset with AND Mask ................................................................................................. 703

15.2.1.2 Set with OR Mask ....................................................................................................... 703

15.2.2 SCOM Register Summary Table ....................................................................................... 703

15.3 Alphabetical Register Listing ....................................................................................................... 705

15.3.1 AXU Debug Select Register (ABDSR) ............................................................................... 705

15.3.2 Error Injection Register (ERRINJ) ...................................................................................... 706

15.3.3 Fault Isolation Register 0 and Associated Registers ......................................................... 707

15.3.4 Fault Isolation Register 1 and Associated Registers ......................................................... 711

15.3.5 Fault Isolation Register 2 and Associated Registers ......................................................... 716

15.3.6 IU Debug Select Register (IDSR) ...................................................................................... 720

15.3.7 MMU/PC Debug Select Register (MPDSR) ....................................................................... 723

15.3.8 PC Configuration Register 0 (PCCR0) ............................................................................... 725

15.3.9 Ram Data Registers (RAMD, RAMDH, RAMDL) ............................................................... 726

15.3.10 Ram Instruction and Command Registers (RAMC, RAMI, RAMIC) ................................ 727

15.3.11 Special Attention Register (SPATTN) .............................................................................. 729

15.3.12 Thread Control and Status Register (THRCTL) ............................................................... 730

15.3.13 XU Debug Select Register1 (XDSR1) .............................................................................. 731

15.3.14 XU Debug Select Register2 (XDSR2) .............................................................................. 734

Contents

Page 18 of 864

Version 1.3

October 23, 2012

Page 19

User’s Manual

A2 Processor

Appendix A. Processor Instruction Summary ......................................................... 737

A.1 Instruction Formats ....................................................................................................................... 737

A.2 Implemented Instructions Sorted by Mnemonic ............................................................................ 737

Appendix B. FU Instruction Summary ...................................................................... 756

B.1 FU Instructions Sorted by Opcode ................................................................................................ 756

Appendix C. Debug and Trigger Groups .................................................................. 761

C.1 Unit Debug Multiplexer Component .............................................................................................. 761

C.2 Debug Multiplexer Component Ordering on the Ramp Bus ......................................................... 761

C.3 Example Debug Multiplexer Configuration Settings ..................................................................... 762

C.3.1 Multiplexer Configuration for Trace/Trigger Signals from a Single Unit .............................. 762

C.3.2 Multiplexer Configuration for Trace/Trigger Signals from Multiple Units ............................. 762

C.4 AXU Debug Select Register and Debug Group Tables ................................................................ 763

C.5 IU Debug Select Register and Debug Group Tables .................................................................... 766

C.6 MMU and PC Debug Select Register and Debug Group Tables .................................................. 778

C.7 XU Debug Select Register1 and Debug Group Tables ................................................................ 798

C.8 XU Debug Select Register2 and Debug Group Tables ................................................................ 817

Appendix D. Instruction Execution Performance and Code Optimizations .......... 833

D.1 A2 Pipeline Overview ................................................................................................................... 833

D.1.1 Arbitration Stages ............................................................................................................... 834

D.1.2 Stall Stages ......................................................................................................................... 835

D.1.3 Flush Stages ....................................................................................................................... 835

D.2 Fetch ............................................................................................................................................. 835

D.2.1 Fetch Arbitration .................................................................................................................. 837

D.2.2 Next Instruction Fetch Address Computation ..................................................................... 837

D.2.3 Instruction Cache Access and Alignment ........................................................................... 837

D.2.4 Instruction Cache Misses .................................................................................................... 837

D.2.5 I-ERAT Misses .................................................................................................................... 838

D.2.6 Instruction Buffer Operation ................................................................................................ 838

D.2.7 Branches and Branch Prediction ........................................................................................ 838

D.2.7.1 Branch Direction Prediction and the Branch History Table (BHT) ............................... 840

D.2.7.2 Taken-Branch Redirection ........................................................................................... 840

D.2.7.3 Branch Target Prediction ............................................................................................. 840

D.2.7.4 Branch Resolution and Mispredictions ........................................................................ 841

D.3 Instruction Issue Operation ........................................................................................................... 841

D.4 Instruction Pair Execution Performance Rules ............................................................................. 841

D.4.1 Defining Latency, Penalty, and Execution Time ................................................................. 841

D.4.2 Unified CR Dependency ..................................................................................................... 842

D.4.3 General CR Operand Dependency ..................................................................................... 842

D.4.4 Move To Condition Register Fields (mtcrf) Instruction Dependency ................................... 843

D.4.5 Move From Condition Register (mfcr) Instruction Dependency .......................................... 843

D.4.6 Move From and Move To Special Purpose Register (mfspr) Dependency ......................... 843

D.4.7 Move From Machine State Register (mfmsr) Dependency ................................................. 843

D.4.8 Multiply Dependency ........................................................................................................... 843

D.4.9 Divide Dependency ............................................................................................................. 844

D.4.10 Store Word Conditional Indexed (stwcx.) Instruction Dependency ................................... 844

Version 1.3 October 23, 2012

Contents

Page 19 of 864

Page 20

User’s Manual

A2 Processor

D.4.11 TLB Management Instruction Dependencies .................................................................... 845

D.4.12 Processor Control Instruction Operation ........................................................................... 845

D.4.13 Load Instruction Dependency ............................................................................................ 846

D.4.14 String/Multiple Operations ................................................................................................. 846

D.4.15 Load-and-Reserve and Store-Conditional Instructions ..................................................... 846

D.4.16 Storage Synchronization Operations ................................................................................. 847

D.5 Loads, Stores, and Data Cache Organization .............................................................................. 847

D.5.1 Overview ............................................................................................................................. 847

D.5.2 Loads ................................................................................................................................... 848

D.5.3 Stores .................................................................................................................................. 848

D.5.4 Load Miss Queue ................................................................................................................ 849

D.5.5 L2 Command Arbitration ..................................................................................................... 849

D.5.6 D-ERAT Misses ................................................................................................................... 849

D.5.7 Back Invalidations ............................................................................................................... 849

D.5.8 Address Alignment .............................................................................................................. 849

D.6 Interrupt Effects ............................................................................................................................ 850

D.7 Floating-Point Instruction Handling ............................................................................................... 850

D.7.1 General FPR Operand Dependency ................................................................................... 852

D.7.2 Denormalized Results ......................................................................................................... 852

D.7.3 Denormalized Operands ..................................................................................................... 852

D.7.4 Not a Number (NaN) Cases ................................................................................................ 852

D.7.5 Floating-Point Load Dependency ........................................................................................ 852

D.7.6 Floating-Point Store Data Dependency ............................................................................... 852

D.7.7 General CR Operand Dependency ..................................................................................... 853

D.7.8 Floating-Point Divide Dependency ...................................................................................... 853

D.7.9 Floating-Point Square Root Dependency ............................................................................ 853

D.7.10 Move to Condition Register from Floating-Point Status and Control Register Dependency ....

853

D.7.11 Move to FPSCR Fields and FPSCR Dependencies .......................................................... 854

D.7.12 Floating-Point Record Forms ............................................................................................ 854

D.8 Interrupt Conditions ...................................................................................................................... 854

D.9 Flush Conditions ........................................................................................................................... 858

Appendix E. Programming Examples ........................................................................ 861

E.1 Wait Instruction with Fast Wakeup for Power Savings ................................................................. 861

E.2 Floating-Point Conversions ........................................................................................................... 861

E.2.1 Conversion from Floating-Point Number to Signed Integer Word ....................................... 861

E.2.2 Conversion from Floating-Point Number to Unsigned Integer Word ................................... 862

E.3 Floating-Point Selection ................................................................................................................ 862

E.3.1 Comparison to Zero ............................................................................................................. 863

E.3.2 Minimum and Maximum ...................................................................................................... 863

E.3.3 Simple If-Then-Else Constructions ...................................................................................... 863

E.4 Notes ............................................................................................................................................. 863

Contents

Page 20 of 864

Version 1.3

October 23, 2012

Page 21

User’s Manual

A2 Processor

List of Figures

Figure 1-1. A2 Core Organization ............................................................................................................. 50

Figure 1-2. A2 Processor Block Diagram ................................................................................................. 56

Figure 2-1. A2 Core Instruction Unit ......................................................................................................... 79

Figure 2-2. Instruction Issue Timing Diagram 1 ........................................................................................ 80

Figure 2-3. Instruction Issue Timing Diagram 2 ........................................................................................ 81

Figure 2-4. Instruction Issue Timing Diagram 3 ........................................................................................ 81

Figure 2-5. User Programming Model Registers ...................................................................................... 83

Figure 3-1. Approximation to Real Numbers .......................................................................................... 135

Figure 3-2. Selection of z1 and z2 .......................................................................................................... 140

Figure 4-1. Software-Initiated Reset Request Overview ........................................................................ 163

Figure 6-1. Virtual Address to TLB Entry Match Process ....................................................................... 190

Figure 6-2. Effective-to-Real Address Translation Flow ......................................................................... 192

Figure 6-3. ERAT Entry Word Definitions ............................................................................................... 220

Figure 6-4. ERAT Entry Word Definitions for 32-Bit Mode ..................................................................... 227

Figure 6-5. Indirect Entry to Page Table Size Calculation ...................................................................... 238

Figure 6-6. Page Table Entry Format ..................................................................................................... 239

Figure 9-1. Relationship of Timer Facilities to the Time Base ................................................................ 387

Figure 9-2. Watchdog State Machine ..................................................................................................... 395

Figure 10-1. Pass-Through Trace and Trigger Bus Overview .................................................................. 446

Figure 10-2. Trace and Trigger Bus Unit Description ............................................................................... 447

Figure 11-1. Performance Event Selection Overview ............................................................................... 449

Figure 11-2. Core Event Multiplexer Description ...................................................................................... 451

Figure 11-3. A2 Common Unit Event Multiplexer Component .................................................................. 456

Figure 12-1. ICSWX (RS

Figure 12-2. Coprocessor Command Word (CCW) .................................................................................. 518

Figure 12-3. Generic Coprocessor-Request Block ................................................................................... 520

Figure 15-1. Chip Level Infrastructure Example to Access SCOM Registers in the A2 Core .................. 702

Figure 15-2. Principle Timing of Information Carried on CCH and DCH .................................................. 702

Figure C-1. Debug Multiplexer Component ............................................................................................. 761

Figure D-1. A2 Pipeline Structure ........................................................................................................... 833

Figure D-2. Instruction Cache ................................................................................................................. 836

Figure D-3. Branch Prediction ................................................................................................................. 839

Figure D-4. FU Dataflow ......................................................................................................................... 851

) Coprocessor-Command Word ................................................................. 517

32:63

Version 1.3 October 23, 2012

List of Figures

Page 21 of 864

Page 22

User’s Manual

A2 Processor

List of Figures

Page 22 of 864

Version 1.3

October 23, 2012

Page 23

User’s Manual

A2 Processor

List of Tables

Table 2-1. Data Operand Definitions ....................................................................................................... 63

Table 2-2. Alignment Effects for Storage Access Instructions ................................................................ 63

Table 2-3. Priority Levels ......................................................................................................................... 76

Table 2-4. Other “or” Instruction Hints ..................................................................................................... 76

Table 2-5. Program Priority Register (PPR32) ........................................................................................ 76

Table 2-6. Register Mapping ................................................................................................................... 84

Table 2-7. Category Listing ..................................................................................................................... 86

Table 2-8. Instruction Categories ............................................................................................................ 89

Table 2-9. Integer Storage Access Instructions ...................................................................................... 90

Table 2-10. Integer Storage Access Instructions by External Process ID ................................................. 90

Table 2-11. Operand Handling Dependent on Alignment ......................................................................... 90

Table 2-12. Integer Arithmetic Instructions ................................................................................................ 91

Table 2-13. Integer Logical Instructions .................................................................................................... 92

Table 2-14. Integer Compare Instructions ................................................................................................. 92

Table 2-15. Integer Trap Instructions ........................................................................................................ 92

Table 2-16. Integer Rotate Instructions ..................................................................................................... 93

Table 2-17. Integer Shift Instructions ........................................................................................................ 93

Table 2-18. Integer Population Count Instructions .................................................................................... 93

Table 2-19. Integer Select Instruction ....................................................................................................... 93

Table 2-20. Branch Instructions ................................................................................................................ 94

Table 2-21. Condition Register Logical Instructions .................................................................................. 94

Table 2-22. Register Management Instructions ........................................................................................ 95

Table 2-23. System Linkage Instructions .................................................................................................. 95

Table 2-24. Processor Control Instruction ................................................................................................. 95

Table 2-25. Cache Management Instructions ........................................................................................... 96

Table 2-26. Cache Management Instructions by External Process ID ...................................................... 96

Table 2-27. TLB Management Instructions ............................................................................................... 96

Table 2-28. Processor Synchronization Instruction ................................................................................... 97

Table 2-29. Load and Reserve and Store Conditional Instructions ........................................................... 97

Table 2-30. Storage Synchronization Instructions ..................................................................................... 97

Table 2-31. Wait Instruction ...................................................................................................................... 98

Table 2-32. Initiate Coprocessor Instructions ............................................................................................ 98

Table 2-33. Cache Initialization Instructions .............................................................................................. 98

Table 2-34. BO Field Encodings ............................................................................................................. 100

Table 2-35. ‘at’ Bit Encodings .................................................................................................................. 100

Table 2-36. CR Updating Instructions ..................................................................................................... 108

Table 2-37. GPR Registers ..................................................................................................................... 110

Table 2-38. XER[SO,OV] Updating Instructions ...................................................................................... 111

Version 1.3 October 23, 2012

List of Tables

Page 23 of 864

Page 24

User’s Manual

A2 Processor

Table 2-39. XER[CA] Updating Instructions ............................................................................................111

Table 2-40. SPRG0 Register ...................................................................................................................114

Table 2-41. SPRG1 Register ...................................................................................................................114

Table 2-42. SPRG2 Register ...................................................................................................................115

Table 2-43. SPRG3 Register ...................................................................................................................115

Table 2-44. SPRG4 Register ...................................................................................................................115

Table 2-45. SPRG5 Register ...................................................................................................................116

Table 2-46. SPRG6 Register ...................................................................................................................116

Table 2-47. SPRG7 Register ...................................................................................................................116

Table 2-48. SPRG8 Register ...................................................................................................................117

Table 2-49. GSPRG0 Register ................................................................................................................117

Table 2-50. GSPRG1 Register ................................................................................................................117

Table 2-51. GSPRG2 Register ................................................................................................................118

Table 2-52. GSPRG3 Register ................................................................................................................118

Table 2-53. Privileged Instructions .......................................................................................................... 121

Table 3-1. Data Operand Definitions .....................................................................................................128

Table 3-2. Invalid Operation Exception Categories ............................................................................... 129

Table 3-3. Floating-Point Registers (FPR0–FPR31) ............................................................................. 130

Table 3-4. Floating-Point Status and Control Register (FPSCR) ...........................................................131

Table 3-5. Floating-Point Single Format ................................................................................................134

Table 3-6. Floating-Point Double Format ............................................................................................... 134

Table 3-7. Format Fields ........................................................................................................................134

Table 3-8. IEEE 754 Floating-Point Fields .............................................................................................134

Table 3-9. Rounding Modes .................................................................................................................. 140

Table 3-10. IEEE 64-Bit Execution Model ...............................................................................................141

Table 3-11. Interpretation of the G, R, and X Bits .................................................................................... 141

Table 3-12. Location of the Guard, Round, and Sticky Bits in the IEEE Execution Model ......................142

Table 3-13. Multiply-Add 64-Bit Execution Model ....................................................................................143

Table 3-14. Location of Guard, Round, and Sticky Bits in the Multiply-Add Execution Model .................143

Table 3-15. Floating-Point Load Instructions ...........................................................................................146

Table 3-16. Floating-Point Store Instructions ..........................................................................................147

Table 3-17. Floating-Point Move Instructions .......................................................................................... 148

Table 3-18. Floating-Point Elementary Arithmetic Instructions ................................................................148

Table 3-19. Floating-Point Multiply-Add Instructions ...............................................................................149

Table 3-20. Floating-Point Rounding and Conversion Instructions .........................................................150

Table 3-21. Comparison Sets ..................................................................................................................150

Table 3-22. Floating-Point Compare and Select Instructions .................................................................. 151

Table 3-23. Floating-Point Status and Control Register Instructions .......................................................151

Table 4-1. Register Reset Values ..........................................................................................................155

List of Tables

Page 24 of 864

Version 1.3

October 23, 2012

Page 25

User’s Manual

A2 Processor

Table 4-2. Shadow TLB Array Entry Initialization .................................................................................. 158

Table 5-1. Data Cache Array Organization ........................................................................................... 169

Table 5-2. Cache Size and Parameters ................................................................................................ 169

Table 5-3. Instruction Cache Array Organization .................................................................................. 170

Table 5-4. Cache Size and Parameters ................................................................................................ 170

Table 5-5. XUCR Bits ............................................................................................................................ 183

Table 6-1. Page Size and Effective Address to EPN Comparison ........................................................ 191

Table 6-2. Page Size and Real Address Formation .............................................................................. 192

Table 6-3. Access Control Applied to Cache Management Instructions ............................................... 194

Table 6-4. TLB Entry Fields ................................................................................................................... 199

Table 6-5. ERAT Class Field Reload Value For UTLB Hits .................................................................. 208

Table 6-6. LRAT Entry Fields ................................................................................................................ 211

Table 6-7. TLB Management Instruction Privilege Levels ..................................................................... 212

Table 6-8. TLB Congruence Class Hashing Function (of EPN Address Bits) ....................................... 214

Table 6-9. Supported EPN[27:51] Field Values in Downbound TLBIVAX Request .............................. 218

Table 6-10. ERAT Management Instruction Privilege Levels .................................................................. 219

Table 6-11. Summary of Supported IS Field Values in ERATIVAX ........................................................ 222

Table 6-12. Supported EPN[27:51] Field Values in Downbound erativax Request ............................... 224

Table 6-13. TLB Reservation Fields ........................................................................................................ 233

Table 6-14. TLB Update After Page Table Translation ........................................................................... 242

Table 6-15. MAS Register Update Summary .......................................................................................... 275

Table 7-1. Register Mapping in Guest State ......................................................................................... 301

Table 7-2. Interrupt Types and Associated Offsets ............................................................................... 316

Table 7-3. Interrupt and Exception Types ............................................................................................. 323

Table 8-1. Invalid Operation Exception Categories ............................................................................... 372

Table 8-2. MSR[FE0, FE1] Modes ........................................................................................................ 374

Table 8-3. Invalid Operation Exceptions ............................................................................................... 376

Table 8-4. QNaN Result ........................................................................................................................ 381

Table 8-5. FPSCR[FPRF] Result Flags ................................................................................................. 382

Table 8-6. Floating-Point Status and Control Register (FPSCR) .......................................................... 383

Table 8-7. Bit Encodings for a CR Field ................................................................................................ 386

Table 9-1. Timebase Register (TB) ....................................................................................................... 388

Table 9-2. Timebase Lower Register (TBL) .......................................................................................... 388

Table 9-3. Timebase Upper Register (TBU) .......................................................................................... 389

Table 9-4. Decrementer Register (DEC) ............................................................................................... 390

Table 9-5. Decrementer Auto-Reload Register (DECAR) ..................................................................... 390

Table 9-6. Fixed Interval Timer Period Selection .................................................................................. 392

Table 9-7. Watchdog Timer Period Selection ........................................................................................ 393

Table 9-8. Watchdog Timer Exception Behavior ................................................................................... 394

Version 1.3 October 23, 2012

List of Tables

Page 25 of 864

Page 26

User’s Manual

A2 Processor

Table 10-1. PCCR0[DBA] (Debug Action) Definition per Thread ............................................................ 400

Table 10-2. Debug Events .......................................................................................................................402

Table 10-3. Debug Event Summary ........................................................................................................415

Table 10-4. Ram Instruction and Command Register (RAMIC) ..............................................................431

Table 10-5. Ram Instruction Register (RAMI) ..........................................................................................431

Table 10-6. Ram Command Register (RAMC) ........................................................................................431

Table 10-7. Ram Data Register (RAMD) .................................................................................................433

Table 10-8. Ram Data Register High (RAMDH) ......................................................................................433

Table 10-9. Ram Data Register Low (RAMDL) ....................................................................................... 434

Table 10-10. Thread Control and Status Register (THRCTL) ...................................................................442

Table 10-11. PC Configuration Register 0 (PCCR0) .................................................................................444

Table 11-1. Core Event Multiplexer to External Event Bus ......................................................................451

Table 11-2. Performance Monitor Event Tags .........................................................................................457

Table 11-3. FU Performance Events Table .............................................................................................458

Table 11-4. IU Performance Events Table ..............................................................................................458

Table 11-5. XU Performance Events Table .............................................................................................460

Table 11-6. LSU Performance Events Table ...........................................................................................462

Table 11-7. MMU Performance Events Table .........................................................................................465

Table 11-8. Core Instruction Trace Data and Control Signals .................................................................477

Table 11-9. First Instruction Trace Record Format ..................................................................................477

Table 11-10. Format of Subsequent Instruction Trace Records ................................................................478

Table 11-11. Trace Record Type Decode and Instruction Trace Record Ordering ...................................478

Table 14-1. Register Summary ................................................................................................................530

Table 15-1. SCOM Register Summary ....................................................................................................703

Table 15-2. Error Injection Register .........................................................................................................706

Table 15-3. Fault Isolation Register 0 (FIR0) ........................................................................................... 708

Table 15-4. FIR0 Action1 Register (FIR0A1) ...........................................................................................709

Table 15-5. FIR0 Mask Register (FIR0M) ................................................................................................710

Table 15-6. FIR0 and FIR1 Registers (Read Only) ................................................................................. 711

Table 15-7. Fault Isolation Register 1 ......................................................................................................711

Table 15-8. FIR1 Action0 Register (FIR1A0) ...........................................................................................713

Table 15-9. FIR1 Action1 Register (FIR1A1) ...........................................................................................714

Table 15-10. FIR1 Mask Register (FIR1M) ................................................................................................714

Table 15-11. Fault Isolation Register 2 (FIR2) ........................................................................................... 716

Table 15-12. FIR2 Action0 Register (FIR2A0) ...........................................................................................717

Table 15-13. FIR2 Action1 Register (FIR2A1) ...........................................................................................718

Table 15-14. FIR2 Mask Register (FIR2M) ................................................................................................720

Table 15-15. PC Configuration Register 0 (PCCR0) .................................................................................725

Table 15-16. Ram Data Register (RAMD) .................................................................................................726

List of Tables

Page 26 of 864

Version 1.3

October 23, 2012

Page 27

User’s Manual

A2 Processor

Table 15-17. Ram Data Register High (RAMDH) ...................................................................................... 726

Table 15-18. Ram Data Register Low (RAMDL) ....................................................................................... 727

Table 15-19. Ram Command Register (RAMC) ........................................................................................ 727

Table 15-20. Ram Instruction Register (RAMI) ......................................................................................... 729

Table 15-21. Ram Instruction and Command Register (RAMIC) .............................................................. 729

Table 15-22. Special Attention Register .................................................................................................... 729

Table 15-23. Thread Control and Status Register (THRCTL) ................................................................... 730

Table A-1. A2 Core Instructions by Mnemonic ...................................................................................... 738

Table B-1. FU Instructions by Opcode ................................................................................................... 756

Table C-1. AXU Debug Select Register (ADBSR) ................................................................................. 763

Table C-2. AXU Debug Multiplexer Debug and Trigger Groups ............................................................ 764

Table C-3. IU Debug Select Register (IDSR) ......................................................................................... 766

Table C-4. IU Debug Mux1 Debug and Trigger Groups ........................................................................ 768

Table C-5. IU Debug Mux2 Debug and Trigger Groups ........................................................................ 774

Table C-6. MMU and PC Debug Select Register (MPDSR) .................................................................. 778

Table C-7. MMU Debug Multiplexer Debug and Trigger Groups ........................................................... 781

Table C-8. PC Debug Multiplexer Debug and Trigger Groups .............................................................. 796

Table C-9. XU Debug Select Register1 (XDSR1) .................................................................................. 798

Table C-10. XU Debug Mux1 Debug and Trigger Groups ....................................................................... 800

Table C-11. XU Debug Mux2 Debug and Trigger Groups ....................................................................... 807

Table C-12. XU Debug Select Register2 (XDSR2) .................................................................................. 817

Table C-13. XU Debug Mux3 Debug and Trigger Groups ....................................................................... 819

Table C-14. XU Debug Mux4 Debug and Trigger Groups ....................................................................... 830

Table D-1. Multiply Instructions and Their Associated Latency ............................................................. 844

Table D-2. Divide Instructions and Their Associated Latency ............................................................... 844

Table D-3. SRAM Operations ................................................................................................................ 847

Table D-4. Interrupt Conditions .............................................................................................................. 854

Table D-5. Flush Conditions .................................................................................................................. 858

Version 1.3 October 23, 2012

List of Tables

Page 27 of 864

Page 28

User’s Manual

A2 Processor

List of Tables

Page 28 of 864

Version 1.3

October 23, 2012

Page 29

User’s Manual

A2 Processor

Revision Log

Each release of this document supersedes all previously released versions. The revision log lists all significant changes made to the document since its initial release. In the rest of the document, change bars in the margin indicate that the adjacent text was modified from the previous release of this document.

Revision Date Pages Description

October 23, 2012

657

May 25, 2011 — Version 1.2.

April 1, 2011 Version 1.1.

518 Added a programming note to Section 12.5.3 Execution.

90 Revised Table 2-11 Operand Handling Dependent on Alignment.

December 15, 2010 — Version 1.0. Initial release.

Version 1.3. Updated Section 14.5.99 PVR - Processor Version Register.

Removed “IBM Confidential.”

Version 1.3 October 23, 2012

Revision Log

Page 29 of 864

Page 30

User’s Manual

A2 Processor

Revision Log

Page 30 of 864

Version 1.3

October 23, 2012

Page 31

User’s Manual

A2 Processor

About This Book

This user’s manual provides the architectural overview, programming model, and detailed information about the instruction set, registers, and other facilities of the IBM® Power ISA

The A2 embedded controller core features:

• Power ISA Architecture

• Concurrent-issue pipeline with dynamic branch prediction

A2 64-bit embedded processor core.

• Separate 16 KB

each instruction and data caches

• Memory management unit (MMU) with a 512-entry translation lookaside buffer (TLB)

•4TB

(42-bit) physical address capability

• 128-bit reload interface and 128-bit store interface

•ANSI

/IEEE 754-1985 compliant floating-point1

• Single-precision and double-precision operation in hardware

• Auxiliary execution unit (AXU) that executes the Power ISA floating-point instruction set

• Super-pipelined: Single cycle throughput for most instructions

• In-order execution and completion

Who Should Use This Book

This book is for system hardware and software developers and for application developers who need to understand the A2 core. The audience should understand embedded system design, operating systems, RISC microprocessing, and computer organization and architecture.

How to Use This Book

This book describes the A2 core device architecture, programming model, registers, and instruction set. This book contains the following chapters:

• Overview on page 45

• CPU Programming Model on page 61

• FU Programming Model on page 127

• Initialization on page 153

• Instruction and Data Caches on page 169

• Memory Management on page 185

• CPU Interrupts and Exceptions on page 293

• FU Interrupts and Exceptions on page 371

• Timer Facilities on page 387

1.Power ISA FUs require software support for IEEE compliance.

Version 1.3 October 23, 2012

About This Book

Page 31 of 864

Page 32

User’s Manual

A2 Processor

• Debug Facilities on page 399

• Performance Events and Event Selection on page 449

• Implementation Dependent Instructions on page 481

• Power Management Methods on page 525

• Register Summary on page 529

• SCOM Accessible Registers on page 701

This book contains the following appendixes:

• Processor Instruction Summary on page 737

• FU Instruction Summary on page 756

• Debug and Trigger Groups on page 761

• Instruction Execution Performance and Code Optimizations on page 833

• Programming Examples on page 861

Notation

The manual uses the following notational conventions:

• Active low signals are shown with an overbar (Active_Low

• All numbers are decimal unless specified in some special way.

• 0bnnnn means a number expressed in binary format.

• 0xnnnn means a number expressed in hexadecimal format.

Underscores might be used between digits.

• RA refers to General Purpose Register (GPR) RA.

• (RA) refers to the contents of GPR RA.

• (RA|0) refers to the contents of GPR RA or to the value 0 if the RA field is 0.

• Bits in registers, instructions, and fields are specified as follows.

• Bits are numbered most-significant bit to least-significant bit, starting with bit 0.

•X

means bit p of register, instruction, or field X.

•X

means bits p through q of a register, instruction, or field X.

p:q

•X

means bits p, q,... of a register, instruction, or field X.

p,q,...

• X[p] means a named field p of register X.

• X[p:q] means named fields p through q of register X.

• X[p,q,...]

means named fields p, q,... of register X.

...

• ¬X means the ones complement of the contents of X.

• A period (.) as the last character of an instruction mnemonic means that the instruction records status information in certain fields of the Condition Register as a side effect of execution, as described in Section 12 Implementation Dependent Instructions on page 481.

About This Book

Page 32 of 864

Version 1.3

October 23, 2012

Page 33

User’s Manual

A2 Processor

• The symbol  is used to describe the concatenation of two values. For example, 0b010  0b111 is the same as 0b010111.

•x

means x raised to the n power.

x means the replication of x, n times (that is, x concatenated to itself n – 1 times). n0 and n1 are special

• cases:

•

0 means a field of n bits with each bit equal to 0. Thus 50 is equivalent to 0b00000.

•

1 means a field of n bits with each bit equal to 1. Thus 51 is equivalent to 0b11111.

• /, //, ///,... denotes a reserved field in an instruction or in a register.

• ? denotes an allocated bit in a register.

• A shaded field denotes a field that is reserved or allocated in an instruction or in a register.

Related Publications

• Power ISA User Set Architecture (Book I, Version 2.06)

• Power ISA Virtual Environment Architecture (Book II, Version 2.06)

• Power ISA Operating Environment Architecture (Book III-E, Version 2.06)

The Power ISA specifications are available at www.power.org

Version 1.3 October 23, 2012

About This Book

Page 33 of 864

Page 34

User’s Manual

A2 Processor

About This Book

Page 34 of 864

Version 1.3

October 23, 2012

Page 35

User’s Manual

A2 Processor

List of Acronyms and Abbreviations

ABIST automatic built-in self test

ALU arithmetic logic unit

ANSI American National Standards Institute

ARE auto-reload enable

AS address space

ATB alternate time base category

attn attention

AXU auxiliary execution unit

B base category

BCLR branch conditional to Link Register

BE big endian

BHT branch history table

BP branch prediction

BRDCAST broadcast

BRT branch taken

BTA branch target address

CA carry

CAM content addressable memory

CC congruence class

CCH control channel

CCW coprocessor command word

CD coprocessor directive

CEE change exception enable

CI coprocessor instance

CIA current instruction address

CPU central processing unit

CRB coprocessor-request block

CS cache specification category

Version 1.3 October 23, 2012

List of Acronyms and Abbreviations

Page 35 of 864

Page 36

User’s Manual

A2 Processor

CSB control status block

CSI context synchronizing instruction

DAC data address compare

DBA debug action

DBELL doorbell interrupt

DCC data cache controller

DCH data channel

DCI data cache invalidate instruction

DCR device control register

DEA data effective address

DEC decrementer

D-ERAT data ERAT

DERRDET D-ERAT error detect

DFP decimal floating-point category

DL downlink

DRA data real address

DSI data storage interrupt

DSP digital signal processor

DVC data value compare

E endian or embedded category

E.CD embedded.cache debug category

E.CI embedded.cache initialization category

E.DC embedded.device control category

E.ED embedded.enhanced debug category

E.HV embedded.hypervisor category

E.LE embedded.little-endian category

E.PC embedded.processor control category

E.PD embedded.external PID category

E.PM embedded.performance monitor category

List of Acronyms and Abbreviations

Page 36 of 864

Version 1.3

October 23, 2012

Page 37

User’s Manual

A2 Processor

E.PT embedded.page table category

E.TWC embedded.tlb write conditional category

EA effective address

ECC error-correcting code

ECL embedded cache locking category

EDM external debug mode

EEN error entry number

EH exclusive access hint

EM embedded multithreading category

EM.TM embedded multithreading.thread management category

EPID external PID

EPLC external process ID load context

EPN effective page number

EPR external problem state bit

EPSC external process ID store context

ERAT effective to real address translation

ESID effective segment ID

EVPR Exception Vector Prefix Register

EXC external control category

EXP external proxy category

FE floating-point equal

FG floating-point greater than

FIFO first-in, first out

FIR fault isolation register

FIT fixed interval timer

FL floating-point less than

FP floating-point category

FP.R floating-point.record category

FPR floating-point register

Version 1.3 October 23, 2012

List of Acronyms and Abbreviations

Page 37 of 864

Page 38

User’s Manual

A2 Processor

FU floating-point unit

FXU fixed-point unit

G guarded

GB gigabyte

GB/sec gigabytes per second

GHz gigahertz

GPR general purpose register

GS guest state

HTM hardware trace macro

HWT hardware table walker

I caching inhibited

I/O input/output

IAC instruction address compare

IEA instruction effective address

IBUFF instruction buffer

ICC instruction cache controller

ICI instruction cache immediate instruction

ICMP instruction complete

IDE imprecise debug event

IEA instruction effective address

IEEE Institute of Electrical and Electronics Engineers

I-ERAT instruction ERAT

IERRDET I-ERAT error detect

IFAR Instruction Fetch Address Register

IND indirect

INSTTRACE instruction trace mode

I/O input/output

IR intermediate result

IRPT interrupt

List of Acronyms and Abbreviations

Page 38 of 864

Version 1.3

October 23, 2012

Page 39

User’s Manual

A2 Processor

IS instruction fetch address space OR invalidation select

ISA instruction set architecture

ISI instruction storage interrupt

IU instruction unit

IU0 - IU6 instruction unit pipeline stage

IVC instruction value compare

JTAG Joint Test Action Group

KB kilobyte

LA logical address

L1 level 1

L2 level 2

LA logical address

LBIST logic built-in self-test

LE little endian

LIFO last-in, first-out

LMA legacy integer multiply-accumulate1 category

LMQ load miss queue

LMV legacy move assist category

LPID logical partition identifier

LPIDTAG LPID tag

LPN logical page number

LRAT logical to real address translation

LRU least recently used

LSb least significant bit

LSB least significant byte

LSQ load/store quadword category

LSU load/store unit

M memory coherence required

MA move assist category

Version 1.3 October 23, 2012

List of Acronyms and Abbreviations

Page 39 of 864

Page 40

User’s Manual

A2 Processor

MAS MMU assist

MAV MMU Architecture version

MB megabyte

MESI modified, exclusive, shared, invalid

MHz megahertz

MMC memory coherence category

MMU memory management unit

MSB most significant byte

MSRP Machine State Register protect

MT multithread

NaN Not a Number

NAND not AND

NH next higher in magnitude

NIA next instruction address

NL next lower in magnitude

NOR not OR

OV overflow

OX overflow exception

PC processor control

PCB pervasive control bus

PCR processor compatibility category

PIB pervasive interconnect bus

PID processor ID

PIRTAG PIR tag

PME power-management

PMU performance monitor unit

POR power-on reset

PS page size specified by PTE

PTE page table entry

List of Acronyms and Abbreviations

Page 40 of 864

Version 1.3

October 23, 2012

Page 41

User’s Manual

A2 Processor

QNaN quiet NaN

RA real address

RAW read-after-write

REE reference exception enable

RET return

RISC reduced instruction set computing

RMT replacement management table

RO read only

ROM read-only memory

RPN real page number

S server category

S.PM server.performance monitor category

S.RPTA server.relaxed page table alignment category

SAO strong access order category

SCOM serial communications

SCPM store conditional page mobility category

SEM sequential execution model

SER soft error rate

SIMD single instruction, multiple data

SLB segment lookaside buffer

SNaN signalling NaN

SO summary overflow

SOC system-on-a-chip

SP signal processing engine category

SPE signal processing engine

SP.FD SPE.embedded float scalar double category

SP.FS SPE.embedded float scalar single category

SP.FV SPE.embedded float vector category

SPR Special Purpose Register

Version 1.3 October 23, 2012

List of Acronyms and Abbreviations

Page 41 of 864

Page 42

User’s Manual

A2 Processor

SPRN special purpose register number

SPRG Special Purpose Registers General

SR supervisor mode read access

SRAM static random access memory

STM stream category

SW supervisor mode write access

SX supervisor mode execution access

TB terabyte

TBC transfer byte count

TBL time base lower

TBU time base upper

TERRDET TLB error detect

TGS translation guest space identifier

TID translation ID

TLB translation lookaside buffer

TLPID translation logical partition identifier

TRC trace category

TS translation space identifier

UC microcode unit or uncorrectable error

uCode microcode

UCT unavailable coprocessor type

UDE unconditional debug event

UDEC user decrementer

UE underflow exception

UL uplink

UND undefined

UR user mode read access

UTLB unified translation lookaside buffer

UW user mode write access

List of Acronyms and Abbreviations

Page 42 of 864

Version 1.3

October 23, 2012

Page 43

User’s Manual

A2 Processor

UX underflow exception or user mode execution access

V vector category

V.LE little-endian category

VA virtual addresses

VF virtualization fault

VHDL very-high-speed integrated circuit (VHSIC) hardware description language

VLE variable length encoding category

VLPT virtual linear page table

VPN virtual page number

VSID virtual segment ID

VSX vector-scalar extension category

VX invalid operation exception

W write-through

WAW write-after-write

WC wake control or write to clear

WDT watchdog timer

WIMGE write-through, caching-inhibited, memory coherency required, guarded, and endi-

anness attributes

WP watchdog timer period

WS write to set

WT wait category

XOR exclusive OR

XU execution unit

ZX zero divide exception

Version 1.3 October 23, 2012

List of Acronyms and Abbreviations

Page 43 of 864

Page 44

User’s Manual

A2 Processor

List of Acronyms and Abbreviations

Page 44 of 864

Version 1.3

October 23, 2012

Page 45

User’s Manual

A2 Processor

1. Overview

The IBM Power ISA A2 64-bit embedded processor core is an implementation of the scalable and flexible Power ISA architecture. The A2 core implements four simultaneous threads of execution within the core. Each thread of execution can be viewed as a processor within a 4-way multiprocessor with shared dataflow. This gives the effective appearance of four independent processing units to software. The performance of the four threads is limited because they share some resources such as the L1 and L2 caches.

The floating-point unit interfaces to the A2 processor core and incorporates a 6-stage arithmetic pipeline. The pipeline enables one arithmetic instruction to be issued during each cycle. Floating-point instructions execute with 6-cycle latency and 1-cycle throughput, except for operations on denormalized operands, division, and square root.

1.1 A2 Core Key Design Fundamentals

The key design fundamentals of the A2 core are the following:

• 64-bit implementation of the Power ISA Version 2.06 Book III-E - Embedded Platform Environment.

– The A2 core provides binary compatibility for IBM PowerPC® application level code (problem state). – The A2 core implements the Embedded Hypervisor Architecture to provide secure compute domains

and operating system virtualization.

• The A2 core is optimized for aggregate throughput.

– 4-way, fine-grained simultaneous multithreaded. – 2-way concurrent issue. One branch/integer/load/store + one AXU

(FP/vector). – In-order dispatch and execution. – 27 FO4 design.

• The A2 core is a modular design to support reuse.

– The A2 core provides a general purpose coprocessor (AXU) port to attached unique AXUs.

• AXUs have full ISA flexibility.

• AXUs currently include: –FU

- Power ISA V2.06 scalar double-precision floating-point unit.

• The AXU is an optional unit.

– The A2 core provides for an optional MMU

unit.

• The MMU unit supports Power ISA V2.06 Book III-E Memory Management (MAV

• Without the MMU, the A2 core supports the software-managed ERATs defined in this document.

– The A2 core provides for an optional microcode engine and ROM

• Power ISA V2.06 Book I and II instructions are supported with a combination of microcoded

instructions and hardware implemented instructions.

2.0).

Version 1.3 October 23, 2012

Overview

Page 45 of 864

Page 46

User’s Manual

A2 Processor

1.2 A2 Core Features

The A2 core is a high-performance, low-power engine that implements the flexible and powerful Power ISA Architecture.

The A2 core contains a single-issue, in-order, pipelined processing unit, along with other functional elements required by embedded product specifications. These other functions include memory management, cache control, timers, and debug facilities. Interfaces for custom coprocessors and floating-point functions are provided. The processor interface is 128 bits for reads and 128 bits (optional 256 bits version of the A2) for writes and provides the framework to efficiently support system-on-a-chip (SOC) designs.

A2 core features include:

• High-performance, concurrent-issue, 64-bit RISC

CPU

• 4-way, fine-grained simultaneous multithreaded implementation of the full 64-bit Power ISA Architecture

– One outstanding I-fetch request to the L2

cache per thread – One 8-entry instruction fetch buffer per thread – Up to four instructions can be placed in the instruction buffer per cycle – Up to one instruction can be taken out of the instruction buffer per cycle per thread – Instruction decode and dependency per thread

• Two-way concurrent instruction decode and issue

• In-order dispatch, execution, and completion

• High-accuracy dynamic branch prediction

–81024 entry branch history table with 2 bits of history – Four-entry link stack per thread

• Highly-pipelined microarchitecture

– Full GPR – Full CR

bypass

– Link Register bypass

• Single unified pipeline

• Complex integer, system, branch, simple integer, and load/store pipelines

• 5-port (3-read, 2-write) 32  4  64-bit General Purpose Register (GPR) file

• Hardware support for all CPU misaligned accesses (except for lmw and stmw)

• Full support for both big- and little-endian byte ordering

• Primary caches

• Separate instruction and data cache arrays

• Array size offerings: 16 KB

• Single-cycle access

• 64-byte line size

• 8-way set-associative D-cache, 4-way set-associative I-cache

• Write-through operation

• Unified (for all threads) nonblocking with up to eight outstanding load misses

Overview

Page 46 of 864

Version 1.3

October 23, 2012

Page 47

User’s Manual

A2 Processor

• Cache line locking supported

• Caches can be partitioned to provide separate regions for transient instructions and data

• Critical-word-first data access and forwarding

• Pseudo LRU

replacement policy

• Cache tags and data are parity protected. Errors are recoverable.

• Memory Management Unit (MMU)

• Support for Power ISA categories Embedded.Hypervisor (E.HV), Embedded.Hypervisor.LRAT (E.HV.LRAT), Embedded.TLB Write Conditional (E.TWC), and Embedded.Page Table (E.PT)

• Support for Power ISA Book III-E MMU Architecture Version 2.0 (MAV 2.0)

• Separate instruction and data ERAT

– Fully associative 16-entry I-ERAT

shared by all threads – Fully associative 32-entry D-ERAT shared by all threads – Entries can be shared by two or more threads via 4-bit thread ID mask field – Exclusion range function to allow address “holes” at base of page entries – ERATs operate in one of two modes: MMU mode or ERAT-only mode

1. MMU mode; ERAT with backing MMU – Software-managed page tables and indirect (IND = 1) TLB

entries – Hardware handles ERAT miss with TLB hit – Hardware handles direct (IND = 0) TLB miss via hardware page table walking – Software handles indirect (IND = 1) TLB miss via instruction and data TLB miss exceptions – Software can also install direct (IND = 0) TLB entries as required

2. ERAT-only mode; effective-to-real address translation with ERATs only – MMU removed, no backing TLB – Software-managed ERAT entries – I/D TLB miss exceptions

• 512-entry, 4-way set-associative unified TLB array

• Variable page sizes for direct (IND = 0) entries (4 KB, 64 KB, 1 MB, 16 MB, 1 GB), simultaneously resident in TLB and/or ERAT, and indirect (IND = 1) entries (1 MB and 256 MB) in TLB

• 88-bit virtual address (contains 64-bit effective address)

• 42-bit (4 TB

) real addressability

• Flexible TLB management via software management, or via hardware page table search

• Flexible storage attribute controls for write-through, caching inhibited, coherent, guarded, and byte order (endianness)

• Four user-definable storage attribute controls

• TLB tags and data are parity protected against soft errors.

• Debug facilities

• Extensive hardware debug facilities

• Multiple instruction and data address breakpoints

• Data value compare

• Instruction value compare

• Single-step, branch, trap, and other debug events

• Noninvasive real-time software trace interface

Version 1.3 October 23, 2012

Overview

Page 47 of 864

Page 48

User’s Manual

A2 Processor

• Timer facilities

• 64-bit time base

• Decrementer with auto-reload capability

• Fixed interval timer (FIT)

• Watchdog timer with critical interrupt and/or auto-reset

• Multiple core interfaces operating at core frequency

• System interface

• A command interface for instruction reads, data reads, and data writes

• A 256-bit interface for data writes (XUCR0[L2SIW] selects 128-bit mode)

• A 128-bit interface for instruction reads and data reads

• An invalidate interface to the core for the system to maintain L1

• Auxiliary execution unit (AXU) port

– Allows full ISA flexibility

• AXU includes support for separate decode and dependency

• Full support to stall and flush the processor

– A2 core pipeline exposed to allow high-performance, tightly coupled coprocessors

– Four-thread issue selection

– Provides functional extensions to the processor pipelines

– 256-bit load/store interface (direct access between AXU and the primary data cache)

– Interface can support AXU execution of all Power ISA floating-point instructions

– Attachment capability for DSP

– Enables customer-specific instruction enhancements for unique applications

• Clock and power management interface

• Debug interface

• Performance monitor event interface

Floating-point unit features include:

• IEEE

754-1985 compliance

cache coherency

coprocessing such as accumulators and SIMD computation

• Single-precision and double-precision operation in hardware

• Executes Power ISA floating-point instruction set

• Masked exceptions handled in hardware

• Super-pipelined; single-cycle throughput for most instructions

• In-order dispatch, execution, and completion

• Single instruction decode and issue

• Thirty-two 64-bit Floating-Point Registers (FPRs)

• 64-bit load/store interface

1. The A2 FPU requires software support for IEEE 754 compliance. See IEEE 754 and Architectural Compliance on page 56 for details.

Overview

Page 48 of 864

Version 1.3

October 23, 2012

Page 49

User’s Manual

A2 Processor

1.3 The A2 Core as a Power ISA Implementation

The A2 core implements the full, 64-bit fixed-point Power ISA Architecture. The A2 core fully complies with these architectural specifications. The core does not implement the floating-point operations, although a floating-point unit (FU) can be attached (using the AXU interface).

1.3.1 Embedded Hypervisor

The A2 core implements the Embedded Hypervisor Architecture to provide secure compute domains and operating system virtualization. The Embedded Hypervisor Architecture introduces the concept of partitions by two main architectural changes. The first is by extending the virtual address with a logical partition identifier (LPID). The identifier serves an analogous purpose to the process ID (PID) and is used to distinguish partitions. The second change is introducing a new privilege level above supervisor and reallocating ownership of resources between the two levels. Moving the ownership of certain resources beyond the supervisor helps software to provide secure compute domains.

In addition to providing logical partitions, the following requirements are set forth:

• Ensure a secure environment. An operating system in one logical partition is not allowed to affect the resources of an operating system in another partition.

• Maintain compatibility with the existing programming model. An existing operating system today should require only minor initialization changes to run.

• An operating system running in a logical partition should not be able to deny service to any shared resources.

• Clean and secure communication channels between supervisor and embedded hypervisor states (in both directions).

• The ability to run guest operating systems efficiently and provide real-time response to interrupts.

1.4 A2 Core Organization

The A2 core includes a concurrent-issue instruction fetch and decode unit with an attached branch unit, together with a pipeline for complex integer, simple integer, and load/store operations. The A2 core also includes a memory management unit (MMU); separate instruction and data cache units; pervasive and debug logic; and timer facilities.

Version 1.3 October 23, 2012

Overview

Page 49 of 864

Page 50

User’s Manual

A2 Processor

Figure 1-1. A2 Core Organization

1.4.1 Instruction Unit

The instruction unit of the A2 core fetches, decodes, and issues two instructions from different threads per cycle to any combination of the one execution pipeline and the AXU interface (see Section 1.4.2 Execution Unit on page 51 and Section 1.5.2 Auxiliary Execution Unit (AXU) Port on page 59). The instruction unit includes a branch unit that provides dynamic branch prediction using a branch history table (BHT). This mechanism greatly improves the branch prediction accuracy and reduces the latency of taken branches, such that the target of a branch can usually be executed immediately after the branch itself with no penalty.

Overview

Page 50 of 864

Version 1.3

October 23, 2012

Page 51

User’s Manual

A2 Processor

1.4.2 Execution Unit

The A2 core contains a single execution pipeline. The pipeline consists of seven stages and can access the 5-ported (three read, two write) GPR file.

The pipeline handles all arithmetic, logical, branch, and system management instructions (such as interrupt and TLB management, move to/from system registers, and so on) as well as arithmetic, logical operations and all loads, stores and cache management operations. The pipelined multiply unit can perform 32-bit  32- bit multiply operations with single-cycle throughput and single-cycle latency. The width of the divider is 64 bits. Divide instructions dealing with 64-bit operands recirculate for 65 cycles, and operations with 32-bit operands recirculate for 32 cycles. No divide instructions are pipelined; they all require some recirculation.

All misaligned operations are handled in hardware with no penalty on any operation that is contained within an aligned 32-byte region. The load/store pipeline supports all operations to both big-endian and little-endian data regions.

Appendix D Instruction Execution Performance and Code Optimizations on page 833 provides detailed information about instruction timings and performance implications in the A2 core.

1.4.3 Instruction and Data Cache Controllers

The A2 core provides separate instruction and data cache controllers and arrays, which allow concurrent access and minimize pipeline stalls. The storage capacity of the cache arrays 16 KB each. Both cache controllers have 64-byte lines, with 4-way set-associativity I-cache and 8-way set-associativity D-cache. Both caches support parity checking on the tags and data in the memory arrays to protect against soft errors. If a parity error is detected, the CPU forces an L1 miss and reloads from the system bus. The A2 core can be configured to cause a machine check exception on a D-cache parity error.

The Power ISA instruction set provides a rich set of cache management instructions for software-enforced coherency. See Instruction and Data Caches on page 169 for detailed information about the instruction and data cache controllers.

1.4.3.1 Instruction Cache Controller

The instruction cache controller (ICC) delivers up to four instructions per cycle to the instruction unit of the A2 core. The ICC also handles the execution of the Power ISA instruction cache management instructions for coherency.

1.4.3.2 Data Cache Controller

The data cache controller (DCC) handles all load and store data accesses, as well as the Power ISA data cache management instructions. All misaligned accesses are handled in hardware. Cacheable load accesses that are contained within a double quadword (32 bytes) are handled as a single request. Cacheable store or caching inhibited loads or store accesses that are contained within a quadword (16 bytes) are handled as a single request. Load and store accesses that cross these boundaries are broken into separate byte accesses by the hardware by the microcode engine. When in 32-byte store mode (XUCR0[L2SIW] = 1), then all misaligned store or load accesses contained within a double quadword (32 bytes) are handled as a single request. This includes cacheable and caching inhibited stores and loads.

Version 1.3 October 23, 2012

Overview

Page 51 of 864

Page 52

User’s Manual

A2 Processor

The DCC interfaces to the AXU port to provide direct load/store access to the data cache for AXU load and store operations. Such AXU load and store instructions can access up to 32 bytes (a double quadword) in a single cycle for cacheable accesses and can access up to 16 bytes (a quadword) in a single cycle for caching inhibited accesses.

The data cache always operates in a write-through manner.

The DCC also supports cache line locking and “transient” data via way locking.

The DCC provides for up to eight outstanding load misses, and the DCC can continue servicing subsequent load and store hits in an out-of-order fashion. Store-gathering is not performed within the A2 core.

1.4.4 Memory Management Unit (MMU)

The A2 core supports a flat, 42-bit (4 TB) real (physical) address space. This 42-bit real address is generated by the MMU as part of the translation process from the 64-bit effective address, which is calculated by the processor core as an instruction fetch or load/store address.

Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros. Therefore, to have a translation hit in 32-bit mode, software needs to set the effective address upper bits to zero in the ERATs and TLB.

The MMU provides address translation, access protection, and storage attribute control for embedded applications. The MMU supports demand paged virtual memory and other management schemes that require precise control of logical to physical address mapping and flexible memory protection. Working with appropriate system level software, the MMU provides the following functions:

• Translation of the 88-bit virtual address, 1-bit guest state (GS), 8-bit logical partition ID (LPID), 1-bit address space (AS) identifier, 14-bit process ID (PID), and 64-bit effective address into the 42-bit real address (note the 1-bit indirect entry IND bit is not considered part of the virtual address)

• Page-level read, write, and execute access control

• Storage attributes for cache policy, byte order (endianness), and speculative memory access

• Software control of page replacement strategy

The translation lookaside buffer (TLB) is the primary hardware resource involved in the control of translation, protection, and storage attributes. It consists of 512 entries, each specifying the various attributes of a given page of the address space. The TLB is 4-way set associative. The TLB entries can be of type direct (IND = 0), in which case the virtual address is translated immediately by a matching entry, or of type indirect (IND = 1), in which case the hardware page table walker is invoked to fetch and install an entry from the hardware page table.

The TLB tag and data memory arrays are parity protected against soft errors; if a parity error is detected during an address translation, the TLB and ERAT caches treat the parity error like a miss and proceed to either reload the entry with correct parity (in the case of an ERAT miss, TLB hit) and set the parity error bit in the appropriate fault isolation register (FIR), or generate a TLB exception where software can take appropriate action (in the case of a TLB miss).

An operating system can choose to implement hardware page tables in memory that contain virtual to logical translation page table entries (PTEs) per Category E.PT. These PTEs are loaded into the TLB by the hardware page table walker logic after the logical address is converted to a real address via the logical to real address translation (LRAT) per Category E.HV.LRAT. Software must install indirect (IND = 1) type TLB entries for each page table that is to be traversed by the hardware walker. Alternately, software can manage

Overview

Page 52 of 864

Version 1.3

October 23, 2012

Page 53

User’s Manual

A2 Processor

the establishment and replacement of TLB entries by simply not using indirect entries (that is, by using only direct IND = 0 entries). This gives system software significant flexibility in implementing a custom page replacement strategy. For example, to reduce TLB thrashing or translation delays, software can reserve several TLB entries for globally accessible static mappings. The instruction set provides several instructions for managing TLB entries. These instructions are privileged, and the processor must be in supervisor state i for them to be executed.

The first step in the address translation process is to expand the effective address into a virtual address. This is done by taking the 64-bit effective address and prepending to it a 1-bit guest state (GS) identifier, an 8-bit logical partition ID (LPID), a 1-bit address space (AS) identifier, and the 14-bit process identifier (PID). The 1-bit indirect entry (IND) identifier is not considered part of the virtual address. The LPID value is provided by the LPIDR register, and the PID value is provided by the PID register (see Memory Management on page 185). The GS and AS identifiers are provided by the Machine State Register (MSR, see CPU Interrupts and Exceptions on page 293), which contains separate bits for the instruction fetch address space (MSR[IS]) and the data access address space (MSR[DS]). Together, the 64-bit effective address and the other identifiers form an 88-bit virtual address. This 88-bit virtual address is then translated into the 42-bit real address using the TLB.

The MMU divides the address space (whether effective, virtual, or real) into pages. Five direct (IND = 0) page sizes (4 KB, 64 KB, 1 MB, 16 MB, 1 GB) are simultaneously supported, such that at any given time the TLB can contain entries for any combination of page sizes. The MMU also supports two indirect (IND = 1) page sizes (1 MB and 256 MB) with associated sub-page sizes (see Section 6.16 Hardware Page Table Walking (Category E.PT)). For an address translation to occur, a valid direct entry for the page containing the virtual address must be in the TLB. An attempt to access an address for which no TLB direct exists results in a search for an indirect TLB entry to be used by the hardware page table walker. If neither a direct or indirect entry exists, an instruction (for fetches) or data (for load/store accesses) TLB miss exception occurs.

To improve performance, both the instruction cache and the data cache maintain separate shadow TLBs called ERATs. The ERATs contain only direct (IND = 0) type entries. The instruction ERAT (I-ERAT) contains 16 entries, while the data ERAT (D-ERAT) contains 32 entries. These ERAT arrays minimize TLB contention between instruction fetch and data load/store operations. The instruction fetch and data access mechanisms only access the main unified TLB when a miss occurs in the respective ERAT. Hardware manages the replacement and invalidation of both the I-ERAT and D-ERAT; no system software action is required in MMU mode. In ERAT-only mode, an attempt to access an address for which no ERAT entry exists causes an instruction (for fetches) or data (for load/store accesses) TLB miss exception.

Each TLB entry provides separate user state and supervisor state read, write, and execute permission controls for the memory page associated with the entry. If software attempts to access a page for which it does not have the necessary permission, an instruction (for fetches) or data (for load/store accesses) storage exception occurs.

Each TLB entry also provides a collection of storage attributes for the associated page. These attributes control cache policy (such as cacheability and write-through as opposed to copy-back behavior), byte order (big-endian as opposed to little-endian), and enabling of speculative access for the page. In addition, a set of four, user-definable storage attributes are provided. These attributes can be used to control various systemlevel behaviors.

Section 6 Memory Management describes the A2 core MMU functions in greater detail.

Version 1.3 October 23, 2012

Overview

Page 53 of 864

Page 54

User’s Manual

A2 Processor

1.4.5 Timers

The A2 core contains a time base and three timers: a decrementer (DEC), a fixed interval timer (FIT), and a watchdog timer. The time base is a 64-bit counter that gets incremented at a frequency either equal to the processor core clock rate or as controlled by a separate asynchronous timer clock input to the core. No interrupt is generated as a result of the time base wrapping back to zero.

The DEC is a 32-bit register that is decremented at the same rate at which the time base is incremented. The user loads the DEC register with a value to create the desired interval. When the register is decremented to zero, a number of actions occur: the DEC stops decrementing, a status bit is set in the Timer Status Register (TSR), and a decrementer exception is reported to the interrupt mechanism of the A2 core. Optionally, the DEC can be programmed to reload automatically the value contained in the Decrementer Auto-Reload Register (DECAR), after which the DEC resumes decrementing. The Timer Control Register (TCR) contains the interrupt enable for the decrementer interrupt.

The FIT generates periodic interrupts based on the transition of a selected bit from the time base. Users can select one of four intervals for the FIT period by setting a control field in the TCR to select the appropriate bit from the time base. When the selected time base bit transitions from 0 to 1, a status bit is set in the TSR and a fixed interval timer exception is reported to the interrupt mechanism of the A2 core. The FIT interrupt enable is contained in the TCR.

Similar to the FIT, the watchdog timer also generates a periodic interrupt based on the transition of a selected bit from the time base. Users can select one of four intervals for the watchdog period, again by setting a control field in the TCR to select the appropriate bit from the time base. Upon the first transition from 0 to 1 of the selected time base bit, a status bit is set in the TSR and a watchdog timer exception is reported to the interrupt mechanism of the A2 core. The watchdog timer can also be configured to initiate a hardware reset if a second transition of the selected time base bit occurs before the first watchdog exception being serviced. This capability provides an extra measure of recoverability from potential system lock-ups.

The timer functions of the A2 core are more fully described in Timer Facilities on page 387

1.4.6 Debug Facilities

The A2 core debug facilities include debug modes for the various types of debugging used during hardware and software development. Also included are debug events that allow developers to control the debug process. Debug modes and debug events are controlled using debug registers in the chip. The debug registers are accessed either through software running on the processor, or through the serial communications (SCOM) port.

The debug modes, events, controls, and interfaces provide a powerful combination of debug facilities for hardware development tools such as the RISCWatch debugger from IBM.

A brief overview of the debug modes and development tool support are provided below. Debug Facilities on page 399 provides detailed information about each debug mode and other debug resources.

1.4.6.1 Debug Modes

The A2 core supports two debug modes: internal and external. Each mode supports a different type of debug tool used in embedded systems development. Internal debug mode supports software-based ROM

monitors, and external debug mode supports a hardware emulator type of debug. The debug modes are controlled by Debug Control Register 0 (DBCR0) and the setting of bits in the Machine State Register (MSR).

Overview

Page 54 of 864

Version 1.3

October 23, 2012

Page 55

User’s Manual

A2 Processor

Internal debug mode supports accessing architected processor resources, setting hardware and software breakpoints, and monitoring processor status. In internal debug mode, debug events can generate debug exceptions, which can interrupt normal program flow so that monitor software can collect processor status and alter processor resources.

Internal debug mode relies on exception-handling software—running on the processor—along with an external communications path to debug software problems. This mode is used while the processor continues executing instructions and enables debugging of problems in application or operating system code. Access to debugger software executing in the processor while in internal debug mode is through a communications port on the processor board, such as a serial port or Ethernet connection.

External debug mode supports stopping, starting, and single-stepping the processor, accessing architected processor resources, setting hardware and software breakpoints, and monitoring processor status. In external debug mode, debug events can architecturally “freeze” the processor. While the processor is frozen, normal instruction execution stops, and the architected processor resources can be accessed and altered using a debug tool (such as RISCWatch) attached through the SCOM port. This mode is useful for debugging hardware and low-level control software problems.

1.4.6.2 Development Tool Support

The A2 core provides powerful debug support for a wide range of hardware and software development tools.

RISCWatch is an example of a development tool that uses the external debug mode, debug events, and the SCOM port to support hardware and software development and debugging.

1.4.7 Floating-Point Unit Organization

The floating-point unit incorporates a single-issue instruction decode and issue unit and a 6-stage arithmetic pipeline working in parallel with a 4-stage load/store pipeline. The floating-point unit contains a Floating-Point Register (FPR) file that interfaces to both pipelines. There are thirty-two 64-bit FPRs.

Figure 1-2 illustrates the logical organization of the A2 core and its relationship to the A2 processor core.

Version 1.3 October 23, 2012

Overview

Page 55 of 864

Page 56

User’s Manual

Instruction Decode/Issue Unit

AXU

Interface

Data

Cache

Arithmetic

Pipe

Load/Store

Pipe

FPSCR

Thread 0 Thread 1

Unit

FPR0

FPR1

•

FPR1

FPR30

FPR31

Floating-Point AXUA2 Core

Thread 2 Thread 3

A2 Processor

Figure 1-2. A2 Processor Block Diagram

1.4.7.1 Arithmetic and Load/Store Pipelines

The A2 core has a single execution pipeline. The pipeline handles all computational instructions and reads from and writes to the FPRs, Floating-Point Status and Control Register (FPSCR), and the Condition Register (CR).

1.4.8 IEEE 754 and Architectural Compliance

The A2 core is IEEE 754 and Power ISA compliant and implements single-precision and double-precision instructions.

Overview

Page 56 of 864

Version 1.3

October 23, 2012

Page 57

User’s Manual

A2 Processor

1.4.8.1 IEEE 754 Compliance

IEEE 754 requires a certain set of operations to be included in any implementation that claims to be compliant. Such operations can be implemented in hardware, software, or a combination of the two. The Power ISA floating-point architecture includes most of the required operations but some are missing. The missing operations are: floating-point remainder, format conversion between binary and decimal, and format conversion from integer to floating-point. It is necessary to provide a software library to support these missing functions. In other words, the Power ISA Architecture requires software support to be fully complaint with the IEEE standard.

1.4.9 Floating-Point Unit Implementation

Certain aspects of the behavior of the floating-point unit are implementation-specific.

1.4.9.1 Reciprocal Estimates

While the Power ISA Architecture defines single-precision reciprocal estimates and reciprocal square root estimates to have relative errors of 2 relative error of 2

-14

-5

and 2-8 respectively, both are implemented in the A2 core to have a

Programmers are encouraged to take advantage of this increased accuracy, but must be aware that code that relies on this increased accuracy might not work on any other Power ISA FU.

1.4.9.2 Denormalized B Operands

The floating-point unit supports all denormal numbers in the dataflow with no additional latency except the following cases:

1. B is a double-precision denorm AND NOT (move{fabs/fnabs/fneg} OR fsel OR fcfid OR mv_to_fpscr).

2. B is a single-precision denorm AND NOT (move{fabs/fnabs/fneg} OR fsel)

If any of the above cases are detected, the A2 core flushes to the microcode engine, which in turn issues a prenormalization instruction, followed by the original instruction. The latency for these operations increases by 20 cycles when this occurs.

1.4.9.3 Non-IEEE mode

Non-IEEE mode, controlled by the NI bit in the FPSCR, is intended to eliminate data-dependent overhead cycles caused by exceptional operands or results. The result is faster, deterministic performance with reasonable results. This mode is not supported by the A2 core. The value of the NI bit is ignored.

1.4.10 Floating-Point Unit Interfaces

The floating-point unit interfaces to the A2 processor core.

1.4.10.1 A2 Processor Core Interface

This interface enables the A2 core to interact with the A2 processor core. Interactions include resets and updating the CR.

Version 1.3 October 23, 2012

Overview

Page 57 of 864

Page 58

User’s Manual

A2 Processor

1.4.10.2 Clock and Power Management Interface

The CPM interface supports clock distribution and power management to reduce power consumption below the normal operational level. External logic is necessary for the sleep mode to function.

1.5 Core Interfaces

The core includes the following interfaces:

• System interface

• Auxiliary execution unit (AXU) port

• SCOM, debug, trace, and performance monitor event ports

• Interrupt interface

• Clock and power management interface

Several of these interfaces are described briefly in the sections below.

1.5.1 System Interface

The A2 core interface has one command interface for instruction reads, data reads, and data writes, and uses a 42-bit address bus. A full 64-byte cache line is implied for cacheable data reads and cacheable instruction fetches. The transfer length is used to indicate 1 byte, 2 byte, 4 byte, 8 byte, 16 byte, and 32 byte for noncacheable reads and 16 bytes for noncacheable instruction fetches. There is a 256-bit data interface for data writes with 32 byte enables indicating which bytes should be written.

Data writes can be 1 byte, 2 byte, 4 byte, 8 byte, or 16 byte for noncacheable or cacheable writes. There is a 128-bit data reload interface for instruction reads and data reads. When the reload data is less than 16 bytes (due to the transfer length indicating 1 byte, 2 byte, 4 byte or 8 byte), the data should be aligned within the 16 byte reload bus based on the associated command interface address. There is a back invalidate interface for systems with an entity outside the A2 core (such as an L2 cache controller) that provide hardware cache coherency.

A2 supports a mode that enables a 32-byte write bus to the A2 core/L2 interface. Only the AXU can produce 32-byte writes.

The command interface is a credit-based interface. The A2 core can handle up to eight load-type credits. The actual number of load-type credits (L) that it will handle is initialized in the A2 core configuration ring. In the A2 core, there is a 12-entry load command queue that includes eight entries for data loads and four entries for instruction fetches. An entity outside the A2 core is expected to have a near queue of L entries for load-type operations and to give a pop indication to the A2 core as each is sent to the far queue that contains 8 to 12 entries. The specific command is indicated in the transaction type.

Examples of transaction types that expect data to be returned on the reload bus are instruction fetch, load, and dcbt. Examples of transaction types that do not expect data to be returned on the reload bus are store, dcbz and dcbf. The A2 core can handle up to 32 store-type credits. The actual number of credits (S) that it will handle is initialized in the A2 core configuration ring.

Overview

Page 58 of 864

Version 1.3

October 23, 2012

Page 59

User’s Manual

A2 Processor

An entity outside the A2 core is expected to be able to queue the S store-type operations and give a pop indication to the A2 core for each as it is processed and the queue entry is available. For an entity outside the A2 core that also support store gathering, it should give a gather indication to the A2 core when the store is gathered with an existing queue entry to let the A2 core know that an additional queue entry is available.

1.5.2 Auxiliary Execution Unit (AXU) Port

This interface provides the A2 core with the flexibility to attach a tightly-coupled coprocessor-type macro incorporating instructions that go beyond those provided within the processor core itself. The AXU port provides sufficient functionality for attachment of various coprocessor functions such as a fully-compliant Power ISA floating-point unit (single- or double-precision), multimedia engine, DSP

, or other custom function implementing algorithms appropriate for specific system applications. The AXU interface supports can be used with macros that contain their own register files. AXU load and store instructions can directly access the A2 core data cache, with operands of up to a double quadword (32 bytes) in length.

The AXU interface provides the capability for a coprocessor to execute instructions that are not part of the Power ISA instruction set at the same time that the A2 core is executing PowerISA instructions. Areas within the architected instruction space allow for these customer-specific or application-specific AXU instruction set extentions. Further description is beyond the scope of this document.

1.5.3 JTAG Port

The A2 core SCOM port supports the indirect attachment of a debug tool such as the RISCWatch product from IBM. A logic block outside the A2 core must provide JTAG

to SCOM port translation. Through the SCOM port, and using the debug facilities designed into the A2 core, a debug workstation can single-step the processor and interrogate the internal processor state to facilitate hardware and software debugging.

Version 1.3 October 23, 2012

Overview

Page 59 of 864

Page 60

User’s Manual

A2 Processor

Overview

Page 60 of 864

Version 1.3

October 23, 2012

Page 61

User’s Manual

A2 Processor

2. CPU Programming Model

The programming model of the A2 core describes how the following features and operations of the core appear to programmers:

• Logical Partitioning on page 61

• Storage Addressing on page 62

• Multithreading on page 70

• Registers on page 82

• 32-Bit Mode on page 85

• Instruction Categories on page 86

• Instruction Classes on page 87

• Implemented Instruction Set Summary on page 88

• Wait Instruction on page 98

• Branch Processing on page 99

• Integer Processing on page 110

• Processor Control on page 113

• Privileged Modes on page 120

• Speculative Accesses on page 122

• Synchronization on page 122

• Software Transactional Memory Acceleration on page 125

2.1 Logical Partitioning

2.1.1 Overview

Logical partitioning defines instructions, resources, and methods for establishing an additional attribute of processor privilege called a guest state.

The Embedded.Hypervisor category permits processors and portions of real storage to be assigned to local collections called partitions such that a program executing on a processor in one partition cannot interfere with any program executing on a processor in a different partition. This isolation can be provided for both problem state and privileged state programs by using a layer of trusted software called a hypervisor program (or simply a “hypervisor”) and the resources provided by this category to manage system resources. The collection of software that runs in a given partition and its associated resources is called a guest. The guest normally includes an operating system (or other system software) running in privileged state and its associated processes running in the problem state under the management of the hypervisor. The processor is in the guest state when a guest is executing, and it is in the hypervisor state when the hypervisor is executing. The processor is executing in the guest state when MSR[GS] = 1.

A2 implements 2

partitions. See Section 6.17.2 Logical Partition ID Register (LPIDR) on page 245. All

threads of a single A2 core must be assigned to the same logical partition.

Version 1.3 October 23, 2012

CPU Programming Model

Page 61 of 864

Page 62

User’s Manual

A2 Processor

A processor is assigned to one partition at any given time. A processor can be assigned to any given partition without consideration of the physical configuration of the system (for example, shared registers, caches, organization of the storage hierarchy), except that processors that share certain hypervisor resources might need to be assigned to the same partition. Additionally, certain resources can be used by the guest at the discretion of the hypervisor. Such usage might cause interference between partitions, and the hypervisor should allocate those resources accordingly. The primary registers and facilities used to control logical partitioning are described in the following subsections. Other facilities associated with logical partitioning are described within the appropriate sections within this book.

Category Embedded.Hypervisor changes the operating system programming model to allow for easier virtualization, while retaining a default backwards compatible mode where an operating system written for processors not containing this category will still operate as before without using the logical partitioning facilities.

2.2 Storage Addressing

As a 64-bit implementation of the Power ISA Architecture, the A2 core implements a uniform 64-bit effective address (EA) space. Effective addresses are expanded into virtual addresses and then translated to 42-bit (4 TB) real addresses by the memory management unit (see Memory Management on page 185 for more information about the translation process). The organization of the real address space into a physical address space is system-dependent, and is described in the user’s manuals for chip-level products that incorporate an A2 core.

The A2 core generates an effective address whenever it executes a storage access, branch, cache management, or translation look aside buffer (TLB) management instruction, or when it fetches the next sequential instruction.

2.2.1 Storage Operands

Bytes in storage are numbered consecutively starting with 0. Each number is the address of the corresponding byte.

Data storage operands accessed by the integer load/store instructions can be bytes, halfwords, words, doublewords or—for load/store multiple and string instructions—a sequence of words or bytes, respectively. Data storage operands accessed by auxiliary execution unit (AXU) load/store instructions can be bytes, halfwords, words, doublewords, quadwords or double quadwords. The address of a storage operand is the address of its first byte (that is, of its lowest-numbered byte). Byte ordering can be either big endian or little endian, as controlled by the endian storage attribute (see Byte Ordering on page 66; also see Endian (E) on page 197 for more information about the endian storage attribute).

Operand length is implicit for each scalar storage access instruction type (that is, each storage access instruction type other than the load/store multiple and string instructions). The operand of such a scalar storage access instruction has a “natural” alignment boundary equal to the operand length. In other words, the natural address of an operand is an integral multiple of the operand length. A storage operand is said to be aligned if it is aligned at its natural boundary; otherwise, it is said to be unaligned.

Data storage operands for storage access instructions have the characteristics shown in Table 2-1 on page 63.

CPU Programming Model

Page 62 of 864

Version 1.3

October 23, 2012

Page 63

User’s Manual

A2 Processor

Table 2-1. Data Operand Definitions

Storage Access Instruction Type Operand Length Addr[59:63] if Aligned

Byte (or String) 8 bits 0bxxxxx

Halfword 2 bytes 0bxxxx0

Word (or Multiple) 4 bytes 0bxxx00

Doubleword 8 bytes 0bxx000

Quadword (AXU only) 16 bytes 0bx0000

Double Quadword (AXU only) 32 bytes 0b00000

Note: An “x” in an address bit position indicates that the bit can be 0 or 1 independent of the state of other bits in the address.

The alignment of the operand effective address of some storage access instructions might affect performance; in some cases, it might cause an alignment exception to occur. For such storage access instructions, the best performance is obtained when the storage operands are aligned. Table 2-2 summarizes the effects of alignment on those storage access instruction types for which such effects exist. If an instruction type is not shown in the table, there are no alignment effects for that instruction type.

Table 2-2. Alignment Effects for Storage Access Instructions (Sheet 1 of 2)

Storage Access Instruction Type Alignment Effects

Integer cacheable load halfword

Integer cacheable store or caching inhibited load/store halfword

Integer cacheable load word

Integer cacheable store or caching inhibited load/store word

Integer cacheable load doubleword

Integer cacheable store or caching inhibited load/store doubleword

Integer load/store multiple

Integer load/store string Broken into a series of byte accesses until the last byte is accessed. (See notes.)

AXU cacheable load halfword

AXU cacheable store or caching inhibited load/store halfword

AXU cacheable load word

AXU cacheable store or caching inhibited load/store word

AXU cacheable load doubleword

AXU cacheable store or caching inhibited load/store doubleword

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] = 0b11111); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] = 0b1111); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11100); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1100); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11000); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1000); otherwise no effect. (See notes.)

Broken into a series of word (4-byte) accesses until the last word is accessed. The load/store multiple address must be word aligned. (See notes.)

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] = 0b11111); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] = 0b1111); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11100); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1100); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b11000); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b1000); otherwise no effect. (See notes.)

Version 1.3 October 23, 2012

CPU Programming Model

Page 63 of 864

Page 64

User’s Manual

A2 Processor

Table 2-2. Alignment Effects for Storage Access Instructions (Sheet 2 of 2)

Storage Access Instruction Type Alignment Effects

AXU cacheable load quadword

AXU cacheable store or caching inhibited load/store quadword

AXU cacheable load double quadword

AXU cacheable store or caching inhibited load/store double quadword

Notes:

• Any unaligned access that also crosses a 4 K page boundary causes an alignment exception.

• An auxiliary processor can specify that the EA for a given AXU load/store instruction must be aligned at the operand-size boundary or, alternatively, at a word boundary. If the AXU so indicates this requirement and the calculated EA fails to meet it, the A2 core generates an alignment exception. Alternatively, an auxiliary processor can specify that the EA for a given AXU load/store instruction should be “forced” to be aligned by ignoring the appropriate number of low-order EA bits and processing the AXU load/store as if those bits were 0. Byte, halfword, word, doubleword, and quadword AXU load/store instructions ignore 0, 1, 2, 3, and 4 low-order EA bits, respectively.

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b10000); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b0000); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 32-byte boundary (EA[59:63] > 0b00000); otherwise no effect. (See notes.)

Broken into byte accesses if crosses 16-byte boundary (EA[60:63] > 0b0000); otherwise no effect. (See notes.)

Cache management instructions access cache block operands; for the A2 core, the cache block size is 64 bytes. However, the effective addresses calculated by cache management instructions are not required to be aligned on cache block boundaries. Instead, the architecture specifies that the associated low-order effective address bits (bits 58:63 for the A2 core) are ignored during the execution of these instructions.

Similarly, the TLB management instructions access page operands, and—as determined by the page size— the associated low-order effective address bits are ignored during the execution of these instructions.

Instruction storage operands, on the other hand, are always 4 bytes long, and the effective addresses calculated by branch instructions are therefore always word-aligned.

2.2.2 Effective Address Calculation

For a storage access instruction, if the sum of the effective address and the operand length exceeds the maximum effective address of 2

–1 for 64-bit mode or 232–1 in 32-bit mode (that is, the storage operand itself crosses the maximum address boundary), the result of the operation is undefined, as specified by the architecture. The A2 core performs the operation as if the storage operand wrapped around from the maximum effective address to effective address 0. Software, however, should not depend upon this behavior, so that it can be ported to other implementations that do not handle this scenario in the same fashion. Accordingly, software should ensure that no data storage operands cross the maximum address boundary.

Note: Because instructions are words and because the effective addresses of instructions are always implicitly on word boundaries, it is not possible for an instruction storage operand to cross any word boundary, including the maximum address boundary.

Effective address arithmetic, which calculates the starting address for storage operands, wraps around from the maximum address to address 0 for all effective address computations except next sequential instruction fetching. See Instruction Storage Addressing Modes on page 65 for more information about next sequential instruction fetching at the maximum address boundary.

CPU Programming Model

Page 64 of 864

Version 1.3

October 23, 2012

Page 65

User’s Manual

A2 Processor

2.2.2.1 Data Storage Addressing Modes

There are two data storage addressing modes supported by the A2 core:

• Base + displacement (D-mode) addressing mode:

The 16-bit D field is sign-extended and added to the contents of the GPR

designated by RA or to zero if

RA = 0.

• Base + index (X-mode) addressing mode:

The contents of the GPR designated by RB (or the value 0 for lswi and stswi) are added to the contents of the GPR designated by RA or to 0 if RA = 0.

2.2.2.2 Instruction Storage Addressing Modes

There are four instruction storage addressing modes supported by the A2 core:

• I-form branch instructions (unconditional):

The 24-bit LI field is concatenated on the right with 0b00, sign-extended, and then added to either the address of the branch instruction if AA = 0 or to 0 if AA = 1.

• Taken B-form branch instructions:

The 14-bit BD field is concatenated on the right with 0b00, sign-extended, and then added to either the address of the branch instruction if AA = 0 or to 0 if AA = 1.

• Taken XL-form branch instructions:

The contents of bits 0:61 of the Link Register (LR) or the Count Register (CTR) are concatenated on the right with 0b00 to form the 64-bit effective address of the next instruction.

Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros.

• Next sequential instruction fetching (including nontaken branch instructions):

The value 4 is added to the address of the current instruction to form the 64-bit effective address of the next instruction. If the address of the current instruction is 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode or 0x0000_0000_FFFF_FFFC in 32-bit mode, the A2 core wraps the next sequential instruction address back to address 0. This behavior is not required by the architecture, which specifies that the next sequential instruction address is undefined under these circumstances. Therefore, software should not depend upon this behavior, so that it can be ported to other implementations that do not handle this scenario in the same fashion. Accordingly, if software wants to execute across this maximum address boundary and wrap back to address 0, it should place an unconditional branch at the boundary with a displacement of 4.

In addition to the above four instruction storage addressing modes, the following behavior applies to branch instructions:

• Any branch instruction with LK = 1:

The value 4 is added to the address of the current instruction and the low-order 64 bits of the result are placed into the LR. As for the similar scenario for next sequential instruction fetching, if the address of the branch instruction is 0xFFFF_FFFF_FFFF_FFFC in 64-bit mode or 0x0000_0000_FFFF_FFFC in 32-bit mode, the result placed into the LR is architecturally undefined, although once again the A2 core wraps the LR update value back to address 0. Again, however, software should not depend on this behavior so that it can be ported to implementations that do not handle this scenario in the same fashion.

Version 1.3 October 23, 2012

CPU Programming Model

Page 65 of 864

Page 66

User’s Manual

A2 Processor

2.2.3 Byte Ordering

If scalars (individual data items and instructions) were indivisible, there would be no such concept as “byte ordering.” It is meaningless to consider the order of bits or groups of bits within the smallest addressable unit of storage, because nothing can be observed about such order. Only when scalars, which the programmer and processor regard as indivisible quantities, can comprise more than one addressable unit of storage does the question of order arise.

For a machine in which the smallest addressable unit of storage is the 64-bit doubleword, there is no question of the ordering of bytes within doublewords. All transfers of individual scalars between registers and storage are of doublewords, and the address of the byte containing the high-order 8 bits of a scalar is no different from the address of a byte containing any other part of the scalar.

For the Power ISA Architecture, as for most current computer architectures, the smallest addressable unit of storage is the 8-bit byte. Many scalars are halfwords, words, or doublewords that consist of groups of bytes. When a word-length scalar is moved from a register to storage, the scalar occupies 4 consecutive byte addresses. It thus becomes meaningful to discuss the order of the byte addresses with respect to the value of the scalar: which byte contains the highest-order 8 bits of the scalar, which byte contains the next-highestorder 8 bits, and so on.

Given a scalar that contains multiple bytes, the choice of byte ordering is essentially arbitrary. There are 24 ways to specify the ordering of 4 bytes within a word, but only two of these orderings are sensible:

• The ordering that assigns the lowest address to the highest-order (left-most) 8 bits of the scalar, the next sequential address to the next-highest-order 8 bits, and so on.

This ordering is called big endian because the “big end” (most-significant end) of the scalar, considered as a binary number, comes first in storage. IBM RISC

System/6000, IBM System/390®, and Motorola

680x0 are examples of computer architectures using this byte ordering.

• The ordering that assigns the lowest address to the lowest-order (“right-most”) 8 bits of the scalar, the next sequential address to the next-lowest-order 8 bits, and so on.

This ordering is called little endian because the “little end” (least-significant end) of the scalar, considered as a binary number, comes first in storage. The Intel x86 is an example of a processor architecture using this byte ordering.

Power ISA supports both big-endian and little-endian byte ordering, for both instruction and data storage accesses. Which byte ordering is used is controlled on a memory page basis by the endian (E) storage attribute, which is a field within the TLB entry for the page. The endian storage attribute is set to 0 for a bigendian page and is set to 1 for a little-endian page. See Memory Management on page 185 for more information about memory pages, the TLB, and storage attributes, including the endian storage attribute.

2.2.3.1 Structure Mapping Examples

The following C language structure,

s, contains an assortment of scalars and a character string. The

comments show the value assumed to be in each structure element; these values show how the bytes comprising each structure element are mapped into storage.

struct {

int a; /* 0x1112_1314 word */ long long b; /* 0x2122_2324_2526_2728 doubleword */ int c; /* 0x3132_3334 word */ char d[7]; /* 'A','B','C','D','E','F','G' array of bytes */

CPU Programming Model

Page 66 of 864

Version 1.3

October 23, 2012

Page 67

User’s Manual

A2 Processor

short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */

} s;

C structure mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries. The following structure mapping examples show each scalar aligned at its natural boundary. This alignment introduces padding of 4 bytes between a and b, one byte between d and e, and two bytes between e and f. The same amount of padding is present in both big-endian and little-endian mappings.

Big-Endian Mapping

The big-endian mapping of structure

s follows (the data is highlighted in the structure mappings). Addresses,

in hexadecimal, are below the data stored at the address. The contents of each byte, as defined in structure

s, is shown as a (hexadecimal) number or character (for the string elements). The shaded cells correspond to

padded bytes.

11 12 13 14

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07

21 22 23 24 25 26 27 28

0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F

31 32 33 34 'A' 'B' 'C' 'D'

0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17

'E' 'F' 'G'

0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F

61 62 63 64

0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27

51 52

Little-Endian Mapping

Structure

14 13 12 11

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07

28 27 26 25 24 23 22 21

0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F

34 33 32 31 'A' 'B' 'C' 'D'

0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17

'E' 'F' 'G'

0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F

64 63 62 61

0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27

s is shown mapped little endian.

52 51

2.2.3.2 Instruction Byte Ordering

Power ISA defines instructions as aligned words (4 bytes) in memory. As such, instructions in a big-endian program image are arranged with the most-significant byte (MSB) of the instruction word at the lowestnumbered address.

Version 1.3 October 23, 2012

CPU Programming Model

Page 67 of 864

Page 68

User’s Manual

A2 Processor

Consider the big-endian mapping of instruction p at address 0x00, where, for example, p = add r7, r7, r4:

MSB LSB

0x00 0x01 0x02 0x03

On the other hand, in a little-endian mapping the same instruction is arranged with the least-significant byte (LSB) of the instruction word at the lowest-numbered address:

LSB MSB

0x00 0x01 0x02 0x03

By the definition of Power ISA bit numbering, the most-significant byte of an instruction is the byte containing bits 0:7 of the instruction. As depicted in the instruction format diagrams (see Instruction Formats in the Power ISA specification), this most-significant byte is the one that contains the primary opcode field (bits 0:5). Due to this difference in byte orderings, the processor must perform whatever byte reversal is required (depending on the particular byte ordering in use) to correctly deliver the opcode field to the instruction decoder. In the A2 core, this reversal is performed between the memory interface and the instruction cache, according to the value of the endian storage attribute for each memory page, such that the bytes in the instruction cache are always correctly arranged for delivery directly to the instruction decoder.

If the endian storage attribute for a memory page is reprogrammed from one byte ordering to the other, the contents of the memory page must be reloaded with program and data structures that are in the appropriate byte ordering. Furthermore, anytime the contents of instruction memory change, the instruction cache must be made coherent with the updates by invalidating the instruction cache and refetching the updated memory contents with the new byte ordering.

2.2.3.3 Data Byte Ordering

Unlike instruction fetches, data accesses cannot be byte-reversed between memory and the data cache. Data byte ordering in memory depends upon the data type (byte, halfword, word, and so on) of a specific data item. It is only when moving a data item of a specific type from or to an architected register (as directed by the execution of a particular storage access instruction) that it becomes known what kind of byte reversal might be required due to the byte ordering of the memory page containing the data item. Therefore, byte reversal during load or store accesses is performed between the data cache (or memory, on a data cache miss, for example) and the load register target or store register source, depending on the specific type of load or store instruction (that is, byte, halfword, word, and so on).

Comparing the big-endian and little-endian mappings of structure

s, as shown in Structure Mapping Exam-

ples on page 66, the differences between the byte locations of any data item in the structure depends upon

the size of the particular data item. For example (again referring to the big-endian and little-endian mappings of structure

s):

•The word a has its 4 bytes reversed within the word spanning addresses 0x00 – 0x03.

•The halfword e has its 2 bytes reversed within the halfword spanning addresses 0x1C – 0x1D.

Note: The array of bytes d, where each data item is a byte, is not reversed when the big-endian and littleendian mappings are compared. For example, the character 'A' is located at address 0x14 in both the bigendian and little-endian mappings.

The size of the data item being loaded or stored must be known before the processor can decide whether, and if so, how, to reorder the bytes when moving them between a register and the data cache (or memory).

CPU Programming Model

Page 68 of 864

Version 1.3

October 23, 2012

Page 69

User’s Manual

A2 Processor

• For byte loads and stores, including strings, no reordering of bytes occurs regardless of byte ordering.

• For halfword loads and stores, bytes are reversed within the halfword for one byte order with respect to the other.

• For word loads and stores (including load/store multiple), bytes are reversed within the word for one byte order with respect to the other.

• For doubleword loads and stores, bytes are reversed within the doubleword for one byte order with respect to the other.

• For quadword loads and stores (AXU loads/stores only), bytes are reversed within the quadword for one byte order with respect to the other.

Note: This mechanism applies independent of the alignment of data. In other words, when loading a multibyte data operand with a scalar load instruction, bytes are accessed from the data cache (or memory) starting with the byte at the calculated effective address and continuing with consecutively higher-numbered bytes until the required number of bytes have been retrieved. Then, the bytes are arranged such that either the byte from the highest-numbered address (for big-endian storage regions) or the lowest-numbered address (for little-endian storage regions) is placed into the least-significant byte of the register. The rest of the register is filled in corresponding order with the rest of the accessed bytes. An analogous procedure is followed for scalar store instructions.

For load/store multiple instructions, each group of 4 bytes is transferred between memory and the register according to the procedure for a scalar load word instruction.

For load/store string instructions, the most-significant byte of the first register is transferred to or from memory at the starting (lowest-numbered) effective address, regardless of byte ordering. Subsequent register bytes (from most-significant to least-significant, and then moving into the next register, starting with the most-significant byte, and so on) are transferred to or from memory at sequentially higher-numbered addresses. This behavior for byte strings ensures that if two strings are loaded into registers and then compared, the first bytes of the strings are treated as most significant with respect to the comparison.

2.2.3.4 Byte-Reverse Instructions

The Power ISA defines load/store byte-reverse instructions, which can access storage that is specified as being of one byte ordering in the same manner that a regular (that is, nonbyte-reverse) load/store instruction would access storage that is specified as being of the opposite byte ordering. In other words, a load/store byte-reverse instruction to a big-endian memory page transfers data between the data cache (or memory) and the register in the same manner that a normal load/store would transfer the data to or from a little-endian memory page. Similarly, a load/store byte-reverse instruction to a little-endian memory page transfers data between the data cache (or memory) and the register in the same manner that a normal load/store would transfer the data to or from a big-endian memory page.

The function of the load/store byte-reverse instructions is useful when a particular memory page contains a combination of data with both big-endian and little-endian byte ordering. In such an environment, the endian storage attribute for the memory page would be set according to the predominant byte ordering for the page, and the normal load/store instructions would be used to access data operands that used this predominant byte ordering. Conversely, the load/store byte-reverse instructions would be used to access the data operands that were of the other (less prevalent) byte ordering.

Software compilers cannot typically make general use of the load/store byte-reverse instructions, so they are ordinarily used only in special, hand-coded device drivers.

Version 1.3 October 23, 2012

CPU Programming Model

Page 69 of 864

Page 70

User’s Manual

A2 Processor

2.3 Multithreading

The A2 core has four threads that allow simultaneous execution within the processor and can be viewed as a 4-way multiprocessor with shared dataflow. This gives the effective appearance of four independent processing units from the view of software. The performance of each thread can be limited due to the sharing of resources between each of the threads.

2.3.1 Thread Identification

2.3.1.1 Thread Identification Register (TIR)

The TIR is a read-only register that can be used to distinguish a thread from other threads on the A2 core. The TIR returns a value n, where n is referred to as “thread n.”

Decimal SPR Number: 446 Write Access: None

Initial Value: 0x0000000000000000 Duplicated for Multithread: N

Slow SPR: N Notes:

Guest Supervisor Mapping: Scan Ring: func

Bits Field Name

0:31 /// 0x0 Reserved

32:61 /// 0x0 Reserved

62:63 TID 0b00 Processor Thread ID

Initial Val ue

This field can be used to distinguish the thread from other threads on the processor. Threads are numbered sequentially, with valid values ranging from 0 to 3.

Description

2.3.1.2 Processor Identification Register (PIR)

The PIR is a read-only register that uniquely identifies a specific instance of a processor thread, within a multiprocessor configuration, enabling software to determine exactly which thread it is running on. This capability is important for operating system software within multiprocessor configurations.

Decimal SPR Number: 286 Write Access: None

Initial Value: 0x0000000000000000 Duplicated for Multithread: N

Slow SPR: N Notes:

Guest Supervisor Mapping: GPIR Scan Ring: func

Bits Field Name

32:53 ///

54:61 CID

Initial Val ue

0x0 Reserved

0x0 Processor Core ID

Returns the value of the I/O pin an_ac_coreid. This can be used to distinguish a processor core from other processor cores in the system.

Description

CPU Programming Model

Page 70 of 864

Version 1.3

October 23, 2012

Page 71

User’s Manual

A2 Processor

Bits Field Name

62:63 TID

Initial Val ue

0b00 Processor Thread ID

This field can be used to distinguish the thread from other threads on the processor. Threads are numbered sequentially, with valid values ranging from 0 to 3.

Description

2.3.1.3 Guest Processor Identification Register (GPIR)

The GPIR is a register that identifies a specific instance of a processor thread for the guest operating system. The GPIR is used to filter incoming processor messages. See Processor Messages on page 357.

Decimal SPR Number: 382 Write Access: Hypv

Initial Value: 0x0000000000000000 Duplicated for Multithread: Y

Slow SPR: N Notes: HM

Guest Supervisor Mapping: Y Scan Ring: func

Bits Field Name

32:49 VPTAG

50:63 DBTAG

Initial

Val ue

0x0 Virtual Processor Tag

Storage used by the guest operating system to identify the virtual processor on which the operating system is running.

0x0 Doorbell Tag

Used to match guest doorbell messages that are sent to all the processors and virtual processors in a coherence domain. If a sent guest doorbell message tag matches the DBTAG field, a guest doorbell is said to be accepted on the (virtual) processor.

Description

2.3.2 Thread Run State

The A2 core provides several methods for controlling a thread’s run state. For a thread to fetch instructions, all methods outlined below must be properly configured. If any one I/O or register is configured to stop a thread, the affected thread will not fetch instructions.

2.3.2.1 Thread Stop I/O Pin

The I/O pin, an_ac_pm_thread_stop, can be used to stop the A2 core from fetching instructions. Stopping a thread causes all instructions that have begun executing to be completed and all prefetched instructions to be discarded.

2.3.2.2 Thread Control and Status Register (THRCTL)

The SCOM

accessible THRCTL register can control the thread run state to allow an external debugger control of the processor. See Direct Access to I-Cache and D-Cache Directories on page 437. Stopping a thread via THRCTRL causes all instructions that have begun executing to be completed and all prefetched instructions to be discarded.

Version 1.3 October 23, 2012

CPU Programming Model

Page 71 of 864

Page 72

User’s Manual

A2 Processor

2.3.2.3 Core Configuration Register 0 (CCR0)

The CCR0 is used to disable or enable threads. When a thread is disabled by setting the CCR0 bit corresponding to the thread to 0, all instructions that have begun executing are completed and all prefetched instructions are discarded. Subsequent instructions are not prefetched or initiated. Asynchronous interrupts or other conditions that are unmasked and enabled in CCR1 for the thread will cause the thread to be reenabled. Executing a wait instruction on a thread will cause that thread’s CCR0[WE] to be set to 1. CCR0 also contains controls for allowing the processor to enter a power managed state. See Section 13 Power Management Methods on page 525 for information about power savings modes.

Programming Note: When using mtccr0 to put other threads to sleep, using an external interrupt or any asynchronous interrupt as the wake-up method is not reliable. The thread being put to sleep might have just taken an interrupt and MSR(EE) is zero, preventing wake-up. In this case, mtccr0 should be used to wake up the sleeping threads. A thread can put itself to sleep using mtccr0 or the wait instruction and wake up using an external interrupt or any asynchronous interrupt reliably.

Decimal SPR Number: 1008 Write Access: Hypv

Initial Value: 0x0000000000000000 Duplicated for Multithread: N

Slow SPR: N Notes:

Guest Supervisor Mapping: Scan Ring: bcfg

Bits Field Name

32:33 PME

34:51 ///

52:55 WEM

56:59 ///

60:63 WE

Initial Val ue

0b00 Power Management Enable

00 Disabled: No power savings mode entered. 01 PM_Sleep_enable: PM_Sleep state entered when all threads are stopped. 10 PM_RVW_enable: PM_RVW state entered when all threads are stopped. 11 Disabled2: No power savings mode entered. Note: See the A2 User Manual, Power Management Methods section.

0x0 Reserved

0b0000 Wait Enable Mask

0 No effect to CCR0[WE]. 1 Allows writing of the corresponding bit in the CCR0[WE] field. These bits are non-

0b0000 Reserved

0b0000 Wait Enable

For t < 4, bit 63-t corresponds to thread t: 0 Indicates that the thread is enabled. 1 Indicates that the thread is disabled. Note: This field can also be set by a wait instruction.

persistent. A read always returns zeros.

Description

2.3.2.4 Thread Enable Register (TENS, TENC)

The Thread Enable Register is used to disable or enable threads and is provided as a means to access shared resources (see Accessing Shared Resources on page 78). When a thread is disabled by setting the TEN bit corresponding to the thread 0, all instructions that have begun executing are completed and all prefetched instructions are discarded. Subsequent instructions are not prefetched or initiated. All asynchronous interrupts for the thread are delayed until the thread is re-enabled.

CPU Programming Model

Page 72 of 864

Version 1.3

October 23, 2012

Page 73

User’s Manual

A2 Processor

The TEN is accessed by using two registers: TENS and TENC. When TENS is written, threads for which the corresponding bit in TENS is 1 are enabled; threads for which the corresponding bit in TENS is 0 are unaffected. When TENC is written, threads for which the corresponding bit in TENC is 1 are disabled; threads for which the corresponding bit in TENC is 0 are unaffected. When either SPR

is read, the current value of the

TEN is returned.

Decimal SPR Number: 438 Write Access: Hypv

Initial Value: 0x0000000000000001 Duplicated for Multithread: N

Slow SPR: N Notes: WS

Guest Supervisor Mapping: Scan Ring: bcfg

Bits Field Name

0:31 ///

32:59 ///

60:63 TEN

Initial Val ue

0x0 Reserved

0b0001 Thread Enable Set

Description

For t < 4, bit 63-t corresponds to thread t. When bit 63-t is set to 1, thread t is enabled, if it is not already. When bit 63-t is set 0, thread t is unaffected.

When bit 63-t is read, the current value of the thread enable is returned.

Decimal SPR Number: 439 Write Access: Hypv

Initial Value: 0x0000000000000001 Duplicated for Multithread: N

Slow SPR: N Notes: WC

Guest Supervisor Mapping: Scan Ring: bcfg

Bits Field Name

0:31 ///

32:59 ///

60:63 TEN

Initial Val ue

0x0 Reserved

0b0001 Thread Enable Clear

Description

For t < 4, bit 63-t corresponds to thread t. When bit 63-t is set to 1, thread t is disabled, if it is not already. When bit 63-t is set 0, thread t is unaffected.

When bit 63-t is read, the current value of the thread enable is returned.

2.3.2.5 Thread Enable Status Register (TENSR)

The TENSR indicates which threads are quiesced.

Programming Note: The TENSR is only valid after a context synchronizing instruction or an event that precisely stops a thread, such as a write to TEN.

Programming Note: When thread T1 disables other threads, Tn, it sets the 10 bits corresponding to Tn to zeros. To ensure that all operations being performed by threads Tn have been performed with respect to all threads on the processor, thread T1 reads the TENSR until all the bits corresponding to the disabled threads, Tn, are zeros.

Version 1.3 October 23, 2012

CPU Programming Model

Page 73 of 864

Page 74

User’s Manual

A2 Processor

Decimal SPR Number: 437 Write Access: None

Initial Value: 0x0000000000000000 Duplicated for Multithread: N

Slow SPR: N Notes:

Guest Supervisor Mapping: Scan Ring: func

Bits Field Name

0:31 ///

32:59 ///

60:63 TENSR

Initial Val ue

0x0 Reserved

0b0000 Thread Enable Status Register

Description

Bit 63-t of the TENSR corresponds to thread t.

2.3.3 Wake On Interrupt

The A2 core can be configured to wake on interrupts or other conditions, if the thread was disabled by a write to CCR0 or by executing a wait instruction.

2.3.3.1 Core Configuration Register 1 (CCR1)

CCR1 provides additional masking on what conditions can cause the processor to resume execution. The conditions or interrupts specified must be appropriately unmasked and must also be enabled in CCR1 to exit the stopped state.

Decimal SPR Number: 1009 Write Access: Hypv

Initial Value: 0x000000000F0F0F0F Duplicated for Multithread: N

Slow SPR: N Notes:

Guest Supervisor Mapping: Scan Ring: func

Bits Field Name

32:33 ///

34:39 WC3

40:41 ///

CPU Programming Model

Page 74 of 864

Initial Val ue

0b00 Reserved

0xF Thread 3 Wake Control

Description

(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,

or guest machine check doorbell interrupts.

(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell

interrupts. (4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.

0b00 Reserved

Version 1.3

October 23, 2012

Page 75

User’s Manual

A2 Processor

Bits Field Name

42:47 WC2

48:49 ///

50:55 WC1

56:57 ///

58:63 WC0

Initial Val ue

0xF Thread 2 Wake Control

(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,

(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell

(4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.

0b00 Reserved

0xF Thread 1 Wake Control

(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,

(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell

(4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.

0b00 Reserved

0xF Thread 0 Wake Control

(0) 1 Disables sleep on waitrsv. (1) 1 Disables sleep on waitimpl. (2) 1 Enables wake on critical input, watchdog, critical doorbell, guest critical doorbell,

(3) 1 Enables wake on external input, performance monitor, doorbell, or guest doorbell

(4) 1 Enables wake on decrementer or user decrementer interrupts. (5) 1 Enables wake on fixed interval timer interrupts.

or guest machine check doorbell interrupts.

interrupts.

or guest machine check doorbell interrupts.

interrupts.

or guest machine check doorbell interrupts.

interrupts.

Description

2.3.4 Thread Priority

Thread priority can be changed by writing the PPR32 register, executing an or Rx,Rx,Rx instruction, or by causing an interrupt.

2.3.4.1 Program Priority Register (PPR32)

The program priority register controls thread priority. A2 hardware supports three physical priorities. In A2’s lowest hardware priority, the number of cycles between two instructions being issued is determined by IUCR1[THRES]. See Instruction Unit Configuration Register 1 (IUCR1) on page 77.

The mapping of the three hardware priorities to the architected priorities in the PPR32 register is shown in

Table 2-3. An or Rx,Rx,Rx is used to set PPR32[PRI]; these are also shown in Table 2-3. Other defined or Rx,Rx,Rx hints shown in Table 2-4 are ignored. PPR32[PRI] remains unchanged if the privilege state of the

processor executing the instruction is lower than the privilege indicated in Table 2-3. PPR32[PRI] also remains unchanged if “000” is written to the field.

If MSR[EE] is 0 and PPR32 = low then thread priority is increased to medium; PPR32 is unchanged. When MSR[EE] is 1, thread priority is determined by PPR32[PRI]. This function is provided to reduce delay in the processing of interrupts.

Version 1.3 October 23, 2012

CPU Programming Model

Page 75 of 864

Page 76

User’s Manual

A2 Processor

Table 2-3. Priority Levels

Rx PPR32[PRI] ISA Priority

A2 Hardware Priority with IUCR1[HIPRI] Setting

00 01 10 11

31 001 very low a2low a2low a2low a2low yes

1010 low no

6 011 medium low a2medium a2medium a2medium a2medium no

2 100 medium a2high no

5 101 medium high a2high yes

3 110 high a2high yes

7 111 very high a2high hypv

Table 2-4. Other “or” Instruction Hints

Rx Mnemonic Reserved

27 yield Yes

29 mdoio Yes

30 mdoom Yes

Table 2-5. Program Priority Register (PPR32)

Decimal SPR Number: 898 Write Access: Any

Initial Value: 0x00000000000C0000 Duplicated for Multithread: Y

Slow SPR: Y Notes:

Guest Supervisor Mapping: Scan Ring: ccfg

Privileged

Bits Field Name

32:42 ///

43:45 PRI

46:63 ///

CPU Programming Model

Page 76 of 864

Initial Val ue

0x0 Reserved

0b011 Thread Priority

Description

001 Very low (privileged). 010 Low. 011 Medium low. 100 Medium. 101 Medium high (privileged). 110 High (privileged). 111 Very high (hypervisor). Access violations or writing a value of zero will result in a nop.

0x0 Reserved

Version 1.3

October 23, 2012

Page 77

User’s Manual

A2 Processor

2.3.4.2 Instruction Unit Configuration Register 1 (IUCR1)

Decimal SPR Number: 883 Write Access: Hypv

Initial Value: 0x0000000000001000 Duplicated for Multithread: Y

Slow SPR: Y Notes:

Guest Supervisor Mapping: Scan Ring: ccfg

Bits Field Name

32:49 ///

50:51 HIPRI

52:57 ///

58:63 THRES

Initial Val ue

0x0 Reserved

0b01 High Priority Privilege Level

The A2 core has three priority values implemented in hardware. This field configures which value in PPR32[PRI] corresponds to the implementations highest priority.

00 Medium normal. 01 Medium high. 10 High. 11 Very high.

0x0 Reserved

0x0 Low Priority Minimum Issue Count

Sets the number of cycles between low priority issues, which is set by PPR32[PRI]. The number of cycles is equal to THRES 4. This field is not used when a thread is set to high or medium priority.

Description

2.3.5 Resources Shared between Threads

All architected states are duplicated for each thread except for logical partitioning and memory. This allows each thread to look independent from a software standpoint. Some nonarchitected resources are shared between threads to save on the overall area for the core. Section 2.3.6 provides more information about shared resources. Section 2.3.7 on page 78 provides more information about duplicated resources.

2.3.6 Shared Resources

Instruction ERAT array Entries can be used as shared or thread specific. L1 instruction cache array Data ERAT array Entries can be used as shared or thread specific. L1 data cache array Load miss queue Store queue Microcode ROM

array Branch history table This is a configurable resource and can be set up to be shared or duplicated. SPR registers Not all SPRs are shared. See Table 14-1 Register Summary on page 530 for

more information. Instruction fetch pipeline Instruction issue Integer execution pipeline

Version 1.3 October 23, 2012

CPU Programming Model

Page 77 of 864

Page 78

User’s Manual

A2 Processor

TLB LRAT

2.3.6.1 Accessing Shared Resources

When software executing in thread Tn writes a new value in an SPR (mtspr) that is shared with other threads, either of the following sequences of operations can be performed to ensure that the write operation has been performed with respect to other threads.

Sequence 1

• Disable all other threads (see Thread Enable Register (TENS, TENC) on page 72).

• Write to the shared SPR (mtspr).

• Perform a context synchronizing operation.

• Enable the previously disabled threads.

In the above sequence, the context synchronizing operation ensures that the write operation has been performed with respect to all other threads that share the SPR. The enabling of other threads ensures that subsequent instructions of the enabled threads use the new SPR value because enabling a thread is a context synchronizing operation.

Sequence 2

• All threads are put in hypervisor state and begin polling a storage flag.

• The thread updating the SPR does the following:

• Writes to the SPR (mtspr).

• Sets a storage flag indicating that the write operation was done.

• Performs a context synchronizing operation.

• When other threads see the updated storage flag, they perform context synchronizing operations.

In the above sequence, the context synchronizing operation by the thread that writes to the SPR ensures that the write operation has been performed with respect to all other threads that share the SPR; the context synchronizing operation by the other threads ensures that subsequent instructions for these threads use the updated value.

2.3.7 Duplicated Resources

Link stack queue Instruction buffer Thread dependency GPR register file This includes extra registers for microcode instruction use. SPR registers Not all SPRs are duplicated. See Table 14-1 Register Summary on page 530 for

more information.

Branch history table This is a configurable resource and can be setup to be shared or duplicated.

CPU Programming Model

Page 78 of 864

Version 1.3

October 23, 2012

Page 79

User’s Manual

A2 Processor

2.3.8 Pipeline Sharing

Figure 2-1 shows the instruction flow for the A2 core.

Figure 2-1. A2 Core Instruction Unit

Version 1.3 October 23, 2012

CPU Programming Model

Page 79 of 864

Page 80

User’s Manual

A2 Processor

2.3.8.1 Instruction Cache

The instruction cache is a shared resource between all threads where a single thread can be selected each cycle dependent upon the number of instructions currently contained within that thread’s instruction buffers. There are two watermarks within the instruction buffer that determine a thread’s priority level for fetches that are empty and half-empty. The empty watermarks gives the corresponding thread high priority and a halfempty level gives the thread a low-priority fetch request. The high-priority and low-priority fetches are two separate round-robin queues to give each thread an even chance at getting the next command. A low-priority fetch is only issued when none of the high-priority water marks are active. The instruction cache and instruction directories are 4-way associative and are a shared resource between all threads. The branch prediction unit that is part of the instruction cache in Figure 2-1 on page 79 contains a branch history table and link stack to allow proper branch resolution. The link stack is a 4-deep queue per thread whereas the branch history table is a 2-bit history that can configured to either 1 k per thread or a 4 k history shared between all four threads.

2.3.8.2 Instruction Buffer and Decode Dependency

The colored portion of Figure 1-1 on page 50 contains all of the instruction buffer, decode, and dependency logic for each of the threads. This logic is duplicated for each thread to allow other threads with nondependent commands to be issued to maximize usage for the integer and floating-point pipelines.

2.3.8.3 Instruction Issue

Instruction issue is a shared resource within the core, and the logic is a 1+1 concurrent issue machine. This allows two commands to be issued per cycle; however, each of the commands issued must be from separate threads with one to the XU

and another to the AXU units. The selection logic for the issue logic is a simple

round-robin scheme with three levels of priority to allow software more flexibility.

See Figure 2-2, Figure 2-3, and Figure 2-4 for examples of round-robin logic.

Figure 2-2. Instruction Issue Timing Diagram 1

(Thread 0, high priority; threads 1, 2, 3 low priority; timeout set to 3.)

CPU Programming Model

Page 80 of 864

Version 1.3

October 23, 2012

Page 81

User’s Manual

A2 Processor

Figure 2-3. Instruction Issue Timing Diagram 2

(All threads set to high priority; timeout set to 3.)

Figure 2-4. Instruction Issue Timing Diagram 3 (Threads 0 and 1, high priority; threads 2 and 3, medium priority;

timeout set to 3.)

2.3.8.4 Ram Unit

The Ram unit allows an external command to be issued within a given thread’s instruction stream. This unit is a shared resource within a core in that only one thread can issue a Ram command at a time. It is software’s responsibility to only allow one outstanding command per core, and it is necessary to poll the core until this command has completed before issuing any new commands.

Version 1.3 October 23, 2012

CPU Programming Model

Page 81 of 864

Page 82

User’s Manual

A2 Processor

2.3.8.5 Microcode Unit

The microcode unit (uCode) is partially shared and partially duplicated logic. The ROM that contains the actual stream of instructions to be issued is a shared unit; however, each thread contains its own microcode engine so that all four threads can be within a uCode stream at the same time. One of the engines will read a single command from the ROM each cycle based upon a fair round-robin scheme (not based upon the thread priority level for the issue logic), and issue that command to the appropriate thread’s instruction buffer. If the instruction buffer is over halfway filled, the uCode will stop issuing new commands. In addition, it will not include this thread for ROM reads until the instruction buffer has drained below this point.

2.3.8.6 Integer Unit

The integer execution unit is shared between threads because there is a unified execution, load/store, and branch pipeline. Exceptions and flushes from one thread usually will not affect another thread.

However, a flush that will affect all threads when encountered by one of the threads is caused by a data cache invalidate (DCI) or instruction cache invalidate (ICI) that reaches completion. A DCI or ICI will flush all threads for one cycle to allow the L1 caches to be invalidated. Software is required to guarantee that the load miss queue is empty for all threads before execution of a DCI.

Another flush condition caused by one thread that can affect another thread occurs when reload data returning for an outstanding load collides with a load or store at the data cache array pins.

For a comprehensive list of flush conditions, see Interrupt Conditions on page 854.

Some multiply operations and all divide operations require recirculation within the multiply/divide unit, therefore blocking all other threads from executing multiplies and divides. This does not prevent other threads from executing any instructions other than multiplies and divides. If any multiply or divide instructions are issued and collide with a recirculating multiply or divide, the younger instructions are flushed. In the case of the multiplier, the size of the operands determines how many cycles are needed for recirculation. The width of the multiplier is 32 bits by 32 bits, so any operations that require multiplying 64-bit operands will require recirculation. If both operands are 32 bits, no recirculation is needed (in other words, the instruction is pipelined as normal). The width of the divider is 64 bits. Divide instructions dealing with 64-bit operands recirculate for 65 cycles, and operations with 32-bit operands recirculate for 32 cycles. No divide instructions are pipelined; they all require some recirculation.

A forward progress timer monitors that each thread is making forward progress. If the thread appears to be hung, thread priorities are adjusted to break out of a potential live-lock condition.

2.4 Registers

This section provides an overview of the register categories and types provided by the A2 core. Detailed descriptions of each of the registers are provided within the chapters covering the functions with which they are associated (for example, the cache control and cache debug registers are described in Instruction and

Data Caches on page 169). An alphabetical summary of all registers, including bit definitions, is provided in Register Summary on page 529

All registers in the A2 core are architected as 64 bits wide, although certain bits in some registers are reserved and thus not necessarily implemented. For all registers with fields marked as reserved, these reserved fields should be written as 0 and read as undefined. The recommended coding practice is to

CPU Programming Model

Page 82 of 864

Version 1.3

October 23, 2012

Page 83

User’s Manual

Integer Processing

GPR0

GPR1

GPR31

GPR2

•

Condition Register

XER

Link Register

CTR

Timer

TBU

SPRG4

SPRG5

SPRG7

SPRG6

Processor Control

VR Save Register

VRSAVE

Count Register

Integer Exception Register

Time Base

Branch Control

SPR General 3–7

General Purpose

Replicated per Thread

SPRG3

UDEC

User Decrementer Register

A2 Processor

perform the initial write to a register with reserved fields set to 0, and to perform all subsequent writes to the register using a read-modify-write strategy: read the register; use logical instructions to alter defined fields, leaving reserved fields unmodified; and write the register.

All of the registers are grouped into categories according to the processor functions with which they are associated. In addition, each register is classified as being of a particular type, as characterized by the specific instructions that are used to read and write registers of that type. Finally, most of the registers contained within the A2 core are defined by the Power ISA Architecture, although some registers are implementationspecific and unique to the A2 core.

Figure 2-5 illustrates the A2 core registers contained in the user programming model; that is, those registers to which access is nonprivileged and that are available to both user and supervisor programs.

Figure 2-5. User Programming Model Registers

Table 14-1 on page 530 lists the A2 core registers contained in the supervisor or hypervisor programming

model, to which access is privileged.

Version 1.3 October 23, 2012

CPU Programming Model

Page 83 of 864

Page 84

User’s Manual

A2 Processor

2.4.1 Register Mapping

Some special purpose register (SPR) accesses in guest state are mapped to analogous registers for the guest state. This removes the requirement for the hypervisor software to handle embedded hypervisor privilege interrupts for these accesses and make the required emulated changes by the hypervisor for these highuse registers.

Accesses to the registers listed in Table 2-6 are changed by the processor to the registers given in the table when the processor is in guest state (MSR[GS] = 1). Accesses to these registers are not mapped when not in guest state.

Table 2-6. Register Mapping

SPR Accessed SPR Mapped to Type of Access

SRR0 GSRR0 mtspr, mfspr

SRR1 GSRR1 mtspr, mfspr

ESR GESR mtspr, mfspr

DEAR GDEAR mtspr, mfspr

PIR GPIR mtspr, mfspr

SPRG0 GSPRG0 mtspr, mfspr

SPRG1 GSPRG1 mtspr, mfspr

SPRG2 GSPRG2 mtspr, mfspr

SPRG3 GSPRG3 mtspr, mfspr

USPRG3 GSPRG3 mtspr, mfspr

2.4.2 Register Types

There are five register types contained within and/or supported by the A2 core. Each register type is characterized by the instructions that are used to read and write the registers of that type. The following subsections provide an overview of each of the register types and the instructions associated with them.

2.4.2.1 General Purpose Registers

The A2 core contains 32 integer general purpose registers (GPRs); each contains 64 bits. In 32-bit mode, all instructions that operate on GPRs produce the same GPR results in 32-bit mode as in 64-bit mode.

Integer Processing on page 110 provides more information about integer operations and the use of GPRs.

2.4.2.2 Special Purpose Registers

Special Purpose Registers (SPRs) are directly accessed using the mtspr and mfspr instructions. In addition, certain SPRs might be updated as a side-effect of the execution of various instructions. For example, the Integer Exception Register (XER) (see Integer Exception Register (XER) on page 110) is an SPR that is updated with arithmetic status (such as carry and overflow) upon execution of certain forms of integer arithmetic instructions.

CPU Programming Model

Page 84 of 864

Version 1.3

October 23, 2012

Page 85

User’s Manual

A2 Processor

SPRs control the use of the debug facilities, timers, interrupts, memory management, caches, and other architected processor resources. Table 14-1 on page 530 shows the mnemonic, name, and number for each SPR, in alphabetical order. Each of the SPRs is described in more detail within the section or chapter covering the function with which it is associated.

2.4.2.3 Condition Register

The Condition Register (CR) is a 32-bit register of its own unique type and is divided up into eight, independent 4-bit fields (CR0–CR7). The CR can be used to record certain conditional results of various arithmetic and logical operations. Subsequently, conditional branch instructions can designate a bit of the CR as one of the branch conditions (see Wait Instruction on page 98). Instructions are also provided for performing logical bit operations and for moving fields within the CR.

See Condition Register (CR) on page 107 for more information about the various instructions that can update the CR.

2.4.2.4 Machine State Register

The Machine State Register (MSR) is a register of its own unique type that controls important chip functions, such as the enabling or disabling of various interrupt types.

The MSR can be written from a GPR using the mtmsr instruction. The contents of the MSR can be read into a GPR using the mfmsr instruction. The MSR[EE] bit can be set or cleared atomically using the wrtee or wrteei instructions. The MSR contents are also automatically saved, altered, and restored by the interrupthandling mechanism. See Machine State Register (MSR) on page 301 for more detailed information about the MSR and the function of each of its bits.

2.5 32-Bit Mode

2.5.1 64-Bit Specific Instructions

Instructions or registers that are categorized as 64-bit are only available in 64-bit implementations of the A2 core. In a 64-bit implementation in 32-bit mode, all instructions that operate on GPRs produce the same GPR results in 32-bit mode as in 64-bit mode. Instructions that set condition bits do so based on the 32-bit result computed. Effective addresses and all SPRs operate on the low-order 32 bits only unless otherwise stated.

2.5.2 32-Bit Instruction Selection

Any software that uses any of the instructions listed in the 64-bit category is considered 64-bit software. Generally speaking, 32-bit software should avoid using any instruction or instructions that depend on any particular setting of bits 0:31 of any 64-bit application-accessible system register, including General Purpose Registers, for producing the correct 32-bit results. Context switching might or might not preserve the upper 32 bits of application-accessible 64-bit system registers, and insertion of arbitrary settings of those upper 32 bits at arbitrary times during the execution of the 32-bit application must not affect the final result.

Version 1.3 October 23, 2012

CPU Programming Model

Page 85 of 864

Page 86

User’s Manual

A2 Processor

2.6 Instruction Categories

The Power ISA defines that each facility (including registers and fields therein) and instruction is in exactly one category. Table 2-7 indicate the categories that are implemented by the A2 processor core.

Table 2-7. Category Listing

Implemented

by A2 Core

Yes Base B Required for all implementations.

No Server S Required for server implementations.

Yes Embedded E Required for embedded implementations.

No Alternate Time Base ATB An additional time base; see Book II.

Yes Cache Specification CS Specify a specific cache for some instructions; see Book II.

No Decimal Floating-Point DFP Decimal floating-point facilities.

No Decorated Storage DS Decorated storage facilities.

No Embedded.Cache Debug E.CD Provides direct access to cache data and directory content.

Yes Embedded.Cache Initialization E.CI Instructions that invalidate the entire cache.

No Embedded.Device Control E.DC Embedded device control bus support.

No Embedded.Enhanced Debug E.ED Embedded enhanced debug facility; see Book III-E.

Yes Embedded.External PID E.PD Embedded external PID facility; see Book III-E.

Yes Embedded.Hypervisor

Embedded.Hypervisor.LRAT

Yes Embedded.Little-Endian E.LE Embedded little-endian page attribute; see Book III-E.

Yes Embedded.Page Table E.PT Embedded page table facility; see Book III-E.

Yes Embedded.TLB Write Conditional E.TWC Embedded TLB write conditional facility; see Book III-E.

No Embedded.Performance Monitor E.PM Embedded performance monitor example; see Book III-E.

Yes Embedded.Processor Control E.PC Processor control facility; see Book III-E.

Yes Embedded Cache Locking ECL Embedded cache locking facility; see Book III-E.

Yes Embedded Multithreading

Embedded multiThreading.Thread Management

No External Control EXC External control facility; see Book II.

No External Proxy EXP External proxy facility; see Book III-E.

Yes Floating-Point

Floating-Point.Record

No Legacy Move Assist LMV Determine left most zero byte instruction.

No Legacy Integer Multiply-

Accumulate1

No Load/Store Quadword LSQ Load/store quadword instructions; see Book III-S.

Yes Memory Coherence MMC Requirement for memory coherence; see Book II.

No Move Assist MA Move assist instructions.

No Processor Compatibility PCR Processor compatibility register.

(Sheet 1 of 2)

Category Abbreviation Notes

E.HV E.HV.LRAT

EM EM.TM

FP FP.R

LMA Legacy integer multiply-accumulate instructions.

Embedded logical partitioning and hypervisor facilities. Embedded hypervisor logical to real address translation.

Embedded multithreading; see Book III-E. Embedded multithreading thread management facility.

Floating-point facilities. Floating-point instructions with Rc

= 1.

CPU Programming Model

Page 86 of 864

Version 1.3

October 23, 2012

Page 87

User’s Manual

A2 Processor

Table 2-7. Category Listing (Sheet 2 of 2)

Implemented

by A2 Core

No Server.Performance Monitor S.PM Performance monitor example for servers; see Book III-S.

No Server.Relaxed Page Table Align-

No Signal Processing Engine

Yes Store Conditional Page Mobility SCPM Store conditional accounting for page movement; see Book II.

No Stream STM Stream variant of dcbt instruction; see Book II.

No Strong Access Order SAO Assist for X86 emulation; see Book II.

No Trace TRC Trace facility example; see Book III-S.

No Variable Length Encoding VLE Variable length encoding facility; see Book VLE.

determined by

AXU

determined by

AXU

Yes Wait WT Wait instruction; see Book II.

Yes

ment

SPE ble

SPE.Embedded Float Scalar Single

SPE.Embedded Float Vector

Vector-Scalar Extension VSX Vector-scalar extension.

Vector Little-Endian

64-Bit 64

Category Abbreviation Notes

S.RPTA HTAB alignment on a 256 KB boundary; see Book III-S.

.Embedded Float Scalar Dou-

SP SP.FD SP.FS SP.FV

V V.LE

Facility for signal processing. GPR-based floating-point double-precision instruction set. GPR-based floating-point single-precision instruction set. GPR-based floating-point vector instruction set.

Vector facilities. Little-endian support for vector storage operations.

Required for 64-bit implementations; not defined for 32-bit implementations.

2.7 Instruction Classes

Power ISA architecture defines all instructions as falling into exactly one of the following three classes, as determined by the primary opcode (and the extended opcode, if any):

1. Defined

2. Illegal

3. Reserved

2.7.1 Defined Instruction Class

This class of instructions consists of all the instructions defined in Power ISA. In general, defined instructions are guaranteed to be supported within a Power ISA system as specified by the architecture, either within the processor implementation itself or within emulation software supported by the system operating software.

As defined by Power ISA, any attempt to execute a defined instruction will:

• Cause an illegal instruction exception type of program interrupt, if the instruction is not recognized by the implementation; or

• Cause a floating-point unavailable interrupt if the instruction is recognized as a floating-point instruction, but floating-point processing is disabled; or

Version 1.3 October 23, 2012

CPU Programming Model

Page 87 of 864

Page 88

User’s Manual

A2 Processor

• Perform the actions described in the rest of this document, if the instruction is recognized and supported by the implementation. The architected behavior might cause other exceptions.

The A2 core recognizes and fully supports all of the instructions in the defined class and in the categories supported, with a few exceptions. First, instructions that are defined for floating-point processing are not supported within the A2 core, but can be implemented within an auxiliary processor and attached to the core using the AXU interface. If no such auxiliary processor is attached, attempting to execute any floating-point instructions causes an illegal instruction exception type of program interrupt. If an auxiliary processor that supports the floating-point instructions is attached, the behavior of these instructions is as defined above and as determined by the implementation details of the floating-point auxiliary processor.

2.7.2 Illegal Instruction Class

This class of instructions contains the set of instructions described in Power ISA Appendix D of Book Appendices. Illegal instructions are available for future extensions of the Power ISA; that is, some future version of the Power ISA might define any of these instructions to perform new functions.

Any attempt to execute an illegal instruction causes the system illegal instruction error handler to be invoked and will have no other effect.

An instruction consisting entirely of binary zeros is guaranteed always to be an illegal instruction. This increases the probability that an attempt to execute data or uninitialized storage will result in the invocation of the system illegal instruction error handler.

2.7.3 Reserved Instruction Class

This class of instructions contains the set of instructions described in Power ISA Appendix E of Book Appendices.

Reserved instructions are allocated to specific purposes that are outside the scope of the Power ISA.

Any attempt to execute a reserved instruction causes the system illegal instruction error handler to be invoked if the instruction is not implemented.

Because implementations are typically expected to treat reserved-nop instructions as true no-ops, these instruction opcodes are available for future extensions to Power ISA that have no effect on the architected state. Such extensions might include performance-enhancing hints, such as new forms of cache touch instructions. Software would be able to take advantage of the functionality offered by the new instructions and still remain backwards-compatible with implementations of previous versions of Power ISA.

The A2 core implements all of the reserved-nop instruction opcodes as true no-ops. The specific reservednop opcodes are the following extended opcodes under primary opcode 31: 530, 562, 594, 626, 658, 690, 722, and 754.

2.8 Implemented Instruction Set Summary

This section provides an overview of the various types and categories of instructions implemented within the A2 core. Appendix A Processor Instruction Summary on page 737 lists each implemented instruction alpha- betically (and by opcode) along with a short-form description and its extended mnemonics.

CPU Programming Model

Page 88 of 864

Version 1.3

October 23, 2012

Page 89

User’s Manual

A2 Processor

Table 2-8 summarizes the A2 core instruction set by category. Instructions within each category are described in subsequent sections.

Table 2-8. Instruction Categories

Category Subcategory Instruction Types

Integer Storage Access load, store

Integer Arithmetic add, subtract, multiply, divide, negate

Integer Logical

Integer Compare compare, compare logical

Integer

Integer Select select operand

Integer Trap trap

Integer Rotate rotate and insert, rotate and mask

Integer Shift shift left, shift right, shift right algebraic

Branch branch, branch conditional, branch to link, branch to count

Condition Register Logical crand, crandc, cror, crorc, crnand, crnor, crxor, crxnor

Processor Control

System Linkage

Processor Synchronization instruction synchronize

Cache Management

Storage Control

Note: The A2 core does not implement any device control registers (DCRs). Move to and move from DCR instructions are dropped silently. They are no-ops and do not cause an exception.

TLB Management read, write, search, synchronize

Storage Synchronization memory synchronize, memory barrier

and, andc, or, orc, xor, nand, nor, xnor, extend sign, count leading zeros

move to/from SPR, move to/from MSR, write to external interrupt enable bit, move to/from CR

system call, return from interrupt, return from critical interrupt, return from machine check interrupt

data allocate, data invalidate, data touch, data zero, data flush, data store, instruction invalidate, instruction touch

2.8.1 Integer Instructions

Integer instructions transfer data between memory and the GPRs and perform various operations on the GPRs. This category of instructions is further divided into seven subcategories, described in the following sections.

2.8.1.1 Integer Storage Access Instructions

Integer storage access instructions load and store data between memory and the GPRs. These instructions operate on bytes, halfwords, and words. Integer storage access instructions also support loading and storing multiple registers, character strings, and byte-reversed data, and loading data with sign-extension.

Table 2-9 shows the integer storage access instructions in the A2 core. In the table, the syntax “[u]” indicates that the instruction has both an “update” form (in which the RA addressing register is updated with the calculated address) and a “nonupdate” form. Similarly, the syntax “[x]” indicates that the instruction has both an “indexed” form (in which the address is formed by adding the contents of the RA and RB GPRs) and a “base + displacement” form (in which the address is formed by adding a 16-bit signed immediate value (specified as part of the instruction) to the contents of GPR RA.

Version 1.3 October 23, 2012

CPU Programming Model

Page 89 of 864

Page 90

User’s Manual

A2 Processor

Table 2-9. Integer Storage Access Instructions

Loads Stores

Byte Halfword Word Double Multiple/String Byte Halfword Word Double Multiple/String

lbz[u][x]

lha[u][x] lhbrx lhz

[u][x]

lwbrx

[u][x]

lwz lwa[u][x]

ld[u][x] ldbrx

lmw lswi lswx

stb

[u][x]

sth[u][x] sthbrx

[u][x]

stw stwbrx

[u][x]

std stdbrx

stmw stswi stswx

Table 2-10. Integer Storage Access Instructions by External Process ID

Loads Stores

Byte Halfword Word Double Byte Halfword Word Double

lbepx lhepx lwepx ldepx stbepx sthepx stwepx stdepx

Table 2-11 shows how operands are handled depending on alignment. Optimal performance and configuration is achieved when operands are aligned.

Table 2-11. Operand Handling Dependent on Alignment (Sheet 1 of 2)

Operand Big Endian - Boundary Crossing Little Endian - Boundary Crossing

Size Byte Align None 32B Block 16B Block

8 Byte 8 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<8 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

4 Byte 4 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<4 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

2 Byte 2 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<2 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

1 Byte 1 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

lmw, stmw 4 uCode uCode uCode uCode uCode uCode uCode uCode

<4 Alignment

Exception

string uCode uCode uCode uCode uCode uCode uCode uCode

8 Byte 8 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<8 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

4 Byte 4 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<4 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

32 Byte 32 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<32 uCode uCode uCode uCode Pipeline uCode uCode uCode

Notes:

1. If the storage operand spans two virtual pages that have different storage control attributes, an alignment exception occurs.

2. Only valid if the request is a cache-inhibited load or a store request with the L2 interface in 16-byte mode.

Alignment Exception

Alignment

Exception

Any General Purpose AXU

Virtual Page None 32B Block 16B Block2Virtual Page

Integer

Alignment Exception

Float

Alignment Exception

Alignment

Exception

Alignment Exception

CPU Programming Model

Page 90 of 864

Version 1.3

October 23, 2012

Page 91

User’s Manual

A2 Processor

Table 2-11. Operand Handling Dependent on Alignment (Sheet 2 of 2)

Operand Big Endian - Boundary Crossing Little Endian - Boundary Crossing

Size Byte Align None 32B Block 16B Block

16 Byte 16 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<16 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

8 Byte 8 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<8 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

4 Byte 4 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<4 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

2 Byte 2 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

<2 Pipeline uCode uCode uCode Pipeline uCode uCode uCode

1 Byte 1 Pipeline N/A N/A N/A Pipeline N/A N/A N/A

Notes:

1. If the storage operand spans two virtual pages that have different storage control attributes, an alignment exception occurs.

2. Only valid if the request is a cache-inhibited load or a store request with the L2 interface in 16-byte mode.

Virtual Page None 32B Block 16B Block2Virtual Page

2.8.1.2 Integer Arithmetic Instructions

Arithmetic operations are performed on integer or ordinal operands stored in registers. Instructions that perform operations on two operands are defined in a 3-operand format; an operation is performed on the operands, which are stored in two registers. The result is placed in a third register. Instructions that perform operations on one operand are defined in a 2-operand format; the operation is performed on the operand in a register, and the result is placed in another register. Several instructions also have immediate formats in which one of the source operands is a field in the instruction.

Most integer arithmetic instructions have versions that can update CR[CR0] and/or XER[SO, OV] (Summary Overflow, Overflow), based on the result of the instruction. Some integer arithmetic instructions also update XER[CA] (Carry) implicitly. See Integer Processing on page 110 for more information about how these instructions update the CR and/or the XER.

Table 2-12 lists the integer arithmetic instructions in the A2 core. In the table, the syntax “[o]” indicates that the instruction has both an “o” form (which updates the XER[SO,OV] fields) and a “non-o” form. Similarly, the syntax “[.]” indicates that the instruction has both a “record” form (which updates CR[CR0]) and a “nonrecord” form.

Table 2-12. Integer Arithmetic Instructions

Add Subtract Multiply Divide Negate

add[o][.] addc[o][.] adde[o][.] addi addic

[.]

addis addme

[o][.]

addze[o][.]

subf[o][.] subfc[o][.] subfe[o][.] subfic subfme

[o][.]

subfze[o][.]

mulhw[.] mulhwu[.] mulli

[o][.]

mullw mulhd[.] mulhdu[.] mulld[o][.]

divw[o][.] divwu[o][.] divwe[o][.] divweu[o][.] divd

[o][.]

divdu[o][.] divde[o][.] divdeu[o][.]

neg

[o][.]

Version 1.3 October 23, 2012

CPU Programming Model

Page 91 of 864

Page 92

User’s Manual

A2 Processor

2.8.1.3 Integer Logical Instructions

Table 2-13 lists the integer logical instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.

Table 2-13. Integer Logical Instructions

Or with

Com-

ple-

ment

Nor Xor

orc[.] nor[.]

xor[.] xori xoris

Equiva-

lence

eqv[.]

Extend

Sign

extsb[.] extsh[.] extsw[.]

Count

Leading

Zeros

cntlzw[.] cntlzd[.]

Permute Parity

bpermd

prtyw prtyd

And

and[.] andi. andis.

And with Comple-

ment

Nand Or

andc[.] nand[.]

or[.] ori oris

2.8.1.4 Integer Compare Instructions

These instructions perform arithmetic or logical comparisons between two operands and update the CR with the result of the comparison.

Table 2-14 lists the integer compare instructions in the A2 core.

Table 2-14. Integer Compare Instructions

Arithmetic Logical

cmp cmpi cmpb

cmpl cmpli

2.8.1.5 Integer Trap Instructions

Table 2-15 lists the integer trap instructions in the A2 core.

Table 2-15. Integer Trap Instructions

Tr ap

tw twi

td tdi

2.8.1.6 Integer Rotate Instructions

These instructions rotate operands stored in the GPRs. Rotate instructions can also mask rotated operands.

Table 2-16 lists the rotate instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.

CPU Programming Model

Page 92 of 864

Version 1.3

October 23, 2012

Page 93

User’s Manual

A2 Processor

Table 2-16. Integer Rotate Instructions

Rotate and Insert Rotate and Mask Rotate and Clear

rldcl[.]

rlwimi[.] rldimi[.]

rlwinm[.] rlwnm[.]

rldcr[.] rldic[.] rldicl[.] rldicr[.]

2.8.1.7 Integer Shift Instructions

Table 2-17 lists the integer shift instructions in the A2 core. Note that the shift right algebraic instructions implicitly update the XER[CA] field. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.

Table 2-17. Integer Shift Instructions

Shift Left Shift Right

Shift Right

Algebraic

sraw[.] slw[.] sld[.]

srw[.] srd[.]

srawi[.]

srad[.]

sradi[.]

2.8.1.8 Integer Population Count Instructions

Table 2-18 lists the integer population count instructions in the A2 core.

Table 2-18. Integer Population Count Instructions

Pop Count

popcntb popcntw popcntd

2.8.1.9 Integer Select Instruction

Table 2-19 lists the integer select instruction in the A2 core. The RA operand is 0 if the RA field of the instruction is 0; it is the contents of GPR(RA) otherwise.

Table 2-19. Integer Select Instruction

Integer Select

isel

Version 1.3 October 23, 2012

CPU Programming Model

Page 93 of 864

Page 94

User’s Manual

A2 Processor

2.8.2 Branch Instructions

These instructions unconditionally or conditionally branch to an address. Conditional branch instructions can test condition codes set in the CR by a previous instruction and branch accordingly. Conditional branch instructions can also decrement and test the Count Register (CTR) as part of branch determination and can save the return address in the Link Register (LR). The target address for a branch can be a displacement from the current instruction address or an absolute address or contained in the LR or CTR.

See Wait Instruction on page 98 for more information about branch operations.

Table 2-20 lists the branch instructions in the A2 core. In the table, the syntax “[l]” indicates that the instruc- tion has both a “link update” form (which updates LR with the address of the instruction after the branch) and a “nonlink update” form. Similarly, the syntax “[a]” indicates that the instruction has both an “absolute address” form (in which the target address is formed directly using the immediate field specified as part of the instruction) and a “relative” form (in which the target address is formed by adding the specified immediate field to the address of the branch instruction).

Table 2-20. Branch Instructions

Branch

b[l][a] bc[l][a] bcctr[l] bclr[l]

2.8.3 Processor Control Instructions

Processor control instructions manipulate system registers, perform system software linkage, and synchronize processor operations. The instructions in these three subcategories of processor control instructions are described below.

2.8.3.1 Condition Register Logical Instructions

These instructions perform logical operations on a specified pair of bits in the CR, placing the result in another specified bit. The benefit of these instructions is that they can logically combine the results of several comparison operations without incurring the overhead of conditional branching between each one. Software performance can significantly improve if multiple conditions are tested at once as part of a branch decision.

Table 2-21 lists the condition register logical instructions in the A2 core.

Table 2-21. Condition Register Logical Instructions

crand crandc creqv crnand

crnor cror crorc crxor

CPU Programming Model

Page 94 of 864

Version 1.3

October 23, 2012

Page 95

User’s Manual

A2 Processor

2.8.3.2 Register Management Instructions

These instructions move data between the GPRs and control registers in the A2 core.

Table 2-22 lists the register management instructions in the A2 core.

Table 2-22. Register Management Instructions

CR DCR

mcrf mcrxr mfcr mfocrf mtcrf mtocrf

1. When CCR2(EN_DCR) is zero, DCR instructions are dropped silently. They are no-ops and do not cause an exception.

mfdcr mfdcrx mfdcrux mtdcr mtdcrx mtdcrux

MSR SPR TB

mfmsr mtmsr wrtee

mfspr mtspr

mttb

wrteei

2.8.3.3 System Linkage Instructions

These instructions invoke supervisor software level for system services and return from interrupts.

When executing in the guest state (MSR[GS,PR] = 0b10), execution of an rfi instruction is mapped to rfgi and the rfgi instruction is executed in place of the rfi.

Table 2-23 lists the system linkage instructions in the A2 core.

Table 2-23. System Linkage Instructions

ehpriv rfi rfci rfgi rfmci sc

2.8.3.4 Processor Control Instructions

The msgsnd and msgclr instructions are provided for sending and clearing messages to processors and other devices in the coherence domain. These instructions are hypervisor privileged.

Table 2-28 shows the processor control instructions in the A2 core.

Table 2-24. Processor Control Instruction

msgsnd msgclr

2.8.4 Storage Control Instructions

These instructions manage the instruction and data caches and the TLB of the A2 core. Instructions are also provided to synchronize and order storage accesses. The instructions in these three subcategories of storage control instructions are described in the following sections.

Version 1.3 October 23, 2012

CPU Programming Model

Page 95 of 864

Page 96

User’s Manual

A2 Processor

2.8.4.1 Cache Management Instructions

These instructions control the operation of the data and instruction caches. Instructions are provided to fill, flush, invalidate, or zero data cache blocks, where a block is defined as a 64-byte cache line. Instructions are also provided to fill or invalidate instruction cache blocks.

Table 2-25 lists the cache management instructions in the A2 core.

Table 2-25. Cache Management Instructions

Data Cache Instruction Cache

dcba dcbf dcbi dcbst dcbt

icbi

icbt dcbtst dcbz

icbtls

icblc dcbtls dcbtstls dcblc

Table 2-26. Cache Management Instructions by External Process ID

Data Cache Instruction Cache

dcbstep dcbtep dcbfep

icbiep dcbtstep dcbzep

2.8.4.2 TLB Management Instructions

The TLB management instructions read and write entries of the TLB array and search the TLB array for an entry that will translate a given virtual address.

Table 2-27 lists the TLB management instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.

Table 2-27. TLB Management Instructions

tlbre tlbsx[.] tlbsync tlbwe tlbivax

CPU Programming Model

Page 96 of 864

Version 1.3

October 23, 2012

Page 97

User’s Manual

A2 Processor

2.8.4.3 Processor Synchronization Instruction

The processor synchronization instruction, isync, forces the processor to complete all instructions preceding the isync before allowing any context changes as a result of any instructions that follow the isync. Additionally, all instructions that follow the isync will execute within the context established by the completion of all the instructions that precede the isync. See Synchronization on page 122 for more information about the synchronizing effect of isync.

Table 2-28 shows the processor synchronization instructions in the A2 core.

Table 2-28. Processor Synchronization Instruction

isync sync

2.8.4.4 Load and Reserve and Store Conditional Instructions

The load and reserve and store conditional instructions can be used to construct a sequence of instructions that appears to perform an atomic update operation on an aligned storage location.

The A2 core implements the exclusive access hint (EH) included in load and reserve instructions.

Table 2-29. Load and Reserve and Store Conditional Instructions

Loads Stores

Word Double Word Double

lwarx ldarx stwcx. stdcx.

2.8.4.5 Storage Synchronization Instructions

The storage synchronization instructions allow software to enforce ordering amongst the storage accesses caused by load and store instructions, which by default are weakly-ordered by the processor. “Weaklyordered” means that the processor is architecturally permitted to perform loads and stores generally out-oforder with respect to their sequence within the instruction stream, with some exceptions. However, if a storage synchronization instruction is executed, then all storage accesses prompted by instructions preceding the synchronizing instruction must be performed before any storage accesses prompted by instructions that come after the synchronizing instruction. See Synchronization on page 122 for more information about storage synchronization.

msync is an extended mnemonic for the synchronize instruction so that it can be coded with the L value as part of the mnemonic rather than as a numeric operand.

Table 2-28 shows the storage synchronization instructions in the A2 core.

Table 2-30. Storage Synchronization Instructions

msync mbar

Version 1.3 October 23, 2012

CPU Programming Model

Page 97 of 864

Page 98

User’s Manual

A2 Processor

2.8.4.6 Wait Instruction

The wait instruction allows instruction fetching and execution to be suspended under certain conditions, depending on the value of the WC field. WC = 11 is treated as a no-op instruction. WC = 10 specifies a wake condition determined by the an A2 input signal called an_ac_sleep_en.

Table 2-31 shows the wait instructions in the A2 core.

Table 2-31. Wait Instruction

wait

2.8.5 Initiate Coprocessor Instructions

Initiation of a coprocessor is requested by issuing the Initiate Coprocessor Store Word Indexed (icswx) instruction. A coprocessor is not a standard processor, but instead is a specialized processor that is capable of one or more particular tasks with the intent to provide acceleration of each task that might have otherwise been done by the program. See Section 12.5 Coprocessor Instructions on page 513.

Table 2-32 shows the icswx instructions in the A2 core.

Table 2-32. Initiate Coprocessor Instructions

icswx[.] icswepx[.]

2.8.5.1 Cache Initialization Instructions

The dci and ici instructions are privileged instructions, and if executed in supervisor mode they will flash invalidate the entire associated cache. They do not generate an address, nor are they affected by the access control mechanism.

Table 2-28 shows the cache initialization instructions in the A2 core.

Table 2-33. Cache Initialization Instructions

dci ici

The dci and ici instructions have a CT field. The following describes the affects of the CT field.

• CT = 0 indicates L1 only. The L1 cache will be invalidated and request is not sent to the L2.

• CT = 2 indicates L1 and L2. The L1 cache will be invalidated and request is sent to the L2.

• CT != 0,2 indicates a no-op. No L1 caches are invalidated and the request is not sent to the L2.

CPU Programming Model

Page 98 of 864

Version 1.3

October 23, 2012

Page 99

User’s Manual

A2 Processor

2.9 Branch Processing

The four branch instructions provided by A2 core are summarized in Table 2.8.2 on page 94. The following sections provide additional information about branch addressing, instruction fields, prediction, and registers.

2.9.1 Branch Addressing

The branch instruction (b[l][a]) specifies the displacement of the branch target address as a 26-bit value (the 24-bit LI field right-extended with 0b00). This displacement is regarded as a signed 26-bit number covering an address range of 32 MB. Similarly, the branch conditional instruction (bc[l][a]) specifies the displacement as a 16-bit value (the 14-bit BD field right-extended with 0b00). This displacement covers an address range of 32 KB.

For the relative form of the branch and branch conditional instructions (b[l] and bc[l], with instruction field AA = 0), the target address is the address of the branch instruction itself (the current instruction address, or CIA) plus the signed displacement. This address calculation is defined to “wrap around” from the maximum effective address (0xFFFF_FFFF_FFFF_FFFF) to 0x0000_0000_0000_0000 and vice-versa.

For the absolute form of the branch and branch conditional instructions (ba[l] and bca[l], with instruction field AA = 1), the target address is the sign-extended displacement. This means that with absolute forms of the branch and branch conditional instructions, the branch target can be within the first or last 32 MB or 32 KB of the address space, respectively.

The other two branch instructions, bclr (branch conditional to LR) and bcctr (branch conditional to CTR), do not use absolute or relative addressing. Instead, they use indirect addressing, in which the target of the branch is specified indirectly as the contents of the LR or CTR.

2.9.2 Branch Instruction BI Field

Conditional branch instructions can optionally test one bit of the CR, as indicated by instruction field BO[0] (see Section 2.9.3). The value of instruction field BI specifies the CR bit to be tested (32-63). The BI field is ignored if BO[0] = 1. The branch (b[l][a]) instruction is by definition unconditional; hence, it does not have a BI instruction field. Instead, the position of this field is part of the LI displacement field.

2.9.3 Branch Instruction BO Field

The BO field specifies the condition under which a conditional branch is taken and whether the branch decrements the CTR as shown in Table 2-34. In the table, M = 0 in 64-bit mode and M = 32 in 32-bit mode. The branch (b[l][a]) instruction is by definition unconditional; hence, it does not have a BO instruction field. Instead, the position of this field is part of the LI displacement field.

Conditional branch instructions can optionally test one bit in the CR. This option is selected when BO[0] = 0. If BO[0] = 1, the CR does not participate in the branch condition test. If the CR condition option is selected, the condition is satisfied (branch can occur) if the CR bit selected by the BI instruction field matches BO[1].

Conditional branch instructions can also optionally decrement the CTR by one and test whether the decremented value is 0. This option is selected when BO[2] = 0. If BO[2] = 1, the CTR is not decremented and does not participate in the branch condition test. If the CTR decrement option is selected, BO[3] specifies the condition that must be satisfied to allow the branch to be taken. If BO[3] = 0, CTR  0 is required for the branch to occur. If BO[3] = 1, CTR = 0 is required for the branch to occur.

Version 1.3 October 23, 2012

CPU Programming Model

Page 99 of 864

Page 100

User’s Manual

A2 Processor

Table 2-34. BO Field Encodings

BO Description Description

0000z Decrement the CTR, then branch if the decremented CTRM:63 neq 0 and CRBI = 0.

0001z Decrement the CTR, then branch if the decremented CTRM:63 = 0 and CRBI = 0.

001at Branch if CRBI = 0.

0100z Decrement the CTR, then branch if the decremented CTRM:63 neq 0 and CRBI = 1.

0101z Decrement the CTR, then branch if the decremented CTRM:63 = 0 and CRBI = 1.

011at Branch if CRBI = 1.

1a00t Decrement the CTR, then branch if the decremented CTRM:63 neq 0.

1a01t Decrement the CTR, then branch if the decremented CTRM:63 = 0.

1z1zz Branch always.

Notes:

1. ‘z’ denotes a bit that is ignored.

2. The ‘a’ and ‘t’ bits are used as described in Table 2-35 on page 100.

The “a” and “t” bits of the BO field can be used by software to provide a hint about whether the branch is likely to be taken or is likely not to be taken, as shown in Table 2-35.

Table 2-35. ‘at’ Bit Encodings

at Hint

00 No hint is given.

01 Reserved.

10 The branch is very likely not to be taken.

11 The branch is very likely to be taken.

This implementation has dynamic mechanisms for predicting whether a branch will be taken. Because the dynamic prediction is likely to be very accurate and is likely to be overridden by any hint provided by the “at” bits, the “at” bits should be set to 0b00 unless the static prediction implied by at = 0b10 or at = 0b11 is highly likely to be correct.

2.9.4 Branch Prediction

The following sections detail the methods by which the branch predictor decodes incoming branches, generates predictions for both the direction and target of these branches, and guides instruction flow based on these predictions.

2.9.4.1 Branch Decoder

Before the branch predictor itself, every instruction cache line is passed through the branch decoder. The primary purpose of the branch decoder is to identify any valid branch instructions contained within the cache line. Valid branches include b, bc, bclr, bcctr, and their derivatives.

The branch decoder also decodes any hints contained within the branch instructions. Hints can be specified for any branch conditional instruction (bc, bclr, bcctr, and their derivatives). Hints are encoded in the branch instruction's BO field.

CPU Programming Model

Page 100 of 864

Version 1.3

October 23, 2012

IBM A2 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Title Page

Copyright and Disclaimer

Contents

List of Figures

List of Tables

Revision Log

About This Book

Who Should Use This Book

How to Use This Book

Notation

Related Publications

List of Acronyms and Abbreviations

1. Overview

1.1 A2 Core Key Design Fundamentals

1.2 A2 Core Features

1.3 The A2 Core as a Power ISA Implementation

1.3.1 Embedded Hypervisor

1.4 A2 Core Organization

1.4.1 Instruction Unit

1.4.2 Execution Unit

1.4.3 Instruction and Data Cache Controllers

1.4.3.1 Instruction Cache Controller

1.4.3.2 Data Cache Controller

1.4.4 Memory Management Unit (MMU)

1.4.5 Timers

1.4.6 Debug Facilities

1.4.6.1 Debug Modes

1.4.6.2 Development Tool Support

1.4.7 Floating-Point Unit Organization

1.4.7.1 Arithmetic and Load/Store Pipelines

1.4.8 IEEE 754 and Architectural Compliance

1.4.8.1 IEEE 754 Compliance

1.4.9 Floating-Point Unit Implementation

1.4.9.1 Reciprocal Estimates

1.4.9.2 Denormalized B Operands

1.4.9.3 Non-IEEE mode

1.4.10 Floating-Point Unit Interfaces

1.4.10.1 A2 Processor Core Interface

1.4.10.2 Clock and Power Management Interface

1.5 Core Interfaces

1.5.1 System Interface

1.5.2 Auxiliary Execution Unit (AXU) Port

1.5.3 JTAG Port

2. CPU Programming Model

2.1 Logical Partitioning

2.1.1 Overview

2.2 Storage Addressing

2.2.1 Storage Operands

2.2.2 Effective Address Calculation

2.2.2.1 Data Storage Addressing Modes

2.2.2.2 Instruction Storage Addressing Modes

2.2.3 Byte Ordering

2.2.3.1 Structure Mapping Examples

2.2.3.2 Instruction Byte Ordering

2.2.3.3 Data Byte Ordering

2.2.3.4 Byte-Reverse Instructions

2.3 Multithreading

2.3.1 Thread Identification

2.3.1.1 Thread Identification Register (TIR)

2.3.1.2 Processor Identification Register (PIR)

2.3.1.3 Guest Processor Identification Register (GPIR)

2.3.2 Thread Run State

2.3.2.1 Thread Stop I/O Pin

2.3.2.2 Thread Control and Status Register (THRCTL)

2.3.2.3 Core Configuration Register 0 (CCR0)

2.3.2.4 Thread Enable Register (TENS, TENC)

2.3.2.5 Thread Enable Status Register (TENSR)

2.3.3 Wake On Interrupt

2.3.3.1 Core Configuration Register 1 (CCR1)

2.3.4 Thread Priority

2.3.4.1 Program Priority Register (PPR32)

2.3.4.2 Instruction Unit Configuration Register 1 (IUCR1)

2.3.5 Resources Shared between Threads

2.3.6 Shared Resources

2.3.6.1 Accessing Shared Resources

2.3.7 Duplicated Resources