Digital Equipment Alpha 21164PC Hardware Reference Manual

Page 1
Digital Semiconductor Alpha 21164PC Microprocessor
Hardware Reference Manual
Order Number: EC–R2W0A–TE
Revision/Update Information: This is a preliminary document.
Digital Equipment Corporation Maynard, Massachusetts
http://www.digital.com/semiconductor
Page 2
September 1997
While DIGITAL believes the informa ti on included in this pub li cation is correct as of the date of publication, it is subject to chang e without notice.
Digital Equipment Corpora ti on makes no representations that the use of its products in the manner de scri bed in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of li ce nses to make, use, or sell equipm e n t or software in accordance with the description.
©Digital Equipment Corporation 1997. All rights reserved. Printed in U.S.A.
DIGITAL, Digital Semiconductor, OpenVMS, VAX, the AlphaGeneration design mark, and the DIGITAL logo ar e trademarks of Digital Equipment Corporation.
Digital Semiconducto r is a Digital Equipment Corporation business.
GRAFOIL is a registered trademark of Union Carbide Corporation. IEEE is a registered trademark of The Institute of Ele ct rical and Electronics Eng ine ers, Inc. Windows NT is a trademark of Microsoft Corp oration.
All other trademarks and registe re d trademarks are the property of t heir respective owners.
29 September 1997 – Subject to Change
Page 3

Contents

Preface
1 Introduction
1.1 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1
1.1.1 Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2
1.1.2 Integer Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.1.3 Floating-Point Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3
1.2 21164PC Microprocessor Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-4
2 Internal Architecture
2.1 21164PC Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2
2.1.1 Instruction Fetch/Decode Unit and Branch Unit. . . . . . . . . . . . . . . . . . . . . . . 2-3
2.1.1.1 Instruction Decode and Issue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
2.1.1.2 Instruction Prefetch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4
2.1.1.3 Branch Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5
2.1.1.4 Instruction Translation Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7
2.1.1.5 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8
2.1.2 Integer Execution Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.1.3 Floating-Point Execution Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9
2.1.4 Memory Address Translation Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.1.4.1 Data Translation Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-10
2.1.4.2 Load Instruction and the Miss Address File . . . . . . . . . . . . . . . . . . . . . . 2-11
2.1.4.3 Dcache Control and Store Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . 2-11
2.1.4.4 Write Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
2.1.5 Cache Control and Bus Interface Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
2.1.6 Cache Organization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-12
2.1.6.1 Data Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
2.1.6.2 Instruction Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
2.1.6.3 External Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
2.1.7 Serial Read-Only Memory Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
29 September 1997 – Subject to Change
iii
Page 4
2.2 Pipeline Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-13
2.2.1 Pipeline Stages and Instruction Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.2.2 Aborts and Exceptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2.2.3 Nonissue Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
2.3 Scheduling and Issuing Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
2.3.1 Instruction Class Definition and Instruction Slotting. . . . . . . . . . . . . . . . . . . . 2-19
2.3.2 Coding Guidelines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-22
2.3.3 Instruction Latencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-23
2.3.3.1 Producer–Producer Latency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-26
2.3.4 Issue Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-27
2.4 Replay Traps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-28
2.5 Miss Address File and Load-Merging Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.5.1 Merging Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.5.1.1 Cacheable Space Load-Merge Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . 2-29
2.5.1.2 Noncacheable Space Load-Merge Rules. . . . . . . . . . . . . . . . . . . . . . . . 2-30
2.5.2 Read Requests to the CBU. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-30
2.5.3 MAF Entries and MAF Full Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.5.4 Fill Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31
2.6 MTU Store Instruction Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32
2.7 Write Buffer and the WMB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33
2.7.1 The Write Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
2.7.2 The Write Memory Barrier (WMB) Instruction . . . . . . . . . . . . . . . . . . . . . . . . 2-34
2.7.3 Entry-Pointer Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-34
2.7.4 Write Buffer Entry Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-35
2.7.5 Ordering of Noncacheable Space Write Instructions. . . . . . . . . . . . . . . . . . . 2-36
2.8 Performance Measurement Support–Performance Counters . . . . . . . . . . . . . . . 2-36
2.8.1 CBU Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-37
2.9 Floating-Point Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-40
2.10 Design Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42
3 Hardware Interface
3.1 21164PC Microprocessor Logic Symbol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1
3.2 21164PC Signal Names and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3
4 Clocks, Cache, and External Interface
4.1 Introduction to the External Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.1 System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2
4.1.1.1 Commands and Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
4.1.2 Bcache Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.2.1 Bcache Interface Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.2.2 Pipelined Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4
4.1.2.3 Write Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5
iv
29 September 1997 – Subject to Change
Page 5
4.2 Clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4.2.1 CPU Clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4.2.2 System Clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4.2.3 Delayed System Clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4.3 Physical Address Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
4.3.1 Physical Address Regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
4.3.2 Data Wrapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
4.3.3 Noncached Read Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
4.3.4 Noncached Write Operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
4.4 Bcache Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12
4.4.1 Bcache Victim Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
4.5 Cache Coherency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13
4.5.1 Flush Cache Coherency Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
4.6 21164PC-to-Bcache Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.6.1 Synchronous Burst-Mode Cache Support. . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4.6.2 Bcache Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-18
4.6.3 Bcache Private Read Transaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-20
4.6.4 Bcache st_clk Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21
4.6.5 Bcache Private Write Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-22
4.6.5.1 Bcache Private Write-Probe Operation . . . . . . . . . . . . . . . . . . . . . . . . . 4-22
4.6.5.2 Bcache Private Data-Write Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
4.6.5.3 Interleaving Write-Probes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
4.6.6 Selecting Bcache Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
4.7 21164PC-Initiated System Transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
4.7.1 READ MISS Clean - No Victim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-30
4.7.2 FILL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-32
4.7.3 READ MISS with Victim. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-33
4.7.4 WRITE BLOCK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-37
4.8 System-Initiated Transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38
4.8.1 Sending Commands to the 21164PC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38
4.8.2 Write Invalidate Protocol Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-40
4.8.2.1 21164PC Responses to Flush-Based Protocol Commands. . . . . . . . . . 4-41
4.8.2.2 FLUSH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-41
4.8.2.3 INVALIDATE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-43
4.8.2.4 READ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-43
4.9 Data Bus and Command/Address Bus Contention. . . . . . . . . . . . . . . . . . . . . . . . 4-45
4.9.1 Command/Address Bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45
4.9.2 Read/Write Spacing—Data Bus Contention . . . . . . . . . . . . . . . . . . . . . . . . . 4-46
4.9.3 Using idle_bc_h and fill_h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-46
4.9.4 Using data_bus_req_h. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-47
4.9.5 Tristate Overlap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-48
4.9.5.1 Private READ or WRITE to FILL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-48
4.9.5.2 System READ to FILL (System WRITE) Spacing. . . . . . . . . . . . . . . . . . 4-49
4.9.5.3 FILL to Private READ or WRITE Operation . . . . . . . . . . . . . . . . . . . . . . 4-50
4.10 21164PC Interface Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-50
4.10.1 Fill Operations After Other Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-50
4.10.2 Command Acknowledge for WRITE BLOCK Commands . . . . . . . . . . . . . . . 4-51
29 September 1997 – Subject to Change
v
Page 6
4.11 21164PC/System Race Conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-51
4.11.1 Rules for 21164PC and System Use of External Interface . . . . . . . . . . . . . . 4-51
4.11.2 READ MISS with Victim Aborted by FILL Example. . . . . . . . . . . . . . . . . . . . 4-53
4.11.3 idle_bc_h and cack_h Race Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-54
4.11.4 READ MISS with idle_bc_h Asserted Example. . . . . . . . . . . . . . . . . . . . . . . 4-55
4.11.5 READ MISS with Victim Aborted by System Command Example. . . . . . . . . 4-56
4.11.6 Bcache Hit Under READ MISS Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-57
4.12 Data Integrity and Bcache Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-57
4.12.1 Data Parity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-57
4.12.2 Bcache Tag Data Parity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
4.12.3 Fill Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
4.13 Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
4.13.1 Interrupt Signals During Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
4.13.2 Interrupt Signals During Normal Operation . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
4.13.3 Interrupt Priority Level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
5 Internal Processor Registers
5.1 Instruction Fetch/Decode Unit and Branch Unit (IDU) IPRs . . . . . . . . . . . . . . . . . 5-5
5.1.1 Istream Translation Buffer Tag (ITB_TAG) Register (101) . . . . . . . . . . . . . . 5-5
5.1.2 Instruction Translation Buffer Page Table Entry (ITB_PTE) Register (102) . 5-5
5.1.3 Instruction Translation Buffer Address Space Number (ITB_ASN)
Register (103) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.4 Instruction Translation Buffer Page Table Entry Temporary
(ITB_PTE_TEMP) Register (104) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.5 Instruction Translation Buffer Invalidate All Process (ITB_IAP)
Register (106) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-7
5.1.6 Instruction Translation Buffer Invalidate All (ITB_IA) Register (105). . . . . . . 5-8
5.1.7 Instruction Translation Buffer IS (ITB_IS) Register (107) . . . . . . . . . . . . . . . 5-8
5.1.8 Formatted Faulting Virtual Address (IFAULT_VA_FORM) Register (112) . . 5-9
5.1.9 Virtual Page Table Base (IVPTBR) Register (113) . . . . . . . . . . . . . . . . . . . . 5-10
5.1.10 Icache Parity Error Status (ICPERR_STAT) Register (11A) . . . . . . . . . . . . . 5-11
5.1.11 Icache Flush Control (IC_FLUSH_CTL) Register (119). . . . . . . . . . . . . . . . . 5-11
5.1.12 Exception Address (EXC_ADDR) Register (10B) . . . . . . . . . . . . . . . . . . . . . 5-12
5.1.13 Exception Summary (EXC_SUM) Register (10C) . . . . . . . . . . . . . . . . . . . . . 5-12
5.1.14 Exception Mask (EXC_MASK) Register (10D) . . . . . . . . . . . . . . . . . . . . . . . 5-14
5.1.15 PAL Base Address (PAL_BASE) Register (10E). . . . . . . . . . . . . . . . . . . . . . 5-15
5.1.16 IDU Current Mode (ICM) Register (10F) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5.1.17 IDU Control and Status (ICSR) Register (118) . . . . . . . . . . . . . . . . . . . . . . . 5-16
5.1.18 Interrupt Priority Level (IPLR) Register (110). . . . . . . . . . . . . . . . . . . . . . . . . 5-18
5.1.19 Interrupt ID (INTID) Register (111) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19
5.1.20 Asynchronous System Trap Request (ASTRR) Register (109). . . . . . . . . . . 5-20
5.1.21 Asynchronous System Trap Enable (ASTER) Register (10A). . . . . . . . . . . . 5-20
5.1.22 Software Interrupt Request (SIRR) Register (108) . . . . . . . . . . . . . . . . . . . . 5-21
5.1.23 Hardware Interrupt Clear (HWINT_CLR) Register (115). . . . . . . . . . . . . . . . 5-22
vi
29 September 1997 – Subject to Change
Page 7
5.1.24 Interrupt Summary (ISR) Register (100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23
5.1.25 Serial Line Transmit (SL_XMIT) Register (116). . . . . . . . . . . . . . . . . . . . . . . 5-25
5.1.26 Serial Line Receive (SL_RCV) Register (117). . . . . . . . . . . . . . . . . . . . . . . . 5-26
5.1.27 Performance Counter (PMCTR) Register (11C) . . . . . . . . . . . . . . . . . . . . . . 5-27
5.2 Memory Address Translation Unit (MTU) IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
5.2.1 Dstream Translation Buffer Address Space Number (DTB_ASN)
Register (200) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-31
5.2.2 Dstream Translation Buffer Current Mode (DTB_CM) Register (201). . . . . . 5-31
5.2.3 Dstream Translation Buffer Tag (DTB_TAG) Register (202). . . . . . . . . . . . . 5-32
5.2.4 Dstream Translation Buffer Page Table Entry (DTB_PTE) Register (203) . . 5-32
5.2.5 Dstream Translation Buffer Page Table Entry Temporary (DTB_PTE_TEMP)
Register (204) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
5.2.6 Dstream Memory Management Fault Status (MM_STAT) Register (205). . . 5 -35
5.2.7 Faulting Virtual Address (VA) Register (206). . . . . . . . . . . . . . . . . . . . . . . . . 5-36
5.2.8 Formatted Virtual Address (VA_FORM) Register (207). . . . . . . . . . . . . . . . . 5-37
5.2.9 MTU Virtual Page Table Base (MVPTBR) Register (208). . . . . . . . . . . . . . . 5-38
5.2.10 Dcache Parity Error Status (DC_PERR_STAT) Register (212). . . . . . . . . . . 5-39
5.2.11 Dstream Translation Buffer Invalidate All Process (DTB_IAP)
Register (209) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5.2.12 Dstream Translation Buffer Invalidate All (DTB_IA) Register (20A) . . . . . . . 5-40
5.2.13 Dstream Translation Buffer Invalidate Single (DTB_IS) Register (20B) . . . . 5-41
5.2.14 MTU Control (MCSR) Register (20F). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-42
5.2.15 Dcache Mode (DC_MODE) Register (216) . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44
5.2.16 Miss Address File Mode (MAF_MODE) Register (217). . . . . . . . . . . . . . . . . 5-46
5.2.17 Dcache Flush (DC_FLUSH) Register (210). . . . . . . . . . . . . . . . . . . . . . . . . . 5-49
5.2.18 Alternate Mode (ALT_MODE) Register (20C). . . . . . . . . . . . . . . . . . . . . . . . 5-49
5.2.19 Cycle Counter (CC) Register (20D). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-50
5.2.20 Cycle Counter Control (CC_CTL) Register (20E) . . . . . . . . . . . . . . . . . . . . . 5-51
5.2.21 Dcache Test Tag Control (DC_TEST_CTL) Register (213). . . . . . . . . . . . . . 5-52
5.2.22 Dcache Test Tag (DC_TEST_TAG) Register (214). . . . . . . . . . . . . . . . . . . . 5-54
5.2.23 Dcache Test Tag Temporary (DC_TEST_TAG_TEMP) Register (215) . . . . 5-56
5.3 External Interface Control (CBU) IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-58
5.3.1 CBU Configuration (CBOX_CONFIG) Register (FF FFF0 0008). . . . . . . . . . 5-59
5.3.2 CBU Address (CBOX_ADDR) Register (FF FFF0 0088). . . . . . . . . . . . . . . . 5-62
5.3.3 CBU Status (CBOX_STATUS) Register (FF FFF0 0108) . . . . . . . . . . . . . . . 5-63
5.3.4 CBU Configuration #2 (CBOX_CONFIG2) Register (FF FFF0 0188) . . . . . . 5-65
5.4 PALcode Storage Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68
5.5 Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68
5.5.1 CBU IPR PALcode Restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68
5.5.2 PALcode Restrictions—Instruction Definitions. . . . . . . . . . . . . . . . . . . . . . . . 5-69
6 Privileged Architecture Library Code
6.1 PALcode Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1
6.2 PALmode Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-2
29 September 1997 – Subject to Change
vii
Page 8
6.3 Invoking PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-3
6.4 PALcode Entry Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.1 CALL_PAL Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5
6.4.2 PALcode Trap Entry Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6.5 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
6.6 21164PC Implementation of the Architecturally Reserved Opcodes . . . . . . . . . . 6-7
6.6.1 HW_LD Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6.6.2 HW_ST Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6.6.3 HW_REI Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
6.6.4 HW_MFPR and HW_MTPR Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
7 Initialization and Configuration
7.1 Input Signals sys_reset_l and dc_ok_h and Booting . . . . . . . . . . . . . . . . . . . . . . 7-1
7.1.1 Pin State with dc_ok_h Not Asserted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-5
7.2 sysclk Ratio and Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.3 Built-In Self-Test (BiSt). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.4 Serial Read-Only Memory Interface Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-6
7.4.1 Serial Instruction Cache Load Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-7
7.5 Serial Terminal Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.6 Cache Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-8
7.6.1 Icache Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.6.2 Flushing Dirty Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.7 External Interface Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-9
7.8 Internal Processor Register Reset State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10
7.9 Timeout Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-12
7.10 IEEE 1149.1 Test Port Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-13
8 Error Detection and Error Handling
8.1 Error Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.1.1 Icache Data or Tag Parity Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-1
8.1.2 Dcache Data Parity Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.3 Dcache Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2
8.1.4 Istream Data Parity Errors (Bcache or Memory) . . . . . . . . . . . . . . . . . . . . . . 8-3
8.1.5 Dstream Data Parity Errors (Bcache or Memory) . . . . . . . . . . . . . . . . . . . . . 8-3
8.1.6 Bcache Tag Parity Errors—Istream. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.1.7 Bcache Tag Parity Errors—Dstream . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-4
8.1.8 System Read Operations of the Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.1.9 Fill Timeout (FILL_ERROR_H) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.1.10 System Machine Check. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.1.11 IDU Timeout. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-5
8.2 MCHK Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-6
8.3 MCK_INTERRUPT Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-7
viii
29 September 1997 – Subject to Change
Page 9
9 Electrical Data
9.1 Electrical Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9.2 DC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.2.1 Power Supply. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.2.2 Input Signal Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9.2.3 Output Signal Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
9.3 Clocking Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.3.1 Input Clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-5
9.3.2 Clock Termination and Impedance Levels. . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
9.3.3 AC Coupling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
9.4 AC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
9.4.1 Test Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
9.4.2 Pin Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
9.4.2.1 Backup Cache Loop Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
9.4.2.2 sys_clk-Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-13
9.4.3 Timing—Additional Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
9.4.4 Timing of Test Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
9.4.4.1 Icache BiSt Operation Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
9.4.4.2 Automatic SROM Load Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
9.4.5 Clock Test Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
9.4.5.1 Normal (1× Clock) Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
9.4.5.2 Clock Test Reset Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
9.4.6 IEEE 1149.1 (JTAG) Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
9.5 Power Supply Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
9.5.1 Decoupling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
9.5.1.1 Vdd Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
9.5.1.2 Vddi Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-22
9.5.2 Power Supply Sequencing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-23
10 Thermal Management
10.1 Operating Temperature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1
10.2 Heat-Sink Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
10.3 Thermal Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-4
11 Mechanical Packaging Information
11.1 Mechanical Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-1
11.2 Signal Descriptions and Pin Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
11.2.1 Signal Pin Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
11.2.2 Pin Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
29 September 1997 – Subject to Change
ix
Page 10
12 Testability and Diagnostics
12.1 Test Port Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1
12.2 Test Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
12.2.1 IEEE 1149.1 Test Access Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
12.2.2 Test Status Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6
12.3 Boundary-Scan Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-6
A Alpha Instruction Set
A.1 Alpha Instruction Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
A.1.1 Opcodes Reserved for DIGITAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
A.1.2 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A.2 IEEE Floating-Point Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A.3 VAX Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-12
A.4 Opcode Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-13
A.5 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-15
A.6 21164PC Microprocessor IEEE Floating-Point Conformance. . . . . . . . . . . . . . . . A-15
B 21164PC Microprocessor Specifications
C Serial Icache Load Predecode Values
D Errata Sheet
E Support, Products, and Documentation
Glossary Index
x
29 September 1997 – Subject to Change
Page 11

Figures

2–1 21164PC Microprocessor Block/Pipe Flow Diagram . . . . . . . . . . . . . . . . . . . . . . 2-2
2–2 Instruction Pipeline Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-14
2–3 Floating-Point Control Register (FPCR) Format. . . . . . . . . . . . . . . . . . . . . . . . . . 2-40
2–4 Typical Uniprocessor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-42
3–1 21164PC Microprocessor Logic Symbol. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2
4–1 21164PC System/Bcache Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-3
4–2 Merits of a Multiprobes In Flight – Pipelined Cache . . . . . . . . . . . . . . . . . . . . . . . 4-5
4–3 Tag/Data Store Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6
4–4 Clock Signals and Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4–5 21164PC Uniprocessor Clock. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-9
4–6 Flush-Based Protocol 21164PC States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-15
4–7 Flush-Based Protocol System/Bus States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16
4–8 SSRAM/Bcache Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17
4–9 Bcache Private Read Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-21
4–10 Bcache Private Write Probe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-23
4–11 Bcache Private Data – Write Hit Clean. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-24
4–12 Bcache Private Data – Write Hit Dirty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-25
4–13 Bcache Interleaving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-26
4–14 READ MISS Clean – Bcache Timing Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-31
4–15 READ MISS with Victim Timing Diagram, Pipelined Mode. . . . . . . . . . . . . . . . . . 4-35
4–16 READ MISS with Victim Timing Diagram, Flow-Through Mode . . . . . . . . . . . . . . 4-36
4–17 WRITE BLOCK Timing Diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38
4–18 Algorithm for System Sending Commands to the 21164PC. . . . . . . . . . . . . . . . . 4-39
4–19 FLUSH Timing Diagram (Bcache Hit) Flow-Through SSRAM . . . . . . . . . . . . . . . 4-42
4–20 INVALIDATE Timing Diagram – Bcache Hit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-43
4–21 READ Timing Diagram (Bcache Hit) Flow-Through SSRAM . . . . . . . . . . . . . . . . 4-44
4–22 Driving the Command/Address Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-45
4–23 Using data_bus_req_h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-48
4–24 System READ to FILL Spacing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-49
4–25 FILL to Private READ or WRITE Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-50
4–26 READ MISS with Victim Aborted by FILL Example. . . . . . . . . . . . . . . . . . . . . . . . 4-53
4–27 idle_bc_h and cack_h Race Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-54
4–28 READ MISS with idle_bc_h Asserted Example . . . . . . . . . . . . . . . . . . . . . . . . . . 4-55
4–29 READ MISS with Victim Abort Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-56
4–30 Bcache Hit Under READ MISS Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-57
4–31 21164PC Interrupt Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-58
5–1 Istream Translation Buffer Tag (ITB_TAG) Register. . . . . . . . . . . . . . . . . . . . . . . 5-5
5–2 Instruction Translation Buffer Page Table Entry (ITB_PTE) Register Write Format 5-6 5–3 Instruction Translation Buffer Page Table Entry (ITB_PTE) Register Read Format 5-6
5–4 Instruction Translation Buffer Address Space Number (ITB_ASN) Register . . . . 5-7
5–5 Instruction Translation Buffer IS (ITB_IS) Register. . . . . . . . . . . . . . . . . . . . . . . . 5-8
5–6 Formatted Faulting Virtual Address (IFAULT_VA_FORM) Register (NT_Mode=0) 5-9 5–7 Formatted Faulting Virtual Address (IFAULT_VA_FORM) Register (NT_Mode=1) 5-9
5–8 Virtual Page Table Base (IVPTBR) Register (NT_Mode=0). . . . . . . . . . . . . . . . . 5-10
29 September 1997 – Subject to Change
xi
Page 12
5–9 Virtual Page Table Base (IVPTBR) Register (NT_Mode=1). . . . . . . . . . . . . . . . . 5-10
5–10 Icache Parity Error Status (ICPERR_STAT) Register. . . . . . . . . . . . . . . . . . . . . . 5-11
5–11 Exception Address (EXC_ADDR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12
5–12 Exception Summary (EXC_SUM) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
5–13 Exception Mask (EXC_MASK) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14
5–14 PAL Base Address (PAL_BASE) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5–15 IDU Current Mode (ICM) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15
5–16 IDU Control and Status (ICSR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
5–17 Interrupt Priority Level (IPLR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18
5–18 Interrupt ID (INTID) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19
5–19 Asynchronous System Trap Request (ASTRR) Register . . . . . . . . . . . . . . . . . . . 5-20
5–20 Asynchronous System Trap Enable (ASTER) Register . . . . . . . . . . . . . . . . . . . . 5-20
5–21 Software Interrupt Request (SIRR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
5–22 Hardware Interrupt Clear (HWINT_CLR) Register . . . . . . . . . . . . . . . . . . . . . . . . 5-22
5–23 Interrupt Summary (ISR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23
5–24 Serial Line Transmit (SL_XMIT) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25
5–25 Serial Line Receive (SL_RCV) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26
5–26 Performance Counter (PMCTR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27
5–27 Dstream Translation Buffer Address Space Number (DTB_ASN) Register . . . . . 5-31
5–28 Dstream Translation Buffer Current Mode (DTB_CM) Register . . . . . . . . . . . . . . 5-31
5–29 Dstream Translation Buffer Tag (DTB_TAG) Register . . . . . . . . . . . . . . . . . . . . . 5-32
5–30 Dstream Translation Buffer Page Table Entry (DTB_PTE) Register—Write Format 5-33 5–31 Dstream Translation Buffer Page Table Entry Temporary (DTB_PTE_TEMP)
Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-34
5–32 Dstream Memory Management Fault Status (MM_STAT) Register. . . . . . . . . . . 5-35
5–33 Faulting Virtual Address (VA) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-36
5–34 Formatted Virtual Address (VA_FORM) Register (NT_Mode=1) . . . . . . . . . . . . . 5-37
5–35 Formatted Virtual Address (VA_FORM) Register (NT_Mode=0) . . . . . . . . . . . . . 5-37
5–36 MTU Virtual Page Table Base (MVPTBR) Register . . . . . . . . . . . . . . . . . . . . . . . 5-38
5–37 Dcache Parity Error Status (DC_PERR_STAT) Register. . . . . . . . . . . . . . . . . . . 5-39
5–38 Dstream Translation Buffer Invalidate Single (DTB_IS) Register. . . . . . . . . . . . . 5-41
5–39 MTU Control (MCSR) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-42
5–40 Dcache Mode (DC_MODE) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-44
5–41 Miss Address File Mode (MAF_MODE) Register . . . . . . . . . . . . . . . . . . . . . . . . . 5-46
5–42 Alternate Mode (ALT_MODE) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-49
5–43 Cycle Counter (CC) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-50
5–44 Cycle Counter Control (CC_CTL) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-51
5–45 Dcache Test Tag Control (DC_TEST_CTL) Register. . . . . . . . . . . . . . . . . . . . . . 5-52
5–46 Dcache Test Tag (DC_TEST_TAG) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-54
5–47 Dcache Test Tag Temporary (DC_TEST_TAG_TEMP) Register. . . . . . . . . . . . . 5-56
5–48 CBU Configuration (CBOX_CONFIG) Register . . . . . . . . . . . . . . . . . . . . . . . . . . 5-59
5–49 CBU Address (CBOX_ADDR) Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-62
5–50 CBU Status (CBOX_STATUS) Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-63
5–51 CBU Configuration #2 (CBOX_CONFIG2) Register. . . . . . . . . . . . . . . . . . . . . . . 5-65
6–1 HW_LD Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6–2 HW_ST Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6–3 HW_REI Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
xii
29 September 1997 – Subject to Change
Page 13
6–4 HW_MFPR and HW_MTPR Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
9–1 osc_clk_in_h,l Input Network and Terminations . . . . . . . . . . . . . . . . . . . . . . . . . . 9-6
9–2 Impedance vs Clock Input Frequency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-7
9–3 Input/Output Pin Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-9
9–4 Bcache Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-12
9–5 sys_clk System Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-14
9–6 BiSt Timing Event —Timeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-18
9–7 SROM Load Timing Event—Timeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-19
9–8 Serial ROM Load Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-20
10–1 Heat Sink 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-3
11–1 Package Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-2
11–2 21164PC Top View (Pin Down) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-8
11–3 21164PC Bottom View (Pin Up) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-9
12–1 IEEE 1149.1 Test Access Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-3
12–2 TAP Controller State Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-4
29 September 1997 – Subject to Change
xiii
Page 14

Tables

2–1 Effect of Branching Instructions on the Branch—Prediction Stack. . . . . . . . . . . . 2-6
2–2 Pipeline Examples—All Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2–3 Pipeline Examples—Integer Add . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2–4 Pipeline Examples—Floating Add. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-15
2–5 Pipeline Examples—Load (Dcache Hit) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2–6 Pipeline Examples—Load (Dcache Miss) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-16
2–7 Pipeline Examples—Store (Dcache Hit). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17
2–8 Instruction Classes and Slotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-19
2–9 Instruction Latencies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-24
2–10 Floating-Point Control Register Bit Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . 2-40
3–1 21164PC Signal Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4
3–2 21164PC Signal Descriptions by Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14
4–1 CPU Clock Generation Control. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-7
4–2 System Clock Divisor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8
4–3 System Clock Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10
4–4 Physical Memory Regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11
4–5 Bcache States for Cache Coherency Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 4-14
4–6 Bcache Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-19
4–7 Bcache Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-27
4–8 21164PC-Initiated Interface Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-28
4–9 System-Initiated Interface Commands (Write Invalidate Protocol) . . . . . . . . . . . . 4-40
4–10 21164PC Responses to Flush-Based Protocol Commands. . . . . . . . . . . . . . . . . 4-41
4–11 Interrupt Priority Level Effect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-59
5–1 IDU, MTU, Dcache, and PALtemp IPR Encodings. . . . . . . . . . . . . . . . . . . . . . . . 5-1
5–2 Granularity Hint Bits in ITB_PTE_TEMP Read Format. . . . . . . . . . . . . . . . . . . . . 5-7
5–3 Icache Parity Error Status Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11
5–4 Exception Summary Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13
5–5 IDU Control and Status Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16
5–6 Software Interrupt Request Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21
5–7 Hardware Interrupt Clear Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22
5–8 Interrupt Summary Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23
5–9 Serial Line Transmit Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25
5–10 Serial Line Receive Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26
5–11 Performance Counter Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-28
5–12 PMCTR Counter Select Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29
5–13 Measurement Mode Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-30
5–14 Dstream Memory Management Fault Status Register Fields. . . . . . . . . . . . . . . . 5-35
5–15 Formatted Virtual Address Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-38
5–16 Dcache Parity Error Status Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-40
5–17 MTU Control Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-43
5–18 Dcache Mode Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-45
5–19 Miss Address File Mode Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-47
5–20 Alternate Mode Register Settings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-49
5–21 Cycle Counter Control Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-51
xiv
29 September 1997 – Subject to Change
Page 15
5–22 Dcache Test Tag Control Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-52
5–23 Dcache Test Tag Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-55
5–24 Dcache Test Tag Temporary Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-57
5–25 CBU Internal Processor Register Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . 5-58
5–26 CBU Configuration Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-59
5–27 CBU Address Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-62
5–28 CBU Status Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-63
5–29 CBU Configuration #2 Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-65
5–30 CBU IPR PALcode Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-68
5–31 PALcode Restrictions Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-69
6–1 PALcode Trap Entry Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-6
6–2 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7
6–3 Opcodes Reserved for PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8
6–4 HW_LD Format Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9
6–5 HW_ST Format Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10
6–6 HW_REI Format Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11
6–7 HW_MFPR and HW_MTPR Format Description . . . . . . . . . . . . . . . . . . . . . . . . . 6-12
7–1 21164PC Signal Pin Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-3
7–2 Internal Processor Register Reset State. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-10
9–1 21164PC Absolute Maximum Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1
9–2 Operating Voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-2
9–3 CMOS DC Input/Output Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-3
9–4 Input Clock Specification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-8
9–5 Bcache Loop Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-10
9–6 Normal Output Driver Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
9–7 Big Output Driver Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-11
9–8 21164PC System Clock Output Timing (sysclk=T
) . . . . . . . . . . . . . . . . . . . . . . . 9-13
ø
9–9 Input Timing for sys_clk_out-Based Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-15
9–10 Output Timing for sys_clk_out-Based Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 9-16
9–11 Bcache Control Signal Timing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-17
9–12 BiSt Timing for Some System Clock Ratios, Port Mode=Normal (System Cycles) 9-18 9–13 BiSt Timing for Some System Clock Ratios, Port Mode=Normal (CPU Cycles). . 9-18
9–14 SROM Load Timing for Some System Clock Ratios (System Cycles) . . . . . . . . . 9-19
9–15 SROM Load Timing for Some System Clock Ratios (CPU Cycles) . . . . . . . . . . . 9-19
9–16 Clock Test Modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
9–17 IEEE 1149.1 Circuit Performance Specifications . . . . . . . . . . . . . . . . . . . . . . . . . 9-21
a
10–1 Θ 10–2 Maximu m T
at Various Airflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-1
c
at Various Airflows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-2
a
11–1 Alphabetic Signal Pin List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-3
11–2 Voltage Reference, Power, and Ground Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-7
12–1 21164PC Test Port Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-1
12–2 Compliance Enable Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2
12–3 Instruction Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-5
12–4 Boundary-Scan Register Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-7
A–1 Instruction Format and Opcode Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1
A–2 Architecture Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2
A–3 Opcodes Reserved for DIGITAL. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-9
29 September 1997 – Subject to Change
xv
Page 16
A–4 Opcodes Reserved for PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A–5 IEEE Floating-Point Instruction Function Codes. . . . . . . . . . . . . . . . . . . . . . . . . . A-10
A–6 VAX Floating-Point Instruction Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . A-12
A–7 Opcode Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-14
A–8 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-15
B–1 21164PC Microprocessor Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-1
D–1 Document Revision History. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D-1
xvi
29 September 1997 – Subject to Change
Page 17
This manual provides information about the architecture, internal design, external interface, and speci f ica ti ons of the Digital Semiconduct or Al pha 21164PC micropro­cessor (referred to as the 21164PC) and its associated software.
Audience
This reference manual is for system designers and programmers who use the 21164PC.
Manual Organization
This manual includes the following chapters and appendixes, and an index.
Chapter 1, Introduction, introduces the 21164PC and provides an overview of
the Alpha architecture.
Chapter 2, Internal Architec ture, describes the major hardware funct ions and the
internal chip architecture. It describes performance measurement facilities, cod­ing rules, and design examples.
Chapter 3, H ardware Interface, lists and describes the external hard ware inter-
face signals.

Preface

Chapter 4, Clo cks, Cache, and External Interface, describes the e xternal bus
functions and transactions, lists bus commands, and describes the clock func­tions.
Chapter 5, Internal Pro cessor Reg isters, lists and de scribes the 21164PC internal
processor register set.
Chapter 6, Privileged Architecture Library Code, describes the privileged archi-
tecture library code (PALcode).
29 September 1997 – Subject To Change
xvii
Page 18
Chapter 7, Initialization and Configuration, describes the initialization and con-
figuration sequence.
Chapter 8, Error Detection and Error Handling, describes error detection and
error handling.
Chapter 9, Electri cal Data, p rovide s electr ical dat a and descr ibes sign al int egrity
issues.
Chapter 10, Thermal Management, pr ovides infor mation abou t ther mal manage -
ment.
Chapter 11, Mechanical Packaging Information, provides mechanical data and
packaging information, including signal pin lists.
Chapter 12, Testability and Diagn ostics, describes chip and system t estability
features.
Appendix A, Alpha Instruction Set, summarizes the Alpha instruction set.
Appendix B, 21164PC Microprocessor Specifications, summarizes the
21164PC specifications.
Appendix C, Serial Icache Load Predecode Values, provides a C code example
that calculates the predecode values of a serial Icache load.
Appendix D, Errata Sheet, lists changes and revisions to this manual.
xviii
Appendix E, Support, Products, and Documentation, provides phone numbers
for support and lists rela ted DIGITAL and third-party publications with order information .
The Glossary lists and defines terms associated with the 21164PC.
The companion volume to this manual, the Alpha AXP Architecture Reference Man­ual, contains the Alpha architecture information.
29 September 1997 – Subject To Change
Page 19
Conventions
This section defines product-specific terminology, abbreviations, and other conven­tions used throughout this manual.
Abbreviations
Binary Multiples
The abbreviations K, M, and G (kilo, mega , and giga ) repr esent b inary mul tipl es and have the following values.
K M G
10
=2
20
=2
30
=2
(1024) (1,048,576) (1,073,741,824)
For example:
2KB = 2 kilobytes 4MB = 4 megabytes 8GB = 8 gigabyte s
Register Access
=2 × 2 =4 × 2 =8 × 2
10 20 30
bytes bytes bytes
The abbreviations used to indicate the type of access to register fields and bits have the following definit io ns:
IGN — Ignore Register bits specified as IGN are ignored when written and are UNPRE-
DICTABLE when read if not otherwise specified. MBZ — Must Be Zero Software must never place a nonzero value in bits and fields specified as
MBZ. Reads return unpredictable values. Such fields are reser ved for future use.
RAO — Read As One Register bits specified as RAO return a 1 when read. RAZ — Read As Zero Register bits specified as RAZ return a 0 when read.
29 September 1997 – Subject To Change
xix
Page 20
RC — Read To Clear A register field specifie d as RC is writte n by hardware and remains
unchanged until read. The value may be read by software, at which point, hardware may write a new value into the field.
RES — Reserved Bits and fields specified as RES are reserved by Digital Semiconductor and
should not be used; however, zeros can be written to r eserve d field s that can ­not be masked.
RO — Read Only Bits and fields specified as RO can be read and are ignored (not written) on
writes. RW — Read/Write Bits and fields specified as RW can be read and written. W0C — Write Zero to Clear Bits and fields s pecif ied as W0C can be rea d. Writing a zero clears these bits
for the duration of the write; writing a one has no effect. W1C — Write One to Clear Bits and fields specifi ed as W1C ca n be read. Writ i ng a one cl ea rs thes e bits
for the duration of the write; writing a zero has no effect.
xx
WO — Write Only Bits and fields specified as WO can be written but not read.
Addresses
Unless otherwise noted, all addresses and offsets are hexadecimal.
Aligned and Unaligned
The terms aligned and naturally align ed are interchangeable and refer to data objects
n
that are powers of two in size. An aligned datum of size 2
n
byte address that is a multiple of 2
; that is, one that has n low-order zeros. For ex-
is stored in memory at a
ample, an aligned 64-byte stack frame has a memory address that is a multiple of 64. A datum of size 2
n
.
of 2
n
is unaligned if it is stored in a byte address that is not a multiple
29 September 1997 – Subject To Change
Page 21
Bit Notation
Multiple-bit f ields can i nclud e cont igu ous and noncon ti guous b its c ontai ned in an gle brackets (<>). Multiple contiguous bits are indicated by a pair of numbers separated by a colon (:). For example, <9:7,5,2:0> specifies bits 9,8,7,5,2,1, and 0. Similarly, single bits ar e f re quently indicated with angle brackets. For example, <27 > s pec ifies bit 27.
Caution
Cautions indicate potential damage to equipment or loss of data.
Data Units
The following data-unit terminology is used throughout this manual.
Term Words Bytes Bits Other
Byte ½18 Word 1 2 16 — Dword 2 4 32 Longword Quadword 4 8 64 2 Dwords
External
Unless otherwise stated, external means not contained in the 21164PC.
Numbering
All numbers are decimal or hexadecimal unless otherwise indicated. The prefix 0x indicates a hexadecimal number. For example, 19 is decimal, but 0x19 and 0x19A are hexadecimal (also see Addresses). Otherwise, the base is indicated by a sub­script; for example, 100
Ranges and Extents
is a binary number.
2
Ranges are specified by a pair o f numb ers se parat ed by t wo per iods ( ..) and are inclu ­sive. For example, a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4.
Extents are specified by a pair of numbers in angle brackets (<>) separated by a colon (:) and are i nclus ive. Bit fi elds a re oft en speci fi ed as e xtents . For examp le, bit s <7:3> specifies bits 7, 6, 5, 4, and 3.
29 September 1997 – Subject To Change
xxi
Page 22
Security Holes
Security holes exist when unpr ivil eged sof tware ( that i s, soft ware tha t is run ning out ­side of kernel mode) can:
Affect the o per ation of another pr oc ess wi thout authorizatio n f ro m th e operating
system.
Amplify its privilege without authorization from the operating system.
Communicate with another process, either overtly or covertly, without authori-
zation from the operating system.
Signal Names
Signal names are printed in lowercase, boldface type. Low-asserted signals are indi­cated by the _l suffix, while high-asserted signals have the _h suffix. For example, osc_clk_in_h is a high-asserted signal, and osc_clk_in_l is a low-asserted signal.
Unpredictable and Undefined
Throughout this manual, the te rms UNPREDICTABLE and UNDEFINED are used. Their meanings are quite different and must be carefully distinguished.
In particular, only privileged software (that is, software running in kernel mode) can trigger UNDEFINED operations. Unprivileged software cannot trigger UNDE­FINED operations. However, either privileged or unprivileged software can trigger UNPREDICTABLE results or occurrences.
xxii
UNPREDICTABLE results or occurrences do not disrupt the basic operation of the processor. The processor continues to execute instructions in its normal manner. In contrast, UNDEFINED operations can halt the processor or cause it to lose informa­tion.
The terms UNPREDICTABLE and UNDEFINED can be further described as fol­lows:
Unpredictable
Results or occurrence s s pec if ie d a s UNPREDI CTABLE may vary from moment
to moment, implementation to implementation, and instruction to instructio n within implementations. Software can never depend on results specified as UNPREDICTABLE.
29 September 1997 – Subject To Change
Page 23
An UNPREDICTABLE result may acquire an arbitrary value subject to a few
constraints. Such a result may be a n arbitrar y functi on of t he input operands or of any state informati on that is accessible to the proc ess in its current access mode. UNPREDICTABLE results may be unchanged from their previous values.
Operations that produ ce UNPREDICTABLE results may also produce excep­tions.
An occurrence specified as UNPREDICTABLE may happen or not based on an
arbitrary choice function. The choice function is subject to the same constraints as are UNPREDICTABLE results and, in particular, must not constitute a secu­rity hole.
Specifically, UNPREDICTABLE r es ult s must not depend upon, or be a functio n of the contents of memory locations or registers that are inaccessible to the cur­rent process in the current access mode.
Also, operations that may produce UNPREDICTABLE results must not:
Write or modi fy the contents of memory locations or registers to which the
current process in the current access mode does not have access.
Halt or hang the system or any of its components.
For example, a security hole would exist if some UNPREDICTABLE result depended on the value of a register in another process, on the contents of processor temporary registers left behind by some previously running process, or on a sequence of actions of different processes.
Undefined
Operations specified as UNDEFINED may vary from moment to moment,
implementation to impleme ntation, and instruction to instructio n within imple ­mentations. The operation may vary in effect from nothing, to stopping system operation.
UNDEFINED operations may halt the processor or cause it to lose information.
However, UNDEFINED operat ions must not cause the pro cessor t o hang, th at is, reach an unhalted state from which there is no transition to a normal state in which the machine exec utes instructions. Only privileged software (that is, so ft­ware running in kernel mode) may trigger UNDEFINED operations.
29 September 1997 – Subject To Change
xxiii
Page 24
Page 25
This chapter provides a brief introduction to the Alpha architecture, Digital Equipment Corporation’s RISC (reduced instruction set computing) architecture
designed for high perfo rmance. Th e c hapter then s ummari zes t he spec ific fe ature s of the Digital Semiconductor Alpha 21164PC microprocessor (hereafter called the 21164PC) that implements the Alpha architecture. Appendix A provides a list of Alpha instructions.
For a complete definition of the Alpha architecture, refer to the companion volume, the Alpha AXP Architecture Reference Manual.

1.1 The Architecture

The Alpha architecture is a 64-bit load and store RISC architecture designed with particular emphasis on speed, multiple instruction issue, multiple processors, and software migration from many operating systems.
All registers are 64 bits long and all operations are performed between 64-bit regis­ters. All instructions are 32 bits long. Memory operations are either load or store operations. All data manipulation is done between registers.
1

Introduction

The Alpha architecture supports the following data types:
8-, 16-, 32-, and 64-bit integers
IEEE 32-bit and 64-bit floating-point formats
VAX architecture 32-bit and 64-bit floating-point formats
In the Alpha architecture, instructions interact with each other only by one instruc­tion writing to a register or memory location and another instruction reading from that register or memory location. This use of resources makes it easy to build imple­mentations that issue multiple instructions every CPU cycle.
29 September 1997 – Subject To Change
Introduction 1–1
Page 26
The Architecture
The 21164PC uses a set of subroutines, called privileged architecture library code (PALcode), that is specific to a particular Alpha operating system implementation and hardware platform. These subroutines provide operating system primitives for context switching, interrupts, exceptions, and memory management. These subrou­tines can be invoked by hardware or CALL_PAL instructions. CALL_PAL instruc­tions use the function field of the instruction to vector to a specified subroutine. PALcode is written in standard machine code with some implementation-specific extensions to provide direct access to low-level hardware functions. PALcode sup­ports optimiza tions for multip le operating sy stems, flexible memory-management implementations, and multi-instruction atomic sequences.
The Alpha architecture performs byte shifting and masking with normal 64-bit, reg­ister-to-register instructions; it does not include single-byte load and store instruc­tions.

1.1.1 Addressing

The basic addressable unit in the Alpha architecture is the 8-bit byte. The 21164PC supports a 43-bit virtual address.
Virtual addresses as seen by the program are translated into physical memory addresses by the memory-management mechanism. The 21164PC supports a 40-bit uncached and a 33-bit cached physical address space.
1–2 Introduction
29 September 1997 – Subject To Change
Page 27

1.1.2 Integer Data Types

Alpha architecture supports four integer data types.
Data Type Description
Byte A byte is eight contiguous bits that start at an addressable b yte boundary. A
byte is an 8-bit value. A byte is supported in Alpha architecture by the EXTRACT, INSERT, LDBU, MASK, SEXTB, STB, ZAP, PACK, UNPACK, MIN, MAX, and PERR instructions.
Word A word is two cont igu ous by te s th at s t art at an ar bi trary byt e b oun dary. A
word is a 16-bit value. A word is supported in Alpha architecture by the EXTRACT, INSERT, LDWU, MASK, SEXTW, STW, PACK, UNPACK, MIN, and MAX instructions.
Longword A longword is four contiguous bytes that start at an arbitrary byte bound-
ary. A longword is a 32-bit value. A longword is supported in Alpha archi­tecture by sign-extended load and store instructions and by longword arithmetic instructions.
Quadword A quadword is eight contiguous bytes that start at an arbitrary byte bound-
ary. A quadword is supported in Alpha architecture by load and store instructions and quadword integer operate instructions.
The Architecture
Note: Alpha implementations may impose a significant performance penalty
when accessing operands that are not NATURALLY ALIGNED. Refer to the Alpha AXP Architecture Reference Manual for details.

1.1.3 Floating-Point Data Types

The 21164PC supports the following floating-point data types:
Longword integer format in floating-point unit
Quadword integer format in floating-point unit
IEEE floating-point formats
–S_floating –T_floating
29 September 1997 – Subject To Change
Introduction 1–3
Page 28

21164PC Microprocessor Features

VAX floating-point formats
–F_floating – G_floating – D_floating (limited support)
1.2 21164PC Microprocessor Features
The 21164PC is a superscalar pipelined processor manufactured using 0.35-µm CMOS technology. It is packaged in a 413-pin IPGA carrier and has removable application-specific heat sinks. The 21164PC has been optimized for uniprocessor systems with very high cache and memory bandwidth. The 21164PC supports the new motion video instructions (MVI) added to the Alpha instruction set.
The 21164PC ca n issue four Alpha instruct ions in a si ngle cycle , thereby mini mizing the average cycles per instruction (CPI). A number of low-latency and/or high­throughput features in the instruction issue unit and the onchip components of the memory subsystem further reduce the average CPI.
The 21164PC and associated PALcode implements IEEE single-precision and dou­ble-precision, VAX F_floating and G_floating data types, and supports longword (32-bit) and quadword (64-bit) integers. Byte (8-bit) and word (16-bit) support is provided by byte-manipulation instructions. Limited hardware support is provided for the VAX D_floating data type.
Other 21164PC features include:
A peak instruction execution rate of four times the CPU clock frequency.
The ability to issue up to four instructions during each clock cycle.
An onchip, demand-paged memory-management unit with translation buffer,
which, when used with PALcode, can implement a variety of page table struc­tures and translation algorithms. The unit consists of a 64-entry data translation buffer (DTB) and a 48-entry instruction translation buffer (ITB), with each entry able to map a single 8KB page or a group of 8, 64, or 51 2 8KB page s. The size of
each translation buffer entry’s group is specified by hint bits stored in the entry. The DTB and ITB implement 7-bit address space numbers (ASN), (MAX_ASN=127).
Two onchip, high-throughput pipelined floating-point units, capable of execut-
ing both DIGITAL and IEEE floating-point data types.
An onchip, 16KB virtual instruction cache with 7-bit ASNs (MAX_ASN=127).
1–4 Introduction
29 September 1997 – Subject To Change
Page 29
21164PC Microprocessor Features
An onchip, dual-read-ported, 8KB data cache.
An onchip write buffer with six 32-byte entries.
A 128-bit data bus with onchip parity and offchip longword parity.
Support for an external second-level cache. The size and access time of the
external second-level cache is programmable.
An internal clock generator providing a high-speed clock used by the 21164PC,
and a pair of programmable system clocks for use by the CPU module.
Onchip performance counters to measure and analyze CPU and system perfor-
mance.
Chip and module level test support, including an instruction cache test interface
to support chip and module level testing.
A 3.3-V exte rnal interface and 2.5-V internal interface.
Refer to Chapter 9 for 21164PC dc and ac electrical characteristics. Refer to the Alpha AXP Architecture Reference Manual for a description of address space num­bers (ASNs).
29 September 1997 – Subject To Change
Introduction 1–5
Page 30
Page 31
2

Internal Architecture

This chapter provides both an overview of the 21164PC microarchitecture and a sys-
tem designer’s view of the 21164PC implementation of the Alpha architecture. The combination of the 21164PC microarchitecture and privileged architecture library code (PALcode) defines the chip’s implementation of the Alpha architecture. If a certain piece of ha rdware seems t o be “ar chite ctura lly inco mplet e,” th e missi ng func ­tionality is implemented in PALcode. Chapter 6 provides more information on PAL­code.
This chapter describes the major functional hardware units and is not intended to be a detailed hardware description of the chip. It is organized as follows:
21164PC microarchitecture
Pipeline organization
Scheduling and issuing rules
Replay traps
Miss address file (MAF) and load-merging rules
MTU store instruction execution
Write buffer and the WMB instruction
Performance measurement support
Floating-point control register
Design examples
29 September 1997 – Subject To Change
Internal Architecture 2–1
Page 32

21164PC Microarchitecture

2.1 21164PC Microarchitecture
The 21164PC microprocessor is a high-performance implementation of Digital Equipment Corporation’s Alpha architecture. Figure 2–1 is a block diagram of the
21164PC that shows the major f unctional blocks rela tive to pipeline stage flow. The following paragraphs provide an overview of the chip’s architecture and major func­tional units.
Figure 2–1 21164PC Microprocessor Block/Pipe Flow Diagram
Pipe Stages
S-1
Istream
Refill
Fill
Buffer
Next Index Logic
Instruction Fetch/Decode Unit
S0 S1 S2 S3 S4 S5 S6 S7 S8
Floating-Point Execution Unit
Floating-Point
Data Cache (Dcache)
8KB 32-Byte Block Direct-Mapped Dual Read-Ported
Dual-Read
Translation Buffer
Store Data
Divider
ADD, LOG, SHIFT, LD, ST, IMUL, CMP, SEXT, CMOV, BYTE, WORD
ADD, LOG, LD, BR, CMP, CMOV
Miss
Address
File
6 Data Misses 4 Istream
Misses
Write Buffer
6, 32-Byte Entries
Instruction
Cache
16KB 64-Byte Block Direct-Mapped
Program Counter
Logic
Instruction
Buffer
0
1
Instruction Translation
Buffer
48-Entry Associative
Load Data
Instruction
Slot
Logic
Integer Register
File
Issue
Scoreboard
Logic
Store and Fill Data
Instruction Stream Miss (Physical Address)
Floating-
Point
Register
File
Integer
Multiplier
Integer Pipe 0
Integer Pipe 1
64-Entry Associative Dual-Ported
Translation Unit Bus Interface Unit
Floating-Point Add Pipe and Divider
Floating-Point Multiply Pipe
Floating-Point Store Data
Integer Unit Store Data
Integer Execution Unit
To Floating-Point Unit
Address to Pins
Bus Address
3 Entries
Cache Control andMemory Address
Backup Cache (Bcache)
512KB to 4MB Direct-Mapped
File
MK145513B
(Offchip)
The 21164PC microprocessor consists of the following internal sections:
Clock generation logic (Section 4.2)
Instruction fetch/decode unit and branch unit (IDU) (Section 2.1.1), which
includes:
Instruction prefetcher and instruction decoder
2–2 Internal Architecture
29 September 1997 – Subject To Change
Page 33
21164PC Microarchitecture
Instruction translation buffer – Branch prediction – Instruction slotting/issue – Interrupt support
Integer execution unit (IEU) (Section 2.1.2)
Floating-point execution unit (FPU) (Section 2.1.3)
Memory address translation unit (MTU) (Section 2.1.4), which includes:
Data translation buffer (DTB) – Miss address file (MAF) – Write buffer – Dcache control
Cache control and bus interface unit (CBU) with interface to external cache
(Section 2.1.5)
Data cache (Dcache) (Section 2.1.6.1)
Instruction cache (Icache) (Section 2.1.6.2)
Serial read-only memory (SROM) interface (Section 2.1.7)
2.1.1 Instruction Fetch/Decode Unit and Branch Unit
The primary function of the i nstru ction fetch /decode unit and bran ch unit (IDU) i s to manage and issue instructions to the IEU, MTU, and FEU. It also manages the instruction cache. The IDU contains:
Prefetcher and instruction buffer
Instruction slot and issue logic
Program counter (PC) and branch prediction logic
48-entry instruction translation buffers (ITBs)
Abort logic
Register conflict logic
Interrupt and exception logic
29 September 1997 – Subject To Change
Internal Architecture 2–3
Page 34
21164PC Microarchitecture
2.1.1.1 Instruction Decode and Issue
The IDU decodes up to four instructions in parallel and checks that the required resources are available for each instruction. The IDU issues only the instructions for which all requi red resou rces are avai lable. The IDU does not iss ue in structi ons out of order, even if the resources are available for a later instruction and not for an earlier one.
In other words:
If resources are available, and multiple issue is possible, then all four instruc-
tions are issued.
If resources are available only for a later instruction and not for an earlier one,
then only the instructions up to the latest one for which resources are available are issued.
The IDU handles only NATURALLY ALIGNED groups of four instructions (INT16). The IDU does not advance to a new group of four instructions until all instructions in a group are issued. If a branch to the middle of an INT16 group occurs, then the IDU attempts to issue the instructions from the branch target to the end of the current INT16; the IDU then proceeds to the next INT16 of instructions after all the instructions in the target INT16 are issued. Thus, achieving maximum issue rate and optimal performance requires that code be be scheduled properly and that floating or integer NOP instructions be used to fill empty slots in the scheduled instruction stream.
For more informati on on instruction scheduling and issuing, includ ing detailed rules governing mu ltiple instruction issue, refer to Section 2.3.
2.1.1.2 Instruction Prefetch
The IDU contains an instruction prefetcher and a four-entry, 32-byte-per-entry, prefetch buffer called the refill buffer. Each instruction cache (Icache) miss is checked in the refill buffer. If the refill buffer contains the instruction data, it fills the Icache and instruction buffer simultaneously. If the refill buffer does not contain the necessary data, a fet ch and a numbe r of pref etches ar e sent to the MTU. One pr efetch is sent per cycle until each of the four entries in the refill buffer is filled or has a pending fill. The refill buffer holds all returned fill data until the data is required by the IDU pipeline or until it is overwritten by a subsequent fetch/prefetch sequence caused by a future Icache miss.
2–4 Internal Architecture
29 September 1997 – Subject To Change
Page 35
21164PC Microarchitecture
Prefetching does n ot begin until ther e is a “true” miss. A tr ue miss is a refere nce t hat misses in the Icach e and the n al so misse s in th e refi ll buf fer. If an Icache miss results in a refill buffer hit , pref et chi ng is not sta rt ed unt il all the dat a has been moved from the refill buffer entry into th e pipeline.
Each fill of the Icache by the refill buffer occurs when the instruction buffer stage in the IDU pipeline re quires a new I NT16. The INT16 is wr itte n into the I cache and the instruction buffer simultaneously. This can occur at a maximum rate of one Icache fill per cycle. The actual rate depends on how frequently the instruction buffer stage requires a new INT16, and on availability of data in the refill buffer.
Once an Icache miss occurs, the Icache enters fill mode. When the Icache is in fill mode, the refill buffer is checked each cycle to see if it contains the next INT16 required by the instruction buffer.
When the required data is not available in the refill buffer (also a miss), the Icache is checked for a hit while it awaits the arrival of the data from the Bcache or main memory . The IDU sen ds a read re quest to the CBU by means of the MTU. The CBU checks the Bcache, and if the request misses, the CBU drives a main memory request.
If there is an Icache hit at this time, the Icache returns to access mode and the prefetcher stops sending fetches to the MTU. When a new program counter (PC) is loaded (that i s, taken b ranches), the I cache re turns to acces s mode unti l the fi rst miss. The refill buffer receives and holds instr uction data fr om fetches initiated befo re the Icache returned to acces s mode.
The Icache has a 64-byte block size, whereas the refill buffer is able to load the Icache with only one INT16 (16 bytes) per cycle. Therefore, each Icache block has four valid bits, one for each 16-byte subblock.
2.1.1.3 Branch Execution
When a branch or jump instruction is fetched from the Icache by the prefetcher, the IDU needs one cycle to calculate the target PC before it is ready to fetch the target instruction stream. In the second cycle after the fetch, the Icache is accessed at the target address . Bra nch and PC prediction are necessary to predict and beg in f et chi ng the target instruction stream before the branch or jump instruction is issued.
The Icache records the outcome of branch instructions in a 2048-entry, 2-bit per entry branch history table. The table is indexed by the instruction’s virtual address bits <13:03>. This information is used as t h e pre diction for the nex t e xecu tion of the branch instruction. The 2-bit history state is a saturating counter that increments on taken branches and dec rement s on not -take n branc hes. The branch is predi cted take n
29 September 1997 – Subject To Change
Internal Architecture 2–5
Page 36
21164PC Microarchitecture
on the top two count values and is predicted not-taken on the bottom two count val-
ues. The history stat us is not ini tial ized on Icac he fil l, th erefor e it may “remembe r” a branch that was evicted from the Icache and subsequently reloaded.
The 21164PC does not limit the number of branch predictions outstanding to one. It predicts branches even while waiting to confirm the prediction of previously pre­dicted branches. There can be one branch prediction pending for each of pipeline stages 3 and 4, plus up to four in pipeline stage 2. Refer to Section 2.2 for a descrip­tion of pipeline stages.
When a predicted branch is issued, the IEU or FEU checks the prediction. The branch history table is updated accordingly. On branch mispredict, a mispredict trap occurs and the IDU restarts execution from the correct PC.
The 21164PC provides a 12-entry subroutine return stack that is controlled by decoding the opcode (BSR, HW_REI, and JMP/JSR/RET/JSR_COROUTINE), and DISP<15:14> in JMP/JSR/RET/JSR_COROUTINE. The stack stores an Icache index in each entry. The stack is implemented as a circular queue that wraps around in the overflow and underflow cases.
Table 2–1 lists the effect each of these ins tr uct ions has on the sta te of the branch-pre­diction stack.
Table 2–1 Effect of Branching Instructions on the Branch—Prediction Stack
Instruction
BSR, JSR No Push PC+4 RET Yes Pop JMP, BR, BRxx No No eff ect JSR_COROUTINE Yes Pop, then push PC+4 PAL entry No Push PC+4 HW_REI Yes Pop
The 21164PC uses the Icache index hint in the JMP and JSR instructions to predict the target PC. The Ic ache index hint in the inst ruc ti on’s displacement field is used to access the direct-mapped Icache. The upper bits of the PC are formed from the data in the Icache tag store at that index. Later in the pipeline, the PC prediction is checked against the actual PC generated by the IEU. A mismatch causes a PC mispredict trap and restart from the correct PC. This is similar to branch prediction.
2–6 Internal Architecture
Stack Used for Prediction? Effect on Stack
29 September 1997 – Subject To Change
Page 37
The RET, JSR_COROUTINE, and HW_REI instructions predict the next PC by using the index fro m the subr outine re turn stac k. The uppe r bits of the PC are for med from the data in the Icache tag at that index. These predictions are checked against the actual PC in exactly the same way that JMP and JSR predictions are checked.
The branch-prediction stack never predicts a target address in PALmode. This pre­vents the possibility of nonprivileged code accessing privileged modes through incorrect stack predictions (for example, by underflow/overflow of the stack). This implies that PALcode libraries should avoid using instructions such as RET and JSR_COROUTINE for internal jumps with PALmode targets, as the 21164PC will always mispredict the target address.
2.1.1.4 Instruction Translation Buffer
The IDU includes a 48-entry, fully associative instruction translation buffer (ITB). The buffer stores recently used Istream address translations and protection informa­tion for pages ranging from 8KB to 4MB and uses a not-last-used replacement algo­rithm.
PALcode fills and maintains the ITB. Each entry suppor ts all fou r granular ity hint bi t combinations, so that any singl e ITB ent ry ca n provi de tran sl at ion for up to 512 con­tiguously mapped 8KB pages. The operating system, using PALcode, must ensure that virtual addresses can only be mapped through a single ITB entry or superpage mapping at one time. Multiple simultaneous mapping can cause UNDEFINED results.
21164PC Microarchitecture
While not executing in PALmode, the 43-bit virtual PC is routed to the ITB each cycle. If the page table entry (PTE) associated with the PC is cached in the ITB, the protection bits for the page that contains the PC are used by the IDU to do the neces­sary access checks. If there is an Icache miss and the PC is cached in the ITB, the page frame number (PFN) and protection bits for the page that contains the PC are used by the IDU to do the address translation and access checks.
The 21164 PC’ s ITB support s 128 addre ss space n umbers (AS Ns) (MAX_ASN=127) by means of a 7-bit ASN field in each ITB entry. PALcode uses the hardware-spe­cific HW_MTPR instruction to write to the architecturally defined ITB_IAP register. This has the effect of invalidating ITB entries that do not have their ASM bit set.
The 21164PC provides two optional translation extensions called superpages. Access to superpages is enabled using ICSR<SPE> and is allowed only while exe­cuting in privileged mode.
29 September 1997 – Subject To Change
Internal Architecture 2–7
Page 38
21164PC Microarchitecture
One superpage maps virtual address bits <39:13> to physical address bits
<39:13>, on a one-to-one basis, when virtual address bits <42:41> equal 2. This maps the entire physical address space four times over to the quadrant of the vir­tual address space.
The other superpage maps virtual address bits <29:13> to physical address bits
<29:13>, on a one-to-one basis, and forces physical address bits <39:30> to 0 when virtual address bits <42:30> equal 1FFE region of physical address space to a single region of the virtual address space defined by virtual address bits <42:30> = 1FFE
Access to either su perpag e mapping is al lo wed only while execut ing in kerne l mode. Superpage mapping allows the operating system to map all physical memory to a privileged virtual memory region.
2.1.1.5 Interrupts
The IDU exception logic supports three sources of interrupts:
Hardware interrupts
There are 7 level-sensitive hardware interrupt sources supplied by the following signals:
. This effectively maps a 30-bit
16
.
16
irq_h<3:0> mch_hlt_irq_h pwr_fail_irq_h sys_mch_chk_irq_h
Software interrupts
There are 15 p rioritized software interrupts sourced by the software interru pt request register (SIRR) (see Section 5.1.22).
Asynchronous system traps (ASTs)
There are 4 ASTs sourced by the asynchronous system trap request (ASTRR) register.
The serial interrupt, the performance counter interrupts, and irq_h<3:0> are all maskable by bits in the ICSR (see Sec ti on 5.1.17). The four AST traps are maskable by bits in the ASTER (see Section 5.1.21). In addition, the AST traps are qualified by the current processor mode. All interrupts are disabled when the processor is exe­cuting PALcode.
2–8 Internal Architecture
29 September 1997 – Subject To Change
Page 39
Each interrupt source, or group of sources, is assigned an interrupt priority level
(IPL), as shown in Table 4–11. The current IPL is set using the IPLR register (see Section 5.1.18). Any i nterrupt s that have a n equal or lower IPL a re masked. When an interrupt occurs that has an IPL greater than the value in the IPLR register, program control passes to the INTERRUPT PALcode entry point. PALcode processes the interrupt by reading the ISR (see Section 5.1.24) and the INTID register (see Section 5.1.19).

2.1.2 Integer Execution Unit

The integer execut ion unit (IEU) contains t w o 64- bit integer execution pip eli nes, E0 and E1, which include the following:
Two adders
Two logic boxes
A barrel shifter
Byte-manipu lation logic
An integer multiplier
21164PC Microarchitecture
A motion video instruction unit
The IEU also includes the 40- entry, 64-bit integer register file (I RF) that conta ins the 32 integer registers defined by the Alpha architecture and 8 PAL shadow registers. The register fil e has four re ad ports and t wo write po rts that provide oper ands to both integer execution pipelines and accept results from both pipes. The register file also accepts load instruction results (memory data) on the same two write ports.

2.1.3 Floating-Point Execution Unit

The onchip, pipelined floating-point unit (FPU) can execute both IEEE and VAX floating-point instructions. The 21164PC supports IEEE S_floating and T_floating data types, and all rounding modes. It also supports VAX F_floating and G_floating data types, and provides limited support for the D_floating format. The FPU con­tains:
A 32-entry, 64-bit floating-point register file
A user-accessible control register
A floating-point multiply pipeline
A floating-point add pipeline
29 September 1997 – Subject To Change
Internal Architecture 2–9
Page 40
21164PC Microarchitecture
The floating-point divide unit is associated with the floating-point add pipeline but is not pipelined.
The FPU can accept two instructions every cycle, with the exception of floating­point divide inst ructions . The res ult lat ency for non divide, f loating- point inst ructions is four cycles.
The floating-poi nt regist er file (FRF) has fi ve read por ts and four write port s. Four of the read ports are used by the two pipelines to source operands. The remaining read port is used by floating-point stores. Two of the write ports are used to write results from the two pipelines. The other two write ports are used to write fills from float­ing-point loads.

2.1.4 Memory Address Translation Unit

The memory address translation unit (MTU) contains three major sections:
Data translation buffer (dual ported)
Miss address file
Write buffer address file
The MTU receives up to two virtual addresses every cycle from the IEU. The trans­lation buffer generates the corresponding physical addresses and access control information for each virtual address. The 21164PC implements a 43-bit virtual address, a 40-bit noncacheable physical address, and a 33-bit cacheable physical address. Cacheable addresses consist of bits <32:0> when bit <39> = 0. Physical addresses that set bits <38:33> are not supported by the 21164PC. These addresses are not checked by the 21164PC and could result in erroneous data.
2.1.4.1 Data Translation Buffer
The 64-entry, fully associative, dual-read-ported data translation buffe r (DTB) stores recently used data stream (Dstream) page table entries (PTEs). Each entry supports all four granularity hint-bit combinations, so that a single DTB entry can provide translation for up to 512 contiguously mapped, 8KB pages. The translation buffer uses a not-last-used replacement algorithm.
For load and store instructions, and other MTU instructions requiring address trans­lation, the eff ective 43-bit virtual address is presented to the DTB. If the PTE of the supplied virtual address is cached in the DTB, the page frame number (PFN) and protection bits for the page that contains the address are used by the MTU to com­plete the address translation and access checks.
2–10 Internal Architecture
29 September 1997 – Subject To Change
Page 41
The DTB also supports the optional superpage extensions that are enabled using ICSR<SPE>. The DTB superpage maps provide virtual-to-physical address transla­tion for two regions of the virtual address space, as described in Section 2.1.1.4.
PALcode fills and maintains the DTB. The operating system, using PALcode, must ensure that virtual addr esses be mapped eith er through a single DTB entry or through superpage mapping. Multiple simultaneous mapping can cause UNDEFINED results. The only exce ption to this rule is tha t a ny gi ve n vi rt ual page may be mapped twice with identical data in two different DTB entries. This occurs in operating sys­tems, such as OpenVMS, which utilize virtuall y access ib le page ta bles. If the level 1 page table is accessed virtually, PALcode loads the translation information twice; once in the double-miss hand ler, and once in the primary handler. The PTE mapping the level 1 page table must remain consta nt during acc esses to this p age to meet this requirement.
2.1.4.2 Load Instruction and the Miss Address File
The MTU begins the execution of each load instruction by translating the virtual address and by accessing the data cache (Dcache). Translation and Dcache tag read operations occur in parallel. If the addressed location is found in the Dcache (a hit), then the data from the Dcache is formatted and written to either the integer register file (IRF) or floating-point register file (FRF). The formatting required depends on the particu l ar load inst ruction executed. If the data is not found in the Dcache (a miss), then the address, target register number, and formatting information are entered in the miss address file (MAF).
21164PC Microarchitecture
The MAF performs a load-merging function. When a load miss occurs, each MAF entry is checked to see if it contains a load miss that addresses the same Dcache (32­byte) block. If it does, a nd cert ain merging rules a re sat isfied , th en the new load miss is merged with an existing MAF entry. This allows the MTU to service two or more load misses with one data fill from the CBU.
There are six MAF entries fo r load misse s and fou r more fo r IDU instr uction fetc hes and prefetches. Load misses are usually the highest MTU priority.
Refer to Sect ion 2.5 for information on load-merging rules.
2.1.4.3 Dcache Control and Store Instructions
The Dcache follows a write-through protocol. During the execution of a store instruction, the MTU probes the Dcache to determine whether the location to be overwritten is currently cached. If so (a Dcache hit), the Dcache is updated. Regard­less of the Dcache state, the MTU forwards the data to the CBU.
29 September 1997 – Subject To Change
Internal Architecture 2–11
Page 42
21164PC Microarchitecture
A load instruction t hat is issued one cycle after a store instruction in the pipeline cre­ates a conflict if both the load and st ore ope rations a ccess the same memory loca tion. (The store instruction has not yet updated the location when the load instruction reads it.) This co nflic t is hand led b y forc ing the l oad i nstruc ti on to take a repl ay tr ap; that is, the IDU flushes the pipeline and restarts execution from the load instruction. By the time the l oad inst ruction a rrives at th e Dcache t he second time, t he confli cting store instruction has w ritten the Dcache and the load instruction is executed nor­mally.
Replay traps can be avoided by scheduling the load instruction to issue three cycles after the st ore ins tructi on. If t he load instru ction is sch eduled t o is sue two c ycles a fter the store instruction, then it will be issue-sta lled for one cycle.
2.1.4.4 Write Buffer
The MTU contains a write buffer that has six 32-byte entries, each of which holds the data from one or more store instructions that access the same 32-byte block in memory until the data is written into the Bcache. The write buffer provides a finite, high-bandwidth resource for receiving store data to minimize the number of CPU stall cycles. The write buffer and associated WM B instruction are described in Sec­tion 2.7.

2.1.5 Cache Control and Bus Interface Unit

The cache control and bus interface unit (CBU) processes all accesses sent by the MTU and implements all memory-related external interface functions, particularly the coherence protocol functions for write-back caching. It controls the board-level backup cache (Bcache). The CBU handles all instruction and primary Dcache read misses and performs the function of writing data from the write buffer into the shared coherent memory subsys tem. The CBU also con trols the 128-b it bidire ctional data bus, address bus, and I/O control. Chapter 4 describes the external interface.

2.1.6 Cache Organization

The 21164PC has two onchip caches−a primary data cache (Dcache) and a primary instruction cache (Icache). All memory cells in the onchip caches are fully static, six-transistor, CMOS structures.
The 21164PC also provides control for the external cache (Bcache).
2–12 Internal Architecture
29 September 1997 – Subject To Change
Page 43
2.1.6.1 Data Cache
The data cache (Dcache) is a dual-read-ported, single-write-ported, 8KB cache. It is a write-through, read-allocate, direct-mapped, byte-accessible, physical cache with 32-byte blocks and data parity at the byte level.
2.1.6.2 Instruction Cache
The instruction cache (Icache) is a 16KB, virtual, direct-mapped cache with 64-byte blocks and 32-byte fills. Each block tag contains:
A 7-bit address space number (ASN) field as defined by the Alpha architecture
A 1-bit address space match (ASM) field as defined by the Alpha architecture
A 1-bit PALcode (physically addressed) indicator
Software, rather than Icache hardware, maintains Icache coherence with memory.
2.1.6.3 External Cache
The CBU implements control for an external, direct-mapped, physical, write-back, write-allocate cache with 64-byte blocks. The 21164PC supports board-level cache sizes of 512KB, 1MB, 2MB, and 4MB.

Pipeline Organization

2.1.7 Serial Read-Only Memory Interface

The serial read-only memory (SROM) interface provi des the initia lization data load path from a system SROM to the Icache. Chapter 7 provides information about the SROM interface.
2.2 Pipeline Or ga niz a ti on
The 21164PC has a 7-stage (or 7-cycle) pipeline for integer operate and memory ref­erence instructions, and a 9-stage pipeline for floating-point operate instructions. The IDU maintains state for all pipeline stages to track outstanding register write opera­tions.
Figure 2–2 shows the integer operate, memory reference, and floating-point operate pipelines for the IDU, FPU, I EU, and MTU. The f ir st four stages are executed in t he IDU. Remaining stages are executed by the IEU, FEU, MTU, and CBU. There are bypass paths that allo w the resul t of one instru ction to be used as a source oper and of a following instruction before it is written to the register file.
T ables 2–2, 2–3, 2–4 , 2–5, 2–6 , and 2–7 pro vide exampl es of eve nts at va rious st ages of pipelining during instruction execution.
29 September 1997 – Subject To Change
Internal Architecture 2–13
Page 44
Pipeline Organization
Figure 2–2 Instruction Pipeline Stages
Instruction Cache Read Instruction Buffer, Branch Decode,
Determine Next PC Slot by Function Unit Register File Access Checks,
Integer Register File Access
Integer Operate Pipeline
IC0IB SL
12AC34
First Integer Operate Stage
If Needed, Second Integer Operate Stage
Write Integer Register File
56
Arithmetic, logical, shift, and compare instructions complete in pipeline stage 4 (1-cycle latency). CMOV completes in stage 5 (2-cycle latency). IMULL has an 8-cycle or 9-cycle latency. CMOV or BR can issue in parallel (0-cycle latency) with a dependent CMP instruction.
Floating­Point Pipeline
Memory Reference Pipeline
IC
IBIBSL
0
112
Floating-Point Register File Access
First Floating-Point Operate Stage
Write Floating-Point Register File, Last Floating-Point Operate Stage
IC
0
Dcache Read Begins Dcache Read Ends Use Dcache Data, Store Writes Dcache
Bcache Tag/Data Access Begins Bcache Tag Access Ends, 1st Datum Returned Fill Dcache/Icache (1st OW)
Use Bcache Data
SL
AC
334
AC
2
55667 8
4
Bcache Read Latency
(5-20 CPU cycles)
7
. . .
11910
Bcache Cycle Time (2-10 CPU cycles)
109
2nd Datum Returned Fill Dcache/Icache (2nd OW)
2–14 Internal Architecture
HLO019B
29 September 1997 – Subject To Change
Page 45
Pipeline Organization
Table 2–2 Pipeline Examples—All Cases
Pipeline Stage Events
0 Access Icache tag and data. 1 Buffer four instructions, check for branches, calculate branch displace-
ments, and check for Icache hit.
2 Slot-swap instructions around so they are headed for pipelines capable of
executing them. Stall preceding stages if all instructions in this stage can­not issue simultaneously because of function unit conflicts.
3 Check the operands of each instruction to see that the source is valid and
available and that no write-write hazards exist. Read the IRF. Stall preced­ing stages if any in stru ction can no t be is sued. All source opera nds must be available at the end of this stage for the instruction to issue.
Table 2–3 Pipeline Examples—Integer Add
Pipeline Stage Events
4 Perform the add operation. 5 Result is available for use by an operate function in this cycle. 6 Write the IRF. Result is available for use by an operate function in this
cycle.
Table 2–4 Pipeline Examples—Floating Add
Pipeline Stage Events
4 Read the FRF. 5 First stage of FEU add pipeline. 6 Second stage of FEU add pipeline. 7 Third stage of FEU add pipeline. 8 Fourth stage of FEU add pipeline. Write the FRF. 9 Result is available for use by an operate function in this cycle. For
instance, pipeline stage 5 of th e user in struction can coincide with pipeline stage 9 of the producer (latency of 4).
29 September 1997 – Subject To Change
Internal Architecture 2–15
Page 46
Pipeline Organization
Table 2–5 Pipeline Examples—Load (Dcache Hit)
Pipeline Stage1Events
4 Calculate the effective address. Begin the Dcache data and tag store
access.
5 Finish the Dcache data and tag store access. Detect Dcache hit. Format
the data as required. Bcache arbitration defaults to pipe E0 in anticipation of a possible miss.
6 Write the IRF or FRF. Data is available for use by an operate function in
this cycle.
1
Pipe E0 has not been defined at this po int.
Table 2–6 Pipeline Examples—Load (Dcache Miss)
Pipeline Stage1Events
4 Calculate the effective address. Begin the Dcache data and tag store
access.
5 Finish the Dcache data and tag store access. Detect Dcache miss. Bcache
6 Forward physical address to pins. 7 Begin Bcache access, cycle 1. 8 N more CPU cycles waiting for Bcache data. 9 Receive Bcache data at the pins, send data to the Dcache. 10 Begin Dcache fill. Format the data as required. 11 Finish the Dcache fill. Write the integer or floating-point register file.
1
Pipes E0 and E1 have not been defined at this point.
2–16 Internal Architecture
arbitration defaults to pipe E0 in anticipation of a possible m iss. If there are load instructions in both E0 and E1, the load instruction in E1 would be delayed at least one more cycle because default arbitration specula­tively assumes the load in E0 will miss .
Data is available for use by an operate function in this cycle.
29 September 1997 – Subject To Change
Page 47
Table 2–7 Pipeline Examples—Store (Dcache Hit)
Pipeline Stage Events
4 Calculate the effective address. Begin the Dcache tag store access. 5 Finish the Dcache tag store access. Detect Dcache hit. Send store to the
write buffer simultaneously.
6 Write the Dcache data store if hit (write begins this cycle).

2.2.1 Pipeline Stages and Instruction Issue

The 21164P C pipeline d ivides in structi on processing into four s tatic a nd a number of dynamic stages of execution. The first four st ages consist of the inst ruction fetch, buffer and decode, slotting, and issue-check logic. These stages are static in that instructions may remain valid in the same pipeline stage for multiple cycles while waiting for a resource or stalling for other reasons. Dynamic stages (IEU and FEU) always advance state and are unaffected by any stall in the pipeline. A pipeline stall may occur while zero instructions issue, or while some instructions of a set of four issue and the others are held at the issue stage. A pipeline stall implies that a valid instruction is (or instructions are) presented to be issued but cannot proceed.
Pipeline Organization
Upon satisfying all issue requirements, instructions are issued into their slotted pipe­line. After issuing, ins tructions cannot stall in a s ubse que nt pipeline stage. The issue stage is responsible for ensuring that all resource conflicts are resolved before an instruction is allowed to continue. The only means of stopping instructions after the issue stage is an abor t condit ion. ( The te rm abort as used here i s dif ferent from its use in the Alpha AXP Architecture Reference Manual.)

2.2.2 Aborts and Exceptions

Aborts result from a number of causes. In general, they can be grouped into two classes, excepti ons (includin g interrupts ) and nonexce ptions. The dif ference between the two is that exceptions require that the pipeline be drained of all outstanding instructions before restarting the pipeline at a redirected address. In either case, the pipeline must be flushed of all instructions that were fetched subsequent to the instruction t hat caused the abor t co ndition (arithmeti c exceptions are an exception to this rule). This includes aborting some instructions of a multiple-issued set in the case of an abort condition on the one instruction in the set.
29 September 1997 – Subject To Change
Internal Architecture 2–17
Page 48
Pipeline Organization
The nonexception case does not need to drain the pipeline of all outstanding instruc­tions ahead of the aborti ng instruction. The pipeline can be r estarted immediat ely at a redirected address. Examples of nonexception abort conditions are branch mispre­dictions, subroutine call/return mispredictions, and replay traps. Data cache misses can cause aborts or issue stalls depending on the cycle-by-cycle timing.
In the event of an exception other than an arithmetic exception, the processor aborts all instructions issued after the exceptional instruction, as described in the preceding paragraphs. Due to the nature of some exce ption condi tions, thi s may occur as l ate as the integer register file (IRF) write cycle. In the case of an arithmetic exception, the processor may execute ins tr uct ions issued after the exceptional instruction.
After aborting, the address of the exceptional instruction or the immediately subse­quent instruction is latched in the EXC_ADDR internal processor register (IPR). In the case of an arithmetic exception, EXC_ADDR contains the address of the instruc­tion immediately after the last instruction execut ed. (Every instruction prior to the last instruction executed was also executed.) For machine check and interrupts, EXC_ADDR points to the instruct ion immed iately f ollowin g th e last instruct io n exe­cuted. For the remaining cases, EXC_ADDR points to the exceptional instruction; where, in all cases, its execution should naturally restart.
When the pipeline is fully drained, the processor begins instruction execution at the address given by the PALcode dispatch. The pipeline i s drained whe n all outstandi ng write operations to both the IRF and FRF have completed and all outstanding instructions have passed the point in the pipeline such that they are guaranteed to complete without an exception in the absence of a machine check.
Replay traps are aborts that occur when an instruction requires a resource that is not available at some point in the pipeline. These are usually MTU resources whose availability could not be ant i cipated accurately at is sue time (ref er to Sec ti on 2.4) . If the necessary resource is not available when the instruction requires it, the instruc­tion is aborted and the IDU begins fetching at exactly that instruction, thereby replaying the instruction in the pipeline. A slight variation on this is the load-miss­and-use replay trap in which an operate instruction is issued just as a Dcache hit is
being evaluated to determine if one of the instruction’s operands is valid. If the result is a Dcache miss, then the operate instruction is aborted and replayed.
2–18 Internal Architecture
29 September 1997 – Subject To Change
Page 49

2.2.3 Nonissue Conditions

There are two reasons for nonissue conditions. The first is a pipeline stall wherein a valid instruction or set of instructions are prepared to issue but cannot due to a resource confli ct ( re gister conflict or function unit conflict). These t ype s of nonissue cycles can be minimized through code scheduling.
The second type of nonissue conditions consists of pipeline bubbles where there is no valid instruction in the pipeline to issue. Pipeline bubbles result from the abort conditions described in the previous section. In addition, a single pipeline bubble is produced whenever a branch type instruction is predicted to be taken, including sub­routine calls and returns.
Pipeline bubbles a re reduced directly by t he instruction buf fe r har dwar e a nd through bubble squashing, but can also be effectively minimized through careful coding practices. Bubble squashing involves the ability of the first four pipeline stages to advance whenever a bubble or buffer slot is detected in the pipeline stage immedi­ately ahead of it while the pipeline is otherwise stalled.

2.3 Scheduling and Issuing Rules

Scheduling and Issuing Rules
The following sections define the classes of instructions and provide rules for instruction slotting, instruction iss uing, and latency.
2.3.1 Instruction Class Definition and Instruction Slotting
The scheduling and mult iple issu e rules pre sented here are per formance re lated onl y; that is, there are no functional dependencies related to scheduling or multiple issu-
ing. The rules are defined in terms o f instruction classes. Table 2–8 sp ecifies all o f the instruction classes and the pipeline that executes the particular class. With a few additional rules, the table provides the information necessary to determine the func­tional resource conflicts that determine which instructions can issue in a given cycle.
Table 2–8 Instruction Classes and Slotting
Class Name Pipeline Instruction List
LD E01 or E12 All loads except LDx_L ST E0 All stores except STx_C MBX E0 LDx_L, MB, WMB, STx_C, HW_LD-lock, HW_ST-cond,
FETCH
RX E0 RS, RC
29 September 1997 – Subject To Change
Internal Architecture 2–19
(Sheet 1 of 3)
Page 50
Scheduling and Issuing Rules
Table 2–8 Instruction Classes and Slotting
Class Name Pipeline Instruction List
MXPR E0 or E1
HW_MFPR, HW_MTPR
(Sheet 2 of 3)
(depends
on the IPR) IBR E1 Integer conditional branches FBR FA
3
Floating-point conditional branches
JSR E1 Jump-to-subroutine instructions: JMP, JSR, RET, or
JSR_COROUTINE, BSR, BR, HW_REI, CALLPAL
IADD E0 or E1 ADDL, ADDL/V, ADDQ, ADDQ/V, SUBL, SUBL/V, SUBQ,
SUBQ/V, S4ADDL, S4ADDQ, S8ADDL, S8ADDQ, S4SUBL,
S4SUBQ, S8SUBL, S8SUBQ, LDA, LDAH ILOG E0 or E1 AND, BIS, XOR, BIC, ORNOT, EQV SEXT E0 SEXTB, SEXTW SHIFT E0 SLL, SRL, SRA, EXTQL, EXTLL, EXTWL, EXTBL,
EXTQH, EXTLH, EXTWH, MSKQL, MSKLL, MSKWL,
MSKBL, MSKQH, MSKLH, MSKWH, INSQL, INSLL,
INSWL, INSBL, INSQH, INSLH, INSWH, ZAP, ZAPNOT CMOV E0 or E1 CMOVEQ, CMOVNE, CMOVLT, CMOVLE, CMOVGT,
CMOVGE, CMOVLBS, CMOVLBC ICMP E0 or E1 CMPEQ, CMPLT, CMPLE, CMPULT, CMPULE, CMPBGE IMULL E0 MULL, MULL/V IMULQ E0 MULQ, MULQ/V IMULH E0 UMULH MVI E0 PERR, UNPKBW, UNPKBL, PKWB, PKLB, MINSB8,
FADD FA Floating-point operates, including CPYSN and CPYSE, except
FDIV FA Floating-point divide FMUL FM
2–20 Internal Architecture
MINSB4, MINUB8, MINUW4, MAXUB8, MAXUW4,
MAXSB8, MAXSW4
multiply, divide, and CPYS
4
Floating-point multiply
29 September 1997 – Subject To Change
Page 51
Scheduling and Issuing Rules
Table 2–8 Instruction Classes and Slotting
Class Name Pipeline Instruction List
(Sheet 3 of 3)
FCPYS FM or FA CPYS, not including CPYSN or CPYSE MISC E0 RPCC, TRAPB UNOP None UNOP
1
IEU pipeline 0.
2
IEU pipeline 1.
3
FEU add pipeline.
4
FEU multiply pipeline.
5
UNOP is LDQ_U R31,0(Rx).
5
Slotting
The slotting function in the IDU determines which instructions will be sent forward to attempt to issue. The slotting function detects and removes all static functional resource conflict s. Th e s et of instructions out put by t he slotting function will issue if no register or other dynamic resource conflict is detected in stage 3 of the pipeline. The slotting algorithm follows:
Starting from the first (lowest addressed) valid instruction in the INT16 in stage 2 of the 21164PC IDU pipeline, attempt to assign that instruction to one of the four pipelines (E0, E1, FA, FM). If it is an instruction that can issue in either E0 or E1, assign it to E0. However, if one of the following is true, assign it to E1:
E0 is not free and E1 is free.
The next integer instruct ion
If the current inst ruc ti on i s one that can issue in eithe r FA or FM, assign it to FA unless FA is not free. If it is an FA-only instruction, it m ust be a ssigned to FA. If it is an FM-only instruction, it must be assigned to FM. Mark the pipeline selected by this proce ss as taken and resu me with the next sequential instruct ion. Stop when an instruction cannot be allocated in an execution pipeline because any pipeline it can use is already taken.
The slotting logic does not send instructions forward out of logical instruction order because the 21164PC always issues instructions in order. The slotting logic also enforces the special rules in the following list, stopping the slotting process when a rule would be violated by allocating the next instruction an execution pipeline:
1
In this context, an integer instruction is one that can issue in on e or bo th of E0 or E1, but not FA or FM.
29 September 1997 – Subject To Change
1
in this INT16 can issue only in E0.
Internal Architecture 2–21
Page 52
Scheduling and Issuing Rules
An instruction of class LD cannot be issued simultaneously with an instruction
of class ST.
All instructions are discarded at the slotting stage after a predicted-taken IBR or
FBR class instruction, or a JSR class instruction.
After a predicted not- taken IBR or FBR, no othe r IBR, FBR, or JSR clas s can be
slotted together.
The following cases are detected by the slotting logic:
From lowest address to highest with in an I NT16, with th e foll owing a rrange-
ment:
I-instruction, F-instruction, I-instruction, I-instruction
I-instruction is any instruction that can issue in one or both of E0 or E1. F-instruction is any instruction that can issue in one or both of FA or FM.
From lowest address to highest with in an I NT16, with th e foll owing a rrange-
ment:
F-instruction, I-instruction, I-instruction, I-instruction
When this type of case is detected, the first t wo instru ctions are forward ed to the issue point i n one cycle. The second two are sent only when the first two have both issued, provided no other slotting rule would prevent the second two from being slotted in the same cycle.

2.3.2 Coding Guidelines

Code should be sch edul ed ac cor ding to latency and fun cti on unit availability. This is good practice in most RISC architectures. Code alignment and the effects of split-
2
should be considered.
issue
2
Split-issue is the situation in whic h not all instructions sent from the slottin g stage to the issue stage issue. One or more stalls result.
2–22 Internal Architecture
29 September 1997 – Subject To Change
Page 53
Scheduling and Issuing Rules
Instructions [a] (the LDL) and [b] (the first ADDL) in the following example are slotted tog ether. Instruction [b] stalls (sp lit-issue), thus preventing instruction [c] from advancing to the issue stage:
Code example showing Code example showing incorrect ordering correct ordering
(1) [a] LDL R2, 0 (R1) (1) [d] LDL R2, 0 (R1) (3) [b] ADDL R2, R3, R4 (1) [e] NOP (4) [c] ADDL R2, R5, R6 (3) [f] ADDL R2, R3, R4
(3) [g] ADDL R2, R5, R6
NOTES: The instruction examples are assumed to begin on an INT16 alignment. (n) = Expected execute cycle.
Eventually [b] issues when the result of [a] is returned from a presumed Dcache hit. Instruction [ c] i s delayed because i t cannot advance to t he issue stage unti l [ b] issues.
In the improved sequence, the LDL [d] is slotted with the NOP [e]. Th en the first ADDL [f] is slotted with the second ADDL [ g] and th ose two in structi ons dual- issue. This sequence takes one less cycle to complete than the first sequence.

2.3.3 Instruction Latencies

After slotting , inst ructi on is sue is gov erned by the ava ilabi lity of re giste rs for read or write operations, an d the availability of th e f loa ti ng divide unit and the integer multi-
ply unit. There are producer–consumer dependencies, producer–producer dependen­cies (also known as write-after-write conflicts), and dynamic function unit availability dependencies (integer multiply and floating divide). The IDU logic in stage 3 of the 21164PC pipeline detects all these conflicts.
The latency to produce a valid result for most instructions is fixed. The exceptions are loads that mis s, float ing-point di vides, and integer multipl ies. Table 2–9 gives the latencies for each instruction class. A latency of 1 means that the result may be used by an instruction is sue d one cyc le after the producing instruction. Most latenci es are only a property of the produc er. An exception is integer multiply lat encie s. Ther e are no variations in latency due to which a part icular unit produ ces a give n result relati ve to the particular unit tha t consumes it. In the case of integer multi ply, the instruction is issued at the time determined by the standard latency numbers. The multiply’s latency is dependent on which pr evious inst ructi ons prod uced it s oper ands and when they executed.
29 September 1997 – Subject To Change
Internal Architecture 2–23
Page 54
Scheduling and Issuing Rules
Table 2–9 Instruction Latencies
Additional Time Before Result Available to
Class Latency
LD Dcache hits, latency=2.
Dcache miss/Bcache hit, latency=10 or longer.
2
Integer Multiply Unit
1 cycle
ST Store operations produce no result.
MBX LDx_L Dcache hits, latency=2.
LDx_L Dcache miss/Bcache hit, latency=10 or longer.
2
LDx_L Dcache miss/Bcache miss, latency depends on memory subsystem state. STx_C, latency depends on memory subsystem state.
MB, WMB, and FETCH produce no result. RX RS, RC, latency=1. 2 cycles MXPR HW_MFPR, latency=1, 2, or longer, depending on the IPR.
1 or 2 cycles
HW_MTPR, produces no result. IBR Produces no result. (Taken branch issue latency minimum=1
cycle, branch mispredict penalty=5 cycles.) FBR Produces no result. (Taken branch issue latency minimum=1
cycle, branch mispredict penalty=5 cycles.) JSR All but HW_REI, latency=1.
2 cycles
HW_REI produces no result. (Issue latency—minimum 1 cycle.)
(Sheet 1 of 2)
1
SEXT Latency=1. 2 cycles IADD Latency=1. 2 cycles ILOG Latency=1.
3
2 cycles SHIFT Latency=1. 2 cycles CMOV Latency=2. 1 cycle ICMP Latency=1. IMULL Latency=8, plus up to 2 cycles of added latency, depen ding o n the
source of the data.
3
1
Latency until next IMULL, IMULQ, or
2 cycles
1 cycle
IMULH instruction can issue (if there are no data dependencies) is 4 cycles plus the number of cycles added to the latency.
2–24 Internal Architecture
29 September 1997 – Subject To Change
Page 55
Scheduling and Issuing Rules
Table 2–9 Instruction Latencies
Additional Time Before
Result Available to Class Latency
IMULQ Latency=12, plus up to 2 cycles of added latency, depending on
the source of the data.
1
Latency until next IMULL, IMULQ, or
Integer Multiply Unit
1 cycle
IMULH instruction can issue (if there are no data dependencies) is 8 cycles plus the number of cycles added to the latency.
IMULH Latency=14, plus up to 2 cycles of added latency, depending on
the source of the data.
1
Latency until next IMULL, IMULQ, or
1 cycle
IMULH instruction can issue (if there are no data dependencies) is
8 cycles plus the number of cycles added to the latency. MVI Latency=2. 1 cycle FADD Latency=4.
FDIV Data-dependent late ncy: 15 to 31 sin gle p recision , 22 to 60 dou ble
— precision. Next floating divide can be issued in the same cycle. The result of the previous divide is available, regardless of data dependencies.
FMUL Latency=4. — FCYPS Latency=4. — MISC RPCC, latency=2. TRAPB produces no result. 1 cycle
(Sheet 2 of 2)
1
UNOP UNOP produces no result.
1
The multiplier is unable to receive data from IEU bypass paths. The instruction issues at the expected time, but its latency is increased by the time it takes for the input data to become available to the multiplier. For example, an IMULL instruction issued one cycle later than an ADDL instruction, which produced one of its operands, has a latency of 10 (8 + 2). If the IMULL instruction is issued two cycles later than the ADDL instruction, the latency is 9 (8 + 1).
2
When idle, Bcache arbitration predicts a load miss in E0. If a load actually does miss in E0, it is sent to the Bcache immediately. If it hits in the Bcache, and no other event in the CBU affects the operation, the requested data is available for use in 10 or more cycles. Otherwise, the request takes longer (possibly much longer, depending on the state of the CBU and memory). It should be possible to schedule some unrolled code loops for Bcache by prefetching data into the Dcache usi ng LDQ R31, x(Rx).
3
A special bypass provides an effective latency of 0 (zero) cycles for an ICMP or ILOG instruction producing the test operand of an IBR or CMOV instruction. This is true only when the IBR or CMOV instruction issues in the same cycle as the ICMP or ILOG instruction that produced the test operand of the IBR or CMOV instruction. In all other cases, the effective latency of ICMP and ILOG instructions is 1 cycle.
29 September 1997 – Subject To Change
Internal Architecture 2–25
Page 56
Scheduling and Issuing Rules
2.3.3.1 Producer–Producer Latency
Producer–producer latency, also known as write-after-write conflicts, cause issue­stalls to preserve write o rder. If two instructions write the same register, they are forced to do so in different cycles by the IDU. This is necessary to ensure that the correct result is left in the register file after both instructions have executed. For most instructions, the ord er in which they write the register file is dictated by issue o rder. However IMUL, FDIV, and LD instructions may require more time than other instructions t o co mp let e. Subsequent instr uct io ns that write the same destination reg ­ister are issue-stalled to preserve write ordering at the register file.
Conditions that involve an intervening producer–consumer conflict can occur com­monly in a multiple-issue situation when a register is reused. In these cases, pro­ducer–consumer latencies are equal to or greater than the required producer– producer latency as determined by write ordering and therefore dictate the overall latency.
An example of this case is shown in the following code:
LDQ R2,0(R0) ;R2 destination ADDQ R2,R3,R4 ;wr-rd conflict stalls execution waiting for R2 LDQ R2,D(R1) ;wr-wr conflict may dual issue when ADDQ issues
Producer–producer la tency is generally determi ned b y applying the rule that register file write operations must occur in the correct order (enforced by IDU hardware). Two IADD or ILOG class instructions that write the same register issue at least one cycle apart. The same is true of a pair of CMOV-class instructions, even though their latency is 2. For IMUL, FDIV, and LD instructions, producer–producer conflicts with any subsequent instruction results in the second instruction being issue-stalled until the IMUL, FDIV, or LD instruction is about to complete. The second instruc­tion is issued as soon as it is guaran teed to write the regist er file at least one cycle after the IM UL, FDIV, or LD instruc tion.
If a load writes a register, and within two cycles a subsequent instruction writes the same register, the subsequent instruction is issued speculatively, assuming the load hits. If the load misses, a load-mi ss-and-u se trap is gen erated. This causes the se cond instruction to be repla yed by the IDU. When the sec ond instru ction agai n reaches the issue point, it is issue-stalled until the load fill occurs.
2–26 Internal Architecture
29 September 1997 – Subject To Change
Page 57

2.3.4 Issue Rules

The following is a list of conditions that prevent the 21164PC from issuing an instruction:
No instruction can be issued until all of its source and destination registers are
clean; that i s, all outstandin g wr it e operations to t he destination register are guar­anteed to complete in i ssue order a nd there are no outs tanding write operation s to the source registers, or those write operations can be bypassed.
Technically, load-miss-and-use replay traps are an exception to this rule. The
consumer of the load’s result issues, and is aborted, beca use a load wa s predict ed to hit and was discovered to miss just as the consumer instruction issued. In practice, the only difference is that the latency of the consumer may be longer than it would have been had the iss ue logic “kno wn” the load would miss in time to prevent issue.
An instruction of class LD cannot be issued in the second cycle after an instruc-
tion of class ST is issued.
No LD, ST, MXPR (to an MTU register), or MBX class instructions can be
issued after an MB inst ructi on has been issued unt il the MB inst ructi on has been acknowledged by the CBU.
Scheduling and Issuing Rules
No LD, ST, MXPR (to an MTU register), or MBX class instructions can be
issued after a STx_C (or HW_ST-cond) instruction has been issued until the MTU writes the success/failure result of the STx_C (HW_ST-cond) in its desti­nation register.
No IMUL instructions can be issued if the integer multiplier is busy.
No floating-point divi de instruc tions can be issued if the floati ng-point divider is
busy.
No instruction can be issued to pipe E0 exactly two cycles before an integer mul-
tiplication complete s.
No instruction can be issued to pipe FA exactly five cycles before a floating-
point divide completes.
No Store instruction can be issued exactly three cycles before a fill. The data
store write operation, if the store hits, will conflict with the fill operation.
29 September 1997 – Subject To Change
Internal Architecture 2–27
Page 58

Replay Traps

No instruction can be issued to pipe E0 or E1 exactly two cycles before an inte-
ger register fill is requested (speculatively) by the CBU, except IMULL, IMULQ, and IMULH instructions and instructions that do not produce any result.
No LD, ST, or MBX class instructions can be issued to pipe E0 or E1 exactly
one cycle before an integer register fill is requested (speculatively) by the CBU.
No instruction issues after a TRAPB instruction until all previously issued
instructions are guaranteed to finish without generating a trap other than a machine check.
All instructions sent to the issue stage (stage 3) by the slotting logic (stage 2) are issued subject to the previous ru les. If issue is prev ented fo r a given inst ruction at the issue stage, all logica lly subsequent instru ctions at th at stage are p revented f rom issuing automatically. The 21164PC only issues instructions in order.
2.4 Repl ay Traps
There are no stal ls af ter the i nstr uctio n iss ue poi nt in t he pip eline . In s ome si tuations , an MTU instruction cannot be executed because of insufficient resources (or some other reason). These instructions trap and the IDU restarts their execution from the beginning of the pipeline. This is called a replay trap. Replay traps occur in the fol­lowing cases:
The write buff er is fu ll when a s tore i nst ructi on is ex ecuted and t here ar e alre ady
six write buffe r entries all ocated. The trap occur s even if the entry would have merged in the writ e buffer.
A load instruction is issued in pipe E0 when all six MAF entries are valid (not
available), or a load instruction issued in pipe E1 when five of the six MAF entries are valid. The trap occurs even if the load instruction would have hit in the Dcache or merged with an MAF entry.
Alpha shared memory model order trap (Litmus test 1 trap): If a load instruction
issues that address matches with any miss in the MAF (down to the quadword boundary), the load instruction is aborted through a replay trap regardless of whether the newly issued load instruction hits or misses in the Dcache. This ensures that the two loads execute in issue order.
2–28 Internal Architecture
29 September 1997 – Subject To Change
Page 59

Miss Address File and Load-Merging Rules

Load-after-store trap: A replay trap occurs if a l oad instruction is issued in the
cycle immediately foll owi ng a s tor e i nst ruction that hits in the Dcache, and both access the same location. The address match is exact for address bits <12:2> (longword granularity), but ignores address bits <42:13>.
When a load in struction is followed, within one cycle, by any instruction that
uses the result of that load, and the load misses in the Dcache, the consumer instruction traps and is restarted from the beginning of the pipeline. This occurs because the consumer instruction is issued speculatively while the Dcache hit is being evaluated. If the load misses in the Dcache, the speculative issue of the consumer instruction was incorrect. The replay trap generally brings the con­sumer instruction to the issue point before or simultaneously with the availability of fill data.
2.5 Miss Address File and Load-Merging Rules
The following sections describe the miss address file (MAF) and its load-merging function, and the load-merging rules that apply after a load miss.

2.5.1 Merging Rules

When a load miss occ urs, e ach M AF entr y is c hecked to se e if it c ontain s a l oad miss that addresses the same 32-byte Dcache block. If it does, and certain merging rules are satisfied, then the new load miss is merged with an existing MAF entry. This allows the MTU to service two or mor e load misse s with one d ata fi ll from t he CBU. The merging rules for an individual MAF entry are different for cacheable and non­cacheable space.
2.5.1.1 Cacheable Space Load-Merge Rules
The merging rules for cacheable space loads (physical address bit <39>=0) are as follows:
Merging only occurs if the new load miss addresses a different INT8 from all
loads previously entered or merged to that MAF entry. If it addresses the same INT8, the machine traps and replays the instruction. This continues until the MAF entry is retired, at which time the trapping load hits in the Dcache.
Bytes, words, longwords, and quadwords can merge with each other, provided
that they are not in the same INT8.
3
Merging rules result primarily from limitations of the implementation.
29 September 1997 – Subject To Change
3
Internal Architecture 2–29
Page 60
Miss Address File and Load-Merging Rules
Merging is prevented for the MAF entry after the first data fill (to that MAF
entry) from the Bcache, regardless of whether the Bcache access hits or not.
Load misses that match any MAF address down to the INT32 boundary, but
could not merge (for any reason), are replay trapped. Once the Dcache is filled, this load instruction executes and hits in the Dcache.
All DREAD load-merging is prevented when MAF_MODE<00>=1 (see Section 5.2.16).
2.5.1.2 Noncacheable Space Load-Merge Rules
The merging ru les for noncach eable spa ce load s (physi cal addre ss bit < 39>=1) ar e as follows:
Merging only occurs if the new load miss addresses a different INT8 from all
loads previously entered or merged to that MAF entry. If it addresses the same INT8, the machine traps and replays the instruction. This continues until the MAF entry is retired, at which time the trapping load hits in the Dcache.
Only quadwords can merge with other quadwords, provided they are not in the
same INT8. Bytes, words, and longwords cannot merge.
Merging stops for a load instruction to noncacheable space as soon as the CBU
accepts the reference. This permits the system environment to access only those INT8s that are actually requested by load instructions.
All accesses that could not merge (except those to the same INT8) are allocated
new MAF entries.
Noncacheab le space load -merging is prevented when MAF_MODE<03>=1. All DREAD load-merging is prevented when MAF_MODE<00>=1 (see Section 5.2.16).
At the external interface, noncacheable read instructions indicate to the system envi­ronment which INT32 is addressed and which of the INT8s within the INT32 are actually accessed. Each load for longword, word, or byte data results in a separate request to the CBU.

2.5.2 Read Requests to the CBU

Merging is done for two load instructions that issued simultaneously, and both miss; in effect, as if they were issued sequentially with the load from IEU pipe E0 first. The MTU sends a read request to the CBU for each MAF entry allocated.
2–30 Internal Architecture
29 September 1997 – Subject To Change
Page 61
Miss Address File and Load-Merging Rules
A bypass is provided so that if the load instruction issues in IEU pipe E0, and no
MAF requests are pending, the load instruction’s read request is sent to the CBU immediately, provided the CBU is ready for such an access. Similarly, if a load instruction from IEU p ipe E1 mis ses, and there was no load i nstruction in pipe E0 t o begin with, the E1 load miss is sent to the CBU immediately. In either case, the bypassed read request is aborted if the load hits in the Dcache, merges in the MAF, or is replay trapped by the MTU.

2.5.3 MAF Entries and MAF Full Conditions

There are six MAF entries for load misses and four for IDU instruction fetches and prefetches. Load misses are usually the highest MTU priority request.
If the MAF is full and a loa d instructi on issues in pipe E0, or i f five of the si x MAF entries are valid and a loa d instruction issues in pip e E1, an MAF full trap occurs causing the IDU to restart execution with the load instruction that caused the MAF overflow. When the load instruction arrives at the MAF the second time, an MAF entry may have become available. If not, the MAF full trap occurs again.

2.5.4 Fill Operation

Eventually, the CBU provides the data requested for a given MAF entry (a fill). The CBU requests that the IDU allocate up to three consecutive “bubble” cycles in the IEU pipelines. The first bubble prevents any store instruction from issuing. The sec­ond bubble prevents any instructions from issuing. The third bubble prevents only MTU instructions (particularly load and store instructions) from issuing. The first bubble prevents st or e da ta from colliding with t h e fi ll in the data cache. The fi ll uses the second bubble cycle as it progresses down the IEU/MTU pipelines to format the data and load the register file. It uses the third bubble cycle to fill the Dcache.
An instruction typically writes the register file in pipeline stage 6 (see Figure 2–2). Because there is on ly one regi ster fi le write po rt per in teger pipeline , a no-inst ruction bubble cycle is r equired to reser ve a regis ter fi le wri te port f or the f ill. A loa d or stor e instruction accesses the Dcache in the second half of stage 4 and the first half of stage 5. The fill operation writes the Dc ache, making it unavailable for other accesses at that time. Relative to the register file write operation, the Dcache (write) access for a fill occurs a cycle later than the Dcache access for a load hit. Only load and store instructions use the Dcache in the pipeline. Therefore, the second bubble reserved for a fill is a no-MTU-instruction bubble.
29 September 1997 – Subject To Change
Internal Architecture 2–31
Page 62

MTU Store Instruction Execution

Up to two floating or integer registers may be written for each CBU fill cycle. Fills deliver 32 bytes in tw o cyc le s: two INT8 s per cycle. The MAF merging rul es ensu re that there is no more than one register to write for each INT8, so that there is a regis­ter file write port available for each INT8. After appropriate formatting, data from each INT8 is written into the IRF or FRF provided there is a miss recorded for that INT8.
Load misses are all checked against the write buffer contents for conflicts between new load instructions and previously issued store instructions. Refer to Section 2.7 for more info rmation on write operations.
LDL_L and LDQ_L instructions always allocate a new MAF entry if they miss the Dcache. LDL_L and LDQ_L instructions that hit in the Dcache are retired by the MTU immediately. No load instructions that follow an LDL_L or LDQ_L instruc­tion are allowed to merge with it. After an LDL_L or LDQ_L instruction is issued (and misses in the Dcache), the IDU does not is sue any more M TU instru ctions until the MTU has successfully sent the LDL_L or LDQ_L instruction to the CBU. This guarantees correct ordering between an LDL_L or LDQ_L instruction and a subse­quent STL_C or STQ_C instruction even if they access different addresses.
2.6 MTU Store Instruction Execution
Store instructions execute in the MTU by:
1. Reading the Dcache tag store in the pipeline stage in which a load instruction
would read the Dcache
2. Checking for a hit in the next stage
3. Writing the Dcache data store instruction if there is a hit in the second (follow-
ing) pipeline stage
Load instructions are not all owed to issue in the se cond cycle af ter a stor e instru ction (one bubble cycle). Other instructions can be issued in that cycle. Store instructions can issue at the rat e of one per cycl e becau se s tore i nstru ctions in th e Dstr eam do not conflict in t hei r use of resources. The Dc ach e t ag store and Dcache data store are the principal resources. However, a load instruction uses the Dcache data store in the same early stag e that it uses the Dc ache tag store. Therefore , a l oad inst ruction would conflict with a store instruction if it were issued in the second cycle after any store instruction. Refer to Section 2.2 for more information on store instruction execution in the pipeline.
2–32 Internal Architecture
29 September 1997 – Subject To Change
Page 63

Write Buffer and the WMB Instruction

A load instruction t hat is issued one cycle after a store instruction in the pipeline cre­ates a conflict i f b oth access exactly the same memory location. Th is occurs because the store instruction has not yet updated the location when the load instruction reads it. This con flict is han dled by forcing the load instructi on to replay trap. The IDU flushes the pipeline and restarts exec ution from the load instruction. By the time the load instruction arrives at the Dcache the second time, the conflicting store instruc­tion has written the Dcache and the load instruction is executed normally.
Software should not load data immediately after storing it. The replay trap that is
incurred “costs” se ven cycle s. The best solut ion is t o schedul e the load i nstruc ti on to issue three cycles after the store. No issue stalls or replay traps will occur in that case. If the load instruction is scheduled to issue two cycles after the store instruc­tion, it will be issue-stalled for one cycle. This is not an optimal solution, but is pre­ferred over incurring a replay trap on the load instruction.
For each store instruction, a search of the MAF is done to detect load-before-store hazards. If a sto re instr uctio n is execut ed, and a lo ad of t he same addr ess is pre sent i n the MAF, two things happen:
1. Bits are set in each conflicting MAF en tr y to pre vent i ts fi ll fr om bei ng pla ced in
the Dcache when it arrives, and to prevent subsequent load instructions from merging with that MAF entry.
2. Conflict bits are se t with the store instruc tion in the write buffer to prevent the
store instruction from being issued until all conflicting load instructions have been issued to the CBU.
Conflict checking is done at the 32-byte block granularity. This ensures proper results from the load instructions and prevents incorrect data from being cached in the Dcache.
A check is performed for each new s tore a gainst store inst ructi ons in t he write bu f fe r that have already been sent to the CBU but have not been completed. Section 2.7 describes this proces s.
2.7 Write Buffer and the WMB Instruction
The following sections describe the write buffer and the WMB instruction.
29 September 1997 – Subject To Change
Internal Architecture 2–33
Page 64
Write Buffer and the WMB Instruction

2.7.1 The Write Buffer

The write buffer contains six fully associative 32-byte entries. The purpose of the write buffer is to minimize the number of CPU stall cycles by providing a finite, high-bandwidth resource for receiving store data. This is required because the 21164PC can generat e store dat a at the peak rat e of one INT8 every CPU cycl e. This is greater than the average rate at which the Bcache can accept the data.
In addition to HW_ST and other store instructions, the STQ_C and STL_C instruc­tions are also written into the write buffer and sent to the CBU. However, unlike store instructions, these write buffer-directed instructions are never merged into a write buffer entry with other instructions.

2.7.2 The Write Mem ory Barrier (WMB) Instruction

The memory barrier (MB) instruction is suitable for ordering memory references of any kind. The WMB instruction forces ordering of write operations only (store instructions). The WMB i nstruction h as a special effec t on the write buffer. When it is executed, a bit is set in every wr ite buf fe r entry c ontainin g valid st ore data that will prevent future store instructions from merging with any of the entries. Also, the next entry to be allocated is mar ked with a WMB flag. At this point, the entry marked with the WMB flag does not yet have valid data in it. When an entry marked with a WMB flag is ready t o issu e to the CBU, the entry is no t issued until e very p reviou sly issued write instruction is complete. This ensures correct ordering between store instructions issued before the WMB instruction and store instructions issued after it.
Each write buffer entry contains a content-addressable memory (CAM) for holding physical address bits <39:05>, 32 bytes of data, 32-byte mask bits (that indicate which of the 32 byt es in the ent ry co ntain vali d data ), and misce llane ous con trol bits . Among the control bits are the WMB flag, and a no-merge bit, which indicates that the entry is closed to further merging.

2.7.3 Entry-Pointer Queues

T wo e ntry-poi nter queu es are a ssocia ted with the writ e buf fer: a free -entry q ueue an d a pending-request queue. The free-entry queue contains pointers to available invalid write buffer entries. The pending-request queue contains pointers to valid write buffer entries that have not yet been issued to the CBU. The pending-request queue is ordered in allocation order.
2–34 Internal Architecture
29 September 1997 – Subject To Change
Page 65
Each time the write buffer is presented with a store instruction, the physical address generated by the instruction is compar ed to the addr ess in each valid write buffer entry that is open for merging. If th e address is in the same INT32 as an address in a valid write buffer entry (that also contains a store instruction), and the entry is open
for merging, then the new store data is merged into that entry and the entry’s byte mask bits are updated. If no matching address is found, or all entries are closed to merging, then the store data is written into t he entry at the top of the free-entry queue. This entry is vali dated, and a poi nter to th e entry is moved fr om the free-ent ry queue to the pending-request queue.

2.7.4 Write Buffer Entry Processing

Write Buffer and the WMB Instruction
When the number of entries in the pending-request queue reaches the number pro-
4
grammed in MAF_MODE<WB_SET_LO_THRESH>
, the MTU begins arbi tratio n with the other MTU queue requ est s. Once the request is grante d, th e MTU se nds the entry at the head of the pending-request queue to the CBU. The MTU then removes the entry from the pending-request queue without placing it in the free-entry queue. When the CBU has completely processed th e write buf f er entry, it notifies the MTU, and the now invalid write buffer entry is placed i n the free-en try queue. T he MTU may request that up to fi ve addi tion al wri te buf fer entr ies b e p rocess ed whil e waitin g for the CBU to finish the first. The write buffer entries are invalidated and placed in the free-entry que ue in t he orde r that the r equest s compl ete. Thi s order may be d if fer ­ent from the order in which the requests were made.
The MTU sends write requests from the write buffer to the CBU. The CBU pro­cesses these requests according to the cache coherence protocol. Typically, this involves loading the target block into the Bcache, making it writable, and then writ­ing it. Because the Bcache is write-back, this completes the operation.
The MTU continues to request that write buffer entries be processed as long as one of the following occur s:
One buffer contains an STQ_C or STL_C instruction
One buffer is marked by a WMB flag
An MB instruction is being executed by the MTU
4
The following actions can also cause the WB to begin arbitration: (1) an MB or WMB instruction is issued, or (2) 264 cycles have elapsed without completing a write operation while there were pending write operations in the WB (triggered by the WB write counter).
29 September 1997 – Subject To Change
Internal Architecture 2–35
Page 66
Performance Measurement Support–Performance Counters
The number of entries in the write buffer exceeds the number programmed in
MAF_MODE<WB_CLR_LO_THRESH>. This ensures that these instructions complete as quickly as possible. The MTU requests that a wr it e buffer entry be processed every 264 cycl es (provided
there is a valid entry in the write bu ffe r), even if the write buffer is not arbitrating. This ensures that write instructions do not wait forever to be written to memory. (This is triggered by a free-running timer that is reset each time a write operation is completed.)
When an LDL_L or LDQ_L instr ucti on is proces sed by t he MT U, th e MTU re quest s processing of the next p ending writ e buff er reques t. This increa ses the chan ces of the write buffer being empty when an STL_C or STQ_C instruction is issued.
Every store instruction that does not merge in the write buffer is checked against every valid entry. If any entry is an address match, then the WMB flag is set on the newly allocated write buffer entry. This prevents the MTU from concurrently send­ing two write instructions to exactly the same block in the CBU.
Load misses are checked in the write buffer for conflicts. The granularity of this
check is an INT32. Any load ins tructi on matchi ng any writ e buf fer entry ’s address is considered a hit even if it does not access a byte marked for update in that write buffer entry. If a load hits in the write buffer, a conflict bit i s set in the load instruc­tion’s MAF entry, which prevents the load instruction from being issued to the CBU before the conflicting write buffer entry has been issued and completed. At the same time, the no-merge bit is set in every write buffer entry with which the load hit. A write buffer flush flag is also set. The MTU continues to request that write buffer entries be processed until all the entries that were ahead of, and including, the con­flicting write instructions at the time of the load hit have been processed.
2.7.5 Ordering of Noncacheable Space Write Instructions
Special logic ensures t hat wri te inst ructions to n oncacheabl e space are sent of fchip in the order in which t hei r corresponding buffers were allo cat ed ( p l ace d in the pending­request queue).
2.8 Performance Measurement Support–Performance Counters
The 21164PC contains a performance-recording feature. The implementation of this feature provides a mechanism to count various hardware events and causes an inter­rupt upon counter overflow. Interrupts are triggered six cycles after the event and,
2–36 Internal Architecture
29 September 1997 – Subject To Change
Page 67
Performance Measurement Support–Performance Counters
therefore, the exception PC might not reflect the exact instruction causing counter overflow. Three counters are provided to allow accurate comparison of two varia bles under a potentially nonrepeatable experimental condition. The three counters are designated counter 0 (16 bits), counter 1 (16 bits), and counter 2 (14 bits).
Counter inputs include:
Issues
Nonissues
Total cycles
Pipe dry
Pipe freeze
Mispredicts and cache misses
Counts for various instruction classifications
For information about counter control, refer to the following IPR descriptions:
Hardware interrupt clear (HWINT_CLR) register (see Section 5.1.23)
Interrupt summary register (ISR) (see Section 5.1.24)
Performance counter (PMCTR) register (see Section 5.1.27)
CBU configuration control (CBOX_CONFIG2) register bits <13:08> (see
Section 5.3.4)

2.8.1 CBU Performance Counters

The counters in the CBU (counte rs 0 and 1) are used to count Bc ache and system bus events. There are request events from the MTU to the CBU (three types), requests from the CBU to the syst em (three types), and re quests from the system to the CBU (four types).
MTU-to-CBU Requests
The MTU can issue the following requests:
Istream read request (32 bytes of instruction data), due to an Icache miss
Dstream read request (32 bytes of noninstruction data), due to a Dcache miss
Write request (32 bytes)
29 September 1997 – Subject To Change
Internal Architecture 2–37
Page 68
Performance Measurement Support–Performance Counters
Read and write requests can be to either cacheable or I/O space addresses, but the CBU performance counters only count requests to cacheable address space. The total number of read requests is equal to the sum of the Dstream read requests and the Istream read requests.
CBU-to-System Requests
The CBU can issue the following requests to the system:
READ MISS commands
BCACHE VICTIM commands
WRITE BLOCK commands
READ MISS commands to I/O space and WRITE BLOCK commands (which are always to I/O space on the 21164PC) are not counted by the performance counters. BCACHE VICTIM commands are always to cacheable space and, therefore, are always counted. READ MISS command s t o cacheable space ar e g ener at ed when th e 21164PC detects either a read miss or write miss in the Bcache. A BCACHE VICTIM command is also generated along with the READ MISS command if the block the request misses on is valid and dirty in the cache. In this case, the 64-byte Bcache block is read from the Bcache and sent to the system.
System-to-CBU Requests
The system can issue the following requests to the 21164PC:
FILL commands
READ commands
FLUSH commands
INVAL commands
Cacheable FILL commands are i n r es pons e t o READ MISS commands and write 64 bytes of data into the Bcache. I/O space FILL commands are not counted by the CBU performance counters. Depending on whether the miss was for a read or write request, the 21164PC will eit her forward the data to the onchip caches or write data from the writ e buffer into the newly filled block. The tota l number of FI LL com­mands is the same as the total n umber of READ MISS commands.
The other three system commands are external probes of the Bcache. INVAL com­mands are not counted by the CBU performance counter.
2–38 Internal Architecture
29 September 1997 – Subject To Change
Page 69
Performance Measurement Support–Performance Counters
Misses in the onchip ca ches can merge in the MTU before being issued to the CBU. Therefore, MTU read or write requests are not the same as onchip cache misses. Also, two Bcache misses can merge in the CBU and appear on the system bus as a single READ MISS request. Requests only merge with other requests of the same type (that is, Istream and Dstream requests do not merge, nor does a write request merge with a read request).
Using the Counters
The two counters work in paralle l, so they can be used to de termine simple ra tios like Bcache miss rate or more complex statistics like Dstream read merging in the CBU (by running several tests and normalizing the results).
For example:
Bcache miss rate =1 − (Bcache read hits/Total read requests)
Counter 0 selects 0x0 and counter 1 selects 0x1.
Dstream read merge rate in the CBOX
=1 − (Bcache Dstream read hits/Bcache Dstream read requests)
(Bcache Dstream read fills/Bcache Dstream read requests) Counter 0 selects 0x1 and counter 1 selects 0x0 on the first pass,
then counter 0 selects 0x2 and counter 1 selects 0x0 on the s econd pass.
29 September 1997 – Subject To Change
Internal Architecture 2–39
Page 70

Floating-Point Control Register

2.9 Floating-Point Control Register
Figure 2–3 shows the format of the floating-point control register (FPCR) and Table 2–10 describes the fields.
Figure 2–3 Floating-Point Control Register (FPCR) Format
31 00
RAZ/IGN
63 325556575859606162
Table 2–10 Floating-Point Control Register Bit Descriptions
5051525354 4849
INVD DZED OVFD
INV DZE OVF UNF INE IOV DYN_RM UNDZ UNFD INED SUM
RAZ/IGN
LJ-05358.AI4
(Sheet 1 of 2)
Name Extent Description (Meaning When Set)
SUM <63> Summary bit. Records bitwise OR of FPCR exception bits. Equal to
FPCR<57 | 56 | 55 | 54 | 53 | 52>
INED <62> Inexact disable. Suppress INE trap and place correct IEEE nontrap-
ping result in the destination register if the 21164PC is capable of producing correct IEEE nontrapping result.
UNFD <61> Underflow disable. Subset support: Suppress UNF trap if UNDZ is
also set and the /S qualifier is set on the instruction.
UNDZ <60> Underflow to zero. When set together with UNFD, on underflow,
2–40 Internal Architecture
the hardware places a true zero (all 64 bits zero) in the destination register rather than the denormal number specified by the I EEE stan­dard.
29 September 1997 – Subject To Change
Page 71
Floating-Point Control Register
Table 2–10 Floating-Point Control Register Bit Descriptions
Name Extent Description (Meaning When Set)
(Sheet 2 of 2)
DYN_RM <59:58> Dynamic routing mode. Indicates the rounding mode to be used by
an IEEE floating-point operate instruction when the instruction’s function field specifies dynamic mode (/D). The assignments are:
DYN IEEE Rounding Mode Selected
00 Chopped rounding mode 01 Minus infinit y 10 Normal rounding 11 Plus infinity
IOV <57> Integer overflow. An integer arithmetic operation or a conversion
from floating to integer overflowed the destination precision.
INE <56> Inexact result. A floating arithmetic or conversion operation gave a
result that differed from the mathematically exact result.
UNF <55> Underflow. A floating arithmetic or conversion operation under-
flowed the destination exponent.
OVF <54> Overflow. A floating arithmetic or conversion operation overflowed
the destination exponent.
DZE <53> Division by zero. An attempt was made to perfo rm a fl oati ng divi de
operation with a divisor of zero.
INV <52> Invalid operation. An attempt was made to perform a floating arith-
metic, conversion, or comparison operation, and one or more of the
operand values were illegal. OVFD <51> Overflow disable. Not supported. DZED <50> Division by zero disable. Not supported. INVD <49> Invalid operation disable. Not supported. Reserved <48:0> Reserved. Read as zero; ignored when written.
29 September 1997 – Subject To Change
Internal Architecture 2–41
Page 72

Design Examples

2.10 Design Examples
The 21164PC can be designed into many different uniprocessor system configura-
tions. Figure 2–4 illustrates one possible configuration. This configuration employs additional system/memory controller chipsets.
Figure 2–4 shows a typical uniprocessor system with a board-level cache. This sys­tem configuration could be used in standalone or networked workstations.
Figure 2–4 Typical Uniprocessor Configuration
Addr/cmd
21164PC
External
Cache
Tag
External
Cache
Data
Data
I/O Bus
2–42 Internal Architecture
Memory
and I/O
Interface
Main Memory
DRAM
Bank
DRAM
Bank
PCA019
29 September 1997 – Subject To Change
Page 73

Hardware Interface

This chapter contains the 21164PC microprocessor logic symbol and provides a list of signal names and th eir function s.

3.1 21164PC Microprocessor Logic Symbol

Figure 3–1 shows the logic symbol for the 21164PC chip.
3
29 September 1997 – Subject To Change
Hardware Interface 3–1
Page 74
21164PC Microprocessor Logic Symbol
Figure 3–1 21164PC Microprocessor Logic Symbol
21164PC
addr_bus_req_h
cack_h
dack_h
data_bus_req_h
fill_h
fill_dirty_h
fill_error_h
fill_id_h
idle_bc_h
System/Bcache
Interface
addr_h<39:4> addr_res_h<1:0>
cmd_h<3:0> data_h<127:0> data_adsc_l
data_adv_l data_ram_oe_l data_ram_we_l<3:0>
index_h<21:4> int4_valid_h<3:0>
lw_parity_h<3:0> st_clk1_h st_clk2_h st_clk3_h
tag_data_h<32:19> tag_data_par_h
tag_dirty_h tag_ram_oe_l tag_ram_we_l tag_valid_h victim_pending_h
irq_h<3:0> mch_hlt_irq_h pwr_fail_irq_h
sys_mch_chk_irq_h
clk_mode_h<1:0>
osc_clk_in_h
osc_clk_in_l
sys_reset_l
dc_ok_h
port_mode_h<1:0>
srom_data_h
tdi_h
temp_sense
tms_h
Vddi
Vdd
Vss
3–2 Hardware Interface
Interrupts
Clocks
Test Modes and
Miscellaneous
29 September 1997 – Subject To Change
cpu_clk_out_h sys_clk_out1_h sys_clk_out2_h
srom_clk_h srom_oe_l srom_present_l tck_h
tdo_h test_status_h<1> trst_l
MK145506B
Page 75

21164PC Signal Names and Functions

3.2 21164PC Signal Names and Functions
The 21164PC is contained in a 413-pin interstitial pin grid array (IPGA) package. There are 264 functional signal pins, 2 spare signal pins (unused), 5 voltage refer­ence pins (unused), 46 external power (Vdd) pins, 22 internal power (Vddi) pins, and 74 ground (Vss) pins.
The following table defines the 21164PC signal types referred to in this section:
Signal Type Definition
B Bidirectional I Input only O Output only
29 September 1997 – Subject To Change
Hardware Interface 3–3
Page 76
21164PC Signal Names and Functions
The remaining two tables describe the function of each 21164PC external signal.
Table 3–1 lists all signals in alphanumeric order. This table provides full signal descriptions. Table 3–2 lists signals by function and pro vides an abbreviate d de scr ip­tion.
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
addr_h<39:4> B 36 Address bus. These bidirectional signals pr ovide the addr ess of
the requested data or operation between the 21164PC and the system. If addr_h<39> is asserted, then the reference is to noncached, I/O memory space.
When the byte/word instructions are used and addr_h<39> is asserted, six additional bits of information are communicated over the pin bus. Two of the new bits are driven over addr_h<38:37>, becoming transfer_size<1:0>, with the fol- lowing values:
00 Size = 8 bytes 01 Size = 4 bytes 10 Size = 2 bytes 11 Size = 1 byte
addr_bus_req_h I 1 Address bus request. The system interface uses this signal to
gain control of the ad dr_h<39:4> and cmd_h<3:0> pins (see
Figure 4–22).
addr_res_h<1:0> O 2 Address response bits <1> and <0>. For system commands,
the 21164PC uses these pins to indicate the state of the block in the Bcache:
(Sheet 1 of 10)
Bits Command Meaning
00 NOP Nothing. 01 NOACK Data not found or clean. 10 Reserved. 11 ACK/Bcache Data from Bcache.
cack_h I 1 Command acknowledge. The system interface uses this signal
to acknowledge any one of the commands driven by the 21164PC.
3–4 Hardware Interface
29 September 1997 – Subject To Change
Page 77
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 2 of 10)
clk_mode_h<1:0> I 2 Clock test mode. These signals specify a relationship between
osc_clk_in_h,l, the CPU cycle time, and the duty-cycle equal-
izer. These signals should be deasserted in normal operation mode.
Bits Description
00 CPU clock frequency is equal to the input clock fre-
quency.
01 CPU clock frequency is equal to the input clock fre-
quency, with the onchip duty-cycle equalizer enabled.
10 Initialize the CPU clock, allowing the system clock to be
synchronized to a stable reference clock.
11 Initialize the CPU clock, allowing the system clock to be
synchronized to a stable reference clock, with the onchip duty-cycle equalizer enabled.
29 September 1997 – Subject To Change
Hardware Interface 3–5
Page 78
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 3 of 10)
cmd_h<3:0> B 4 Command bus. These signals drive and receive the commands
from the command bus. The following tables define the com­mands that can be driven on the cmd_h<3:0> bus by the 21164PC or the system. For additional information, refer to Section 4.1.1.1.
21164PC Commands to System: cmd_h
<3:0> Command Meaning
0000 NOP Nothing. 0001 Reserved.
0010 Reserved. 0011 Reserved. 0100 Reserved. 0101 Reserved. 0110 WRITE BLOCK Request to write a block. 0111 Reserved.
3–6 Hardware Interface
1000 READ MISS0 Request for data. 1001 READ MISS1 Request for data. 1010 Reserved. 1011 Reserved. 1100 BCACHE VICTIM Bcache victim should be
removed. 1101 Reserved. 1110 Reserved. 1111 Reserved.
29 September 1997 – Subject To Change
Page 79
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
System Commands to 21164PC: cmd_h
<3:0> Command Meaning
(Sheet 4 of 10)
0000 NOP Nothing. 0001 FLUSH Removes block from caches;
return dirty data. 0010 INVALIDATE Invalidates the block from
caches. 0011 Reserved.
0100 READ Read a block. 0101 Reserved. 0111 Reserved. 1xxx Reserved.
cpu_clk_out_h O 1 CPU clock output. This signal is used for test purposes. dack_h I 1 Data acknowledge. The system interface uses this signal to
control data transfer between the 21164PC and the system.
data_h<127:0> B 128 Data bus. These signals are used to move data between the
21164PC, the system, and the Bcache.
data_adsc_l O 1 Load a new address into the Bcache SSRAM. data_adv_l O 1 Advances the Bcache index to the next address. data_bus_req_h I 1 Data bus request. If the 21164PC samples this sig nal as serted
on the rising edge of sysclk n, th en the 21 16 4PC does not drive the data bus on the rising edge of sysclk n+1. Before asserting this signal, the system should assert idle_bc_h for the correct number of cycles. If the 21164PC samples this signal deas­serted on the rising edge of sysclk n, then the 21164PC drives the data bus on the rising edge of sysclk n+1. For timing details, refer to Section 4.9.4.
data_ram_oe_l O 1 Data RAM output enable. This signal is asserted for Bcache
read operations.
29 September 1997 – Subject To Change
Hardware Interface 3–7
Page 80
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 5 of 10)
data_ram_we_l<3:0> O 4 Data RAM write-enable. These signals are asserted for any
Bcache write operation. Refer to Section 5.3.1 for timing details.
dc_ok_h I 1 dc voltage OK. Must be deasserted until dc voltage reaches
proper operating level. After that, dc_ok_h is asserted.
fill_h I 1 Fill warning. If the 21164PC samples this signal asserted on
the rising edge of sysclk n, then the 21164PC provides the address indicated by fill_id_h to the Bcache on the rising edge of sysclk n+1. The Bcache begins to write in that sysclk. At the end of sysclk n+1, the 21164PC waits for the next sysclk and then begins the write operation again if dack_h is not asserted. Refer to Section 4.9.3 for timing details.
fill_dirty_h I 1 Fill dirty. If the block being filled is dirty, this pin should be
asserted.
fill_error_h I 1 Fill error. If this signal is asserted during a fill from memory, it
indicates to the 21164PC that the system has detected an invalid address or hard error. The system still provides an apparently normal read sequence with correct ECC/parity though the d ata i s no t va lid . Th e 21164PC traps to the machin e check (MCHK) PALcode entry point and indicates a serious hardware error. fill_error_h should be asserted when the data is returned. Each assertion produces a MCHK trap.
fill_id_h I 1 Fill identification. Asserted with fill_h to indicate which regis-
ter is used. The 21164PC supports two outstanding load instructions. If this signal is asserted when the 21164PC sam­ples fill_h asserted, then the 21164PC provides the address from miss register 1. If it is deasserted, then the address in miss register 0 is used for the read operation.
idle_bc_h I 1 Idle Bcache. When asserted, the 21164PC finishes the current
Bcache read or write operation but does not start a new read or write operation until the signal is deasserted. The system inter­face must assert this signal in time to idle the Bcache before fill data arrives.
index_h<21:4> O 18 Index. These signals index the Bcache.
3–8 Hardware Interface
29 September 1997 – Subject To Change
Page 81
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 6 of 10)
int4_valid_h<3:0> O 4 INT4 data valid. During write operations to noncached space,
these signals are used to indicate which INT4 b ytes of data are valid. This is useful for noncached write operations that have been merged in the write buffer.
int4_valid_h<3:0> Write Meaning
xxx1 data_h<31:0> valid xx1x data_h<63:32> valid x1xx data_h<95:64> valid 1xxx data_h<127:96> valid
During read operations to noncached space, these signals indi­cate which INT8 bytes of a 32-byte block need to be read and returned to the processor. This is useful for read operations to noncached memory.
int4_valid_h<3:0> Read Meaning
xxx1 data_h<63:0> valid xx1x data_h<127:64> valid x1xx data_h<191:128> valid 1xxx data_h<255:192> valid
Note: For both read and write operations, multiple int4_valid_h<3:0> bits can be set simultaneously.
29 September 1997 – Subject To Change
Hardware Interface 3–9
Page 82
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
When addr_h<39> is asserted, the int4_valid_h<3:0> signals are considered the addr_h<3:0> bits required for byte/word transactions. The functionality of these bits is tied to the value stored in addr_h<38:37>.
For read transactions:
addr_h <38:37> int4_valid_h<3:0> Value
00 Valid INT8 mask 01 addr_h<3:2> valid on int4_valid_h<3:2>;
10 addr_h<3:1> valid on int4_valid_h<3:1>;
11 addr_h<3:0> valid on int4_valid_h <3:0>
For write transactions:
addr_h <38:37> int4_valid_h<3:0> Value
(Sheet 7 of 10)
int4_valid<1:0> undefined
int4_valid<0> undefined
3–10 Hardware Interface
00 Valid INT4 mask 01 Valid INT4 mask 10 addr_h<3:1> valid on int4_valid_h<3:1>;
int4_valid<0> undefined
11 addr_h<3:0> valid on int4_valid_h <3:0>
29 September 1997 – Subject To Change
Page 83
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 8 of 10)
irq_h<3:0> I 4 System interrupt requests. Th ese signals have multiple modes
of operation. During normal operation, these level-sensitive signals are used to signal interrupt requests. During initializa­tion, these signals are used to set up the CPU cycle time divi­sor for sys_clk_out1_h as follows:
irq_h<3> irq_h<2> irq_h<1> irq_h<0> Ratio
Low High Low Low 4 Low High Low High 5 Low High High Low 6
Low High High High 7 High Low Low Low 8 High Low Low High 9 High Low High Low 10 High Low High High 11 High High Low Low 12 High High Low High 13 High High High Low 14 High High High High 15
lw_parity_h<3:0> B 4 Longword parity. These signals set even INT4 parity for the
current data cycle. Refer to Section 4.12.1 for information on the purpose of each lw_parity_h bit.
mch_hlt_irq_h I 1 Machine halt interrupt request. This signal has multiple modes
of operation. During initialization, this signal is used to set up
sys_clk_out2_ h de lay (see Table 4–3). During normal opera­tion, it is used to signal a halt request.
osc_clk_in_h osc_clk_in_l
I
11Oscillator clock inputs. These signals provide the differential
I
clock input that is the fundamental timing of the 21164PC. These signals are driven at the same frequency as the internal clock frequency (clk_mode_h<1:0> = 01).
port_mode_h<1:0> I 2 Select test port interface modes (normal, manufacturing, and
debug). For normal operation, both signal s must be deass ert ed.
29 September 1997 – Subject To Change
Hardware Interface 3–11
Page 84
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 9 of 10)
pwr_fail_irq_h I 1 Power failure interrupt request. This signal has multiple modes
of operation. During initialization, this signal is used to set up
sys_clk_out2_ h de lay (see Table 4–3). During normal opera­tion, this signal is used to signal a power failure.
srom_clk_h O 1 Serial ROM clock. Supplies the clock that causes the SROM to
advance to the next bit. The cycle time of this clock is 128 times the cycle time of the CPU clock.
srom_data_h I 1 Serial ROM data. Input for the SROM. srom_oe_l O 1 Serial ROM output enable. Supplies the output enable to the
SROM.
srom_present_l
1
B 1 Serial ROM present. Indicates that SROM is present and ready
to load the Icache.
st_clk1_h O 1 STRAM clock. Clock for synchronously timed RAMs
(STRAMs). For Bcache, this signal is synchronous with index_h<21:4> during private read and write operations, and with sys_clk_out1_h during read and fill operations.
st_clk2_h O 1 This signal is a duplicate of st_clk1_h, to increase the fanout
capability of the signal.
st_clk3_h O 1 This signal is another duplicate of st_clk1_h, to increase the
fanout capability of the signal.
sys_clk_out1_h O 1 System clock output. Programmable system clock
(cpu_clk_out_h divided by a value of 3 to 15) is used for board-level cache and system logic.
sys_clk_out2_h O 1 System clock output. A version of sys_clk_out1_h delayed by
a programmable amount from 0 to 7 CPU cycles.
sys_mch_chk_irq_h I 1 System machine check interrupt request. This signal has multi-
ple modes of operation. During initialization, it is used to set up sys_clk_out2_h delay (see Table 4–3). During normal operation, it is used to signal a machine interrupt check request.
sys_reset_l I 1 System reset. This signal protects the 21164PC from damage
during initial power-up. It must be asserted until dc_ok_h is asserted. After that, it is deasserted and the 21164PC begins its reset sequence.
3–12 Hardware Interface
29 September 1997 – Subject To Change
Page 85
21164PC Signal Names and Functions
Table 3–1 21164PC Signal Descriptions
Signal Type Count Description
(Sheet 10 of 10)
tag_data_h<32:19> B 14 Bcache tag data bits. This bit range supports .5MB to 4MB
Bcaches.
tag_data_par_h B 1 Tag data parity bit. This signal indicates odd parity for
tag_data_h<32:19>. tag_dirty_h B 1 Tag dirty state bit. This bit is private to the 21164PC. tag_ram_oe_l O 1 Tag RAM output enable. This signal is a sserted during any
Bcache read operation. tag_ram_we_l O 1 Tag RAM write-enable. This signal is asserted during any tag
write operation. tag_valid_h B 1 Tag valid bit. During fills, this signal is asserted to indicate
that the block has valid data. See Table 4–5 for information
about Bcache protocol.
tck_h B 1 JTAG boundary-scan clock. tdi_h I 1 JTAG serial boundary-scan data-in signal. tdo_h O 1 JTAG serial boundary-scan data-out signal. temp_sense I 1 Temperature sense. This signal is used to measure the die tem-
perature and is for manufacturing use only. For normal opera-
tion, this signal must be left disconnected. test_status_h<1> O 1 Icache test status or timeout reset. This signal is used for man-
ufacturing test purposes only to extract Icache test status infor-
mation from the chip. tms_h I 1 JTAG test mode select signal.
1
trst_l
B 1 JTAG test access port (TAP) reset signal.
victim_pending_h O 1 Victim pending. When asserted, this signal indicates that the
current read miss has generated a victim.
1
This signal is shown as bidirectional. However, for normal operation, it is input only. The output function is used during manufacturing test and verification only.
29 September 1997 – Subject To Change
Hardware Interface 3–13
Page 86
21164PC Signal Names and Functions
Table 3–2 lists signals by function and provides an abbreviated description.
Table 3–2 21164PC Signal Descriptions by Function
Signal Type Count Description Clocks
clk_mode_h<1:0> I 2 Clock test mode. cpu_clk_out_h O 1 CPU clock output. osc_clk_in_h,l I 2 Oscillator clock inputs. st_clk1_h O 1 Bcache STRAM clock output. st_clk2_h O 1 Bcache STRAM clock output. st_clk3_h O 1 Bcache STRAM clock output. sys_clk_out1_h O 1 System clock output. sys_clk_out2_h O 1 System clock output. sys_reset_l I 1 System reset.
Bcache
data_h<127:0> B128Data bus. data_adsc_l O 1 Data RAM address load enable. data_adv_l O 1 Data RAM address advance enable.
(Sheet 1 of 3)
data_ram_oe_l O 1 Data RAM output enable. data_ram_we_l<3:0> O 4 Data RAM write-enable bits. index_h<21:4> O 18 Index. lw_parity_h<3:0> B 4 Data check. tag_data_h<32:19> B 14 Bcache tag data bits. tag_data_par_h B 1 Tag data parity bit. tag_dirty_h B 1 Tag dirty state bit. tag_ram_oe_l O 1 Tag RAM output enable. tag_ram_we_l O 1 Tag RAM write-enable. tag_valid_h B 1 Tag valid bit.
3–14 Hardware Interface
29 September 1997 – Subject To Change
Page 87
21164PC Signal Names and Functions
Table 3–2 21164PC Signal Descriptions by Function
Signal Type Count Description System Interface
addr_h<39:4> B 36 Address bus. addr_bus_req_h I 1 Address bus request . addr_res_h<1:0> O 2 Address response. cack_h I 1 Command acknowledge. cmd_h<3:0> B 4 Command bus. dack_h I 1 Data acknowledge. data_bus_req_h I 1 Data bus request. fill_h I 1 Fill warning. fill_dirty_h I 1 Fill dirty. fill_error_h I 1 Fill error. fill_id_h I 1 Fill identification. idle_bc_h I 1 Idle Bcache. int4_valid_h<3:0> O 4 INT4 data valid. victim_pending_h O 1 Victim pending.
(Sheet 2 of 3)
Interrupts
irq_h<3:0> I 4 System interrupt requests. mch_hlt_irq_h I 1 Machine halt interrupt request. pwr_fail_irq_h I 1 Power failure interrupt request. sys_mch_chk_irq_h I 1 System machine check interrupt request.
Test Modes and Miscellaneous
dc_ok_h I 1 dc voltage OK. port_mode_h<1:0> I 2 Selects the test port interface mode (normal, man-
srom_clk_h O 1 Serial ROM clock. srom_data_h I 1 Serial ROM data.
29 September 1997 – Subject To Change
ufacturing, and debug).
Hardware Interface 3–15
Page 88
21164PC Signal Names and Functions
Table 3–2 21164PC Signal Descriptions by Function
Signal Type Count Description
(Sheet 3 of 3)
srom_oe_l O 1 Serial ROM output enable. srom_present_l
1
B 1 Serial ROM present.
tck_h B 1 JTAG boundary-scan clock. tdi_h I 1 JTAG serial boundary-scan data in. tdo_h O 1 JTAG serial boundary-scan data out. temp_sense I 1 Temperature sense. test_status_h<1> O 1 Icache test status or timeout reset. tms_h I 1 JTAG test mode select.
1
trst_l
1
This signal is shown as bidirectional. However, for normal operation , it is input only. The output function is used during manufacturing test and ve rification only.
B 1 JTAG test access port (TAP) reset.
3–16 Hardware Interface
29 September 1997 – Subject To Change
Page 89
4

Clocks, Cache, and External Interface

This chapter describes the 21164PC microprocessor external interface, which includes the backu p cache (B cache) an d system int erfaces . It al so describ es the clo ck circuitry, interrupt signals, and parity generation. It is organized as follows:
Introduction to the external interfac e
Clocks
Physical address considerations
Bcache structure and operation
Cache coherency
21164PC-to-Bcache transactions
21164PC-initiated system transactions
System-initiated transactions
Data bus and command/address bus contention
21164PC interface restrictions
21164PC/system race conditions
Data integrity and Bcache errors
Interrupts
Chapter 3 lists and defines all 21164PC hardware interface signal pins. Chapter 9 describes the 21164PC hardware interface electrical requirements.
29 September 1997 – Subject To Change
Clocks, Cache, and External Interface 4–1
Page 90

Introduction to the External Interface

4.1 Introduction to the External Interface
A 21164PC-based system can be divided into three major sections:
21164PC microprocessor
External Bcache
System interface logic
The 21164PC external interface is optimized for uniprocessor-based systems and mandates few design rules. The interface includes a 128-bit bidirectional data bus, a 36-bit bidirectional address bus, and several control signals.
Read latencies and data repetition rates of the external Bcache can be programmed by means of register bits. The Bcache clock frequency for private read and write operations is independent of the system interface clock frequency and makes for a more flexible design.
The cache system supports a 64-byte block size to the external Bcache.
Figure 4–1 shows a simplified view of the external interface. The function and pur­pose of each signal is described in Chapter 3.

4.1.1 System Interface

This section describes the system or external bus interface. The system interface is made up of bidirectional address and command buses, a data bus that is shared with the Bcache interface, and several cont rol signals.
The system interface i s u nder t he control of the bus int er fa ce u nit (BIU) in the CBU. The system interface is a 128-bit bidirectional data bus.
The cycle time of the sys tem inter face is pr ogrammable to speeds of 4 to 15 times the CPU cycle time (sysclk ratio). All system interface signals are driven or sampled by the 21164PC on the rising edge of signal sys_clk_out1_h. In this chapter, this edge is sometimes referred to as “sysclk.” Precisely when interface signals rise and fall does not matter as long as they meet the setup and hold times specified in Chapter 9.
4–2 Clocks, Cache, and External Interface
29 September 1997 – Subject To Change
Page 91
Introduction to the External Interface
Figure 4–1 21164PC System/Bcache Interface
21164PC
Dcache
Miss
addr_h<39:4>
addr_bus_req_h addr_res_h<1:0> cack_h
cmd_h<3:0>
dack_h
data_bus_req_h fill_h
fill_error_h fill_id_h
idle_bc_h
int4_valid_h<3:0> victim_pending_h
data_h<127:0>
index_h<21:4>
Optional
tag_data_h<32:19>,p
tag_valid_h
tag_dirty_h data_adsc_l data_adv_l
lw_parity_h<3:0> data_ram_we_l<3:0> irq_h<3:0> mch_hlt_irq_h
pwr_fail_irq_h sys_mch_chk_irq_h
fill_dirty_h
Tag
State V DSRAM
Bcache
Data
SRAM
System Memory
and I/O
System Interface
Victim
Buffers
Bcache Interface
Interrupts
4.1.1.1 Commands and Addresses
The 2116 4PC can tak e up to t wo commands from the system at a time. The Bcache is probed to determine what must be done with the command.
If nothing is to be done, the 21164PC acknowledges receiving the command.
If a Bcache read or invalidate operation is required, the 21164PC performs the
task as soon as the Bcache becomes free. The 21164PC acknowledges receiving the command at the start of the Bcache transaction.
29 September 1997 – Subject To Change
MK5504B
Clocks, Cache, and External Interface 4–3
Page 92
Introduction to the External Interface
The BIU contains a three-entry BIU command/address buffer (BAF) capable of queueing up to three Bcache misses or I/O references. These buffers are capable of merging both read and write miss references, to reduce external system bus traffic.

4.1.2 Bcache Interface

The 21164PC includes an interface and control for a required backup cache (Bcache). The Bcache interface features the following:
Support for pipelined and flow-through synchronous burst SRAMs (SSRAMs)
Nonblocking, pipelined Bcache (up to three probes in flight)
Fully interleaved writes to saturate write-hit traffic
Flexible Bcache sizes (512KB - 4MB)
Direct-mapped organization with 64-byte block size
Read/write-allocate replacement policy
Write-back cache policy
A 128-bit data bus (shared with the system interface)
4.8 GB/s pe ak data tran sfer rate
Programmable Bcache clock rate up to 300-MHz operation
4.1.2.1 Bcache Interface Enhancements
With t he advent of commodit y SSRAMs, of fchi p high-s peed cach es can now be bui lt at low cost to take ad vantage of the same perf ormance techniques that until now had been restricted to onchip caches. The SSRAMs contain an address register, a self­incrementing address mechanism, and optionally, a data output register (pipelined). The 21164PC uses these additional control features to deliver a high-performance nonblocking, interleaved, fully pipelined Bcache interface.
4.1.2.2 Pipelined Bcache
A pipelined cache allows the processor to issue multiple cache operations that are overlapped in time to inc rease thr oughput. The 21164PC supports pipelining of up to three outstanding read or write probes at any given time to attain 100% data bus uti-
lization. The outstanding Bcache probes are tracked by the BIU's “Bcache in flight” (or BIF) buffer. Figure 4–2 shows the benefits of a having multiple probes in flight for a pipelined cache.
4–4 Clocks, Cache, and External Interface
29 September 1997 – Subject To Change
Page 93
Introduction to the External Interface
Figure 4–2 Merits of a Multiprobes In Flight – Pipelined Cache
Pipelining allows 100% utilization of the data bus.
Nonpipelined Cache:
index
Pipelined Cache:
A1
data
A1index
data
Multiple probes in flight
4.1.2.3 Write Interleaving
The 2116 4PC Bcache i nterfac e takes adva ntage of the S SRAM address i nput regi ster to employ interleaving techniques to maximize write-hit dirty bandwidth. The Bcache interfac e deco uples th e tag and da ta stor e cont rol to allow tag wr ite prob es t o
be interleaved with data writes. Figure 4–3 shows an example of write interleaving and its ability to keep the data bus at 100% utilization.
A2 A3
latency 1 latency 2
D10 D11 D20 D21
A2 A3 A4 A5 A6 A8A7
latency 1
latency 2
latency 3
D10 D11 D20 D30D21 D31 D40 D50D41 D51
PCA002
29 September 1997 – Subject To Change
Clocks, Cache, and External Interface 4–5
Page 94

Clocks

Figure 4–3 Tag/Data Store Interleaving
Interleaving tag write probes with data write
operations allows 100% utilization of the data bus.
Data writes interleaved with tag probes
tag
data
Tag probes for writes that hit clean (valid, not dirty) in the Bcache must schedule a tag store write to update the dirty bit.
4.2 Clocks
The 21164PC develops three clock signals that are available at output pins.
Signal Description
A1index
A2 A3 A4 A5 A6
latency 1
latency 2
A1 A2 A3
latency 3
latency 4
T1 T2 T3 T4
Hit 1 Hit 2 Hit 3
D10 D11 D20 D30D21 D31 D40 D41
latency 5
Hit 4
PCA003
cpu_clk_out_h A 21164PC internal clock that may or may not drive the system clock. sys_clk_out1_h A clock of programmable speed supplied to the external interface. sys_clk_out2_h A delayed copy of sys_clk_out1_h. The delay is programmable and is
an integer number of cpu_clk_out_h periods.
The behavior of the programmable clocks during the reset sequence is described in Section 7.1.
4–6 Clocks, Cache, and External Interface
29 September 1997 – Subject To Change
Page 95

4.2.1 CPU Clock

The 21164PC uses the differential input clock lines osc_clk_ in_h,l as a source to generate its CPU clock. The input signals clk_mode_h<1:0> control generation of
the CPU clock, as listed in Table 4–1 and as shown in Figure 4–4. The 21164PC uses clk_mode_h<0> to provide onchip capabili ty to equalize the
duty cycle of the input clock (eliminating the need for a 2× oscillator ). When clk_mode_h<0> is asserted, the equalizing circuitry, called a symmetrator, is enabled.
The 21164P C u ses cl k_mode_h<1> to reset the CPU c loc k. When clk_mode_h<1> is set, the internal CPU clock is reset to a known state. When it is clear, the CPU clock is driven at the same frequency as the osc_clk_h,l differential input.
Table 4–1 CPU Clock Generation Control
Mode clk_mode_h<1:0> Description
Normal 0 0 CPU clock frequency is the same as the input clock
Normal 0 1 CPU clock frequency is the same as the input clock
Clocks
frequency; symmetrator is disabled.
frequency; symmetrator is enabled. Also used to accommodate chip testers.
Reset 1 0 Initializes CPU clock, allowing system clock to be
Reset 1 1 Initializes CPU clock, allowing system clock to be
Caution: A clock source should always be provided on osc_clk_ in_h,l when sig-
nal dc_ok_h is asserted.
29 September 1997 – Subject To Change
synchronized to a stable reference clock; symmetrator is disabled.
synchronized to a stable reference clock; symmetrator is enabled.
Clocks, Cache, and External Interface 4–7
Page 96
Clocks
Figure 4–4 Clock Signals and Functions
21164PC
osc_clk_in_h, l clk_mode_h<1:0>
irq_h<3:0>
mch_hlt_irq_h pwr_fail_irq_h sys_mch_chk_irq_h
sys_reset_l dc_ok_h

4.2.2 System Clock

The CPU clock is the source clock used to generate the system clock
sys_clk_out1_h. The system clock divider controls the frequency of sys_clk_out1_h. The divisor, 4 to 15, is obtained from the four interrupt lines irq_h<3:0> at power-up as listed in Table 4–2. The system clock frequency is deter-
mined by dividing the ratio into the CPU clock frequency. Refer to Section 7.2 for information on sysclk behavior during reset. The value is also latched into the SYS_CLK_RATIO<3:0> field of the CBOX_STATUS IPR (bits <7:4>) for read­only purposes.
CPU Clock
Divider
(/1 or /4)
System Clock
Divider
(/4 through /15)
System Clock
Delay
(0 through 7)
Symmetrator
cpu_clk_out_h
sys_clk_out1_h
sys_clk_out2_h
MK5502B
Table 4–2 System Clock Divisor
irq_h<3> irq_h<2> irq_h<1> irq_h<0> Ratio
Low High Low Low 4 Low High Low High 5 Low High High Low 6 Low High High High 7 High Low Low Low 8
4–8 Clocks, Cache, and External Interface
(Sheet 1 of 2)
29 September 1997 – Subject To Change
Page 97
Clocks
Table 4–2 System Clock Divisor
irq_h<3> irq_h<2> irq_h<1> irq_h<0> Ratio
High Low Low High 9 High Low High Low 10 High Low High High 11 High High Low Low 12 High High Low High 13 High High High Low 14 High High High High 15
(Sheet 2 of 2)
Figure 4–5 shows the 21164PC driving the system clock on a uniprocessor system.
Figure 4–5 21164PC Uniprocessor Clock
Memory
ASIC
sys_clk_out
21164PC

4.2.3 Delayed System Clock

The system clock sys_clk_out1_h is the source clock for the delayed system clock sys_clk_out2_h. These clock signals provide flexible timing for system use. The
delay unit, from 0 to 7 CPU CLK cycles , is obta in ed from th e three inter rupt s ignals : mch_hlt_irq_h, pwr_fail_irq_h, and sys_mch_chk_irq_h at power-up, as listed in
Table 4–3. The output of this programmable divider is symmetric if the divisor is
29 September 1997 – Subject To Change
Bus
ASIC
HLO004B
Clocks, Cache, and External Interface 4–9
Page 98

Physical Address Considerations

even. The output is asymmetric if the divisor is odd. When the divisor is odd, the clock is high for an extra cycle. Refer to Section 7.2 for information on sysclk behavior during reset.
Table 4–3 System Clock Delay
sys_mch_chk_irq_h pwr_fail_irq_h mch_hlt_irq_h Delay Cycles
Low Low Low 0 Low Low High 1 Low High Low 2 Low High High 3 High Low Low 4 High Low High 5 High High Low 6 High High High 7
4.3 Physical Address Considerations
This section lists and describes the physical address regions. Cache and data wrap­ping characteristics of physical addresses are also described.

4.3.1 Physical Address Regions

Physical memory of the 21164PC is divided into three regions:
1. The first region is the first half of the physical address space. It is treated by the 21164PC as memory-like.
2. The second region is the second half of the physical address space except for a 1MB region reserved for CBU IPRs. It is treated by the 21164PC as noncache­able.
3. The third region is the 1MB region reserved for CBU IPRs.
In the first region, write merging and load me rging are permitted. All 21164PC accesses in this region are 64-byte, the Bcache block size. This memory-like region is limited to 8G B (maximum).
The 21164PC does not cache data accessed in the second and third region of the physical address space; 21164PC read accesses in these regions are always INT32 requests. Load merging is perm itted, but th e request incl udes a mask to inform the
4–10 Clocks, Cache, and External Interface
29 September 1997 – Subject To Change
Page 99
Physical Address Considerations
system environment as to which INT8s are accessed. Write merging is permitted. Write accesses are INT32 requests with a mask indicating which INT4s are actually modified.
The 21164PC never writes more than 32 bytes at a time in noncached space. The 21164PC does not broadcast accesses to the CBU IPR region if they map to a
CBU IPR. Accesses in this region, that are not to a defined CBU IPR, produce UNDEFINED results. The system should not probe this region.
Table 4–4 shows the 21164PC physical memory regions.
Table 4–4 Physical Memory Regions
Region Address Range Description
Memory-like 00 0000 0000 –
Noncacheable 80 0000 0000 –
IPR region FF FFF0 0000 –

4.3.2 Data Wrapping

The 21164PC requires that wrapped read operations be performed on INT16 bound­aries. READ and FLUSH commands are all wrapped on INT16 boundaries as described here. The valid wrap orders for 64-byte blocks are selected by addr_h<5:4>. They are:
0, 1, 2, 3 1, 0, 3, 2 2, 3, 0, 1 3, 2, 1, 0
Similarly, when the system interface supplies a command that returns data from the 21164PC caches, the values that the system drives on addr_h<5:4> determine the order in which data is supplied by the 21164PC.
BCACHE VICTIM commands provide the data wit h the same wrap o rder as t he read miss that produced them.
01 FFFF FFFF
FF FFEF FFFF
FF FFFF FFFF
Write invalidate cached, load, and store merging allowed.
16
Not cached, load merging limited.
16
Accesses do not appear on the interface unless an undefined location is accessed (which produces
16
UNDEFINED results).
29 September 1997 – Subject To Change
Clocks, Cache, and External Interface 4–11
Page 100

Bcache Structure

4.3.3 Noncached Read Operations

Read operations to physi cal addresses th at have addr_h<39> asse rted are not cached in the Dcache or Bcache. They are merged like other read operations in the miss address file (MAF). To prevent several read operations to noncached memory from being merged into a single 32- byt e b us request, software must insert me mory bar ri er (MB) instructions or set MAF_MODE IPR b it (IO_NMERGE). The MAF merges as many Dstream read operations together as it can and sends the request to the BIU.
Rather than merging two 32-byte requests into a single 64-byte request, the BIU requests a READ MISS from the system. Signals int4_valid_h<3:0> indicate which of the four quadwords are b ein g request ed by soft ware. Th e syst em shoul d return th e fill data to the 21 164 PC as usual. The 21 164 PC does not wri te the Dcac he or Bcache with the fill data. The requested data is written in the register file or Icache.
Note: A special case using int4_valid_h<3:0> occurs during an Icache fill. In
this case the entire returned block is valid although int4_valid_h<3:0> indicates zero.

4.3.4 Noncached Write Operations

Write operations to physical addresses that have addr_h<39> asserted are not writ- ten to any of the caches . Th ese wri te o perat ions a re mer ge d in the wri te bu f fer before being sent to the syste m. If soft ware d oes not want write opera tions to mer g e, it must insert MB or WMB instructions between th em.
When the write buffer decides to write data to noncached memory, the BIU requests a WRITE BLOCK. During each data cycle, int4_valid_h<3:0> indicates which INT4s within the INT16 are valid.
4.4 Bcache Structure
The 21164PC supports a .5, 1, 2, and 4MB Bcache. The size is under program con­trol and is specified by CBOX_CONFIG<13:12> (BC_SIZE<1:0>). The Bcache block size is 64-byte blocks.
Industry-standard, burst-mode synchronous static RAMs (SSRAMs) may be con­nected to the 21 16 4PC without man y extra comp onents, al though fano ut buf fers may be required for the index lines. The SSRAMs are directly controlled by the 21164PC, and the Bcache data lines are connected to the 21164PC data bus.
4–12 Clocks, Cache, and External Interface
29 September 1997 – Subject To Change
Loading...