Compaq 21264, EV67 User Manual

Alpha 21264/EV67 Microprocessor Hardware Reference Manual

Order Numbe r: DS–0028B–TE

This manual is directly derived from the internal 21264/EV67 Specifications, Revision 1.4. You can access this hardware reference manual in PDF format from the following site:

ftp://ftp.compaq.com/pub/products/alphaCPUdocs

Revision/Update Information: This is a revised document . It supercedes

the Alpha 21264A Microprocessor Hardware Reference Manual

(DS–0028A–TE).

Compaq Computer Corporation Shrewsbury, Massachusetts

September 2000

The information in this publication is subj ec t to change without notice.

COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS

INFORMATION IS PROVIDED “AS IS” AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR P ARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT.

This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form wit h out prior written consent from Compaq Computer Corporation.

COMPAQ, the Compaq logo, the Digi tal logo, and VAX Registered in United States Pa tent and Trademark Office.

Pentium is a registered tra de ma rk of Intel Corporation.

Other product names mentioned herein may be trademarks and/or registered trademarks of their respective companies.

Alpha 21264/EV67 Hardware Reference Manual

Preface

1 Introduction

1.1 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–1

1.1.1 Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2

1.1.2 Integer Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2

1.1.3 Floating-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2

1.2 21264/EV67 Microprocessor Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–3

2 Internal Arch itecture

2.1 21264/EV67 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–1

2.1.1 Instruction Fetch, Issue, and Retire Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2

2.1.1.1 Virtual Program Counter Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2

2.1.1.2 Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–3

2.1.1.3 Instruction-Stream Translation Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5

2.1.1.4 Instruction Fetch Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–6

2.1.1.5 Register Rename Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–6

2.1.1.6 Integer Issue Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–6

2.1.1.7 Floating-Point Issue Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–7

2.1.1.8 Exception and Interrupt Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–8

2.1.1.9 Retire Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–8

2.1.2 Integer Execution Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–8

2.1.3 Floating-Point Execution Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–10

2.1.4 External Cache and System Interface Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.4.1 Victim Address File and Victim Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.4.2 I/O Write Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.4.3 Probe Queue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.4.4 Duplicate Dcache Tag Array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.5 Onchip Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.5.1 Instruction Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11

2.1.5.2 Data Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–12

2.1.6 Memory Reference Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–12

2.1.6.1 Load Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13

2.1.6.2 Store Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13

2.1.6.3 Miss Address File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13

2.1.6.4 Dstream Translation Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13

2.1.7 SROM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13

2.2 Pipeline Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13

2.2.1 Pipeline Aborts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–16

2.3 Instruction Issue Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–16

Alpha 21264/EV67 Hardware Reference Manual

iii

2.3.1 Instruction Group Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–17

2.3.2 Ebox Slotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18

2.3.3 Instruction Latencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–20

2.4 Instruction Retire Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–21

2.4.1 Floating-Point Divide/Square Root Early Retire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–22

2.5 Retire of Operate Instructions into R31/F31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–22

2.6 Load Instructions to R31 and F31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–23

2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU, HW_LDL Instructions . . . . . . . 2–23

2.6.2 Prefetch with Modify Intent: LDS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–23

2.6.3 Prefetch, Evict Next: LDQ and HW_LDQ Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24

2.6.4 Prefetch with the LDx_L / STx_C Instruction Sequence . . . . . . . . . . . . . . . . . . . . . . . . 2–24

2.7 Special Cases of Alpha Instruction Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24

2.7.1 Load Hit Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24

2.7.2 Floating-Point Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–26

2.7.3 CMOV Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–26

2.8 Memory and I/O Address Space Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–27

2.8.1 Memory Address Space Load Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–27

2.8.2 I/O Address Space Load Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–28

2.8.3 Memory Address Space Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–29

2.8.4 I/O Address Space Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–29

2.9 MAF Memory Address Space Merging Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–30

2.10 Instruction Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–30

2.11 Replay Traps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31

2.11.1 Mbox Order Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31

2.11.1.1 Load-Load Order Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32

2.11.1.2 Store-Load Order Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32

2.11.2 Other Mbox Replay Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32

2.12 I/O Write Buffer and the WMB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32

2.12.1 Memory Barrier (MB/WMB/TB Fill Flow) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32

2.12.1.1 MB Instruction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–33

2.12.1.2 WMB Instruction Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34

2.12.1.3 TB Fill Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34

2.13 Performance Measurement Support—Performance Counters . . . . . . . . . . . . . . . . . . . . . . . 2–36

2.14 Floating-Point Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–36

2.15 AMASK and IMPLVER Instruction Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38

2.15.1 AMASK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38

2.15.2 IMPLVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38

2.16 Design Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–39

3 Hardware Interface

3.1 21264/EV67 Microprocessor Logic Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1

3.2 21264/EV67 Signal Names and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3

3.3 Pin Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–8

3.4 Mechanical Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–17

3.5 21264/EV67 Packaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–18

4 Cache and External Inte rf ace s

4.1 Introduction to the External Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1

4.1.1 System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3

4.1.1.1 Commands and Addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4

4.1.2 Second-Level Cache (Bcache) Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4

4.2 Physical Address Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4

4.3 Bcache Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7

4.3.1 Bcache Interface Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7

Alpha 21264/EV67 Hardware Reference Manual

4.3.2 System Duplicate Tag Stores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7

4.4 Victim Data Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8

4.5 Cache Coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8

4.5.1 Cache Coherency Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8

4.5.2 Cache Block States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9

4.5.3 Cache Block State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10

4.5.4 Using SysDc Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–11

4.5.5 Dcache States and Duplicate Tags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–13

4.6 Lock Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–14

4.6.1 In-Order Processing of LDx_ L/STx_C Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–15

4.6.2 Internal Eviction of LDx_L Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–15

4.6.3 Liveness and Fairness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–15

4.6.4 Managing Speculative Store Issues with Multiprocessor Systems . . . . . . . . . . . . . . . . 4–16

4.7 System Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–16

4.7.1 System Port Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17

4.7.2 Programming the System Interface Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–18

4.7.3 21264/EV67-to-System Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–19

4.7.3.1 Bank Interleave on Cache Block Boundary Mode . . . . . . . . . . . . . . . . . . . . . . . . . 4–19

4.7.3.2 Page Hit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–20

4.7.4 21264/EV67-to-System Commands Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–21

4.7.5 ProbeResponse Commands (Command[4:0] = 00001). . . . . . . . . . . . . . . . . . . . . . . . . 4–24

4.7.6 SysAck and 21264/EV67-to-System Commands Flow Control . . . . . . . . . . . . . . . . . . . 4–25

4.7.7 System-to-21264/EV67 Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–26

4.7.7.1 Probe Commands (Four Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–26

4.7.7.2 Data Transfer Commands (Two Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–28

4.7.8 Data Movement In and Out of the 21264/EV67. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–30

4.7.8.1 21264/EV67 Clock Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–30

4.7.8.2 Fast Data Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–31

4.7.8.3 Fast Data Disable Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–33

4.7.8.4 SysDataInValid_L and SysDataOutValid_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–34

4.7.8.5 SysFillValid_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–35

4.7.8.6 Data Wrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–36

4.7.9 Nonexistent Memory Proce ssing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–38

4.7.10 Ordering of System Port Transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–40

4.7.10.1 21264/EV67 Commands and System Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–40

4.7.10.2 System Probes and SysDc Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–42

4.8 Bcache Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–42

4.8.1 Bcache Port Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–43

4.8.2 Bcache Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–44

4.8.2.1 Setting the Period of the Cache Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–45

4.8.3 Bcache Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–47

4.8.3.1 Bcache Data Read and Tag Read Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . 4–47

4.8.3.2 Bcache Data Write Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–48

4.8.3.3 Bubbles on the Bcache Data Bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–49

4.8.4 Pin Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–51

4.8.4.1 BcAdd_H[23:4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–51

4.8.4.2 Bcache Control Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52

4.8.4.3 BcDataInClk_H and BcTagInClk_H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53

4.8.5 Bcache Banking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–54

4.8.6 Disabling the Bcache for Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–54

4.9 Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–54

5 Internal Processor Registers

5.1 Ebox IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3

5.1.1 Cycle Counter Register – CC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3

5.1.2 Cycle Counter Control Register – CC_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3

Alpha 21264/EV67 Hardware Reference Manual

5.1.3 Virtual Address Register – VA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4

5.1.4 Virtual Address Control Register – VA_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4

5.1.5 Virtual Address Format Register – VA_FORM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5

5.2 Ibox IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6

5.2.1 ITB Tag Array Write Register – ITB_TAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6

5.2.2 ITB PTE Array Write Register – ITB_PTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6

5.2.3 ITB Invalidate All Process (ASM=0) Register – ITB_IAP. . . . . . . . . . . . . . . . . . . . . . . . 5–7

5.2.4 ITB Invalidate All Register – ITB_IA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7

5.2.5 ITB Invalidate Single Register – ITB_IS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7

5.2.6 ProfileMe PC Register – PMPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8

5.2.7 Exception Address Register – EXC_ADDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8

5.2.8 Instruction Virtual Address Format Register — IVA_FORM. . . . . . . . . . . . . . . . . . . . . . 5–9

5.2.9 Interrupt Enable and Current Processor Mode Register – IER_CM. . . . . . . . . . . . . . . . 5–9

5.2.10 Software Interrupt Request Register – SIRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10

5.2.11 Interrupt Summary Register – ISUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11

5.2.12 Hardware Interrupt Clear Register – HW_INT_CLR . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12

5.2.13 Exception Summary Register – EXC_SUM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–13

5.2.14 PAL Base Register – PAL_BASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15

5.2.15 Ibox Control Register – I_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15

5.2.16 Ibox Status Register – I_STAT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–18

5.2.17 Icache Flush Register – IC_FLUSH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21

5.2.18 Icache Flush ASM Register – IC_FLUSH_ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21

5.2.19 Clear Virtual-to-Physical Map Register – CLR_MAP. . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21

5.2.20 Sleep Mode Register – SLEEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21

5.2.21 Process Context Register – PCTX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21

5.2.22 Performance Counter Control Register – PCTR_CTL. . . . . . . . . . . . . . . . . . . . . . . . . . 5–23

5.3 Mbox IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–25

5.3.1 DTB Tag Array Write Registers 0 and 1 – DTB_TAG0, DTB_TAG1 . . . . . . . . . . . . . . . 5–25

5.3.2 DTB PTE Array Write Registers 0 and 1 – DTB_PTE0, DTB_PTE1 . . . . . . . . . . . . . . . 5–26

5.3.3 DTB Alternate Processor Mode Register – DTB_ALTMODE. . . . . . . . . . . . . . . . . . . . . 5–26

5.3.4 Dstream TB Invalidate All Process (ASM=0) Register – DTB_IAP . . . . . . . . . . . . . . . . 5–27

5.3.5 Dstream TB Invalidate All Register – DTB_IA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–27

5.3.6 Dstream TB Invalidate Single Registers 0 and 1 – DTB_IS0,1 . . . . . . . . . . . . . . . . . . . 5–27

5.3.7 Dstream TB Address Space Number Registers 0 and 1 – DTB_ASN0,1 . . . . . . . . . . . 5–28

5.3.8 Memory Management Status Register – MM_STAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–28

5.3.9 Mbox Control Register – M_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–29

5.3.10 Dcache Control Register – DC_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–30

5.3.11 Dcache Status Register – DC_STAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31

5.4 Cbox CSRs and IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32

5.4.1 Cbox Data Register – C_DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5.4.2 Cbox Shift Register – C_SHFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5.4.3 Cbox WRITE_ONCE Chain Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5.4.4 Cbox WRITE_MANY Chain Descriptio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–38

5.4.5 Cbox Read Register (IPR) Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–41

6 Privileged Architecture Library Code

6.1 PALcode Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–1

6.2 PALmode Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2

6.3 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3

6.4 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3

6.4.1 HW_LD Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3

6.4.2 HW_ST Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4

6.4.3 HW_RET Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–5

6.4.4 HW_MFPR and HW_MTPR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6

6.5 Internal Processor Register Access Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–7

6.5.1 IPR Scoreboard Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–8

Alpha 21264/EV67 Hardware Reference Manual

6.5.2 Hardware Structure of Explicitly Written IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–8

6.5.3 Hardware Structure of Implicitly Written IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9

6.5.4 IPR Access Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9

6.5.5 Correct Ordering of Explicit Writers Followed by Implicit Readers. . . . . . . . . . . . . . . . . 6–10

6.5.6 Correct Ordering of Explicit Readers Followed by Implicit Writers. . . . . . . . . . . . . . . . . 6–11

6.6 PALshadow Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–11

6.7 PALcode Emulation of the FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–11

6.7.1 Status Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12

6.7.2 MF_FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12

6.7.3 MT_FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12

6.8 PALcode Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12

6.8.1 CALL_PAL Entry Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12

6.8.2 PALcode Exception Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13

6.9 Translation Buffer (TB) Fill Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14

6.9.1 DTB Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14

6.9.2 ITB Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–16

6.10 Performance Counter Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–17

6.10.1 General Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18

6.10.2 Aggregate Mode Programming Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18

6.10.2.1 Aggregate Mode Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18

6.10.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–19

6.10.2.3 Aggregate Counting Mode Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.2.3.1 Cycle counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.2.3.2 Retired instructions cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.2.3.3 Bcache miss or long latency probes cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.2.3.4 Mbox replay traps cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.2.4 Counter Modes for Aggregate Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.3 ProfileMe Mode Programming Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.3.1 ProfileMe Mode Precautions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20

6.10.3.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–21

6.10.3.3 ProfileMe Counting Mode Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23

6.10.3.3.1 Cycle counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23

6.10.3.3.2 Inum retire delay cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23

6.10.3.3.3 Retired instructions cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23

6.10.3.3.4 Bcache miss or long latency probes cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23

6.10.3.3.5 Mbox replay traps cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23

6.10.3.4 Counter Modes for ProfileMe Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–24

7 Initialization and Configuration

7.1 Power-Up Reset Flow and the Reset_L and DCOK_H Pins. . . . . . . . . . . . . . . . . . . . . . . . . 7–1

7.1.1 Power Sequencing and Reset State for Signal Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3

7.1.2 Clock Forwarding and System Clock Ratio Configuration . . . . . . . . . . . . . . . . . . . . . . . 7–4

7.1.3 PLL Ramp Up. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6

7.1.4 BiST and SROM Load and the TestStat_H Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6

7.1.5 Clock Forward Reset and System Interface Initialization. . . . . . . . . . . . . . . . . . . . . . . . 7–7

7.2 Fault Reset Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–8

7.3 Energy Star Certification and Sleep Mode Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–9

7.4 Warm Reset Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11

7.5 Array Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–12

7.6 Initialization Mode Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–12

7.7 External Interface Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–14

7.8 Internal Processor Register Power-Up Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–14

7.9 IEEE 1149.1 Test Port Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–16

7.10 Reset State Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–16

7.11 Phase-Lock Loop (PLL) Functional Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19

7.11.1 Differential Reference Clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19

Alpha 21264/EV67 Hardware Reference Manual

vii

7.11.2 PLL Output Clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19

7.11.2.1 GCLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19

7.11.2.2 Differential 21264/EV67 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19

7.11.2.3 Nominal Operating Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19

7.11.2.4 Power-Up/Reset Clocking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–20

8 Error Detection and Error Handling

8.1 Data Error Correction Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2

8.2 Icache Data or Tag Parity Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2

8.3 Dcache Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2

8.4 Dcache Data Single-Bit Correctable ECC Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3

8.4.1 Load Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3

8.4.2 Store Instruction (Quadword or Smaller) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4

8.4.3 Dcache Victim Extracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4

8.5 Dcache Store Second Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4

8.6 Dcache Duplicate Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4

8.7 Bcache Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5

8.8 Bcache Data Single-Bit Correctable ECC Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5

8.8.1 Icache Fill from Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5

8.8.2 Dcache Fill from Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–6

8.8.3 Bcache Victim Read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–6

8.8.3.1 Bcache Victim Read During a Dcache/Bcache Miss . . . . . . . . . . . . . . . . . . . . . . . 8–6

8.8.3.2 Bcache Victim Read During an ECB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7

8.9 Memory/System Port Single-Bit Data Correctable ECC Error. . . . . . . . . . . . . . . . . . . . . . . . 8–7

8.9.1 Icache Fill from Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7

8.9.2 Dcache Fill from Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7

8.10 Bcache Data Single-Bit Correctable ECC Error on a Probe . . . . . . . . . . . . . . . . . . . . . . . . . 8–8

8.11 Double-Bit Fill Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–9

8.12 Error Case Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–9

9 Electrical Data

9.1 Electrical Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1

9.2 DC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2

9.3 Power Supply Sequencing and Avoiding Potential Failure Mechanisms . . . . . . . . . . . . . . . 9–5

9.4 AC Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–6

10 Thermal Management

10.1 Operating Temperature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1

10.2 Heat Sink Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–3

10.3 Thermal Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–7

11 Testability and Diagnostics

11.1 Test Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–1

11.2 SROM/Serial Diagnostic Terminal Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2

11.2.1 SROM Load Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2

11.2.2 Serial Terminal Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2

11.3 IEEE 1149.1 Port. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–3

11.4 TestStat_H Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–4

11.5 Power-Up Self-Test and Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5

11.5.1 Built-in Self-Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5

viii

Alpha 21264/EV67 Hardware Reference Manual

11.5.2 SROM Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5

11.5.2.1 Serial Instruction Cache Load Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–6

11.6 Notes on IEEE 1149.1 Operation and Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–7

A Alpha Instruction Set

A.1 Alpha Instruction Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–1

A.2 Reserved Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–8

A.2.1 Opcodes Reserved for Compaq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–8

A.2.2 Opcodes Reserved for PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9

A.3 IEEE Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9

A.4 VAX Floating-Point Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–11

A.5 Independent Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–11

A.6 Opcode Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–12

A.7 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–13

A.8 IEEE Floating-Point Conformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–14

B 21264/EV67 Boundary-Scan Register

B.1 Boundary-Scan Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B–1

B.1.1 BSDL Description of the Alpha 21264/EV67 Boundary-Scan Register . . . . . . . . . . . . . B–1

C Serial Icache Load Predecode Values

D PALcode Restrictions and Guidelines

D.1 Restriction 1 : Reset Sequence Required by Retire Logic and Mapper. . . . . . . . . . . . . . . D–1

D.2 Restriction 2 : No Multiple Writers to IPRs in Same Scoreboard Group . . . . . . . . . . . . . . . D–8

D.3 Restriction 4 : No Writers and Readers to IPRs in Same Scoreboard Group . . . . . . . . . . D–8

D.4 Guideline 6 : Avoid Consecutive Read-Modify-Write-Read-Modify-Write. . . . . . . . . . . . D–9

D.5 Restriction 7 : Replay Trap, Interrupt Code Sequence, and STF/ITOF . . . . . . . . . . . . . . . D–9

D.6 Restriction 9 : PALmode Istream Address Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–10

D.7 Restriction 10: Duplicate IPR Mode Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–10

D.8 Restriction 11: Ibox IPR Update Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–11

D.9 Restriction 12: MFPR of Implicitly-Written IPRs EXC_ADDR, IVA_FORM, and EXC_SUM D–11

D.10 Restriction 13 : DTB Fill Flow Collision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–11

D.11 Restriction 14 : HW_RET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–11

D.12 Guideline 16 : JSR-BAD VA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–12

D.13 Restriction 17: MTPR to DTB_TAG0/DTB_PTE0/DTB_TAG1/DTB_PTE1 . . . . . . . . . . . . . D–12

D.14 Restriction 18: No FP Operates, FP Conditional Branches, FTOI, or STF in Same Fetch Block as

HW_MTPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .D–12

D.15 Restriction 19: HW_RET/STALL After Updating the FPCR by way of MT_FPCR in PALmode D–12

D.16 Guideline 20 : I_CTL[SBE] Stream Buffer Enable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–12

D.17 Restriction 21: HW_RET/STALL After HW_MTPR ASN0/ASN1. . . . . . . . . . . . . . . . . . . . . . D–12

D.18 Restriction 22: HW_RET/STALL After HW_MTPR IS0/IS1. . . . . . . . . . . . . . . . . . . . . . . . . . D–13

D.19 Restriction 23: HW_ST/P/CONDITIONAL Does Not Clear the Lock Flag. . . . . . . . . . . . . . . D–13

D.20 Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, IC_FLUSH_ASM, CLEAR_MAP D–

D.21 Restriction 25: HW_MTPR ITB_IA After Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–14

D.22 Guideline 26: Conditional Branches in PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–14

D.23 Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode. . . . . . . . . . . . . . . . . . . . . D–15

D.24 Restriction 28: Enforce Ordering Between IPRs Implicitly Written by Loads and Subsequent Loads

D–15

D.25 Guideline 29 : JSR, JMP, RET, and JSR_COR in PALcode. . . . . . . . . . . . . . . . . . . . . . . . . D–15

Alpha 21264/EV67 Hardware Reference Manual

D.26 Restriction 30 : HW_MTPR and HW_MFPR to the Cbox CSR. . . . . . . . . . . . . . . . . . . . . . . D–15

D.27 Restriction 31 : I_CTL[VA_48] Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–17

D.28 Restriction 32 : PCTR_CTL Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–17

D.29 Restriction 33 : HW_LD Physical/Lock Use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18

D.30 Restriction 34 : Writing Multiple ITB Entries in the Same PALcode Flow . . . . . . . . . . . . . . . D–18

D.31 Guideline 35 : HW_INT_CLR Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18

D.32 Restriction 36 : Updating I_CTL[SDE]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18

D.33 Restriction 37 : Updating VA_CTL[VA_48] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18

D.34 Restriction 38 : Updating PCTR_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18

D.35 Guideline 39: Writing Multiple DTB Entries in the Same PAL Flow. . . . . . . . . . . . . . . . . . . . D–19

D.36 Restriction 40: Scrubbing a Single-Bit Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–19

D.37 Restriction 41: MTPR ITB_TAG, MTPR ITB_PTE Must Be in the Same Fetch Block. . . . . D–21

D.38 Restriction 42: Updating VA_CTL, CC_CTL, or CC IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . D–21

D.39 Restriction 43: No Trappable Instructions Along with HW_MTPR. . . . . . . . . . . . . . . . . . . . . D–21

D.40 Restriction 44: Not Applicable to the 21264/EV67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–21

D.41 Restriction 45: No HW_JMP or JMP Instructions in PALcode . . . . . . . . . . . . . . . . . . . . . . . D–21

D.42 Restriction 46: Avoiding Live locks in Speculative Load CRD Handlers . . . . . . . . . . . . . . . D–22

D.43 Restriction 47: Cache Eviction for Single-Bit Cache Errors . . . . . . . . . . . . . . . . . . . . . . . . . D–22

D.44 Restriction 48: MB Bracketing of Dcache Writes to Force Bad Data ECC and Force Bad Tag Parity

D–24

E 21264/EV67-to-Bcache Pin Interconnections

E.1 Forwarding Clock Pin Groupings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–1

E.2 Late-Write Non-Bursting SSRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–2

E.3 Dual-Data Rate SSRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–3

Glossary

Index

Alpha 21264/EV67 Hardware Reference Manual

Figures

2–1 21264/EV67 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–3

2–2 Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–4

2–3 Local Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–4

2–4 Global Predictor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5

2–5 Choice Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5

2–6 Integer Execution Unit—Clusters 0 and 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–9

2–7 Floating-Point Execution Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–10

2–8 Pipeline Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–14

2–9 Pipeline Timing for Integer Load Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–25

2–10 Pipeline Timing for Floating-Point Load Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–26

2–11 Floating-Point Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–36

2–12 Typical Uniprocessor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–39

2–13 Typical Multiprocessor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–40

3–1 21264/EV67 Microprocessor Logic Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2

3–2 Package Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–17

3–3 21264/EV67 Top View (Pin Down) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–18

3–4 21264/EV67 Bottom View (Pin Up). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–19

4–1 21264/EV67 System and Bcache Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3

4–2 21264/EV67 Bcache Interface Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7

4–3 Cache Subset Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9

4–4 System Interface Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17

4–5 Fast Transfer Timing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–32

4–6 SysFillValid_L Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–36

5–1 Cycle Counter Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3

5–2 Cycle Counter Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3

5–3 Virtual Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4

5–4 Virtual Address Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4

5–5 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 0) . . . . . . . . . . . . . . . . . . . . 5–5

5–6 Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) . . . . . . . . . . . . . . . . . . . . 5–6

5–7 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) . . . . . . . . . . . . . . . . . . . . 5–6

5–8 ITB Tag Array Write Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6

5–9 ITB PTE Array Write Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7

5–10 ITB Invalidate Single Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7

5–11 ProfileMe PC Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8

5–12 Exception Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8

5–13 Instruction Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 0) . . . . . . . . . . . 5–9

5–14 Instruction Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) . . . . . . . . . . . 5–9

5–15 Instruction Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) . . . . . . . . . . . 5–9

5–16 Interrupt Enable and Current Processor Mode Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10

5–17 Software Interrupt Request Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11

5–18 Interrupt Summary Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11

5–19 Hardware Interrupt Clear Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12

5–20 Exception Summary Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14

5–21 PAL Base Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15

5–22 Ibox Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–16

5–23 Ibox Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–19

5–24 Process Context Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–22

5–25 Performance Counter Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–23

5–26 DTB Tag Array Write Registers 0 and 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–25

5–27 DTB PTE Array Write Registers 0 and 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–26

5–28 DTB Alternate Processor Mode Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–26

5–29 Dstream Translation Buffer Invalidate Single Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–27

5–30 Dstream Translation Buffer Address Space Number Registers 0 and 1. . . . . . . . . . . . . . . . 5–28

5–31 Memory Management Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–28

5–32 Mbox Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–29

5–33 Dcache Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31

Alpha 21264/EV67 Hardware Reference Manual

5–34 Dcache Status Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32

5–35 Cbox Data Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5–36 Cbox Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5–37 WRITE_MANY Chain Write Transaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–39

6–1 HW_LD Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4

6–2 HW_ST Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4

6–3 HW_RET Instruction Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6

6–4 HW_MFPR and HW_MTPR Instructions Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6

6–5 Single-Miss DTB Instructions Flow Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14

6–6 ITB Miss Instructions Flow Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–16

7–1 Power-Up Timing Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3

7–2 Fault Reset Sequence of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–9

7–3 Sleep Mode Sequence of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11

7–4 Example for Initializing Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–13

7–5 21264/EV67 Reset State Machine State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–17

10–1 Type 1 Heat Sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–4

10–2 Type 2 Heat Sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–5

10–3 Type 3 Heat Sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–6

11–1 TAP Controller State Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–4

11–2 TestStat_H Pin Timing During Power-U p Built-In Self-Test (BiST) . . . . . . . . . . . . . . . . . . . 11–5

11–3 TestStat_H Pin Timing During Buil t-In Self-Initialization (BiSI) . . . . . . . . . . . . . . . . . . . . . . . 11–5

11–4 SROM Content Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–6

xii

Alpha 21264/EV67 Hardware Reference Manual

Tables

1–1 Integer Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2

2–1 Pipeline Abort Delay (GCLK Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–16

2–2 Instruction Name, Pipeline, and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–17

2–3 Instruction Group Definitions and Pipeline Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18

2–4 Instruction Class Latency in Cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–20

2–5 Minimum Retire Latencies for Instruction Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–21

2–6 Instructions Retired Without Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–23

2–7 Rules for I/O Address Space Load Instruction Data Merging . . . . . . . . . . . . . . . . . . . . . . . . 2–28

2–8 Rules for I/O Address Space Store Instruction Data Merging. . . . . . . . . . . . . . . . . . . . . . . . 2–29

2–9 MAF Merging Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–30

2–10 Memory Reference Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31

2–11 I/O Reference Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31

2–12 TB Fill Flow Example Sequence 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34

2–13 TB Fill Flow Example Sequence 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–35

2–14 Floating-Point Control Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–36

2–15 21264/EV67 AMASK Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38

2–16 AMASK Bit Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38

3–1 Signal Pin Types Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3

3–2 21264/EV67 Signal Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3

3–3 21264/EV67 Signal Descriptions by Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–6

3–4 Pin List Sorted by Signal Name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–8

3–5 Pin List Sorted by PGA Location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–12

3–6 Ground and Power (VSS and VDD) Pin List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–16

4–1 Translation of Internal References to External Interface Reference . . . . . . . . . . . . . . . . . . . 4–5

4–2 21264/EV67-Supported Cache Block States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9

4–3 Cache Block State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10

4–4 System Responses to 21264/EV67 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10

4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions. . . . . . . . . . . . 4–11

4–6 System Port Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17

4–7 Programming Values for System Interface Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–18

4–8 Program Values for Data-Sample/Drive CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–18

4–9 Forwarded Clocks and Frame Clock Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–19

4–10 Bank Interleave on Cache Block Boundary Mode of Operation . . . . . . . . . . . . . . . . . . . . . . 4–19

4–11 Page Hit Mode of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–20

4–12 21264/EV67-to-System Command Fields Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–20

4–13 Maximum Physical Address for Short Bus Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–21

4–14 21264/EV67-to-System Commands Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–21

4–15 Programming INVAL_TO_DIRTY_ENABLE[1:0]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–23

4–16 Programming SET_DIRTY_ENABLE[2:0]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–24

4–17 21264/EV67 ProbeResponse Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–24

4–18 ProbeResponse Fields Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–25

4–19 System-to-21264/EV67 Probe Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–26

4–20 System-to-21264/EV67 Probe Commands Fields Descriptions . . . . . . . . . . . . . . . . . . . . . . 4–27

4–21 Data Movement Selection by Probe[4:3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–27

4–22 Next Cache Block State Selection by Probe[2:0] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–27

4–23 Data Transfer Command Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–28

4–24 SysDc[4:0] Field Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–29

4–25 SYSCLK Cycles Between SysAddOut and SysData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–32

4–26 Cbox CSR SYSDC_DELAY[4:0] Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–33

4–27 Four Timing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–34

4–28 Data Wrapping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–36

4–29 System Wrap and Deliver Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–37

4–30 Wrap Interleave Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–37

4–31 Wrap Order for Double-Pumped Data Transfers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–38

4–32 21264/EV67 Commands with NXM Addresses and System Response . . . . . . . . . . . . . . . . 4–39

4–33 21264/EV67 Response to System Probe and In-Flight Command Interaction . . . . . . . . . . . 4–41

Alpha 21264/EV67 Hardware Reference Manual

xiii

4–34 Rules for System Control of Cache Status Update Order. . . . . . . . . . . . . . . . . . . . . . . . . . . 4–42

4–35 Range of Maximum Bcache Clock Ratios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–43

4–36 Bcache Port Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–43

4–37 BC_CPU_CLK_DELAY[1:0] Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–45

4–38 BC_CLK_DELAY[1:0] Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–45

4–39 Program Values to Set the Cache Clock Period (Single-Data) . . . . . . . . . . . . . . . . . . . . . . . 4–46

4–40 Program Values to Set the Cache Clock Period (Dual-Data Rate) . . . . . . . . . . . . . . . . . . . . 4–46

4–41 Data-Sample/Drive Cbox CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–47

4–42 Programming the Bcache to Support Each Size of the Bcache . . . . . . . . . . . . . . . . . . . . . . 4–51

4–43 Programming the Bcache Control Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52

4–44 Control Pin Assertion for RAM_TYPE A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52

4–45 Control Pin Assertion for RAM_TYPE B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52

4–46 Control Pin Assertion for RAM_TYPE C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53

4–47 Control Pin Assertion for RAM_TYPE D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53

5–1 Internal Processor Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–1

5–2 Cycle Counter Control Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4

5–3 Virtual Address Control Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5

5–4 ProfileMe PC Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8

5–5 IER_CM Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10

5–6 Software Interrupt Request Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11

5–7 Interrupt Summary Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12

5–8 Hardware Interrupt Clear Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–13

5–9 Exception Summary Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14

5–10 PAL Base Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15

5–11 Ibox Control Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–16

5–12 Ibox Status Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–19

5–13 IPR Index Bits and Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21

5–14 Process Context Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–22

5–15 Performance Counter Control Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24

5–16 Performance Counter Control Register Input Select Fields. . . . . . . . . . . . . . . . . . . . . . . . . . 5–25

5–17 DTB Alternate Processor Mode Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . 5–27

5–18 Memory Management Status Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . 5–28

5–19 Mbox Control Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–30

5–20 Dcache Control Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31

5–21 Dcache Status Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32

5–22 Cbox Data Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5–23 Cbox Shift Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33

5–24 Cbox WRITE_ONCE Chain Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–34

5–25 Cbox WRITE_MANY Chain Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–39

5–26 Cbox Read IPR Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–41

6–1 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3

6–2 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3

6–3 HW_LD Instruction Fields Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4

6–4 HW_ST Instruction Fields Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–5

6–5 HW_RET Instruction Fields Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6

6–6 HW_MFPR and HW_MTPR Instructions Fields Descriptions. . . . . . . . . . . . . . . . . . . . . . . . 6–7

6–7 Paired Instruction Fetch Orde r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9

6–8 PALcode Exception Entry Locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13

6–9 IPRs Used for Performance Counter Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18

6–10 Aggregate Mode Returned IPR Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–19

6–11 Aggregate Mode Performance Counter IPR Input Select Fields. . . . . . . . . . . . . . . . . . . . . . 6–20

6–12 CMOV Decomposed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–21

6–13 ProfileMe Mode Returned IPR Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–22

6–14 ProfileMe Mode PCTR_CTL Input Select Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–24

7–1 21264/EV67 Reset State Machine Major Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–1

7–2 Signal Pin Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3

7–3 Pin Signal Names and Initialization State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–5

7–4 Power-Up Flow Signals and Their Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–7

7–5 Effect on IPRs After Fault Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–8

xiv

Alpha 21264/EV67 Hardware Reference Manual

7–6 Effect on IPRs After Transition Through Sleep Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–10

7–7 Signals and Constraints for the Sleep Mode Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11

7–8 Effect on IPRs After Warm Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11

7–9 WRITE_MANY Chain CSR Values for Bcache Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 7–12

7–10 Internal Processor Registers at Power-Up Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–14

7–11 21264/EV67 Reset State Machine State Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–17

7–12 Differential Reference Clock Frequencies in Full-Speed Lock . . . . . . . . . . . . . . . . . . . . . . . 7–20

8–1 21264/EV67 Error Detection Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–1

8–2 64-Bit Data and Check Bit ECC Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2

8–3 Error Case Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–9

9–1 Maximum Electrical Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1

9–2 Signal Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2

9–3 VDD (I_DC_POWER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3

9–4 Input DC Reference Pin (I_DC_REF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3

9–5 Input Differential Amplifier Receiver (I_DA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3

9–6 Input Differential Amplifier Clock Receiver (I_DA_CLK) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3

9–7 Pin Type: Open-Drain Output Driver (O_OD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–4

9–8 Bidirectional, Differential Amplifier Receiver, Open-Drain Output Driver (B_DA_OD) . . . . . 9–4

9–9 Pin Type: Open-Drain Driver for Te st Pins (O_OD_TP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–4

9–10 Bidirectional, Differential Amplifier Receiver, Push-Pull Output Driver (B_DA_PP) . . . . . . . 9–4

9–11 Push-Pull Output Driver (O_PP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5

9–12 Push-Pull Output Clock Driver (O_PP_CLK). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5

9–13 AC Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–7

10–1 Operating Temperature at Heat Sink Center (Tc) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1

10–2 qca at Various Airflows for 21264/EV67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2

10–3 Maximum Ta for 21264/EV67 @ 600 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2

10–4 Maximum Ta for 21264/EV67 @ 667 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2

10–5 Maximum Ta for 21264/EV67 @ 700 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2

10–6 Maximum Ta for 21264/EV67 @ 733 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2

10–7 Maximum Ta for 21264/EV67 @ 750 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–3

10–8 Maximum Ta for 21264/EV67 @ 833 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–3

11–1 Dedicated Test Port Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–1

11–2 IEEE 1149.1 Instructions and Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–3

11–3 Icache Bit Fields in an SROM Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–7

A–1 Instruction Format and Opcode Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–1

A–2 Architecture Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–2

A–3 Opcodes Reserved for Compaq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–8

A–4 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9

A–5 IEEE Floating-Point Instruction Function Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9

A–6 VAX Floating-Point Instruction Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–11

A–7 Independent Floating-Point Instruction Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–12

A–8 Opcode Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–12

A–9 Key to Opcode Summary Used in Table A–8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–13

A–10 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–13

A–11 Exceptional Input and Output Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–15

E–1 Bcache Forwarding Clock Pi n Groupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–1

E–2 Late-Write Non-Bursting SSRAMs Data Pin Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–2

E–3 Late-Write Non-Bursting SSRAMs Tag Pin Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–2

E–4 Dual-Data Rate SSRAM Data Pin Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–3

E–5 Dual-Data Rate SSRAM Tag Pin Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–4

Alpha 21264/EV67 Hardware Reference Manual

Audience

Content

Preface

This manual is for system designers and programmers who use the Alpha 21264/EV67 microprocessor (referred to as the 21264/EV67).

This manual contains the following chapters and appendixes: Chapter 1, Introduction, introduces the 21264/EV67 and provides an overview of the

Alpha architecture. Chapter 2, Internal Architecture, describes the major hardware functions and the inter-

nal chip architect ure. It descri bes performanc e measurement faci lities, co ding rules, an d design examples.

Chapter 3, Hardware Interface, lists and describes the internal hardware interface signals, and provides mechanical data and packaging information, including signal pin lists.

Chapter 4, Cache and External Interfaces, describes the external bus functions and transactions, lists bus commands, and describes the clock functions.

Chapter 5, Internal Processor Registers, lists and describes the internal processor register set.

Chapter 6, Privileged Architecture Library Code, describes the privileged architecture library code (PALcode).

Chapter 7, Initialization and Configuration, describes the initialization and configuration sequence.

Chapter 8, Error Detection and Error Handling, describes error detection and error handling.

Chapter 9, Electrical Data, pr ovi des elec tr ical data and describes signal integrity issues. Chapter 10, Thermal Management, provides information about thermal management. Chapter 11, Testability and Diagnostics, describes chip and system testability features. Appendix A, Alpha Instruction Set, summarizes the Alpha instruction set. Appendix B, 21264/EV67 Boundary-Scan Register, presents the BSDL description of

the 21264/EV67 boundary-scan register.

Alpha 21264/EV67 Hardware Reference Manual

xvii

Appendix C, Serial Icache Load Predecode Values, provides a pointer to the Alpha

Motherboards Software Developer’s Kit (SDK), which contains this information. Appendix D, PALcode Restrictions and Guidelines, lists restrictions and guidelines

that must be adhered to when generating PALcode. Appendix E, 21264/EV67-to-Bcache Pin Interconnections, provides the pin interface

between the 21264/EV67 and Bcache SSRAMs. The Glossary lists and defines terms associated with the 21264/EV67. An Index is provided at the end of the document.

Documentation Included by Reference

The companio n volume to this manual, the Alpha Architecture Handbook, Version 4, con- tains the instruction set architecture. You can access this document from the following website: ftp.digital.com/pub/Digital/info/semiconductor/lit-

erature/dsc-library.html

Also available is the Alpha Architecture Reference Manual, Third Edition, which con- tains the complete architecture information. That manual is available at bookstores from the Digital Press as EQ-W938E-DP.

xviii

Alpha 21264/EV67 Hardware Reference Manual

Terminology and Conventions

This section defines the abbreviations, terminology, and other conventions used throughout this document.

Abbreviations

Binary Multiples

•

The abbreviations K, M, and G (kilo, mega, and giga) represent binary multiples and have the following values.

K M G

(1024) (1,048,576) (1,073,741,824)

For example:

2KB = 2 kilobytes 4MB = 4 megabytes 8GB = 8 gigabytes 2K pixels = 2 kilopixels 4M pixels = 4 megapixels

• Register Access

=2 × 2 =4 × 2 =8 × 2 =2 × 2 =4 × 2

bytes

pixels

The abbreviations used to indica te the t ype of acc ess to re giste r fields and bits ha ve the following definitions:

Abbreviation Meaning

IGN Ignore

Bits and fields specified are ignored on writes.

MBZ Must Be Zero

Software must never place a nonzero value in bits and fields specified as MBZ. A nonzero read produces an Illegal Operand exception. Also, MBZ fields are reserved for future use.

RAZ Read As Zero

Bits and fields return a zero when read.

RC Read Clears

Bits and fields are cleared when read. Unless otherwise specified, such bits cannot be written.

RES Reserved

Bits and fields are reserved by Compaq and should not be used; however, zeros can be written to reserved fields that cannot be masked.

RO Read Only

The value may be read by software. It is written by hardware. Software write operations are ignored.

RO,n Read Only, and takes the value n at power-on reset.

The value may be read by software. It is written by hardware. Software write operations are ignored.

Alpha 21264/EV67 Hardware Reference Manual

xix

Abbreviation Meaning

RW Read/Write

Bits and fields can be read and written.

RW,n Read/Write, and takes the value n at power-on reset.

Bits and fields can be read and written.

W1C Write One to Clear

If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 1 cause the bit to be cleared by hardware. Software write operations of a 0 do not modify the state of the bit.

W1S Write One to Set

If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 1 cause the bit to be set by hardware. Softwa re write operations of a 0 do not modi fy the state of the bit.

WO Write Only

Bits and fields can be written but not read.

WO,n Write Only, and takes the value n at power-on reset.

Bits and fields can be written but not read.

• Sign extension

SEXT(x) means x is sign-extended to the required size.

Addresses

Unless otherwise noted, all addresses and offsets are hexa decimal.

Aligned and Unaligned

The terms aligned and naturally aligned are interchangeable and refer to data objects that are powers of two in size. An aligned datum of size 2n is stored in memory at a byte address that is a multiple of 2n; that is , one that has n low-order zeros. For example, an aligned 64-byte st ack frame has a memory address that is a multiple of 64.

A datum of size 2n is unaligned if it is stored in a byte address that is not a multiple of 2n.

Bit Notation

Multiple-bit fields can include contiguous and noncontiguous bits contained in square brackets ([]). Multiple contiguous bit s are indicated by a pair of numbers separ ated by a colon [:]. For example , [ 9:7,5,2: 0] s pecif ies b its 9,8,7, 5,2,1, a nd 0. Similar ly, single bits are frequently indicated with square brackets. For example, [27] specifies bit 27. See also Field Notation.

Caution

Cautions indicate potential damage to equipment or loss of data.

Alpha 21264/EV67 Hardware Reference Manual

Data Units

The following data unit terminology is used throughout this manual.

Term Words Bytes Bits Other

Byte ½1 8— Word1216— Longword 2 4 32 Dword Quadword 4 8 64 2 longword

Do Not Care (X)

A capital X represents any valid value.

External

Unless otherwise stated, external means not contained in the chip.

Field Notation

The names of single-bit and multiple-bit fields can be used rather than the actual bit numbers (see Bit Notation). When the field name is used, it is contained in square brackets ([]). For example, RegisterName[LowByte] specifies RegisterName[7:0].

Note

Notes emphasize particularly important information.

Numbering

All numbers are deci mal or hexadecimal unless otherwise indicat ed. The prefix 0x indicates a hexadecimal numbe r. For example, 19 is decimal, but 0x19 and 0x19A a re hexa decimal (also see Addresses). Otherwise, the base is indicated by a subscript; for example, 100

Ranges and Extents

is a binary number.

Ranges are specified by a pair of numbers separated by two periods (..) and are inclusive. For example, a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4.

Extents are specified by a pair of numbers in square brackets ([]) separated by a colon (:) and are inclusive. Bit fields are often specified as extents. For example, bits [7:3] specifies bits 7, 6, 5, 4, and 3.

The gray areas in register figures indicate reserved or unused bits and fields. Bit ranges that are coupled with the field n ame specify the bits of the name d field that

are included in the register. The bit range may, but need not necessarily, correspond to the bit Extent in the register . Se e the explan ation above Table 5–1 for more information.

Signal Names

The following examples describe signal-name conventions used in this document.

Alpha 21264/EV67 Hardware Reference Manual

xxi

AlphaSignal[n:n] Boldface, mixed-case type denotes signal names that are

assigned internal and external to the 21264/EV67 (that is, the signal traverses a chip interface pin).

AlphaSignal_x[n:n] When a signal has high and low assertion states, a lower-

case italic x represents the assertion states. For example,

SignalName_x[3:0] represents SignalName_H[3:0] and SignalName_L[3:0].

UNDEFINED

Operations specified as UNDEFINED may vary f rom moment to moment , implemen tation to implementation, and instruction to instruction within implementations. The operation may vary in effect from nothing to stopping system operation.

UNDEFINED operations may halt the processor or cause it to lose information. However, UNDEFINED operations must not cause the processor to hang, that is, reach an unhalted state from which there is no transition to a normal state in which the machine executes instructions.

UNPREDICTABLE

UNPREDICTABLE results or occurrences do not disrupt the ba sic ope ratio n of the pro cessor; it continues to execute instructions in its normal manner. Further:

• Results or occurrences specified as UNPREDICTABLE may vary from moment to

moment, implementation to imp lementation, and instruction to instruction within implementations. Software can never depend on results specified as UNPREDICTABLE.

• An UNPREDICTABLE result may acquire an arbitrary value subject to a few con-

straints. Such a result may be an arbitrary function of the input operands or of any state information that is accessible to the process in its current access mode. UNPREDICTABLE results may be unchanged from their previous values.

Operations that produce UNPREDICTABLE results may also produce exceptions.

• An occurrence specified as UNPREDICTABLE may happen or not based on an

arbitrary choice function. The choice function is subject to the same constraints as are UNPREDICTABLE results and, in particular, must not constitute a security hole.

Specifically, UNPREDICT ABLE resul ts must not de pend upon, or be a functio n of, the contents of memory locations or registers that are inaccessible to the current process in the current ac cess mode.

Also, operations that may pr oduce UNPREDICTABLE results must not:

– Write or modify the contents of memory locations or registers to which the cur-

rent process in the current access mode does not have access, or – Halt or hang the system or any of its components . For example, a security hole would exist if some UNPREDICTABLE result

depended on the val ue o f a re gister in another process, on the contents of processor temporary registers left behind by some previously running process, or on a sequence of actions of different processes.

xxii

Alpha 21264/EV67 Hardware Reference Manual

Do not care. A capital X represents any valid va lue.

Alpha 21264/EV67 Hardware Reference Manual

xxiii

This chapter provides a brief introduction to the Alpha architecture, Compaq’s RISC (reduced instruction set computing) architecture designed for high performance. The chapter then summarizes the specific features of the Alpha 21264/EV67 microprocessor (hereafter called the 21264/EV67) that implements the Alpha ar chitecture. Appendix A provides a list of Alpha instructions.

The companio n volume to this manual, the Alpha Architecture Handbook, Version 4, contains the i nstruction set architecture. Als o available is the Alpha Architecture Refer- ence Manual, Third Edition, which contains the complete architecture information.

1.1 The Architecture

The Alpha architecture is a 64-bit load and store RISC architecture designed with particular emphasis o n s peed , mul ti ple instruction issue, multiple proces sor s, and software migration from many operating systems.

All registers are 64 bits long and all operations are performed between 64-bit registers. All instructions ar e 32 bits lo ng. Memory operat ions are e ither loa d or store operation s. All data manipulation is done between registers.

Introduction

The Alpha architecture supports the following data types:

• 8-, 16-, 32-, and 64-bit integers

• IEEE 32-bit and 64-bit floating-point formats

• VAX architecture 32-bit and 64-bit floating-point formats

In the Alpha architecture, instructions interact with each other only by one instruction writing to a register or memory loc ation a nd anothe r inst ructi on read ing fro m that reg ister or memory location. This use of resources makes it easy to build implementations that issue multiple instructions every CPU cycle.

The 21264/EV67 uses a set of subroutines, called privileged architecture library code (PAL code), that is specific to a particular Alpha operating sys tem implementation and hardware platform. These subroutines provide operating system primitives for context switching, interrupts, exceptions, and memory management. These subroutines can be invoked by hardware or CALL_PAL instructions. CALL_PAL instructions use the function field of the instruction to vector to a specified subroutine. PALcode is written in standard machine code with some implementation-specific extensions to provide

Alpha 21264/EV67 Hardware Reference Manual

Introduction 1–1

The Architecture

direct access to low- level hardwar e funct ions. PALcode suppor ts opti mizat ions fo r multiple operating systems, flexible memor y-management implementat ions, and multiinstruction atomic sequ ences.

The Alpha architecture performs byte shifting and masking with normal 64-bit, register-to-regi ster instruct ions. The 21264/EV67 pe rforms single-byt e and single-wo rd load and store instructions.

1.1.1 Addressing

The basic addressable unit in the Alpha architecture is the 8-bit byte. The 21264/EV67 supports a 48-bit or 43-bit virtual address (selectable under IPR control).

V irtua l addr esses as see n by the progra m ar e tran slat ed int o physic al memory addres ses by the memory-management mechanism. The 21264/EV67 supports a 44-bit physical address.

1.1.2 Integer Data Types

Alpha architecture supports the four integer data types listed in Table 1–1.

Table 1–1 Integer Data Types

Data Type Description

Byte A byte is 8 contiguous bits that start at an addressable byte boundary.

A byte is an 8-bit value.

Word A word is 2 contiguous bytes that start at an arbitrary byte boundary.

A word is a 16-bit value.

Longword A longword i s 4 conti guo us byte s that s tar t at an arbit rary byte boundary. A

longword is a 32-bit value.

Quadword A quadword is 8 contiguous bytes that start at an arbitrary byte boundary.

Note: Alpha implementations may impose a significant performance penalty

when accessing operands that are not naturally aligned. Refer to the Alpha Architecture Handbook, Version 4

1.1.3 Floating-Point Data Types

The 21264/EV67 supports the following floating-point data types:

• Longword integer format in floating-point unit

• Quadword integer format in floating-point unit

• IEEE floating-point formats

for details.

• VAX floating-point formats

1–2 Introduction

– S_floating – T_floating

– F_floating –G_floating – D_floating (limited support)

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Microprocessor Features

1.2 21264/EV 67 Microprocessor Features

The 21264/EV67 microproces sor is a sup er sca la r pipelined processor. It is packaged in a 587-pin PGA carrier and has removable application-specific heat sinks. A number of configuration optio ns allow it s use in a ra nge of syst em designs r anging fro m extremely simple uniprocessor systems with minimum component count to high-performance multiprocessor systems with very high cache and memory bandwidth.

The 21264/EV67 can issue four Alpha instructions in a single cycle, thereby minimizing the average cycles per instruction (CPI). A number of low-late ncy and/or highthroughput featu res in the i nstru ction issue unit and the onchip compo nents o f the memory subsystem further reduce the average CPI.

The 21264/EV67 and associated PALcode implements IEEE single-precision and double-precision, VAX F_floating and G_floating data types, and supports longword (32-bit) and quadword (64-bit) integers. Byte (8-bit) and word (16-bit) support is provided by byte-manipulation instructions. Limited hardware support is provided for the VAX D_floating data type.

Other 21264/EV67 features include:

• The ability to issue up to four instructions during each CPU clock cycle.

• A peak instruction execution rate of four times the CPU clock frequency.

• An onchip, demand-paged memory-management unit with translation buffer, which,

when used with PALcode, can implement a variety of page tabl e s tructures and translation algorithms. The uni t consists of a 128-entry , fully-associative data translation buffer (DTB) and a 128- entry, fully-associative inst ruction translat ion buf fer (ITB), with each entry able to map a single 8KB page or a group of 8, 64, or 512 8KB pages. The allocati on scheme f or t he ITB a nd DTB is r ound-r obin. Th e siz e of e ach

translation buffer entry’s group is specified by hint bits stored in the entry. The DTB and ITB implement 8-bit address space numbers (ASN), MAX_ASN=255.

• Two onchip, high-throughput pipelined floating-point units, capable of executing

both VAX and IEEE floating-point data types.

• An onchip, 64KB virtually-addressed instruction cache with 8-bit ASNs

(MAX_ASN=255).

• An onchip, virtually-indexed, physically-tagged dual-read-ported, 64KB data

cache.

• Supports a 48-bit or 43-bit virtual address (program selectable).

• Supports a 44-bit physical address.

• An onchip I/O write buff er with four 64-byte entries for I/O write transactions.

• An onchip, 8-entry victim data buffer.

• An onchip, 32-entry load queue.

• An onchip, 32-entry store queue.

• An onchip, 8-entry miss address file for cache fill requests and I/O read

transactions.

• An onchip, 8-entry probe queue, holding pending system port probe commands.

Alpha 21264/EV67 Hardware Reference Manual

Introduction 1–3

21264/EV67 Microprocessor Features

• An onchip, duplicate tag array used to maintain level 2 cache coherency.

• A 64-bit data bus with onchip parity and error correction code (ECC) support.

• Support for an external second-level (Bcache) cache. The size and some timing

parameters of the Bcache are programmable.

• An internal clock generator providing a high-speed clock used by the 21264/EV67,

and two clocks for use by the CPU module.

• Onchip performance counters to measure and analyze CPU and system perfor-

mance.

• Chip and module level test support, including an instruction cache test interface to

support chip and module level testing.

• A 2.0-V exter nal interface.

Refer to Chapter 9 for 21264/EV67 dc and ac electrical characteristics. Refer to the

Alpha Archit ecture Handbook, Version 4

implementation-dependent information.

, Appendix E, for waivers and any other

1–4 Introduction

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture

This chapter provides both an o verview of the 21264/EV67 microarchitecture and a sys-

tem designer’s view of t he 2 1264/ EV67 imple me ntat io n of t he Alp ha ar chitecture. The combination of the 2126 4/EV67 mic roar chi tecture and privileged architecture library code (PALcode) defines the chip’s implementation of the Alpha architecture. If a ce rt ain piece of hardware seems to be “ar chitecturally incomplete,” the missing functionality is implemented in PALcode. Chapter 6 provides more infor mati on on PALcode.

This chapter describes the major functional hardware units and is not intended to be a detailed hardware description of the chip. It is organized as follows:

• 21264/EV67 microarchitecture

• Pipeline organization

• Instruction issue and retire rules

• Load instructions to R31/F31 (software-directed instruction prefetch)

• Special cases of Alpha instruction execution

• Memory and I/O address space

• Miss address file (MAF) and load-merging rules

• Instruction orderi ng

• Replay traps

• I/O write buffer and the WMB inst ruction

• Performance measurement support

• Floating-point control register

• AMASK and IMPLVER instruction values

• Design examples

2.1 21264/EV67 Microarchitecture

The 21264/EV67 microprocessor is a high-performance third-generation implementation of the Compaq Alpha archit ec tur e. The 21264 /EV67 cons ists of the following sec-

tions, as shown in Figure 2–1:

• Instruction fetch, issue, and retire unit (Ibox)

• Integer execution unit (Ebox)

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–1

21264/EV67 Microarchitecture

• Floating-point execution unit (Fbox)

• Onchip caches (Icache and Dcache)

• Memory reference unit (Mbox)

• External cache and syst em interface unit (Cbox)

• Pipeline operation sequence

2.1.1 Instruction Fetch, Issue, and Retire Unit

The instruction fetch, issue, and retire unit (Ibox) consists of the following subsections:

• Vi rtual program counter logic

• Branch predictor

• Instruction-stream translation buffer (ITB)

• Instruction fetch logic

• Register rename maps

• Integer and floating-point issue queues

• Exception and interrupt logic

• Retire logic

2.1.1.1 Virtual Program Counter Logic

The virtual program counter (VPC) logic maintains the virtual addr esses for instructions that are in flight . There can be up to 80 instr uctions, in 20 succ essive f etch slo ts, in flight between the register rename mappers and the end of the pipeline. The VPC logic contains a 20-entry table to store these fetched VPC addresses.

2–2 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Figure 2–1 21264/EV67 Block Diagram

MUL

Store

Victim

IOWB

Duplicate

Probe

Cache

System

Address

128

Cbox

128

056

21264/EV67 Microarchitecture

Instruction Cache

Ibox

Fetch Unit

VPC

Queue

Branch

Predictor

Ebox

Address

ALU 0

(L0)

Integer Registers 0

(80 Registers)

Virtual Address

Next Address

Integer Issue Queue

(20 Entries)

INT

UNIT

(U0)

INT

UNIT

(U1)

Integer Registers 1

(80 Registers)

ITB

Address

ALU 1

(L1)

Retire

Unit

Four Instructions

Predecode

Decode and

Rename Registers

FP Issue Queue

(15 Entries)

Fbox

ADD

DIV

SQRT

FP Registers

(72 Re

isters)

Queue

Tag Store

Buffer

Arbiter

Physical Address

Data

128

Index

Bus

Mbox

DTB

(Dual-ported, 128-entry)

Physical Address

Dual-Ported Data Cache

2.1.1.2 Branch Predictor

The branch predictor is composed of three units: the local, global, and choice predic-

tors. Figure 2–2 shows how the branch predictor generates the predicted branch address.

Load

Queue

Data

Miss Address

File

Data

FM-

42-AI4

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–3

21264/EV67 Microarchitecture

Figure 2–2 Branch Predictor

Local

Predictor

Global

Predictor

Predicted

Branch

Address

Choice

Predictor

FM-05810.AI4

Local Predictor

The local predictor uses a 2-level table that holds the history of individual branches. The 2-level table desi gn approaches the prediction accuracy of a larger single-level

table while requiring fewer total bits of storage. Figure 2–3 shows how the local predictor generates a prediction. Bits [11:2] of the VPC of the current branch are used as the index to a 1K entry table in which each entry is a 10-bit value. This 10-bit value is used as the index to a 1K entry table of 3-bit saturating counters. The value of the saturating counter determines the predication, taken/not-taken, of the current branch.

Figure 2–3 Local Predictor

VPC[11:2]

Local

History

Table

1K x 10

Index

Local Branch Prediction

Local

Predictor

1K x 3

+/-

FM-05811.AI4

Global Predictor

The global predictor is indexed by a global history of all recent branches. The global predictor correlates the local history of the current branch with all recent branches. Fig-

ure 2–4 shows how the global predictor generates a prediction. The global path history is comprised of the taken/not-taken state of the 12 most-recent branches. These 12 states are used to form an index into a 4K entry table of 2-bit saturating counters. The value of the saturating counter determines the predication, taken/not-taken, of the current branch.

2–4 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Microarchitecture

Figure 2–4 Global Predictor

Global

Path

History

Index

Global Branch Prediction

Choice Predictor

The choice predictor moni tors the history of the local and global predictor s and choose s

the best of the two predictors for a particular branch. Figure 2–5 shows how the choice predictor generates its choice of the result of the local or global prediction. The 12-bit global path history (see Figure 2–4) is used to index a 4K entry t abl e of 2- bit sa turating counters. The value of the sa turatin g counter det ermines th e choice bet ween the output s of the local and global predictors.

Global

Predictor

4K x 2

+/-

FM-05812.AI4

Figure 2–5 Choice Predictor

Global

Path

History

Choice

Predictor

4K x 2

2.1.1.3 Instruction-Stream Translation Buffer

The Ibox includes a 128-entry, fully-associative instruction-stream translation buffer (ITB) that is used to store recently used instruction-stream (Istream) address translations and page protection information. Each of the entries in the ITB can map 1, 8, 64, or 512 contiguous 8KB pages. The allocation scheme is round-robin.

The ITB supports an 8-bit ASN and contains an ASM bit. The Icache is virtually addressed and contains the access-check information, so the ITB is accessed only for Istream references that miss in the Icache.

Istream transactions to I/ O address space are UNDEFINED.

Choice Prediction

FM-05813.AI4

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–5

21264/EV67 Microarchitecture

2.1.1.4 Instruction Fetch Logic

The instruction prefetcher (predecode) reads an octaword, containing up to four naturally aligned instructions per cycle, from the Icache. Branch prediction and line prediction bits accompany the four instructions. The branch prediction scheme operates most efficiently when only one branch instruction is contained among the four fetched instructions. The line prediction scheme attempts to predict the Icache line that the branch predictor will generate, and is described in Section 2.2.

An entry from the subroutine return prediction stack, toge ther with set prediction bits for use by the Ica che s tream con troll er, are fetched along with the octawo rd. The I cache stream controller generates fetch requests for additional Icache lines and stores the Istream data in the Icache. Th ere is no separate buffer to hold Istream requests.

2.1.1.5 Register Rename Maps

The instruction prefetcher forwards instructions to the integer and floating-point register rename maps. The rename maps perform the two functions listed here:

• Eliminate register write-after-read (WAR) and write-after-write (WAW) data

dependencies while preserving true read-after-write (RAW) data dependencies, in order to allow instructions to be dynamically rescheduled.

• Provide a means of speculatively executing instruction s before the con trol flow

previous to those inst ructions is resolved. Both exceptions and branch mispredictions represent deviations from the control flow predicted by the instruction prefetcher.

The map logic translates each instruction’s operand register specifiers from the virtual register numbers in the instruction to the physical register numbers that hold the corresponding architecturally-correct values. The map logic also renames each instruction’s destination register specifier from the virtual number in the instruction to a physical register number chosen from a list of free physical registers, and updates the register maps.

The map logic can process four instructions per cycle. It does not return the physical register, which holds the old value of an instruction’s virtual destination register, to the free list until the instru ction has bee n retired, in dicating that the control flow up to that instruction has been resolved.

If a branch mispredict or exception occurs, the map logic backs up the contents of the integer and floating-po int register rename maps to the state associated with the instruction that triggere d the condition, and the prefetcher restarts at the appropriate VPC. At most, 20 valid fetch slots containing up to 80 instructions can be in flight between the register maps and the end of the machine’s pipeline, where the control flow is finally resolved. The map logic is capable of backing up the contents of the maps to the state associated with any of these 80 instructions in a single cycle.

The register rename logic places instructions into an integer or floating-point issue queue, from which they are later issued to functional units for execution.

2.1.1.6 Integer Issue Queue

The 20-entry integer issue queue (IQ), associated with the integer execution units (Ebox), issues the following types of instructions at a maximum rate of four per cycle:

2–6 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Microarchitecture

• Integer operate

• Integer conditional branch

• Unconditional branch – both displacement and memory format

• Integer and floating-point load and store

• PAL-reserved instructions: HW_MTPR, HW_MFPR, HW_LD, HW_ST,

HW_RET

• Integer-to-floa ting-point (ITOFx) and floating-point-to-integer (FTOIx)

Each queue entry asserts f our requ est si gnals—one f or ea ch of the Ebox subcl uster s. A

queue entry asserts a re quest wh en it contai ns an instr ucti on that can b e execu te d by the subcluster, if the instruction’s operand register values are available within the subcluster.

There are two arbiters—one f or the upper s ubcluster s and one for t he lower subcl usters. (Subclusters are described in Section 2.1.2.) Each arbiter picks two of the possible 20 requesters for servi ce each cycl e. A given instru ction only re quests upper subclust ers or lower subclusters, but because many instructions can only be executed in one type or another this is not too limiting.

For example, load and store instructions can only go to lower subclusters and shift instructions can only go to upper subclusters. Other instructions, such as addition and logic operations, can execute in either upper or lower subclusters and are statically assigned before being placed in the IQ.

The IQ arbiters choose between simultaneous requesters of a subcluster based on the age of the request—older requests are given priority over newer requests. If a given instruction requests both lower subclusters, and no older instruction requests a lower subcluster, then the arbiter assigns subcluster L0 to the instruction. If a given instruction requests both upper subclusters, and no older instruction requests an upper subcluster, then the arbiter assigns subcluster U1 to the instruction. This asymmetry between the upper and lower subcluster arbiters is a circuit implementation optimization with ne gligible overall performance effect.

2.1.1.7 Floating-Point Issue Queue

The 15-entry floating-point issue queue (FQ) associated with the Fbox issues the following instruction types:

• Floating-point operates

• Floating-point conditional branches

• Floating-point stores

• Floating-point register to integer register transfers (FTOIx)

Each queue entry has thr ee req uest l ines— one for the ad d pipel ine, on e for t he multi ply pipeline, and one for the two store pipelines. There are three ar biters—one for each of the add, multiply, and store pipelines. The add and multiply arbiters pick one requester per cycle, while the store pipeline arbiter picks two requesters per cycle, one for each store pipeline.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–7

21264/EV67 Microarchitecture

The FQ arbiters pick between simul taneou s reques ters of a pipeline bas ed on the age of

the request—older requests are given priority over newer requests. Fl oat i ng-point store instructions and FTOIx instructions in even-numbered queue entries arbitrate for one store port. Floating-point store instructions and FTOIx instructions in odd-numbered queue entries arbitrate for the second store port.

Floating-point store instructions and FTOIx instructions are queued in both the integer and floating-point queue s. They wait i n the float ing-poi nt queue unt il thei r opera nd register values are available. They subsequently request service from the store arbiter. Upon being issued fr om the float ing-point queue, the y signal t he corre sponding en try in the integer queue to request service. Upon being issued from the integer queue, the operation is completed.

2.1.1.8 Exception and Interrupt Logic

There are two types of exceptions: faults and synchronous traps. Ar it hmet ic exceptions are precise and are reported as synchronous traps.

The four sources of interrupts are listed as follows:

• Level-sensitive hardware interrupts sourced by the IRQ_H[5:0] pins

• Edge-sensitive hardware interrupts generated by the serial line receive pin,

• Software interrupts sourced by the software interrupt request (SIRR) register

• Asynchronous system traps (ASTs)

Interrupt sources can be individually masked. In addition, AST inte rrupts are qualified by the current processor mode.

2.1.1.9 Retire Logic

The Ibox fetches instructions in program order, executes them out of order, and then retires them in order. The Ibox retire logic maintains the architectural state of the machine by retiring an instruction only if all previous instructions have executed without generating excepti ons or branch mispr edictions. Retir ing an instruc tion commits the machine to any changes the instruction may have made to the software-visible state. The three software-visible states are listed as follows:

• Integer and floating-point registers

• Memory

• Internal processor registers (including control/status registers and translation

The retire logic can sustain a maximum retire rate of eight instructions per cycle, and can retire up to as many as 11 instructions in a single cycle.

performance counter overflows, and hardware corrected read errors

buffers)

2.1.2 Integer Execution Unit

The integer execut ion u nit ( Ebox ) is a 4- path integ er ex ecu tion unit that is implement ed

as two functional-uni t “cl uster s” la beled 0 and 1. Ea ch clus ter c ontain s a copy of an 80entry, physical-register file and two “subcluste rs ”, named upper (U) and lower (L). Figure 2–6 shows the integer execution unit. In the figure, iop_wr is the cross-cluster bus for moving integer result values between clusters.

2–8 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Microarchitecture

Figure 2–6 Integer Execution Unit—Clusters 0 and 1

iop_wr iop_wr

Load/Store Data Load/Store Data

eff_VA eff_VA

FM-05643.AI4

Most instructions have 1- cycle late ncy for consumer s tha t execu te wit hin th e sa me clus ter . Al so, t here is an oth er 1- cycle de la y ass ociat ed wit h prod ucing a val ue in on e clu ster and consuming the value i n th e other cluster. The instruction issu e queue mi nimizes the performance effect of this cross-cluster delay. The Ebox contains the following resources:

• Four 64-bit adders that are used to calculate results for integer add instructions

(located in U0, U1, L0, and L1)

• The adders in the lower subclusters that are used to generate the effective virtual

address for load and st ore instructions (located in L0 and L1)

• Four logic units

• Two barrel shifters and associated byte logic (located in U0 and U1)

• Two sets of conditional branch logic (located in U0 and U1)

• Two copies of an 80-entry register file

• One pipelined multiplier (locate d in U1) with 7-cycle lat ency for all integer m ultiply

operations

• One fully-pipelined uni t (l oca te d in U0), wit h 3-c y cl e la te ncy, that executes the fol-

lowing instructions:

– CTLZ, CTPOP , CTTZ – PERR, MINxxx, MAXxxx, UNPKxx, PKxx

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–9

21264/EV67 Microarchitecture

LK98-0004A

FP Mul

Reg

FP Add

FP Div

SQRT

Floating-P oi n t

Execution Units

The Ebox has 80 register-file entries that contain storage f or t he values of the 31 Alpha integer registers (the value of R31 is not stored), the values of 8 PALshadow registers, and 41 results written by instructions that have not yet been retired.

Ignoring cross-cluster delay, the two copies of the Ebox register file contain identical values. Each copy of the Ebox register file contains four read ports and six write ports. The four read ports are used to source operands to each of the two subclusters within a cluster. The six write ports are used as follows:

• Two write ports are used to write results generated within t he cluster.

• Two write ports are used to write results generated by the other cluster.

• Two write ports are used to write results from load instructions. These two ports

are also used for FTO Ix instructions.

2.1.3 Floating-Point Execution Unit

The floating-point execution unit (Fbox) has two paths. The Fbox executes both VAX and IEEE floating-point instructions. It support IEEE S_floating-point and T_floatingpoint data types and all rounding modes. It also supports VAX F_floating-point and G_floating-point data types, and provides limited support for D_floating-point format.

The basic structure of the floating-point execution unit is shown in Figure 2–7.

Figure 2–7 Floating-Point Execution Units

The Fbox contains the following resources:

• 72-entry physical re gister file

• Fully-pipelined multiplier with 4-cycle latency

• Fully-pipelined adder with 4-cycle latency

• Nonpipelined divide unit associated with the adder pipeline

• Nonpipelined square root unit associated with the adder pipeline

The 72 Fbox register file entries contain storage for the values of the 31 Alpha floatingpoint registers (F31 is not stored) and 41 values written by instructions that have not been retired .

2–10 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

The Fbox register file contains six reads ports and four write ports. Four read ports are used to source operands to the add and multiply pipelines, and two read ports are used to source data for store instructions. Two write ports are used to write results generated by the add and multiply pipelines, and two write ports are used to write results from floating-point load instructions.

2.1.4 External Cache and System Interface Unit

The interface for t he system and external cache (Cbox) controls the Bcac he a nd system ports. It contains the following structures:

• Victim address file (VAF)

• Victim data file (VDF)

• I/O write buffer (IOWB)

• Probe queue (PQ)

• Duplicate Dcache tag (DTAG)

2.1.4.1 Victim Address File and Victim Data File

21264/EV67 Microarchitecture

The victim address file (VAF) and victim data file (VDF) together form an 8-entry victim buffer used for holding:

• Dcache blocks to be written to the Bcache

• Istream cache blocks from memory to be written to the Bcache

• Bcache blocks to be written to memory

• Cache blocks sent to the system in response to probe commands

2.1.4.2 I/O Write Buffer

The I/O write buffer (IOWB) consists of four 64-byte entries and associated address and control logic used for buffering I/O write data between the store queue and the system port.

2.1.4.3 Probe Queue

The probe queue (PQ) is an 8-entry queue that holds pending system port cache probe commands and addresses.

2.1.4.4 Duplicate Dcache Tag Array

The duplicate Dcache tag (DTAG) array holds a duplicat e copy of the Dca che tags and is used by the Cbox when processing Dcache fills, Icache fills, and system port probes.

2.1.5 Onchip Caches

The 21264/EV67 contains two onchip primary-level caches.

2.1.5.1 Instruction Cache

The instruction cache (Icache) is a 64KB virtual-addressed, 2-way set-predict cache. Set prediction is us ed t o approximate the performance of a 2-set cache without slowing the cache access time. Each Icache block contains:

• 16 Alpha instructions (64 bytes)

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–11

21264/EV67 Microarchitecture

• Vi rtual tag bits [47:15]

• 8-bit address space number (ASN) field

• 1-bit address space match (ASM) bit

• 1-bit PALcode bit to indicate physical addressing

• Valid bit

• Data and tag parity bits

• Four access-check bits for the following modes: kernel, executive, supervisor, and

user (KESU)

• Additional predecoded information to assist with instruction processing and fetch

control

2.1.5.2 Data Cache

The data cache (Dcache) is a 64KB, 2-way set- associativ e, virtually index ed, physically tagged, write-back, read/write allocate cache with 64-byte blocks. During each cycle the Dcache can perform one of the following transactions:

• Two quadword (or shorter) read transactions to arbitrary addresses

• Two quadword write transactions to the same aligned octaword

• Two non-overlapping less-than-quadword writes to the same aligned quadword

• One sequential read and write transaction from and to the same aligned octaword

Each Dcache block contains:

• 64 data bytes and associated quadword ECC bits

• Physical tag bits

• Valid, dirty, shared, and modified bits

• Tag parity bit calculated across the tag, dirty, shared, and modified bits

• One bit to control round-robin set allocation (one bit per two cache blocks)

The Dcache contains two sets, each with 512 rows containing 64-byte blocks per row (that is, 32K bytes of data per set). The 21264/EV67 requires t wo additional bits of virtual address beyond the bi ts tha t speci fy an 8KB pag e, in orde r to spe cify a Dc ache row index. A given virtual address might be found in four unique locations in the Dcache, depending on the virtual-to-physical translation for those two bits. The 21264/EV67 prevents this aliasing by keeping only one of the four possible translated addresses in the cache at any time.

2.1.6 Memory Referenc e Unit

The memory reference unit (Mbox) controls the Dcache and ensures architecturally correct behavior for load and store instructions. Th e Mbox contains the following structures:

• Load queue (LQ)

• Store queue (SQ)

2–12 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

• Miss address file (MAF)

• Dstream translation bu ffer (DTB)

2.1.6.1 Load Queue

The load queue (LQ) is a reorder buffer for load instructions. It contains 32 entries and maintains the stat e a ssociated with load instructions tha t have be en issued to the Mbox, but for which results have not been delivered to the processor and the instructions retired. The Mbox assigns load instructions to LQ slots based on the order in which they were fetched f rom the Icache, then place s them into the LQ after they are issued by the IQ. The LQ helps ensure corr ect Alpha memory reference behavior.

2.1.6.2 Store Queue

The store queue (SQ) is a reorder buffer and graduation unit for store instructions. It contains 32 entries and maintains the state associated with store instructions that have been issued to the Mbox, but for which data has not been written to the Dcache and the instruction retir ed. The Mbox assigns store instructions to SQ slots based on the order in which they were fetche d from the Icache and places them into the SQ after they are issued by the IQ. The SQ holds data associated with store instructions issued from the IQ until they are retired, at which point the store can be allowed to update the Dcache. The SQ also helps ensure correct Alpha memory reference behavior.

Pipeline Organization

2.1.6.3 Miss Address File

The 8-entry miss address file (MAF) holds physical addresses associated with pending Icache and Dcache fill requests and pending I/O space read transactions.

2.1.6.4 Dstream Translation Buffer

The Mbox includes a 128-entry, fully associative Dstream tra nsl ati on buffer (DTB) used to store Dstream addr ess tr anslat ions and page protec tion i nforma tion. Ea ch of t he entr ies in the DTB can map 1, 8, 64, or 512 contig uous 8KB pa ges. The allocation scheme is round-robin. The DTB supports an 8-bi t ASN and c ontains an ASM bit.

2.1.7 SROM Interface

The serial read-only memory (SROM) interface provides th e initialization data load path from a system SROM to the Icache. Refer to Chapter 7 for more information.

2.2 Pipeline Organization

The 7-stage pipeline provides an optimized environment for executing Alpha instruc-

tions. The pipeline stage s (0 t o 6) are shown in Figur e 2–8 and des cri bed in the following paragraphs.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–13

Pipeline Organization

Figure 2–8 Pipeline Organization

0213456

ALU

Branch

Predictor

Instruction

Cache (64KB) (2-Set)

Integer Register Rename

Four Instructions

FloatingRegister

Rename

Map

Point

Map

Integer

Issue

Queue

(20)

Floating-

Point

Issue

Queue

(15)

Integer

File

Floating-

Point

File

Shifter

ALU Shifter

Multiplier

Address

ALU

Address

ALU

Floating-Point

Add, Divide,

and Square Root

Floating-Point

Multiply

64KB

Data

Cache

Bus

Interface

Unit

System Bus (64 Bits)

Cache Bus (128 Bits)

Physical Address (44 Bits)

FM-05575.AI4

Stage 0 — Instruction Fetch

The branch predictor uses a branch history algorithm to predict a br anc h in st ruction target address.

Up to four aligned instructions are fetched from the Icache, in program order. The branch prediction tables are also accessed in this cycle. The branch predictor uses tables and a branch history algorithm to predict a branch instruction target address for one branch or memory format JSR instruct ion per cycl e. Therefore, the prefetcher is limited to fetching through one branch per cycle. If there is more than one branch within the fetch line, and the branch pre dictor p redicts that the first b ranch will not be t aken, it will predict through subsequ ent branche s at the rate of on e per cycle, un til it pre dicts a ta ken branch or predicts through the last branch in the fetch line.

The Icache array also contains a line prediction field, the contents of which are applied to the Icache in the next cycle . The purpose o f the line predictor is to remove the pipeline bubble which would otherwise be created when the branch predictor predicts a branch to be taken. In effect, the line predictor attempts to pr edict the Ica che line whi ch the branch predictor will generate. On fills, the line predictor value at each fetc h line is initialized with the inde x of the next sequential fetch line, and later retrained by the branch predictor if necessary.

Stage 1 — Instruction Slot

The Ibox maps four instructions per cycle from the 64KB 2-way set-predict Icache. Instructions are mapped in order, executed dynamically, but are retired in order.

2–14 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Pipeline Organization

In the slot stage, the branch predictor compares the next Icache index that it generates to the index that was generated by the line predictor. If there is a mismatch, the branch

predictor wins—the instructions fetched during that cycle are aborted, and the index predicted by the branch predictor is applied to the Icache during the next cycle. Line mispredictions result in one pipeline bubble.

The line predictor ta kes precedence over the branch predictor during memory format calls or jumps. If the line predictor was trained with a true (as opposed to predicted) memory format call or jump target, then its contents take precedence over the target hint field associated with these instructions. This allows dynamic calls or jumps to be correctly predicted.

The instruction fet cher produce s the full VPC addr ess d uring t he fe tc h stage of th e pipe line. The Icache produces the tags for both Icache sets 0 and 1 each time it is accessed. That enables the fetcher to separate set mispredictions from true Icache misses. If the access was caused by a set misprediction, the instruction fetcher aborts the last two fetched slots and refetches the slot in the next cycle. It also retrains the appropriate set prediction bits.

The instruction data is transferred from the Icache to the integer and floating-point register map hardware during this stage. When the integer instr uction is fetched from the Icache and sl otted into the IQ, the slot logi c determines wh ether the instruction is for the upper or lower subclusters. The slot logic makes the decision based on the resources needed by th e (up to four) integer inst ructions in the fetc h block. Althou gh all four instructions need not be issued simultaneously, distributing their resource usage improves instruction loading across the units. For example, if a fetch block contains two instructions that can be placed in either cluster followed by two instructions that must execute in the lower cluster, the slot logic would designate that combination as EELL and slot them as UULL. Slot combinations are described in Sect ion 2.3.2 and Table 2–3.

Stage 2 — Map

Instructions are se nt from the Icache to the integer and floating-poi nt reg is ter maps dur ing the slot stage and register renaming is performed during the map stage. Also, each instruction is assigned a unique 8-bit number, called an inum, which is used to identify the instruction and its program order with respect to other instructions during the time that it is in flight. Instructions are considered to be in flight between the time they are mapped and the time they are retired.

Mapped instructions and their associated inums are placed in the integer and floatingpoint queues by the end of the map stage.

Stage 3 — Issue

The 20-entry integer issue queue (IQ) issues instructions at the rate of four per cycle. The 15-entry floating-point issue queue (FQ) issues floating-point opera te ins tr uct ions, conditional branch instructions, and store instructions, at the rate of two per cycle. Normally, instructions are deleted from the IQ or FQ two cycles after they are issued. For example, if an instruction is issued in cycle n, it remains in the FQ or IQ in cycle n+1 but does not request service, and is deleted in cycle n+2.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–15

Instruction Issue Rules

Stage 4 — Register Read

Instructions iss ued from the issue queues read their o per ands from the integer and floating-point register files and receive bypass data.

Stage 5 — Execute

The Ebox and Fbox pipelines begin execution.

Stage 6 — Dcache Access

Memory reference instructions access the Dcache and data translation buffers. Normally load instructions access the tag and data arrays while store instructions only access the tag arrays. Store data is written to the store queue where it is held until the store instruction is retired. Most integer operate instructions write their register results in this cycl e.

2.2.1 Pipel ine Aborts

The abort penalty as given is measured from the cycle after the fetch stage of the instruction which tr iggers the abort to the fetch stage of the new target, ignoring any Ibox pipeline stalls or queuing delay that the triggering instruction might experience.

Table 2–1 lists the timing associated with each common source of pipeline abort.

Table 2–1 Pipeline Abort Delay (GCLK Cycles)

Abort Condition

Branch misprediction 7 Integer or floating-point conditional branch

JSR misprediction 8 Memory format JSR or HW_RET. Mbox order trap 14 Load-load order or store-load order. Other Mbox replay traps 13 —

DTB miss 13 — ITB miss 7 — Integer arithmetic trap 12 — Floating-point arithmetic

trap

2.3 Instruction Issue Rules

This section defines instruction classes, the functional unit pipelines to which they are issued, and their associated latencies.

Penalty (Cycles) Comments

misprediction.

13+latency Add latency of instruction. See Section 2.3.3 for

instruction latencies.

2–16 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

2.3.1 Instruction Group Definitions

Table 2–2 lists the instruction class, the pipeline assignments, and the instructions included in the class.

Table 2–2 Instruction Name, Pipeline, and Types

Class Name Pipeline Instruction Type

ild L0, L1 All integer load instructions fld L0, L1 All floating-point lo ad instructions ist L0, L1 All integer store instructions fst FST0, FST1, L0, L1 All floating-point store instructions lda L0, L1, U0, U1 LDA, LDAH mem_misc L1 WH64, ECB, WMB rpcc L1 RPCC rx L1 RS, RC

Instruction Issue Rules

mxpr L0, L1

(depends on IPR) ibr U0, U1 Integer conditional branch instructions jsr L0 BR, BSR, JMP, CALL, RET, COR, HW_RET,

iadd L0, U0, L1, U1 Instructions with opcode 10 ilog L0, U0, L1, U1 AND, BIC, BIS, ORNOT, XOR, EQV, CMPBGE ishf U0, U1 Instructions with opcode 12 cmov L0, U0, L1, U1 Integer CMOV — either cluster

imul U1 Integer multiply instructions imisc U0 CTLZ, CT POP, CTTZ, PERR, MINxxx, MAXxxx, PKxx,

fbr FA Floating-point conditional branch inst ru ct ions fadd FA All flo a ting-point operate instructions except multiply,

fmul FM Floating-point multiply instruction fcmov1 FA Fl oat ing-point CMOV—first half fcmov2 FA Fl oat ing-point CMOV— second half

HW_MTPR, HW_MFPR

CALL_PAL

, except CMPBGE

UNPKxx

divide, square root, and conditional move instructions

fdiv FA Floating- poi n t divi de in st r uct i on fsqrt FA Floating-point square root instruction nop None TRAP, EXCB, UNOP - LDQ_U R31, 0(Rx)

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–17

Instruction Issue Rules

Table 2–2 Instruction Name, Pipeline, and Types (Continued)

Class Name Pipeline Instruction Type

ftoi FST0, FST1, L0, L1 FTOIS, FTOIT itof L0, L1 ITOFS, ITOFF, ITOFT mx_fpcr FM Instructions that move data from the floating-point

2.3.2 Ebox Slotting

Instructions that are issued from the IQ, and could execute in either upper or lower Ebox subclusters, are slotted to one pair or the other during the pipeline mapping stage

based on the instruction mixture in the fetch line. The codes that are used in Table 2–3 are as follows:

• U—The instruction only executes in an upper subcluster.

• L—The instruction only executes in a lower subcluster.

control register

• E—The instruction could execute in either an upper or lower subcluster.

Table 2–3 defines the slotting rules. The table field Instruction Class 3, 2, 1 and 0 iden- tifies each instruction’s locati on in the fetch line by the va lue of bits [3:2 ] in its PC.

Table 2–3 Instruction Group Definitions and Pipeline Unit

Instruction Class 3 2 1 0

E E E E U L U L L L L L L L L L E E E L U L U L L L L U L L L U E E E U U L L U L L U E L L U U E E L E U L L U L L U L L L U L E E L L U U L L L L U U L L U U E E L U U L L U L U E E L U L U E E U E U L U L L U E L L U U L E E U L U L U L L U E U L U L U E E U U L L U U L U L E L U L U E L E E U L U L L U L L L U L L E L E L U L U L L U L U L U L U

Slotting 3 2 1 0

Instruction Class 3 2 1 0

Slotting 3 2 1 0

E L E U U L L U L U U E L U U L E L L E U L L U L U U L L U U L E L L L U L L L L U U U L U U U E L L U U L L U U E E E U L U L E L U E U L U L U E E L U L U L E L U L U L U L U E E U U L L U

2–18 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Instruction Issue Rules

Table 2–3 Instruction Group Definitions and Pipeline Unit (Continued)

Instruction Class 3 2 1 0

Slotting 3 2 1 0

Instruction Class 3 2 1 0

Slotting 3 2 1 0

E L U U L L U U U E L E U L L U E U E E L U L U U E L L U U L L E U E L L U U L U E L U U L L U E U E U L U L U U E U E U L U L E U L E L U L U U E U L U L U L E U L L U U L L U E U U U L U U E U L U L U L U U L E E U L U L E U U E L U U L U L E L U L U L E U U L L U U L U L E U U L L U E U U U L U U U U L L E U L L U L E E E L U L U U L L L U L LL L E E L L U U L U L L U U L L U L E E U L U L U U L U E U L U L L E L E L U L U U L U L U L U L L E L L L U L L U L U U U L U U L E L U L U L U U U E E U U L L L E U E L U U L U U E L U U L L L E U L L U U L U U E U U U L U L E U U L L U U U U L E U U L L L L E E L L U U U U L L U U L L L L E L L L U L U U L U U U L U L L E U L L U U U U U E U U U L L L L E L L L U U U U L U U U L

— — U U U U U U U U

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–19

Instruction Issue Rules

2.3.3 Instruction Latencies

After an ins truction is placed in the IQ or FQ, its issue point is determined by the availability of its register operands, functional unit(s), and relationship to other instructions in the queue. There are register producer-consumer dependencies and dynamic functional unit availability dependencies that affect instruction issue. The mapper removes register producer-producer dependencies.

The latency to produce a reg ister resul t is genera lly fi xed. The one exce ption i s for l oad

instructions that miss the Dcache. Table 2–4 lists the latency, in cycles, for each instruction class.

Table 2–4 Instruction Class Latency in Cycles

Class Latency Comments

ild 3

13+

fld 4

14+

ist — Does not produce register value.

fst — Does not produce register value. rpcc 1 Possible 1-cycle cross-cluster delay. rx 1 — mxpr 1 or 3 HW_MFPR: Ebox IPRs = 1.

icbr — Conditional branch. Does not produce register value. ubr 3 Uncond itional branch. Does not produce register value. jsr 3 — iadd 1 Possible 1-cycle Ebox cross-cluster delay. ilog 1 Possible 1-cycle Ebox cross-cluster delay. ishf 1 Possible 1-cycle Ebox cross-cluster delay.

Dcache hit. Dcache miss, latency with 6-cycle Bcache. Add additional Bcache loop latency if Bcache latency is greater than 6 cycles.

Ibox and Mbox IPRs = 3.

HW_MTPR does not produce a register value.

cmov1 1 Only consumer is cmov2. Possible 1-cycle Ebox cross-cluster delay. cmov2 1 Possible 1-cycle Ebox cross-cluster delay. imul 7 Possible 1-cycle Ebox cross-cluster delay. imisc 3 Possible 1-cycle Ebox cross-cluster delay. fcbr — Does not produce register value. fadd 4

2–20 Internal Architecture

Consumer other than fst or ftoi. Consumer fst or ftoi. Measured from when an fadd is issued from the FQ to when an fst or ftoi is issued from the IQ.

Alpha 21264/EV67 Hardware Reference Manual

Table 2–4 Instruction Class Latency in Cycles (Continued)

Class Latency Comments

Instruction Retire Rules

fmul 4

fcmov1 4 Only consumer is fcmov2. fcmov2 4

fdiv 12

9 15 12

fsqrt 18

15 33 30

ftoi 3 —

itof 4 — nop — Does not produce register value.

Consumer other than fst or ftoi. Consumer fst or ftoi. Measured from when an fmul is issued from the FQ to when an fst or ftoi is issued from the IQ.

Consumer other than fst. Consumer fst or ftoi. Measured from when an fcmov2 is issued from the FQ to when an fst or ftoi is is sued from the IQ.

Single precision - latency to consumer of result value. Single precision - latency to using divider again. Double precision - latency to consumer of result value. Double precision - latency to using divider again.

Single precision - latency to consumer of result value. Single precision - latency to using unit again. Double precision - latency to consumer of result value. Double precision - latency to using unit again.

2.4 Instruction Retire Rules

An instruction is retired when it has been executed to completion, and all previous instructions have been retired. The execution pipeline stage in which an instruction

becomes eligible to be retired depends upon the instruction’s class. Table 2–5 gives the minimum retire latencies (assuming that all previous instructions

have been retired) for various classes of instructions.

Table 2–5 Minimum Retire Latencies for Instruction Classes

Instruction Class Retire Stage Comments

Integer conditional branch 7 — Integer multiply 7/13 Latency is 13 cycles for the MUL/V instruction. Integer operate 7 — Memory 10 — Floating-point add 11 — Floating-point multiply 11 —

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–21

Retire of Operate Instructions into R31/F31

Table 2–5 Minimum Retire Latencies for Instruction Classes (Continued)

Instruction Class Retire Stage Comments

Floating-point DIV/SQRT 11 + latency Add latency of unit reuse for the instruction indicated in Table

2–4. For example, latency for a single-precision fdiv would be 11 plus 9 from Table 2–4. Latency is 11 if har dware detects that no exception is possible (see Section 2.4.1).

Floating-point conditional branch

BSR/JSR 10 JSR instruction m ispredict is reported in stage 8.

11 Branch instruction mispredict is reported in stage 7.

2.4.1 Floating-Point Divide/Square Root Early Retire

The floating-point divider and square root unit can detect that, for many combinations of source operand values, no exception can be generated. Instructions with these operands can be retired before the result is generated. When detected, they are retired with the same latency as the FP add class. Early re tirement is not possible for th e following instruction/operand/architecture state conditions:

• Instruction is not a DIV or SQRT.

• SQRT source operand is negative.

• Divide operand exponent_a is 0.

• Either operand is NaN or INF.

• Divide operand exponent_b is 0.

• Trapping mode is /I (inexact).

• INE status bit is 0.

Early retirement is a lso not possi ble f or div ide i nstruc tions if t he res ul ting expon ent has any of the following characteristics (EXP is the result exponent):

• DIVT, DIVG: (EXP >= 3FF

• DIVS, DIVF: (EXP >= 7F

) OR (EXP <= 216)

) OR (EXP <= 38216)

2.5 Retire of Operate Instructions into R31/F31

Many instructions that have R31 or F31 as their destination are retired immediately upon decode (stage 3). These i nstructions do not produce a r esult and ar e removed fro m the pipeline as well. They do not occupy a slot in the issue queues and do not occupy a

functional unit. Table 2–6 lists th ese instructions and some of their char act er is ti cs . The instruction type in Table 2–6 is from Table C-6 in Append ix C o f the Alpha Ar c hitecture Handbook, Version 4.

2–22 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Table 2–6 Instructions Retired Without Execution

Instruction Type Notes

INTA, INTL, INTM, INTS All with R31 as destination. FLTI, FLTL, FLTV All with F31 as destination. MT_FPCR is not included

because it has no destination—it is never removed from the

pipeline. LDQ_U All with R31 as destination. MISC TRAPB and EXCB are always removed. Others are never

removed. FLTS All (SQRT, ITOF) with F31 as destination.

2.6 Load Instructions to R31 and F31

This section describe s how th e 21264/EV67 processes software-directed prefetc h tr ans actions and load instructions with a destination of R31 and F31.

Load Instructions to R31 and F31

Prefetches allocat e a M AF entry. How the MAF entry is allocated is what distinguishes the type of prefetch. A normal prefetch is equivalent to a normal load MAF (that is , a MAF entry that puts the block into the Dcache in a readable state). A prefetch with modify intent is equivalent to a normal st ore MAF (that is, a MAF entry that puts the block into the Dcache in a wri teabl e stat e). A pref etch, evi ct next , is equi valen t to a nor mal load MAF, with the additional behavior described in Section 2.6.3, below.

A prefetch is not performed if the prefetch hits in the Dcache (as if it were a normal load).

Load operations to R31 and F31 may generate exceptions. These exceptions must be dismissed by PALcode.

The following sections describe the operational prefetch behavior of these instructions.

2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU, HW_LDL Instructions

The 21264/EV67 processes these instructions as normal cache line prefetches. If the load instruction hits the Dcache, the instruction is dismissed, otherwise the addressed cache block is allocated into the Dcache.

The HW_LDL instruction construct equates to the HW_LD instruction with the LEN

field clear. See Table 6–3.

2.6.2 Prefetch with Modify Intent: LDS Instruction

The 21264/EV67 processes an LDS instruction, with F31 as the destination, as a prefetch with modify intent t ransact ion (ReadBlkM od command). I f the tr ansactio n hits a dirty Dcache block, the instr uction is dismissed . Otherwise, the addr essed cache block is allocated into the Dcache for write access, with its dirty and modifie d bits set.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–23

Special Cases of Alpha Instruction Execution

2.6.3 Prefetch, Evi ct Next: LDQ and HW_LDQ Instructions

The 21264/EV67 processes this instruction like a normal prefetch transaction (Read-

BlkSpec command), with one exception—if the load misses the Dcache, the addressed cache block is allocated into the Dcache, but the Dcache set allocation pointer is left pointing to this block. The next miss to the same Dcache line will evict the block. For example, this instruct ion might be use d when softwar e is reading an array tha t is known to fit in the offchip Bcache, but will not fit into the onchip Dcache. In this case, the instruction ensure s th at th e hardwa re prov ides t he desi red pr efet ch func tion wi thout d isplacing useful cache blocks stored in the other set within the Dcache.

The HW_LDQ instruction construct equates to the HW_LD instruction with the LEN field set. See Table 6–3.

2.6.4 Prefetch with the LDx_L / STx_C Instruction Sequence

A prefetch within a dynamic 80-instruction window of a LDx_L instruction can cause the subsequent STx_C to incorrectly succeed when all three references are to the same 64-byte cache block. Within that 80-instruction window, the proximity of the prefetch to the LDx_L instruction directly affects the possibility of the incorrect behavior. Further, if t he pre fe tc h issu es befo re the LDx_L, the er ro r cannot occur, and if the prefetch issues after the LDx_L, the error can only occur when another processor is simultaneously acquiring the same lock.

2.7 Special Cases of Alpha Instruction Execution

This section describes the mechanisms that the 21264/EV67 uses to process irregular instructions in the Alpha instruction set, and cases in which the 21264/EV67 processes instructions in a non-intuitive way.

2.7.1 Load Hit Speculation

The latency of integer load instructions that hit in the Dcac he is three cycles. Figure 2– 9 shows the pipeline timing for these integer load instructions. In Figure 2–9:

Symbol Meaning

Q Issue queue R Register file read E Execute D Dcache access B Data bus active

2–24 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Special Cases of Alpha Instruction Execution

Figure 2–9 Pipeline Timing for Integer Load Instructions

Hit

1Cycle Number

2 3 4 5 6 7 8

ILD Instruction 1 Instruction 2

QREDB

FM-05814.AI4

There are two cycles in which the IQ may spec ul ati v e ly is sue inst ru ctions that use load data before Dcache hit infor mat ion is known. An y inst ructions th at are is sue d by the IQ within this 2-cycle speculative window are ke pt in the IQ with their requests inhibited

until the load instru ction’s hit condition is known, even if they are not depe nden t on the load operation. If the lo ad instr uc tion hit s, then the se inst ruc tions are remo ved from the queue. If the load instruction misses, then the execution of these instructions is aborted and the instructions are allowed to request service again.

For example, in Figure 2–9, instruction 1 and instruction 2 are issued within the speculative window of the load instruction. If the load instruction hits, then both instructions will be deleted from the queue by the start of cycle 7—one cycle later than normal for instruction 1 and at the norm al time for instruc tion 2. If the load inst ruction misses , both instructions are aborted from the execution pipelines and may request service again in cycle 6.

IQ-issued instructi ons are aborte d if iss ued with in the sp eculat ive win dow of an int eger load instruction that missed in the Dcache, even if they are not dependent on the load data. However, if software misses are likely, the 21264/EV67 can still benefit from scheduling the instruction stream for Dcache miss latency. The 21264/EV67 includes a saturating counter that is incremented when load instructions hit and is decremented when load instructions miss. When the upper bit of the counter equals zero, the integer load latency is incr eased to five cycles and the speculative window is removed. The counter is 4 bits wide and is incremented by 1 on a hit and is decremented by two on a miss.

Since load instructions to R31 do not produce a result, they do not create a speculative window when they execute and, therefore, never waste IQ-issue cycles if they miss.

Floating-point load instructions that hit in the Dcache have a latency of four cycles. Figure 2–10 shows the pipeli ne timing for floating-point load instructions. In Figure 2–10:

Symbol Meaning

Q Issue queue R Register file read E Execute D Dcache access B Data bus active

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–25

Special Cases of Alpha Instruction Execution

Figure 2–10 Pipeline Timing for Floating-Point Load Instructions

1Cycle Number

2 3 4 5 6 7 8

Hit

FLD Instruction 1 Instruction 2

QREDB

The speculative window for floating-point load instructions is one cycle wide. FQ-issued instruction s that are issue d within the spe culative window of a floating- point load instruction that has missed, are only aborted if they depend on the load being successful.

For example, in Figure 2–10 instruction 1 is issued in the speculative window of the load instruc tion.

If instruction 1 is not a user of the data returned by the load instruction, then it is removed from the queue at its normal time (at the start of cycle 7).

If instruction 1 is dependent on the load instruction data and the load instruction hits , instruction 1 is removed from the queue one cycle later (at the start of cycle 8). If the load instruction misses, then instruction 1 is aborted from the Fbox pipeline and may request service again in cycle 7.

2.7.2 Floating-Point Store Instructions

FM-05815.AI4

Floating-point store instructions are duplicated and loaded into both the IQ and the FQ from the mapper. Each IQ entry contains a control bit, fpWait, that when set prevents that entry from asserting its requests. This bit is initially set for each floating-point store instruction that enter s the IQ, unless it was the ta rget of a replay trap. The instruction’s FQ clone is issued when its Ra register is about to become clean, resulting in its IQ clone’s fpWait bit being cleared and allowing the IQ clone to issue and be executed by the Mbox. This mechanism ensures that floating-point store instructions are always issued to the Mbox, along wi th t he associated data, without requiring the float in g-p oin t register dirty bits to be available wi thin the IQ.

2.7.3 CMOV Instruction

For the 21264/EV67, the Alpha CMOV instruction has three operands, and so presents a special case. The required operation is to move either the value in register Rb or the value from the old physical destination register into the new destination register, based upon the value in Ra. Since neither the mapper nor the Ebox and Fbox data paths are otherwise required to handle three operand instructions, the CMOV instruction is decomposed by the Ibox pipeline into two 2-operand instructions:

The Alpha architecture instruction CMOV Ra, Rb ⇒ Rc Becomes the 21264/EV67 instructions CMOV1 Ra, oldRc ⇒ newRc1 CMOV2 newRc1, Rb ⇒ newRc2

2–26 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Memory and I/O Address Space Instructions

The first instructi on, CM OV1, tests the value of Ra and records the result of this te st in a 65th bit of its destina tion r egister, newRc1. It also copies the value of th e old phys ical destination register, oldRc, to newRc1.

The second instruction, CMOV2, t hen copies eit her the value in newRc 1 or the value in Rb into a second physical destination register, newRc2, based on the CMOV predicate bit stored in newRc1.

In summary, the original CMOV instruction is decomposed into two dependent instructions that each use a physical register from the free list.

To further simplify this operation, the two component instructions of a CM OV instruction are driven thr ough the map pers in s uccessive cycles. Hence, i f a fetc h line conta ins n CMOV instructions, it takes n+1 cycles to run that fetch line through the mappers.

For example, the following fetch line:

ADD CMOVx SUB CMOVy

Results in the following three map cycles:

ADD CMOVx1 CMOVx2 SUB CMOVy1 CMOVy2

The Ebox executes intege r CMOV instructions as two distinct 1-cycle latency operations. The Fbox add pipeline executes fl oating-point CMOV instruc tions as two distinct 4-cycle latency operations.

2.8 Memory and I/O Address Space Instructions

This section provide s an ove rview o f the way th e 21264 /EV67 pro cesses memory an d I/ O address space instructions.

The 21264/EV67 supports, and internally recognizes, a 44-bit physical address space that is divided equally between memory address space and I/O address space. Memory address space resides in the lower half of the physical address space (PA[43]=0) and I/O address space resides in the upper half of the physical address space (PA[43]=1).

The IQ can issue any combination of load and store instructions to t h e Mbox at the rate of two per cycle. The two lower Ebox subclusters, L0 and L1, generate the 48-bit effective virtual address for these instructions.

An instructi on is defined to be newer than another instruction if it follows that instruction in program order and is older if it precedes that instruction in program order.

2.8.1 Memory Address Space Load Instructions

The Mbox begins execution of a load instruction by translating its virtual address to a physical address using the DTB and by accessing the Dcache. The Dcache is virtually indexed, allowing these two operations to be done in parallel. The Mbox puts information about the load instruction, including its physical address, destination register, and data format, into the LQ.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–27

Memory and I/O Address Space Instructions

If the requested physical location is found in the Dcache (a hit), the data is formatted and written int o the ap propri ate int eger or floati ng-poin t regis ter. If the location is n ot in the Dcache (a miss), the physical add ress is placed in the miss address file (MAF) for processing by the Cbox. The MAF perf orms a merging function in which a new miss address is compared to mis s addresse s already held in the MAF. If the new miss address points to the same Dcache block as a miss address in the MAF, then the new miss address is d iscarded.

When Dcache fill data is returned to the Dcache by the Cbox, the Mbox satisfies the requesting l oad instructio ns in the LQ.

2.8.2 I/O Address Space Load Instructions

Because I/O space load instructions may have side effects, they cannot be performed speculatively. When the Mbox receives an I/O space load instruction, the Mbox places the load instruction in the LQ, where it is held until it retires. The Mbox replays retired I/O space load instructions from the L Q to the MAF in program order, at a rate of one per GCLK cycle.

The Mbox allocates a new MAF entry to an I/O load instruction and inc reases I/O band -

width by attempting to mer ge I/O loa d instruc tions in a mer ge re gister. Tabl e 2–7 shows the rules for merging data. The columns represent the load instructions replayed to the MAF while the rows represent the size of the load in the merge register.

Table 2–7 Rules for I/O Address Space Load Instruction Data Merging

Merge Register/ Replayed Instruction Load Byte/Word Load Longword Load Quadword

Byte/Word No merge No merge No merge Longword No merg e Merge up to 32 bytes No merge Quadword No merge No merge Merge up to 64 bytes

In summary, Table 2–7 shows some of the following rules:

• Byte/word load instruct ions and different size loa d instructions are not allowed to

merge.

• A stream of ascending non-ove rlapping, but not necessarily consecutive, longwo rd

load instructions are allowed to merge into naturally aligned 32-byte blocks.

• A stream of ascending non-ove rlappi ng, but no t nece ssari ly con secuti ve, quadwor d

load instructions are allowed to merge into naturally aligned 64-byte blocks.

• Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on

the Cbox WRITE_ONCE chain 32_BYTE_IO field.

• Issued MB, WMB, and I/O load instructions close the I/O register merge window.

To minimize latency, the merge window is also closed when a timer detects no I/O store instruction activity for 1024 cycles.

After the Mbox I/O regist er has closed its merge window, the Cbox sends I/O read requests offchip in the order that they were re ceived from the Mbox.

2–28 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Memory and I/O Address Space Instructions

2.8.3 Memory Address Space Store Instructions

The Mbox begins execution of a store instruction by translating its virtual address to a physical address using the DTB and by probing the Dcache. The Mbox puts information about the store instruction, including its physical address, its data and the results of the Dcache probe, into the store queue (SQ).

If the Mbox does not find the addressed location in the Dcache, it places the address into the MAF for pro ces si ng b y the Cbox. If the Mbox finds the addressed lo ca ti on i n a Dcache block that is not dirty, then it places a ChangeToDirty request into the MAF.

A store instruction can write its data into the Dcache when it is retired, and when the Dcache block containing its address is dirty and not shared. SQ entries that meet these two conditions can be placed into the writable state. These SQ entries are placed into the writable state in program order at a maximum rate of two entries per cycle. The Mbox transfers writable store queue entry data from the SQ to the Dcache in program order at a maximum rate of two entries per cycle. Dcac he lines associa ted with writab le store queue entries are locked by the Mbox. System port probe commands cannot evict these blocks until their associated writable SQ entries have been transferred into the Dcache. This restriction assists in STx_C instruction and Dcache ECC processing.

SQ entry data that has not been t ransfer red to th e Dcache may sour ce data t o newer loa d instructions. The Mbox compares the virtual Dcache index bits of incoming load instructions to queued SQ entries, and sources the data from the SQ, bypassing the Dcache, when necessary.

2.8.4 I/O Address Space Store Instructions

The Mbox begins processing I/O space store instructions, like memory space store instructions, by translating the virtual address and placing the state associated with the store instru ction into the SQ.

The Mbox replays retired I/O space store entries from the SQ to the IOWB in program order at a rate of one per GCLK cycle. The Mbox never allows queued I/O space store instructions to source data to subsequent load instructions.

The Cbox maximizes I/O bandwidth when it a llocates a new IOWB entry to an I/O store instruction by attempting to merge I/O store instructions in a merge register. Table

2–8 shows the rule s for I/O s pace sto re instr uction d ata mer ging. Th e column s represent the load instructi ons replayed to the IOWB while the rows re present th e size of the store in the merge register.

Table 2–8 Rules for I/O Address Space Store Instruction Data Merging

Merge Register/ Replayed Instruction

Store Byte/Word Store Longword Store Quadword

Byte/Word No merge No merge No merge Longword No merg e Merge up to 32 bytes No merge Quadword No merge No merge Merge up to 64 bytes

Table 2–8 shows some of the following rules:

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–29

MAF Memory Address Space Merging Rules

• Byte/word store instructions and different size store instructions are not allowed to

merge.

• A stream of ascending non-ove rlapping, but not necessarily consecutive, longwo rd

store instructions are allowed to merge into naturally aligned 32-byte blocks.

• A stream of ascending non-ove rlappi ng, but no t nece ssari ly con secuti ve, quadwor d

store instructions are allowed to merge into naturally aligned 64-byte blocks.

• Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on

the Cbox WRITE_ONCE chain 32_BYTE_IO field.

• Issued MB, WMB, and I/O load instructions close the I/O register merge window.

To minimize latency, the merge window is also closed when a timer detects no I/O store instruction activity for 1024 cycles.

After the IOWB merge register has closed its merge windo w, the Cbox sends I/O space store requests offchip in the order that they were received from the Mbox.

2.9 MAF Memory Address Space Merging Rules

Because all memory trans actio ns are to 6 4-byte blocks , ef fic iency i s impro ved by merg-

ing several small data transactions into a single larger data transaction. Table 2–9 lists the rules the 21264/EV67 uses when merging memory transactions into 64-byte naturally aligned data block transactions. Rows represent the merged instruction in the MAF and columns represent the new issued transaction.

Table 2–9 MAF Merging Rules

MAF/New LDx STx STx_C WH64 ECB Istream

LDx Merge ————— STxMergeMerge———— STx_C——Merge——— WH64———Merge—— ECB————Merge— Istream—————Merge

In summary, Table 2–9 shows that only like instruction types, with the exception of load instructions merging w ith store instructions, are merged.

2.10 Instructio n Ordering

In the absence of explicit instruction ordering, such as with MB or WMB instructions, the 21264/EV67 maintains a default instruct ion ordering relationship between pairs of load and store instructions.

2–30 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Replay Traps

The 21264/EV67 maintains the default memory data instruction ordering as shown in

Table 2–10 (assume address X and address Y are different).

Table 2–10 Memory Reference Ordering

First Instruction in Pair Second Instruction In Pair Reference Order

Load memory to address X Load memory to address X Maintained (litmus test 1) Load memory to address X Load memory to address Y Not maintained Store memory to address X Store memory to address X Maintained Store memory to address X Store memory to address Y Maintained Load memory to address X Store memory to address X Maintained Load memory to address X Store memory to address Y Not maintained Store memory to address X Load memory to address X Maintained Store memory to address X Load memory to address Y Not maintained

The 21264/EV67 maintains t he defa ult I/ O instru ctio n order ing as sho wn in Table 2–11 (assume address X and address Y are different).

Table 2–11 I/O Reference Ordering

First Instruction in Pair Second Instruction in Pair Reference Order

Load I/O to address X Load I/O to address X Maintained Load I/O to address X Load I/O to address Y Maintained Store I/O to address X Store I/O to address X Maintained Store I/O to address X Store I/O to address Y Maintained Load I/O to address X Store I/O to address X Maintained Load I/O to address X Store I/O to address Y Not maintained Store I/O to address X Load I/O to address X Maintained Store I/O to address X Load I/O to address Y Not maintained

2.11 Replay Traps

There are some situat ions in whic h a loa d o r store instr uctio n canno t b e execu ted due to a condition that occurs after that in structi on issues fr om the IQ or FQ. The inst ruction is aborted (along with all newer instructions) and restarted from the fetch stage of the pipeline. This mechanism is called a replay trap.

2.11 .1 Mbox Order Traps

Load and store instructions may be issued from the IQ in a different order than they were fetched from the Icach e, while the architecture dictates that Dstream memory transactions to the same physical bytes must be completed in order. Usually, the Mbox manages the memory reference stream by itself to achieve architecturally correct behavior , but t he two ca ses in whi ch the Mbox uses re play tr aps to man age the memor y stream are load-load and store-load order traps.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–31

I/O Write Buffer and the WMB Instruction

2.11.1.1 Load-Load Order Trap

The Mbox ensures that load instructions that read the same physical byte(s) ultimately issue in correct order by using the load-load order trap. The Mbox compares the address of each load instruction, as it is issued, to the address of all load instructions in the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes a load-load order trap on the newer in struction. This is a replay trap that aborts the target of the trap and all newer instructions from the machine and refetches instructions starting at the target of the trap.

2.11.1.2 Stor e-Load Order Trap

The Mbox ensures that a load instruction ultimately issues after an older store instruction that writes some portion of its memory operand by using the store-load order trap. The Mbox compares the address of each store instruction, as it is issued, to the address of all load instruct ions in the load queue. If the Mbox finds a newer load instruction in the load queue, i t invokes a store-load order trap on the load instr uction. This is a repla y trap. It functions like the load-load ord er tr ap.

The Ibox contains extra hardware to reduce the frequency of the store-load trap. There is a 1-bit by 1024-entr y VPC-inde xed tab le in t he Ibox calle d the st Wait table. When an Icache instruction is fetched, the associated stWait table entry is fetched along with the Icache instruction. The stWait table produces 1 bit for each instruction accessed from the Icache. When a loa d i nstru ction ge ts a store-load order repl ay tr ap, it s asso ciat ed bit in the stWait table is set during the cycle that the load is refetched. Hence, the trapping

load instruc tion’s stWait bi t will be set the next time it is fetched. The IQ will not issue load instructions whose stW ait bit is set while there are older unis-

sued store i nstructions in the queue. A load instruction whose stWait bit is set can be issued the cycle immediately after the last older store instruction is issued from the queue. All the bi ts in t he stWait table are unconditiona lly cle ared ever y 16384 c ycles, or every 65536 cycles if I_CTL[ST_WAIT_64K] is set.

2.11.2 Other Mbox Replay Traps

The Mbox also uses replay traps to control the flow of the load queue and store queue, and to ensure that there are never multiple outstanding misses to different physical addresses that map to the sa me Dcac he or Bc ache l ine. Unl ike th e order tra ps, howeve r, these replay traps are invoked on the incoming instruction that triggered the condition.

2.12 I/O Write Buffer and the WMB Instruction

The I/O write buffer (IOWB ) consists of four 64-byte entries with the associated address and control logic used to buffer I/O write data between the store queue (SQ) and the system port.

2.12.1 Memory Barrier (MB/WMB/TB Fill Flow)

The Cbox CSR SYSBUS_MB_ENABLE bit determines if MB instructions produce external system port transactions. When the SYSBUS_MB_ENABLE bit equals 0, the Cbox CSR MB_CNT[3:0] field contains the number of pending uncommitted transactions. The counter will increment for each of the following commands:

• RdBlk, RdBlkMod, RdBlkI

2–32 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

• RdBlkSpec (valid), RdBlkModSpec (valid), RdBlkSpecI (valid)

• RdBlkVic, RdBlkModVic, RdBlkVicI

• CleanToDirty, SharedToDirty, STChangeToDirty, InvalToDirty

• FetchBlk, FetchBlkSpec (valid), Evict

• RdByte, RdLw, RdQw, WrByte, WrLW, WrQW

The counter is decremented with the C (commit) bit in the Probe and SysDc commands (see Section 4.7.7). Syst ems can assert the C bit in the SysDc fill response to the commands that originally i ncremen ted the counter, or attached to the last probe s een by tha t command when it reac hed the syst em seri aliz ation point . If the nu mber of unc ommitted transactions reaches 15 (saturating the counter), the Cbox will stall MAF and IOWB processing until at least one of the pending transac ti ons h as been committed. Probe processing is not interrupted by the state of this counter.

2.12.1.1 MB Instruction Processing

When an MB instruction is fetched in the predicted instruction execution path, it stalls in the map stage of the pipeline. This also stalls all instructions after the MB, and control of instruction flow is base d upon t he val ue in Cbox CSR SYSBUS_MB_ENABLE as follows:

I/O Write Buffer and the WMB Instruction

• If Cbox CSR SYSBUS_MB_ENABLE is clear, the Cbox waits until the IQ is

empty and then performs the following actions: a. Sends all pending MAF and IOWB entries to the system port.

b. Monitors Cbox CSR MB_CNT[3:0], a 4-bit counter of outstanding committed

events. When the counter decr ements from one to zero, the Cbox marks the youngest probe queue entry.

c. Wait s until the MAF contains no more Dst ream refer ences and th e SQ, LQ, and

IOWB are empty.

When all of the above have occurred and a probe response has been sent to the system for the marked probe queue entry, instruction execution continues with the instruction after the MB.

• If Cbox CSR SYSBUS_MB_ENABLE is set, the Cbox waits until the IQ is empty

and then performs the following actions: a. Sends all pending MAF and IOWB entries to the system port b. Sends the MB command to the system port c. Waits until the MB command is acknowledged, then marks the youngest entry

in the probe queue

d. Waits until the MAF contains no more Dstr eam referen ces and the SQ, LQ, and

IOWB are empty

When all of the above have occurred and a probe response has been sent to the system for the marked probe queue entry, instruction execution continues with the instruction after the MB.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–33

I/O Write Buffer and the WMB Instruction

Because the MB instruction is executed speculatively, MB processing can begin and the original MB ca n be killed. In the internal acknowledge case, the MB may have already been sent to the system interface, and the system is still expected to respond to the MB.

2.12.1.2 WMB Instruction Processing

Write memor y barrier (WMB) inst ructions are issued int o the Mbox st ore-queu e, where they wait until they are retired and all prio r store instructions become writable. The Mbox then stalls the writable poi nter and informs the Cbox. The Cbox closes the IOWB merge register and responds in one of the following two ways:

• If Cbox CSR SYSBUS_MB_ENABLE is clear, the Cbox performs the following

actions: a. Stalls further MAF and IOWB processing.

b. Monitors Cbox CSR MB_CNT[3:0], a 4-bit counter of outstanding committed

events. When the counter decr ements from one to zero, the Cbox marks the youngest probe queue entry.

c. When a probe response has been s ent to t he sy stem for the mar ked probe q ueue

entry, the Cbox considers the WMB to be sa tisfied.

• If Cbox CSR SYSBUS_MB_ENABLE is set, the Cbox performs the following

2.12.1.3 TB Fill Flow

Load instructions (HW_LDs) to a virtua l page table entry (VPTE) are processed by the 21264/EV67 to avoid litmus test problems associated with the ordering of memory transactions from anoth er pr ocessor against loading of a page table entry and the subsequent virtual-mode load from this proces sor.

Consider the sequence shown in Table 2–12. The data could be in the Bcach e. Pj should fetch datai if it is using PTEi.

Table 2–12 TB Fill Flow Example Sequence 1

Pi Pj

actions: a. Stalls further MAF and IOWB processing. b. Sends the MB command to the system port. c. Waits until the MB command is acknowledged by the system with a SysDc

MBDone command, then sends acknowle dge and marks the youngest entry in the probe queue.

d. When a probe r esponse has be en sent to t he syst em for the mar ked pro be qu eue

entry, the Cbox considers the WMB to be sa tisfied.

Write Datai Load/Store datai MB <TB miss> Write PTEi Load-PTE

2–34 Internal Architecture

<write TB> Load/Store (restart)

Alpha 21264/EV67 Hardware Reference Manual

I/O Write Buffer and the WMB Instruction

Also consider the relate d sequ ence shown in Table 2–13. In this case, the data could be cached in the Bcache; Pj should fetch datai if it is using PTEi.

Table 2–13 TB Fill Flow Example Sequence 2

Pi Pj

Write Datai Istream read datai MB <TB miss> Write PTEi Load-PTE

<write TB> Istream read (restart) - will miss the Icache

The 21264/EV67 processes Dstream loads to the PTE by injecting, in hardware, some memory barrier processing between the PTE transaction and any subsequent load or store instruction. This is accomplished by the following mechanism:

1. The integer queue issues a HW_LD instruction with VPTE.

2. The integer queue issues a HW_MTPR instruction with a DTB_PTE0, that is datadependent on the HW_LD instruction with a VPTE, and is required in order to fill the DTBs. The HW_MTPR instruction, when que ued, set s I PR scoreboard bits [4] and [0].

3. When a HW_MTPR instruction with a DTB_PTE0 is issued, the Ibox signals the Cbox indicating that a HW_LD instruction with a VPTE has been processed. This causes the Cbox to begin processing the MB instruction. The Ibox prevents any subsequent memory operation s bei ng is sued by not clearing the IPR scoreboard bit [0]. IPR scoreboard bit [0] is one of the scoreboard bits associated with the HW_MTPR instruction with DTB_PTE0.

4. When the Cbox completes processing the MB instruction (using one of the above sequences, depending upon the state of SYSBUS_MB_ENABLE), the Cbox signals the Ibox to clear IPR scoreboard bit [0].

The 21264/EV67 uses a similar mechanism to process Istream TB misses and fills to the PTE for the Istream.

1. The integer queue issues a HW_LD instruction with VPTE.

2. The IQ issues a HW_MTPR instruction with an ITB_PTE that is data-dependent upon the HW_LD instruction with VPTE. This is required in order to fill th e ITB. The HW_MTPR instruction, when queued, sets IPR scoreboard bits [4] and [0].

3. The Cbox issues a HW_MTPR instruction for the ITB_PTE and signals the Ibox that a HW_LD/VPTE instruction has been proce sse d, causing the Cbox to start processing the MB instruction. The Mbox stalls Ibox fetching from when the HW_LD/ VPTE instruction finishes until the probe queue is drained.

4. When the 21264/EV67 is finished (SYS_MB selects one of the above sequences), the Cbox directs th e Ibox t o clear IPR scoreb oard bit [0]. Also , the Mbo x direct s the Ibox to start prefetching.

Inserting MB instruction processing within the TB fill flow is only required for multiprocessor systems. Uniprocessor systems can disable MB instruction processing by deasserting Ibox CSR I_CTL[TB_MB_EN].

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–35

Performance Measurement Support—Performance Counters

0050

2.13 Performance Measurement Support—Performance Counters

The 21264/EV67 provides hardware support for two methods of obtaining program performance feedback information. The two methods do not require program modification. The first method offers similar capabilities to earlier microprocessor performance counters. The second method sup ports the n ew Profi leMe way of s tati sticall y sampl ing individual instruct ions dur ing prog ram execut ion to dev elop a model of progra m execu tion. Both methods use the same hardware registers.

See Section 6.10 for information about counter control.

2.14 Floating-Point Control Register

The floating-point control register (FPCR) is shown in Figure 2–11.

Figure 2–11 Floating-Point Control Register

63 62 6160 59 4958 4857 475655 54 5352 51 50 0

SUM

INED UNFD UNDZ

DYN

IOV

INE UNF OVF DZE

INV

OVFD DZED

INVD

DNZ

The floating-point control register fields are described in Table 2–14.

Table 2–14 Floating-Point Control Register Fields

Name Extent Type Description

SUM [63] RW Summary bit. Records bit-wise OR of FPCR exception bits. INED [62] RW Inexact Disable. If this bit is set and a floating-point instruction that enables

trapping on inexact results generates an inexact value, the result is placed in the destination register and the trap is suppressed.

LK99-

2–36 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Floating-Point Control Register

Table 2–14 Floating-Point Control Register Fields (Continued)

Name Extent Type Description

UNFD [61] RW Underflow Disable. The 21264/EV67 hardware cannot generate IEEE compli-

ant denormal results. UNFD is used in conjunction with UNDZ as follows:

UNFD UNDZ Result

0 X Underflow trap. 1 0 Trap to supply a possible denormal result. 1 1 Underflow trap suppressed. Destination is written wit h a

true zero (+0.0).

UNDZ [60] RW Underflow to zero. When UNDZ is set together with UNFD, underflow traps

are disabled and the 21264/EV67 places a true zero in the destination register. See UNFD, above.

DYN [59:58] RW Dynamic rounding mode. Indicates the rounding mode to be used by an IEEE

floating-point instruction when the instruction specifies dynamic rounding mode:

Bits Meaning

00 Chopped 01 Minus infinity 10 Normal 11 Plus infinity

IOV [57] RW Integer overflow. An integer arithmetic operation or a conversion from float-

ing-point to integer overflowed the destination precision.

INE [56] RW Inexact result. A floating-point arithmetic or conv ersion o peration gav e a result

that differed from the mathematically exact result.

UNF [55] RW Underflow. A floating-point arithmetic or conversion operation gave a result

that underflowed the destination exponent.

OVF [54] RW Overflow. A flo ating-point arithmetic o r conversion o peration gave a result that

overflowed the destination exponent.

DZE [53] RW Divide by zero. An attempt was made to perform a floating-point divide with a

divisor of zero.

INV [52] RW Invalid operation. An attempt was made to perform a floating -point arithmetic

operation and one or more of its operand values were illegal.

OVFD [51] RW Overflow disable. If thi s b i t is s e t and a f lo a ti ng-point arithmetic operation gen-

erates an overflow condition, then the appropriate IEEE nontrapping result is placed in the destination register and the trap is suppressed.

DZED [50] RW Division by zero disable. If this bit is set and a floating-point divide by zero is

detected, the appropriate IEEE nontrapping result is placed in the destination register and the trap is suppressed.

INVD [49] R W Invalid operation disable. If thi s bit is set and a fl oatin g-point op erate gener ates

an invalid operation condition and 21264/EV67 is capable of producing the correct IEEE nontrapping result, that result is placed in the destination register and the trap is suppressed.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–37

AMASK and IMPLVER Instruction Values

Table 2–14 Floating-Point Control Register Fields (Continued)

Name Extent Type Description

DNZ [48] RW Denormal operands to zero. If this bit is set, treat all Denormal operands as a

signed zero value with the same sign as the Denormal operand.

Reserved [47:0]

Alpha architecture FPCR bit 47 (DNOD) is not implemented by the 21264/EV67.

——

2.15 AMASK and IMPLVER Instruction Values

The AMASK and IMPLVER instructions return processor type and supported architecture extensions, respectively.

2.15.1 AMASK

The 21264/EV67 returns the AMASK instruction values provided in Table 2–15. The I_CTL register reports the 21264/EV67 pass level (see I_CTL[CHIP_ID], Section

5.2.15).

Table 2–15 21264/EV67 AMASK Values

21264/EV67 Pass Level AMASK Feature Mask Value

See I_CTL[CHIP_ID], Table 5–11 307

The AMASK bit definitions provided in Table 2–15 are defined in Table 2–16.

Table 2–16 AMASK Bit Assignments

Bit Meaning

0 Support for the byte/word extension (BWX)

The instructions that comprise the BWX extension are LDBU, LDWU, SEXTB, SEXTW, STB, and STW.

1 Support for the square-root and floating-point convert extension (FIX)

The instructions that comprise the FIX extension are FT OIS, FT OIT, ITOFF , IT OFS, ITOFT, SQRTF, SQRTG, SQRTS, and SQRTT.

2 Support for the count extension (CIX)

The instructions that comprise the CIX extension are CTLZ, CTPOP, and CTTZ.

8 Support for the multimedia extension (MVI)

The instructions that comprise the MVI extension are MAXSB8, MAXSW4, MAXUB8, MAXUW4, MINSB8, MINSW4, MINUB8, MINUW4, PERR, PKLB, PKWB, UNPKBL, and UNPKBW.

9 Support for precise arithmetic trap reporting in hardware. The trap PC is the same as

the instruction PC after the trapping instruction is executed.

2.15.2 IMPLVER

For the 21264/EV67, the IMPLVER instruction returns the value 2.

2–38 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

2.16 Design Examples

21272 Core

64-bit PCI Bus

FM-05573-EV67

The 21264/EV67 can be designed into many di fferent uniprocessor and multiprocessor

system configurations. Figures 2–12 and 2–13 illustrate two possible configurations. These configurations employ additional system/memory controller chipsets.

Figure 2–12 shows a typical uniprocessor system with a second-level cache. This system configuration could be used in standalone or networked workstations.

Figure 2–12 Typical Uniprocessor Configuration

Design Examples

L2 Cache

Tag

Store

Data

Store

21264

Tag

Address

Data

Address

Out

Address

Data

Logic Chipset

Control

Chips

Data Slice

Chips

Host PCI

Bridge Chip

Duplicate Tag Store (Optional)

DRAM Arrays

Address Data

Figure 2–13 shows a typical multip roc essor sys tem, each p rocess or with a second -leve l cache. Each interface controller must employ a duplicate tag store to maintain cache coherency. This system configuration could be used in a networked database server application.

Alpha 21264/EV67 Hardware Reference Manual

Internal Architecture 2–39

Design Examples

Address

FM-05574-EV67

Figure 2–13 Typical Multiprocessor Configuration

Cache

21264

Host PCI

Bridge Chip

64-bit PCI Bus

21272 Core

Logic Chipset

Control

Chip

Data Slice

Chips

Host PCI

Bridge Chip

64-bit PCI Bus

DRAM

Arrays

Address

Data

DRAM

Arrays

Data

2–40 Internal Architecture

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface

This chapter contains the 212 64/EV67 mic rop roces sor log ic symbol an d provi des inf ormation about signal names, their function, and their location. This chapter also describes the mechanical specifications of the 21264/EV67. It is organized as follows:

• The 21264/EV67 logic symbol

• The 21264/EV67 signal names and functions

• Lists of the signal pins, sorted by name and PGA location

• The specifications for the 21264/EV67 mechanical package

• The top and bottom views of the 21264/EV67 pinouts

3.1 21264/EV67 Microprocessor Logic Symbol

Figure 3–1 show the logic symbol for the 21264/EV67 chip.

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–1

21264/EV67 Microprocessor Logic Symbol

Figure 3–1 21264/EV67 Microprocessor Logic Symbol

21264

System Interface Bcache Interface

3.3 V

SysAddIn_L[14:0] SysAddInClk_L SysAddOut_L[14:0] SysAddOutClk_L SysVref SysData_L[63:0] SysCheck_L[7:0] SysDataInClk_H[7:0] SysDataOutClk_L[7:0] SysDataInValid_L SysDataOutValid_L SysFillValid_L

ClkIn_x FrameClk_x EV6Clk_x PLL_VDD

Clocks

BcAdd_H[23:4]

BcData_H[127:0]

BcCheck_H[15:0]

BcDataInClk_H[7:0]

BcDataOutClk_[3:0]

BcTag_H[42:20]

BcTagInClk_H

BcTagOutClk_x

BcTagParity_H

BcTagShared_H

BcDataOE_L

BcDataWr_L

BcVref

BcTagDirty_H

BcTagValid_H

BcTagOE_L

BcTagWr_L

BcLoad_L

IRQ_H[5:0] ClkFwdRst_H SromData_H Tms_H Trst_L Tck_H Tdi_H PllBypass_H MiscVref Reset_L DCOK_H

Miscellaneous

SromClk_H

SromOE_L TestStat_H

Tdo_H

LK99-0051A

3–2 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Signal Names and Functions

3.2 21264/EV67 Signal Names and Functions

Table 3–1 defines the 21264/EV67 signal types referred to in this section.

Table 3–1 Signal Pin Types Definitions

Signal Type Definition

Inputs I_DC_REF Input DC reference pin I_DA Input differential amplifier receiver I_DA_CLK Input clock pin Outputs O_OD Open drain output driver O_OD_TP Open drain driver for test pins O_PP Push/pull output driv er O_PP_CLK Pus h/ pu l l outpu t clock driver Bidirectional B_DA_OD Bi directional differential amplifier receiver with open drain output B_DA_PP Bidirectional differential amplifier receiver with push/pull output Other Spare Reserved to Compaq NoConnect No connection — Do not connect to these pins for any revision of the

21264/EV67. These pins must float.

All Spare connections are Reserved to Compaq to maintain compatibility between passes of the chip. Designers should not use these pins.

Table 3–2 lists all signal pins in alphabetic order and provides a full functional description of the pins. Table 3–4 lists the signal pins and their corresponding pin grid array (PGA) locations in al phabetic order f or the s ignal ty pe. Table 3–5 lists the pi n grid ar ray locations in alphabetical order.

Table 3–2 21264/EV67 Signal Descriptions

Signal Type Count Description

BcAdd_H[23:4] O_PP 20 These signals provide the index to the Bcache. BcCheck_H[15:0] B_DA_PP 16 ECC check bits for BcData_H[127:0]. BcData_H[127:0] B_DA_PP 128 Bcache data signals. BcDataInClk_H[7:0] I_DA 8 Bcache data input clocks. These clocks are used with high

speed SDRAMs, such as DDRs, that provide a clock-out with data-output pins to optimize Bcache read bandwidths. The 21264/EV67 internally synchronizes the data to its logic with clock forward receive circuits similar to the system interface.

BcDataOE_L O_PP 1 Bcache data output enable. The 21264/EV67 asserts this signal

during Bcache read operations.

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–3

21264/EV67 Signal Names and Functions

Table 3–2 21264/EV67 Signal Descriptions (Continued)

Signal Type Count Description

BcDataOutClk_H[3:0] BcDataOutClk_L[3:0]

BcDataWr_L O_PP 1 Bcache data write enable. The 21264/EV67 asserts this signal

BcLoad_L O_PP 1 Bcache burst enable. BcTag_H[42:20] B_DA_PP 23 Bcache tag bits. BcTagDirty_H B_DA_PP 1 Tag dirty state bit. During cache write operations, the 21264/

BcTagInClk_H I_DA 1 Bcache tag input clock. The 21264/EV67 uses this input clock

BcTagOE_L O_PP 1 Bcache tag output enable. This signal is asserted by the 21264/

O_PP 8 Bcache data output clocks. These free-running clocks are dif-

ferential copies of the Bcache clock and are derived from the 21264/EV67 GCLK. Their period is a multiple of the GCLK and is fixed for all operations. They can be configured so that their rising edge lags BcAdd_H[23:4] by 0 to 2 GCLK cycles. The 21264/EV67 synchronizes tag output information with these clocks.

when writing data to the Bcache data arrays.

EV67 will assert this signal if the Bcache data has been modified.

to latch the tag information on Bcache read operations. This clock is used with high-speed SDRAMs, such as DDRs, that provide a clock-out with data-output pins to optimize Bcache read bandwidths. The 21264/ EV67 inte r nall y synchronizes the data to its logic with clock forward receive circuits similar to the system interface.

EV67 for Bcache read operations.

BcTagOutClk_H BcTagOutClk_L

BcTagParity_H B_DA_PP 1 Tag parity state bit. BcTagShared_H B_DA_PP 1 Tag shared state bit. The 21264/EV67 will write a 1 on this sig-

BcTagValid_H B_DA_PP 1 Tag valid state bit. If set, this line indicates that the cache line

BcTagWr_L O_PP 1 Tag RAM write enable. The 21264/EV67 asserts this signal

BcVr ef I_DC_REF 1 Bcache tag reference voltage. ClkFwdRst_H I_DA 1 Systems assert this synchronous signal to wake up a powered-

ClkIn_H ClkIn_L

DCOK_H I_DA 1 dc voltage OK. Must be deasserted until dc voltage reaches

EV6Clk_H EV6Clk_L

O_PP 2 Bcache tag output clock. These clocks “echo” the clock-for-

warded BcDataOutClk_x[3:0] clocks.

nal line if another agent has a copy of the cache line.

is valid.

when writing a tag to the Bcache tag arrays.

down 21264/EV67. The ClkFwdRst_H signal is cloc ked into a 21264/EV67 register by the captured FrameClk_x sig nals. Systems must ensure that the timing of this signal meets 21264/EV67 requirements (see Section 4.7.2).

I_DA_CLK 2 Differential input signals provided by the system.

proper operating level. After that, DCOK_H is asserted.

O_PP_CLK 2 Provides an external test point to measure phase alignment of

the PLL.

3–4 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Signal Names and Functions

Table 3–2 21264/EV67 Signal Descriptions (Continued)

Signal Type Count Description

FrameClk_H FrameClk_L

IRQ_H[5:0] I_DA 6 These six interrupt signal lines may be asserted by the system.

MiscVref I_DC_REF 1 Voltage reference for the miscellaneous pins

PllBypass_H I_DA 1 When asserted, this sig nal will cause the two input clocks

PLL_VDD 3.3 V 1 3.3-V dedicated power supply for the 21264/EV67 PLL. Reset_L I_DA 1 System reset. This signal protects the 21264/EV67 from dam-

SromClk_H O_OD_TP 1 Serial ROM clock. Supplies the clock that causes the SROM to

SromData_H I_DA 1 Serial ROM data. Input data line from the S ROM. SromOE_L O_OD_TP 1 Serial ROM enable. Supplies the output enable to the SROM.

I_DA_CLK 2 A skew-controlled differential 50% duty cycle copy of the sys-

tem clock. It is used by the 21264/EV67 as a reference, or framing, clock.

The response of the 21264/EV67 is determined by the system software.

(see Table 3–3).

(ClkIn_x) to be applied to the 21264/EV67 internal circuits, instead of the 21264/EV67 global clock (GCLK).

age during initial power-up. It must be asserted until DCOK_H is asserted. After that, it is deasserted and the 21264/EV67 begins its reset sequence.

advance to the next bit. The cycle time for this clock is 256 times the cycle time of the GCLK (internal 21264/EV67 clock).

SysAddIn_L[14:0] I_DA 15 Time-multiplexed com mand/address/ID/Ack from system to

the 21264/EV67.

SysAddInClk_L I_DA 1 Single-ended forwarded clock from system for

SysAddIn_L[14:0] and SysFillValid_L.

SysAddOut_L[14:0] O_OD 15 Time-multiplexed command/address/ID/mask from the 21264/

EV67 to the system bus.

SysAddOutClk_L O_OD 1 Single-ended forwarded clock output for

SysAddOut_L[14:0]. SysCheck_L[7:0] B_DA_OD 8 Quadword ECC check bits for SysData_L[63:0]. SysData_L[63:0] B_DA_OD 64 Data bus for memory and I/O data. SysDataInClk_H[7:0] I_DA 8 Single-ended system-generated clocks for clock forwarded

input system data. SysDataInValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers to

the 21264/EV67. SysDataOutClk_L[7:0] O_OD 8 Single-ended 21264/EV67-generated clocks for clock for-

warded output system data. SysDataOutValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers from

the 21264/EV67. SysFillV alid_L I_DA 1 When asserted, this bit indicates validation for the cache fill

delivered in the previous system SysDc command.

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–5

21264/EV67 Signal Names and Functions

Table 3–2 21264/EV67 Signal Descriptions (Continued)

Signal Type Count Description

SysVref I_DC_REF 1 System interface reference voltage. Tck_H I_DA 1 IEEE 1149.1 test clock. Tdi_H I_DA 1 IEEE 1149.1 test data-in signal. Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. TestStat_H O_OD_TP 1 Test status pin. System reset drives the test status pin low.

The TestStat_H pin is forced high at the start of the Icache

BiST . If the Icache BiST passes, the p in is deasserted at the end

of the BiST operation; otherwise, it remains high.

The 21264/EV67 generates a timeout reset signal if an instruc-

tion is not retired within one billion cycles.

The 21264/EV67 signals the timeout reset event by outputting

a 256 GCLK cycle wide pulse on TestStat_H.

Tms_H I_DA 1 IEEE 1149.1 test mode select signal. Trst_L I_DA 1 IEEE 1149.1 test access port (TAP) reset signal.

Table 3–3 lists signals by function and provides an abbreviated description.

Table 3–3 21264/EV67 Signal Descriptions by Function

Signal Type Count Description

BcVref Domain BcAdd_H[23:4] O_PP 20 Bcache index. BcCheck_H[15:0] B_DA_PP 16 ECC check bits for BcData_H[127:0]. BcData_H[127:0] B_DA_PP 128 Bcache data. BcDataInClk_H[7:0] I_DA 8 Bcache data input clocks. BcDataOE_L O_PP 1 Bcache data output enable. BcDataOutClk_H[3:0]

BcDataOutClk_L[3:0] BcDataWr_L O_PP 1 Bcache data write enable. BcLoad_L O_PP 1 Bcache burst enable. BcTag_H[42:20] B_DA_PP 23 Bcache tag bits. BcTagDirty_H B_DA_PP 1 Tag dirty state bit. BcTagInClk_H I_DA 1 Bcache tag input clock. BcTagOE_L O_PP 1 Bcache tag output enable.

O_PP 8 Bcache data output clocks.

BcTagOutClk_H BcTagOutClk_L

BcTagParity_H B_DA_PP 1 Tag parity state bit. BcTagShared_H B_DA_PP 1 Tag shared state bit. BcTagValid_H B_DA_PP 1 Tag valid state bit. BcTagWr_L O_PP 1 Tag RAM write enable.

3–6 Hardware Interface

O_PP 2 Bcache tag output clocks.

Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Signal Names and Functions

Table 3–3 21264/EV67 Signal Descriptions by Function (Continued)

Signal Type Count Description

BcVr ef I_DC_REF 1 Tag data input reference voltage. SysVref Domain SysAddIn_L[14:0] I_DA 15 Time-multiplexed SysAddIn, system-to-21264/EV67. SysAddInClk_L I_DA 1 Single-ended forwarded clock from system for

SysAddIn_L[14:0] and SysFillValid_L. SysAddOut_L[14:0] O_OD 15 Time-multiplexed SysAddOut, 21264/EV67-to-system. SysAddOutClk_L O_OD 1 Single-ended forwarded-clock. SysCheck_L[7:0] B_DA_OD 8 Quadword ECC check bits for SysData_L[63:0]. SysData_L[63:0] B_DA_OD 64 Data bus for memory and I/O data. SysDataInClk_H[7:0] I_DA 8 Single-ended system-generated clocks for clock forwarded

input system data. SysDataInValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers to

the 21264/EV67. SysDataOutClk_L[7:0] O_OD 8 Single-ended 21264/EV67-generated clocks for clock for-

warded output system data. SysDataOutValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers

from the 21264/EV67.

SysFillValid_L I_DA 1 Validation for fill given in previous SysDC command. SysVref I_DC_REF 1 System interface reference voltage. Clocks and PLL ClkIn_H

ClkIn_L EV6Clk_H

EV6Clk_L FrameClk_H

FrameClk_L

PLL_VDD 3.3 V 1 3.3-V dedicated power supply for the 21264/EV67 PLL. MiscVref Domain ClkFwdRst_H I_DA 1 Systems assert this synchronous signal to wake up a powered-

DCOK_H I_DA 1 dc voltage OK. Must be deasserted until dc voltage reaches

I_DA_CLK 2 Differential input signals provided by the system.

O_PP_CLK 2 Provides an external test point to measure phase alignment of

the PLL.

I_DA_CLK 2 A skew-controlled differential 50% duty cycle copy of the

system clock. It is used by the 21264/EV67 as a reference, or

framing, clock.

down 21264/EV67. The ClkFwdRst_H signal is clocked in to

a 21264/EV67 register by the captured FrameClk_x sig nals.

proper operating level. After that, DCOK_H is asserted.

IRQ_H[5:0] I_DA 6 These six interrupt signal lines may be asserted by the system. MiscVref I_DC_REF 1 Reference voltage for miscellaneous pins. PllBypass_H I_DA 1 When asserted, this sig nal will cause the input clocks

(ClkIn_x) to be applied to the 21264/EV67 internal circuits,

instead of the 21264/EV67’s global clock (GCLK).

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–7

Pin Assignments

Table 3–3 21264/EV67 Signal Descriptions by Function (Continued)

Signal Type Count Description

Reset_L I_DA 1 System reset. This signal protects the 21264/EV67 from dam-

age during initial power-up. It must be asserted until

DCOK_H is asserted. After that, it is deasserted and the

21264/EV67 begins its reset sequence.

SromClk_H O_OD_TP 1 Serial ROM clock. SromData_H I_DA 1 Serial ROM data. SromOE_L O_OD_TP 1 Serial ROM enable. Tck_H I_DA 1 IEEE 1149.1 test clock. Tdi_H I_DA 1 IEEE 1149.1 test data-in signal. Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. TestStat_H O_OD_TP 1 Test status pin. Tms_H I_DA 1 IEEE 1149.1 test mode select signal. Trst_L I_DA 1 IEEE 1149.1 test access port (TAP) reset signal.

3.3 Pin Assignments

The 21264/EV67 package has 587 pi ns aligned in a pi n grid arra y (PGA) design. There are 380 functional signa l pins, 1 ded icated 3.3-V pin f or the PLL, 112 ground VSS pins, and 94 VDD pins. Table 3–4 lists the s ignal pins an d thei r co rres ponding p in grid ar ray

(PGA) locations in alphabetical order for the signal type. Table 3–5 lists the pi n grid array locations in alphabetical order

Table 3–4 Pin List Sorted by Signal Name

Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location

BcAdd_H_10 B30 BcAdd_H_11 D30 BcAdd_H_12 C31 BcAdd_H_13 H28 BcAdd_H_14 G29 BcAdd_H_15 A33 BcAdd_H_16 E31 BcAdd_H_17 D32 BcAdd_H_18 B34 BcAdd_H_19 A35 BcAdd_H_20 B36 BcAdd_H_21 H30 BcAdd_H_22 C35 BcAdd_H_23 E33 BcAdd_H_4 B28 BcAdd_H_5 E27 BcAdd_H_6 A29 BcAdd_H_7 G27 BcAdd_H_8 C29 BcAdd_H_9 F28 BcCheck_H_0 F2 BcCheck_H_1 AB4 BcCheck_H_10 AW1 BcCheck_H_11 BD10 BcCheck_H_12 E45 BcCheck_H_13 AC45 BcCheck_H_14 AT44 BcCheck_H_15 BB36 BcCheck_H_2 AT2 BcCheck_H_3 BC11 BcCheck_H_4 M38 BcCheck_H_5 AB42 BcCheck_H_6 AU43 BcCheck_H_7 BC37 BcCheck_H_8 M8 BcCheck_H_9 AA3 BcData_H_0 B10 BcData_H_1 D10 BcData_H_10 L3 BcData_H_100 D42 BcData_H_101 D44 BcData_H_102 H40 BcData_H_103 H42 BcData_H_104 G45 BcData_H_105 L43

3–8 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

Pin Assign ments

Table 3–4 Pin List Sorted by Signal Name (Continued)

Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location

BcData_H_106 L45 BcData_H_107 N45 BcData_H_108 T44 BcData_H_109 U45 BcData_H_11 M2 BcData_H_1 10 W45 BcData_H_111 AA43 BcData_H_112 AC43 BcData_H_113 AD44 BcData_H_114 AE41 BcData_H_115 AG45 BcData_H_116 AK44 BcData_H_117 AL43 BcData_H_118 AM42 BcData_H_119 AR45 BcData_H_12 T2 BcData_H_120 AP40 BcData_H_121 BA45 BcData_H_122 AV4 2 BcData_H_123 BB44 BcData_H_124 BB42 BcData_H_125 BC41 BcData_H_126 BA37 BcData_H_127 BD40 BcData_H_13 U1 BcData_H_14 V2 BcData_H_15 Y4 BcData_H_16 AC1 BcData_H_17 AD2 BcData_H_18 AE3 BcData_H_19 AG1 BcData_H_2 A5 BcData_H_20 AK2 BcData_H_21 AL3 BcData_H_22 AR1 BcData_H_23 AP2 BcData_H_24 AY2 BcData_H_25 BB2 BcData_H_26 AW5 BcData_H_27 BB4 BcData_H_28 BB8 BcData_H_29 BE5 BcData_H_3 C5 BcData_H_30 BB10 BcData_H_31 BE7 BcData_H_32 G33 BcData_H_33 C37 BcData_H_34 B40 BcData_H_35 C41 BcData_H_36 C43 BcData_H_37 E43 BcData_H_38 G41 BcData_H_39 F44 BcData_H_4 C3 BcData_H_40 K44 BcData_H_41 N41 BcData_H_42 M44 BcData_H_43 P42 BcData_H_44 U43 BcData_H_45 V44 BcData_H_46 Y42 BcData_H_47 AB44 BcData_H_48 AD42 BcData_H_49 AE43 BcData_H_5 E3 BcData_H_50 AF42 BcData_H_51 AJ45 BcData_H_52 AK42 BcData_H_53 AN45 BcData_H_54 AP44 BcData_H_55 AN41 BcData_H_56 AW45 BcData_H_57 AU41 BcData_H_58 AY44 BcData_H_59 BA43 BcData_H_6 H6 BcData_H_60 BC43 BcData_H_61 BD42 BcData_H_62 BB38 BcData_H_63 BE41 BcData_H_64 C11 BcData_H_65 A7 BcData_H_66 C9 BcData_H_67 B6 BcData_H_68 B4 BcData_H_69 D4 BcData_H_7 E1 BcData_H_70 G5 BcData_H_71 D2 BcData_H_72 H4 BcData_H_73 G1 BcData_H_74 N5 BcData_H_75 L1 BcData_H_76 N1 BcData_H_77 U3 BcData_H_78 W5 BcData_H_79 W1 BcData_H_8 J3 BcData_H_80 AB2 BcData_H_81 AC3 BcData_H_82 AD4 BcData_H_83 AF4 BcData_H_84 AJ3 BcData_H_85 AK4 BcData_H_86 AN1 BcData_H_87 AM4 BcData_H_88 AU5 BcData_H_89 BA1

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–9

Pin Assignments

Table 3–4 Pin List Sorted by Signal Name (Continued)

Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location

BcData_H_9 K2 BcData_H_90 BA3 BcData_H_91 BC3 BcData_H_92 BD6 BcData_H_93 BA9 BcData_H_94 BC9 BcData_H_95 AY12 BcData_H_96 A39 BcData_H_97 D36 BcData_H_98 A41 BcData_H_99 B42 BcDataInClk_H_0 E7 BcDataInClk_H_1 R3 BcDataInClk_H_2 AH2 BcDataInClk_H_3 BC5 BcDataInClk_H_4 F38 BcDataInClk_H_5 U39 BcDataInClk_H_6 AH44 BcDataInClk_H_7 AY40 BcDataOE_L A27 BcDataOutClk_H_0 J5 BcDataOutClk_H_1 AU3 BcDataOutClk_H_2 J43 BcDataOutClk_H_3 AR43 BcDataOutClk_L_0 K4 BcDataOutClk_L_1 AV 4 BcDataOutClk_L_2 K42 BcDataOutClk_L_3 AT42 BcDataWr_L D26 BcLoad_L F26 BcT ag_H_20 E13 BcTag_H_21 H16 BcTag_H_22 A11 BcT ag_H_23 B12 BcTag_H_24 D14 BcTag_H_25 E15 BcT ag_H_26 A13 BcTag_H_27 G17 BcTag_H_28 C15 BcT ag_H_29 H18 BcTag_H_30 D16 BcTag_H_31 B16 BcT ag_H_32 C17 BcTag_H_33 A17 BcTag_H_34 E19 BcT ag_H_35 B18 BcTag_H_36 A19 BcTag_H_37 F20 BcT ag_H_38 D20 BcTag_H_39 E21 BcTag_H_40 C21 BcT ag_H_41 D22 BcTag_H_42 H22 BcTagDirty_H C23 BcTagInClk_H G19 BcTagOE_L H24 BcTagOutClk_H C25 BcTagOutClk_L D24 BcTagParity_H B22 BcTagShared_H G23 BcT agValid_H B24 BcTagWr_L E25 BcVref F18 ClkFwdRst_H BE11 ClkIn_H AM8 ClkIn_L AN7 DCOK_H AY18 EV6Clk_H AM6 EV6Clk_L AL7 FrameClk_H AV16 FrameClk_L AW15 IRQ_H_0 BA15 IRQ_H_1 BE13 IRQ_H_2 AW17 IRQ_H_3 AV18 IRQ_H_4 BC15 IRQ_H_5 BB16 MiscVref AV2 2 NoConnect BB14 NoConnect BD2 PLL_VDD AV 8 PllBypass_H BD12 Reset_L BD16 Spare AJ1

Spare V38 Spare AT4 Spare BE9 Spare F8 Spare BD4 Spare AJ43 Spare AR3 Spare T4 Spare E39 Spare BA39 Spare BC21 SromClk_H AW19

SromData_H BC17 SromOE_L BE17 SysAddIn_L_0 BD30 SysAddIn_L_1 BC29 SysAddIn_L_10 BB24 SysAddIn_L_11 AV2 4 SysAddIn_L_12 BD24 SysAddIn_L_13 BE23 SysAddIn_L_14 AW23 SysAddIn_L_2 AY28 SysAddIn_L_3 BE29 SysAddIn_L_4 AW27

3–10 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

Pin Assign ments

Table 3–4 Pin List Sorted by Signal Name (Continued)

Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location

SysAddIn_L_5 BA27 SysAddIn_L_6 BD28 SysAddIn_L_7 BE27 SysAddIn_L_8 AY26 SysAddIn_L_9 BC25 SysAddInClk_L BB26 SysAddOut_L_0 AW33 SysAddOut_L_1 BE39 SysAddOut_L_10 BE33 SysAddOut_L_11 AW29 SysAddOut_L_12 BC31 SysAddOut_L_13 AV2 8 SysAddOut_L_14 BB30 SysAddOut_L_2 BD36 SysAddOut_L_3 BC35 SysAddOut_L_4 BA33 SysAddOut_L_5 AY32 SysAddOut_L_6 BE35 SysAddOut_L_7 AV 30 SysAddOut_L_8 BB32 SysAddOut_L_9 BA31 SysAddOutClk_L BD34 SysCheck_L_0 L7 SysCheck_L_1 AA5 SysCheck_L_2 AK8 SysCheck_L_3 BA13 SysCheck_L_4 L39 SysCheck_L_5 AA41 SysCheck_L_6 AM40 SysCheck_L_7 AY34 SysData_L_0 F14 SysData_L_1 G13 SysData_L_10 P6 SysData_L_11 T8 SysData_L_12 V8 SysData_L_13 V6 SysData_L_14 W7 SysData_L_15 Y6 SysData_L_16 AB8 SysData_L_17 AC7 SysData_L_18 AD8 SysData_L_19 AE5 SysData_L_2 F12 SysData_L_20 AH6 SysData_L_21 AH8 SysData_L_22 AJ7 SysData_L_23 AL5 SysData_L_24 AP8 SysData_L_25 AR7 SysData_L_26 AT8 SysData_L_27 AV 6 SysData_L_28 AV 10 SysData_L_29 AW11 SysData_L_3 H12 SysData_L_30 AV 12 SysData_L_31 AW13 SysData_L_32 F32 SysData_L_33 F34 SysData_L_34 H34 SysData_L_35 G35 SysData_L_36 F40 SysData_L_37 G39 SysData_L_38 K38 SysData_L_39 J41 SysData_L_4 H10 SysData_L_40 M40 SysData_L_41 N39 SysData_L_42 P40 SysData_L_43 T38 SysData_L_44 V40 SysData_L_45 W41 SysData_L_46 W39 SysData_L_47 Y40 SysData_L_48 AB38 SysData_L_49 AC39 SysData_L_5 G7 SysData_L_50 AD38 SysData_L_51 AF40 SysData_L_52 AH38 SysData_L_53 AJ39 SysData_L_54 AL41 SysData_L_55 AK38 SysData_L_56 AN39 SysData_L_57 AP38 SysData_L_58 AR39 SysData_L_59 AT38 SysData_L_6 F6 SysData_L_60 AY38 SysData_L_61 AV3 6 SysData_L_62 AW35 SysData_L_63 AV 34 SysData_L_7 K8 SysData_L_8 M6 SysData_L_9 N7 SysDataInClk_H_0 D8 SysDataInClk_H_1 P4 SysDataInClk_H_2 AF6 SysDataInClk_H_3 AY6 SysDataInClk_H_4 E37 SysDataInClk_H_5 R43 SysDataInClk_H_6 AG41 SysDataInClk_H_7 AV4 0 SysDataInV alid_L BD22 SysDataOutClk_L_0 G11 SysDataOutClk_L_1 U7 SysDataOutClk_L_2 AG7 SysDataOutClk_L_3 AY8 SysDataOutClk_L_4 H36

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–11

Pin Assignments

Table 3–4 Pin List Sorted by Signal Name (Continued)

Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location

SysDataOutClk_L_5 R41 SysDataOutClk_L_6 AH40 SysDataOutClk_L_7 AW39 SysDataOutValid_L BB22 SysFillValid_L BC23 SysVref BA25 Tck_H BE19 Tdi_H BA21 Tdo_H BB20 TestStat_H BA19 Tms_H BD18 Trst_L AY20

Table 3–5 Pin List Sorted by PGA Location

PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name

A11 BcT ag_H_22 A13 BcTag_H_26 A17 BcTag_H_33 A19 BcTag_H_36 A27 BcDataOE_L A29 BcAdd_H_6 A33 BcAdd_H_15 A35 BcAdd_H_19 A39 BcData_H_96 A41 BcData_H_98 A5 BcData_H_2 A7 BcData_H_65 AA3 BcCheck_H_9 AA41 SysCheck_L_5 AA43 BcData_H_111 AA5 SysCheck_L_1 AB2 BcData_H_80 AB38 SysData_L_48 AB4 BcCheck_H_1 AB42 BcCheck_H_5 AB44 BcData_H_47 AB8 SysData_L_16 AC1 BcData_H_16 AC3 BcData_H_81 AC39 SysData_L_49 AC43 BcData_H_112 AC45 BcCheck_H_13 AC7 SysData_L_17 AD2 BcData_H_17 AD38 SysData_L_50 AD4 BcData_H_82 AD42 BcData_H_48 AD44 BcData_H_113 AD8 SysData_L_18 AE3 BcData_H_18 AE41 BcData_H_114 AE43 BcData_H_49 AE5 SysData_L_19 AF4 BcData_H_83 AF40 SysData_L_51 AF42 BcData_H_50 AF6 SysDataInClk_H_2 AG1 BcData_H_19 AG41 SysDataInClk_H_6 AG45 BcData_H_115 AG7 SysDataOutClk_L_2 AH2 BcDataInClk_H_2 AH38 SysData_L_52 AH40 SysDataOutClk_L_6 AH44 BcDataInClk_H_6 AH6 SysData_L_20 AH8 SysData_L_21 AJ1 Spare AJ3 BcData_H_84 AJ39 SysData_L_53 AJ43 Spare AJ45 BcData_H_51 AJ7 SysData_L_22 AK2 BcData_H_20 AK38 SysData_L_55 AK4 BcData_H_85 AK42 BcData_H_52 AK44 BcData_H_116 AK8 SysCheck_L_2 AL3 BcData_H_21 AL41 SysData_L_54 AL43 BcData_H_117 AL5 SysData_L_23 AL7 EV6Clk_L AM4 BcData_H_87 AM40 SysCheck_L_6 AM42 BcData_H_118 AM6 EV6Clk_H AM8 ClkIn_H AN1 BcData_H_86 AN39 SysData_L_56 AN41 BcData_H_55 AN45 BcData_H_53 AN7 ClkIn_L AP2 BcData_H_23 AP38 SysData_L_57 AP40 BcData_H_120 AP44 BcData_H_54 AP8 SysData_L_24

3–12 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

Pin Assign ments

Table 3–5 Pin List Sorted by PGA Location (Continued)

PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name

AR1 BcData_H_22 AR3 Spare AR39 SysData_L_58 AR43 BcDataOutClk_H_3 AR45 BcData_H_119 AR7 SysData_L_25 AT2 BcCheck_H_2 AT38 SysData_L_59 AT4 Sp are AT42 BcDataOutClk_L_3 AT44 BcCheck_H_14 AT8 SysData_L_26 AU3 BcDataOutClk_H_1 AU41 BcData_H_57 AU43 BcCheck_H_6 AU5 BcData_H_88 AV 10 SysData_L_28 AV12 SysData_L_30 AV1 6 FrameClk_H AV 18 IRQ_H_3 AV2 2 MiscV ref AV2 4 SysAddIn_L_11 AV28 SysAddOut_L_13 AV30 SysAddOut_L_7 AV3 4 SysData_L_63 AV3 6 SysData_L_61 AV4 BcDataOutClk_L_1 AV4 0 SysDataInClk_H_7 AV42 BcData_H_122 AV6 SysData_L_27 AV8 PLL_VDD AW1 BcCheck_H_10 AW11 SysData_L_29 AW13 SysData_L_31 AW15 FrameClk_L AW17 IRQ_H_2 AW19 SromClk_H AW23 SysAddIn_L_14 AW27 SysAddIn_L_4 AW29 SysAddOut_L_11 AW33 SysAddOut_L_0 AW35 SysData_L_62 AW39 SysDataOutClk_L_7 AW45 BcData_H_56 AW5 BcData_H_26 AY12 BcData_H_95 AY18 DCOK_H AY2 BcData_H_24 AY20 Trst_L AY26 SysAddIn_L_8 AY28 SysAddIn_L_2 AY32 SysAddOut_L_5 AY34 SysCheck_L_7 AY38 SysData_L_60 AY40 BcDataInClk_H_7 AY44 BcData_H_58 AY6 SysDataInClk_H_3 AY8 SysDataOutClk_L_3 B10 BcData_H_0 B12 BcTag_H_23 B16 BcTag_H_31 B18 BcTag_H_35 B22 BcT a gParity_H B24 BcTagValid_H B28 BcAdd_H_4 B30 BcAdd_H_10 B34 BcAdd_H_18 B36 BcAdd_H_20 B4 BcData_H_68 B40 BcData_H_34 B42 BcData_H_99 B6 BcData_H_67 BA1 BcData_H_89 BA13 SysCheck_L_3 BA15 IRQ_H_0 BA19 TestStat_H BA21 Tdi_H BA25 SysVref BA27 SysAddIn_L_5 BA3 BcData_H_90 BA31 SysAddOut_L_9 BA33 SysAddOut_L_4 BA37 BcData_H_126 BA39 Spare BA43 BcData_H_59 BA45 BcData_H_121 BA9 BcData_H_93 BB10 BcData_H_30 BB14 NoConnect BB16 IRQ_H_5 BB2 BcData_H_25 BB20 Tdo_H BB22 SysDataOutValid_L BB24 SysAddIn_L_10 BB26 SysAddInClk_L BB30 SysAddOut_L_14 BB32 SysAddOut_L_8 BB36 BcCheck_H_15 BB38 BcData_H_62 BB4 BcData_H_27 BB42 BcData_H_124 BB44 BcData_H_123 BB8 BcData_H_28 BC11 BcCheck_H_3 BC15 IRQ_H_4 BC17 SromData_H BC21 Spare BC23 SysFillValid_L

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–13

Pin Assignments

Table 3–5 Pin List Sorted by PGA Location (Continued)

PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name

BC25 SysAddIn_L_9 BC29 SysAddIn_L_1 BC3 BcData_H_91 BC31 SysAddOut_L_12 BC35 SysAddOut_L_3 BC37 BcCheck_H_7 BC41 BcData_H_125 BC43 BcData_H_60 BC5 BcDataInClk_H_3 BC9 BcData_H_94 BD10 BcCheck_H_11 BD12 PllBypass_H BD16 Reset_L BD18 Tms_H BD2 NoConnect BD22 SysDataInValid_L BD24 SysAddIn_L_12 BD28 SysAddIn_L_6 BD30 SysAddIn_L_0 BD34 SysAddOutClk_L BD36 SysAddOut_L_2 BD4 Spare BD40 BcData_H_127 BD42 BcData_H_61 BD6 BcData_H_92 BE11 ClkFwdRst_H BE13 IRQ_H_1 BE17 SromOE_L BE19 Tck_H BE23 SysAddIn_L_13 BE27 SysAddIn_L_7 BE29 SysAddIn_L_3 BE33 SysAddOut_L_10 BE35 SysAddOut_L_6 BE39 SysAddOut_L_1 BE41 BcData_H_63 BE5 BcData_H_29 BE7 BcData_H_31 BE9 Spare C11 BcData_H_64 C15 BcTag_H_28 C17 BcTag_H_32 C21 BcTag_H_40 C23 BcTagDirty_H C25 BcTagOutClk_H C29 BcAdd_H_8 C3 BcData_H_4 C31 BcAdd_H_12 C35 BcAdd_H_22 C37 BcData_H_33 C41 BcData_H_35 C43 BcData_H_36 C5 BcData_H_3 C9 BcData_H_66 D10 BcData_H_1 D14 BcTag_H_24 D16 BcTag_H_30 D2 BcData_H_71 D20 BcTag_H_38 D22 BcTag_H_41 D24 BcTagOutClk_L D26 BcDataWr_L D30 BcAdd_H_11 D32 BcAdd_H_17 D36 BcData_H_97 D4 BcData_H_69 D42 BcData_H_100 D44 BcData_H_101 D8 SysDataInClk_H_0 E1 BcData_H_7 E13 BcTag_H_20 E15 BcTag_H_25 E19 BcTag_H_34 E21 BcTag_H_39 E25 BcTagWr_L E27 BcAdd_H_5 E3 BcData_H_5 E31 BcAdd_H_16 E33 BcAdd_H_23 E37 SysDataInClk_H_4 E39 Spare E43 BcData_H_37 E45 BcCheck_H_12 E7 BcDataInClk_H_0 F12 SysData_L_2 F14 SysData_L_0 F18 BcVref F2 BcCheck_H_0 F20 BcTag_H_37 F26 BcLoad_L F28 BcAdd_H_9 F32 SysData_L_32 F34 SysData_L_33 F38 BcDataInClk_H_4 F40 SysData_L_36 F44 BcData_H_39 F6 SysData_L_6 F8 Spare G1 BcData_H_73 G11 SysDataOutClk_L_0 G13 SysData_L_1 G17 BcTag_H_27 G19 BcTagInClk_H G23 BcTagShared_H G27 BcAdd_H_7 G29 BcAdd_H_14 G33 BcData_H_32 G35 SysData_L_35

3–14 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

Pin Assign ments

Table 3–5 Pin List Sorted by PGA Location (Continued)

PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name

G39 SysData_L_37 G41 BcData_H_38 G45 BcData_H_104 G5 BcData_H_70 G7 SysData_L_5 H10 SysData_L_4 H12 SysData_L_3 H16 BcTag_H_21 H18 BcTag_H_29 H22 BcTag_H_42 H24 BcTagOE_L H28 BcAdd_H_13 H30 BcAdd_H_21 H34 SysData_L_34 H36 SysDataOutClk_L_4 H4 BcData_H_72 H40 BcData_H_102 H42 BcData_H_103 H6 BcData_H_6 J3 BcData_H_8 J41 SysData_L_39 J43 BcDataOutClk_H_2 J5 BcDataOutClk_H_0 K2 BcData_H_9 K38 SysData_L_38 K4 BcDataOutClk_L_0 K42 BcDataOutClk_L_2 K44 BcData_H_40 K8 SysData_L_7 L1 BcData_H_75 L3 BcData_H_10 L39 SysCheck_L_4 L43 BcData_H_105 L45 BcData_H_106 L7 SysCheck_L_0 M2 BcData_H_11 M38 BcCheck_H_4 M40 SysData_L_40 M44 BcData_H_42 M6 SysData_L_8 M8 BcCheck_H_8 N1 BcData_H_76 N39 SysData_L_41 N41 BcData_H_41 N45 BcData_H_107 N5 BcData_H_74 N7 SysData_L_9 P4 SysDataInClk_H_1 P40 SysData_L_42 P42 BcData_H_43 P6 SysData_L_10 R3 BcDataInClk_H_1 R41 SysDataOutClk_L_5 R43 SysDataInClk_H_5 T2 BcData_H_12 T38 SysData_L_43 T4 Spare T44 BcData_H_108 T8 SysData_L_11 U1 BcData_H_13 U3 BcData_H_77 U39 BcDataInClk_H_5 U43 BcData_H_44 U45 BcData_H_109 U7 SysDataOutClk_L_1 V2 BcData_H_14 V38 Spare V40 SysData_L_44 V44 BcData_H_45 V6 SysData_L_13 V8 SysData_L_12 W1 BcData_H_79 W39 SysData_L_46 W41 SysData_L_45 W45 BcData_H_110 W5 BcData_H_78 W7 SysData_L_14 Y4 BcData_H_15 Y40 SysData_L_47 Y42 BcData_H_46 Y6 SysData_L_15

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–15

Pin Assignments

Table 3–6 lists the 21264/EV67 ground and power (VSS and VDD, respectively) pin list.

Table 3–6 Ground and Power (VSS and VDD) Pin List

Signal PGA Location

VSS A15 A21 A25 A3 A31 A37 A43 A9 AA1 AA39

AA45 AA7 AC41 AC5 AE1 AE39 AE45 AE7 AG3 AG39 AG43 AG5 AJ41 AJ5 AL1 AL39 AL45 AN3 AN43 AN5 AR41 AR5 AU1 AU39 AU45 AU7 AW21 AW25 AW3 AW31 AW37 AW41 AW43 AW7 AW9 AY14 BA11 BA17 BA23 BA29 BA35 BA41 BA5 BA7 BC1 BC13 BC19 BC27 BC33 BC39 BC45 BC7 BE15 BE21 BE25 BE3 BE31 BE37 BE43 C1 C13 C19 C27 C33 C39 C45 C7 DS8 E11 E17 E23 E29 E35 E41 E5 E9 G15 G21 G25 G3 G31 G37 G43 G9 J1 J39 J45 J7 L41 L5 N3N43R1R39R45R5R7T42U41U5 W3 W43 ————————

VDD A23 AB40 AB6 AD40 AD6 AF2 AF38 AF44 AF8 AH4

AH42 AK40 AK6 AM2 AM38 AM44 AP4 AP42 AP6 AT40 AT6 AV14 AV2 AV20 AV26 AV32 AV38 AV44 AY10 AY16 AY22 AY24 AY30 AY36 AY4 AY42 B14 B2 B20 B26 B32 B38 B44 B8 BB12 BB18 BB28 BB34 BB40 BB6 BD14 BD20 BD26 BD32 BD38 BD44 BD8 D12 D18 D28 D34D40D6 F10F16F22F24F30F36F4 F42 H14 H2 H20 H26 H32 H38 H44 K40 K6 M4 M42 P2 P38 P44 P8 T40 T6 V4 V42 Y2Y38Y44Y8 ——————

3–16 Hardware Interface

Alpha 21264/EV67 Hardware Reference Manual

3.4 Mechanical Specifications

This section shows the 21264/EV67 mechanical package dimensions without a heat sink. For heat sink information and dimensions, refer to Chapter 10.

Figure 3–2 shows the package physical dimensions without a heat sink.

Figure 3–2 Package Dimensions

2.54 mm (.100 in) Typ

04 06 08 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44

27.94 mm (1.100 in)

Standoff (4x)

587x 1.40 mm (.055 in) Typ

1.27 mm (.050 in) Typ

27.94 mm

(1.100 in)

45434139373533312927252321191715131109070503

Lid

.13 mm (.005 in) R

Mechanical Specifications

1.27 mm (.050 in) Typ

4.32 mm (.170 in) Typ

1.377 mm (.055 in) Typ

1/4-20 Stud (2x)

7.62 mm (.300 in) Typ

1.905 mm (.075 in) Typ

59.94 mm (2.360 in) Typ

29.62 mm (1.180 in) Typ

25.40 mm

(1.000 in) Typ

53.85 mm

(2.120 in) Typ

29.62 mm (1.180 in) Typ

FM-05662.AI4

Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface 3–17

21264/EV67 Packagin g

3.5 21264/EV67 Packaging

Figure 3–3 shows the 21264/EV67 pinout from the top view with pins facing down.

Figure 3–3 21264/EV67 Top View (Pin Down)

AE AC AA

AD AB

21264/EV67

(PinDown)

Top

View

42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 08 06 04 02

3–18 Hardware Interface

01030507091113151719212325272931333537394143

FM-05644-EV6

Alpha 21264/EV67 Hardware Reference Manual

Figure 3–4 shows the 21264/EV67 pinout from the bottom view with pins facing up.

Figure 3–4 21264/EV67 Bottom View (Pin Up)

AE AC AA

AD AB

21264/EV67

Bottom

(PinUp)

21264/EV67 Packaging

View

04 06 08 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44

Alpha 21264/EV67 Hardware Reference Manual

45434139373533312927252321191715131109070503

FM-05645-EV6

Hardware Interface 3–19

Cache and External Interfaces

This chapter describ es the 21264/EV67 c ache and exter nal interf ace, which include s the second-level cache (Bcache) interface and the system interface. It also describes locks, interrupt signals, and ECC/parity generation. It is organized as follows:

• Introduction to the external interfaces

• Physical address considerations

• Bcache structure

• Victim data buffer

• Cache coherency

• Lock mecha nism

• System port

• Bcache port

• Interrupts

Chapter 3 lists and defines all 21264/EV67 hardware interface signal pins. Chapter 9 describes the 21264/EV67 hardware interface electrical requirements.

4.1 Introduction to the External Interfaces

A 21264/EV67-based system ca n be divided into three major sections:

• 21264/EV67 microprocessor

• Second-level Bcache

• System interface logic

– Optional duplicate tag store – Optional lock register – Optional victim buffers

The 21264/EV67 external i n t er fac e is f lexible and mandates few design rules, al lo wing a wide range of prospective systems. The external interface is composed of the Bcache interface and the system interface.

• Input clocks must hav e the sa me frequenc y as the ir corr esponding o utput cl ock. For

example, the frequency of SysAddInClk_L must be the same as SysAddOutClk_L.

Alpha 21264/EV67 Hardware Reference Manual

Cache and External Interfaces 4–1

Introduction to the External Interfaces

• The Bcache interface includes a 128-bit bidirectional data bus, a 20-bit unidirec-

tional address bus, and several control signals.

–The BcDataOutClk_x[3:0] clocks are free-running and are derived from the

internal GCLK. The period of BcDataOutClk_x[3:0] is a programmable mul- tiple of GCLK.

– The Bcache turns the BcDataOutClk_x[3:0] clocks around and returns them

to the 21264/EV67 as BcDataInClk_H[7:0]. Likewise, BcTagOutClk_x returns as BcTagInClk_H.

– The Bcache interface supports a 64-byte block size.

• The system interface includes a 64-bit bidirectional data bus, two 15-bit

unidirectional address buses, and several control signals.

–The SysAddOutClk_L clock is free-running and is derived from the internal

GCLK. The period of SysAddOutClk_L is a programmable multiple of GCLK.

–The SysAddInClk_L

clock is a turned-around copy of SysAddOutClk_L.

Figure 4–1 shows a simplifi ed view of the external interface. The function and purpose of each signal is desc ribed in Chapter 3.

4–2 Cache and External Interfaces

Alpha 21264/EV67 Hardware Reference Manual

Introduction to the External Interfaces

FM-05818B-EV67

System

Figure 4–1 21264/EV67 System and Bcache Interfaces

SysAddIn_L[14:0]

SysAddInClk_L

SysAddOut_L[14:0]

SysAddOutClk_L

SysVref

SysData_L[63:0]

SysCheck_L[7:0]

SysDataInClk_H[7:0]

SysDataOutClk_L[7:0]

SysDataInValid_L

SysDataOutValid_L

SysFillValid_L

BcAdd_H[23:4]

21264

BcLoad_L

BcData_H[127:0]

BcCheck_H[15:0]

BcDataInClk_H[7:0]

BcDataOutClk_x[3:0]

BcDataOE_L

BcDataWr_L

BcTag_H[42:20]

BcTagInClk_H

BcTagOutClk_

BcVref

BcTagWr_L

BcTagOE_L BcTagValid_H BcTagDirty_H

BcTagShared_H

BcTagParity_H

IRQ_H[5:0]

[23:4] [23:6] [23:6]

Data Tag Status

4.1.1 System Interface

This section introduces the system (external) bus interface. The system interface is made up of two unidirecti onal 15-bit address buses, 64 bidirectional data lines, eight bidirectional chec k bits, two si ngle-end ed un idirect ional c locks, and a few control pin s. The 15-bit address buses provide time-shared address/command/ID in two or four GCLK cycles. The Cbox controls the system interface.

Alpha 21264/EV67 Hardware Reference Manual

Cache and External Interfaces 4–3

Physical Address Considerations

4.1.1.1 Commands and Addresses

The system sends probe and data mov ement command s to the 21264/EV6 7. The 21264 / EV67 can hold up to eight probe commands from the system. The system controls the number of outstan din g pr obe co mman ds and must ensure that the 21264/EV67 8- ent ry probe queue does not overflow.

The Cbox contains an 8-entry miss buffer (MAF) and an 8-entry victim buffer (VAF). A miss occurs when the 21264/EV67 probes the Bc ache but doe s not find t he address ed

block. The 21264/EV67 can queue eight cache misses to the system in its MAF.

4.1.2 Second-Level Cache (Bcache) Interface

The 21264/EV67 Cbox provides control signals and an interface for a second-level cache, the Bcache. The 21264/EV67 supports a Bcache from 1MB to 16MB, with 64byte blocks. A 128-bit data bus is used for transfers between the 21264/EV67 and the Bcache. The Bcache must be comprised of synchronous static RAMs (SSRAMs) and must contain either one , t w o, or t hr ee i nt er n al r egi st ers . Al l Bcache control and address pins are clocked synchronously on Bcache cycle boundaries. The Bcache clock rate varies as a multiple of the CPU clock cycle in half-cycle increments from 1.5 to 4.0, and in full-cycle increments of 5, 6, 7, and 8 times the CPU clock cycle. The 1.5 multiple is only available in dual-data mode.

4.2 Physical Address Considerations

The 21264/EV67 supports a 44-bit physical address space that is divided equally between memory space and I/O space. Memory space resides in the lower half of the physical address space (PA[43] = 0) and I/O space resides in the upper half of the physical address space (PA[43] = 1). The 21264/EV67 recognizes these spaces internally.

The 21264/EV67-generated external references to memory space are always of a fixed 64-byte size, though the internal access granularity is byte, word, longword, or quadword. All 21264/EV67-gener ated e xtern al ref erences t o memory or I/O space are phys ical addresses that are either successfully translated from a virtual address or produced by PALcode. Speculative execution may cause a reference to nonexistent memory. Systems must check the range of all addresses and report nonexistent addresses to the 21264/EV67.

Table 4–1 describes the translation of inter nal references to external interface references. The first column lists the instructions used by the programmer, including load (LDx) and store (STx) instructions of several si zes . Th e column headings are described here:

• DcHit (block was found in the Dcache)

• DcW (block was found in a writable state in the Dcache)

• BcHit (block was found in the Bcache)

• BcW (block was found in a writable state in the Bcache)

• Status and Action (status at end of instruction and action performed by the 21264/

EV67)

4–4 Cache and External Interfaces

Alpha 21264/EV67 Hardware Reference Manual

Physical Address Considerations

Prefetches (LDL, LDF, LDG, LDT, LDBU, LDWU) to R31 use the LDx flow, and prefetch with modify intent (LDS) uses the STx flow. If the prefetch target i s addres sed to I/O space, the upper address bit is cleared, converting the address to memory space (PA[42:6] ). Notes follow the table.

Table 4–1 Translation of Internal References to External Interface Reference

Instruction DcHit DcW BcHit BcW Status and Action

LDx Memory 1 X X X Dcache hit, done. LDx Memory 0 X 1 X Bcache hit, done. LDx Memory 0 X 0 X Miss, generate RdBlk command. LDx I/O X X X X RdBytes, RdLWs, or RdQWs based on size. Istream Memory 1 X X X Dcache hit, Istream serviced from Dcache. Istream Memory 0 X 1 X Bcache hit, Istream serviced from Bcache. Istream Memory 0 X 0 X Miss, generate RdBlkI command. STx Memory 1 1 X X Store Dcache hit and writable, done. STx Memory 1 0 X X Store hit and not writable, set dirty flow (note 1). STx Memory 0 X 1 1 Store Bcache hit and writable, done. STx Memory 0 X 1 0 Store hit and not writable, set-dirty flow (note 1). STx Memory 0 X 0 X Miss, generate RdBlkMod command. STx I/O X X X X WrBytes, WrLWs, or WrQWs based on size. STx_C Memory 0 X X X Fail STx_C. STx_C Memory 1 0 X X STx_C hit and not writable, set dirty flow (note 1). STx_C I/O X X X X Always succeed and WrQws or WrLws are generated,

based on the size. WH64 Memory 1 1 X X Hit, done. WH64 Memory 1 0 X X WH64 hit not writable, set dirty flow (note 1). WH64 Memory 0 X 1 1 WH64 hit dirty, done. WH64 Memory 0 X 1 0 WH64 hit not writable, s et dirty flow (note 1). WH64 Memory 0 X 0 X Miss, generate InvalToDirty command (n ote 2). WH64 I/O X X X X NOP the instruction. WH64 is UNDEFINED for I/O

space. ECB Memory X X X X Generate evict command (note 3). ECB I/O X X X X NOP the instruction. ECB instruction is UNDEFINED

for I/O space. MB/WMB

TB Fill Flows

Alpha 21264/EV67 Hardware Reference Manual

X X X X Generate MB command (note 4). See Section 2.12.1.

Cache and External Interfaces 4–5

Physical Address Considerations

Table 4–1 notes:

1. Set Dirty Flow: Based on the Cbox CSR SET_DIRTY_ENABLE[2:0], SetDirty requests can be either internally acknowledged (called a SetModify) or sent to the system environment f or processing. When externally acknowl edg ed, the shared status information for the cache block is also broadcast. The commands sent externally are SharedToDirty or CleanToDirty. Based on the Cbox CSR ENABLE_STC_COMMAND[0], the external system can be informed of a STx_C generating a SetDirty using the STCChangeToDirty command. See Table 4–16 for more information.

2. InvalToDirty: Based on the Cbox CSR INVAL_TO_DIRTY_ENABLE[1:0], InvalToDirty requests can be either internally acknowledged or sent to the system environment as InvalToDirty commands. Th is Cbox CSR provide s the ability t o conver t WH64 instructions to RdModx operations. See Table 4–15 for more information.

3. Evict: There are two aspects to the commands that are generated by an ECB instruction: fi rst, those com mands that are gene rated to not ify the system of a n evict being performed; second, those commands that are generated by any victim that is created by servicing the ECB.

– If Cbox CSR ENABLE_EVICT[0] is clear, no command is issued by the

21264/EV67 on the external interface to notify the system of an evict being performed. If Cbox CSR ENABLE_EVICT[0] is se t, the 21264/EV67 iss ues an Evict command on the system interface only if a Bcache index match to the ECB address is found in the 21264/EV67 cache system.

Note that whenever ENABLE_EVICT[0] is true (in the write-many chain), BC_CLEAN_VICTIM must also be true (in the write-once chain). Otherwise, the 21264/EV67 could respon d miss t o a pr obe, ra ther t han hi t, bef ore a n Evict command has been sent off chip, but after the Evict command has removed a (clean) block from the internal caches and the Bcache. That behavior might cause systems that maintain an external duplicate copy of the Bcache tags to become confused, because the system could receive the probe re spo nse indicating the miss befo re it receives the Evict command.

– The 21264/EV67 can issue the commands CleanVictimBlk and WrVictimBlk

for a victim that is created by an ECB. CleanVictimBlk is issued only if Cbox CSR BC_CLEAN_VICTIM is set and there is a Bcache index match valid but not dirty in the 21264/EV67 cache system. Wr VictimBlk is issued for any Bcache match of the ECB address that is dirty in the 21264/EV67 cache system.

4. MB: Based on the Cbox CSR SYSBUS_MB_ENABLE, the MB command can be sent to the pins.

Each of these CSRs is programmed appropriately, based on the cache coherence protocol used by the system environment. For example, uniprocessor systems would prefer to internally acknowledge most of these transactions. In contrast, multiprocessor systems may require notification and control of any change in cache state. The 21264/ EV67 and the external syste m must cooper ate to mai ntai n cache coh erence . Secti on 4.5 explains the 21264/EV67 part of the cache coherency protocol.

4–6 Cache and External Interfaces

Alpha 21264/EV67 Hardware Reference Manual

4.3 Bcache Structure

The 21264/EV67 Cbox provides control signals and an interface for a second-level cache (Bcache).

The 21264/EV67 supports a Bcache from 1MB to 16MB, with 64-byte blocks. A 128bit bidirectiona l data b us is used for t ransf ers be tween t he 212 64/EV67 a nd the Bcache . The Bcache is fully synchronous and the synchronous static RAMs (SSRAMs) must contain either one, two, or three internal registers. All Bcache control and address pins are clocked synchronous ly on Bcache cycl e boundaries. The Bcache clock rate va ries as a multiple of the CPU clock cycle in half-cycle increments from 1.5 to 4.0, and in fullcycle increments of 5, 6, 7, and 8 times the CPU clock cycle. The 1.5 multiple is only available in dual-data mode.

4.3.1 Bcache Interface Signals

Figure 4–2 shows the 21264/EV67 system interface signals.

Figure 4–2 21264/EV67 Bcache Interface Signals

Bcache Structure

BcData_H[127:0]

21264

BcCheck_H[15:0] BcDataInClk_H[7:0] BcDataOutClk_[3:0] BcDataOE_L BcDataWr_L BcAdd_H[23:4] BcTag_H[42:20] BcTagInClk_H BcTagOutClk_ BcVref BcTagDirty_H BcTagParity_H BcTagShared_H BcTagValid_H BcTagOE_L BcTagWr_L BcLoad_L

FM-05650-EV6

4.3.2 System Duplicate Tag Stor es

The 21264/EV67 provides Bcache st ate sup port fo r syste ms wit h and witho ut dupli cate tag stores, and will take different actions on this basis. The system sets the Cbox CSR DUP_TAG_ENA[0], indicating that it has a du plica te ta g store for t he Bcache. Syste ms using the DUP_TAG_ENA[0] bit must also use the Cbox CSR BC_CLEAN_VICTIM[0] bit to avoid deadlock situations.

Systems using a Bcache duplicate tag store can accelerate system performance by:

Alpha 21264/EV67 Hardware Reference Manual

Cache and External Interfaces 4–7

Victim Data Buffer

• Issuing probes and SysDc fill commands to the 21264/EV67 out-of-order with

respect to their order at the system serialization point

• Filtering out all probe misses from the 21264/EV67 cache system

If a probe misses in the 21264/EV67 cache system (Bcache miss and VAF miss), the 21264/EV67 stalls probe processing with the expectation that a SysDc fill will allocate this block. Because of this, in du plicate tag mode, the 21264/E V67 can never generate a probe miss response.

When Cbox CSR DUP_TAG_ENA[0] equals 0, the 21264/EV67 delivers a miss response for probes that do not hit in its cache system.

4.4 Victim Data Buffer

The 21264/EV67 has eight victim data buffers (VDBs). They have the following properties:

• The VDBs are used for both vi ctims ( fil ls tha t are rep lacin g dirt y cache blo cks) a nd

for system probes that require data movement. The CleanVictimBlk command (optional) assigns and uses a VDB.

• Each VDB has two valid bits that indicate the buffer is valid for a victim or valid

for a probe or valid for both a victim and a probe. Probe commands that match the address of a victim address file (VAF) entry with an asserted probe-valid bit (P) will stall the 21264/EV67 probe queue. No ProbeResponses will be returned until the P bit is c lear.

• The release victim buffer (RVB) bit, when asserted, causes the victim valid bit, on

the victim data buffer (VDB) specified in the ID field, to be cleared. The RVB bit will also clear t he IOWB when s ystems move dat a on I/ O writ e tra nsacti ons. I n this case, ID[3] equals one.

• The release probe buffer (RPB) bit, when asserted (with a WriteData or Release-

Buffer SysDc command), clears the P bit in the victim buffer entry specified in the ID field.

• Read data commands and victim write commands use IDs 0-7, while IDs 8-11 are

used to address the four I/O write buffers.

4.5 Cache Coherency

This section describes the basics and protocols of the 21264/EV67 cache coherency scheme.

4.5.1 Cache Coherency Basics

The 21264/EV67 systems maintain the ca che hi er arc hy shown in Figure 4–3.

4–8 Cache and External Interfaces

Alpha 21264/EV67 Hardware Reference Manual

Figure 4–3 Cache Subset Hierarchy

Cache Coherency

System

Icache

Main Memory

Bcache

Dcache

FM-05824.AI4

The following tasks must be performed to maintain cache coherency:

• Istream data from memory spaces may be cached in the Icache and Bcache. Icache

coherence is not maintai ned by hardware —it must be maint ained by soft ware using the CALL_PAL IMB instruction.

• The 21264/EV67 maintains the Dcache as a subset of the Bcache. The Dcache is

set-associative but is kept a subset of the larger externally implemented directmapped Bcache.

• System logic must help the 21264/EV67 to keep the Bcache coherent with main

memory and other caches in the system.

• The 21264/EV67 requires the system to allow only one change to a block at a time.

This means that if the 21264/EV67 gains the bus to read or write a block, no other node on the bus should be allowed to access that block until the data has been moved.

• The 21264/EV67 provides hardware mechanisms to support several cache coher-

ency protocols. The protocols can be separat ed into two classes: write invalidate cache coherency protocol and flush cache coherency protocol.

4.5.2 Cache Block States

Table 4–2 lists the cache block states supported by the 21264/EV67.

Table 4–2 21264/EV67-Supported Cache Block States

State Name Description

Invalid The 21264/EV67 do es not have a copy of the block. Clean This 21264/EV67 holds a read-on ly copy o f the blo ck, an d no other agent i n th e system holds

a copy. Upon eviction, the block is not written to memory.

(Sheet 1 of 2)

Alpha 21264/EV67 Hardware Reference Manual

Cache and External Interfaces 4–9

Cache Coherency

Table 4–2 21264/EV67-Supported Cache Block States

State Name Description

Clean/Shared This 21264/EV67 holds a read-only copy of the block, and at least one other agent in the sys-

tem may hold a copy of the block. Upon eviction, the block is not written to memory.

Dirty This 21264/EV67 holds a read-write copy of the block, and must write it to memory after it is

evicted from the cache. No other agent in the system holds a copy of the block.

Dirty/Shared This 21264/EV67 holds a read-only copy of the dirty block, which may be shared with

another agent. The block must be written back to memory when it is evicted.

(Sheet 2 of 2)

4.5.3 Cache Block State Transitions

Cache block state transitions are reflected by 21264/EV67-generated commands to the system. Cache block state transitions can also be caused by system-generated commands to the 21264/EV67 (probes). Probes control the next state for the cache block.

The next state ca n be based on the previous state of the cache block. Table 4–3 lists the next state for the cache block.

Table 4–3 Cache Block State Transitions

Next State Action Based on Probe Hit

No change Do not update cache state. Useful for DMA transactions that sample data but

do not want to update tag state. Clean Independent of previous state, update next state to Clean. Clean/Shared Independent of previous state, update next state to Clean/Shared. This transac-

tion is useful for systems that update memory on probe hits. T1:

Clean ⇒ Clean/Shared Dirty ⇒ Dirty/Shared

T3: Clean ⇒ Clean/Shared Dirty ⇒ Invalid Dirty/Shared ⇒ Clean/Shared

Based on the dirty bit, make the block clean or dirty shared. This transaction

is useful for systems that do not update memory on probe hits.

If the block is Clean or Dirty/Shared, change to Clean/Shared. If the block is

Dirty, change to Invalid. This transaction is useful for systems that use the

Dirty/Shared state as an exclusive state.

The cache state transitions caused by 21264/EV67-generated commands are under the full control o f the system environment usin g the SysDc (system data control) commands. Table 4–4 lists these commands.

Table 4–4 System Responses to 21264/EV67 Commands

Response Type 21264/EV67 Action

SysDc ReadData Fill block with the associated data and update tag with clean cache status. SysDc ReadDataDirty Fill block with the associated data and update tag with dirty cache status. SysDc ReadDataShared Fill block with the associated data and update tag with shared cache status. SysDc ReadDataShared/Dirty Fill block with the associated data and update tag with dirty/shared status. SysDc ReadDat aE rror Fill block with a l l-ones reference pattern and update tag with inval i d status. SysDc ChangeToDirtySuccess Unconditionall y upda te block with dirty cache status. SysDc ChangeToDirtyFail Do not update cache status and fail any associated STx_C instructions.

4–10 Cache and External Interfaces

Alpha 21264/EV67 Hardware Reference Manual

4.5.4 Using SysDc Commands

Note the following:

• The conventional response for RdBlk commands is SysDc ReadData or ReadD-

ataShared.

• The conventional response for a RdBlkMod command is SysDc ReadDataDirty.

• The conventional response for ChangeToDirty commands is

ChangeToDirtySuccess or ChangeToDirtyFail.

However, t he system en vironment i s not limited to these r esponses. Table 4–5 shows all 21264/EV67 commands, system responses, and the 21264/EV67 reaction. The 21264/ EV67 commands are described in the following list:

• Rdx commands are generated by load or Istream references.

• RdBlkModx commands are generated by store references.

• The ChxToDirty command group includes CleanToDirty, SharedToDirty, and STC-

ChangeToDirty commands, which are generated by store references that hit in the 21264/EV67 cache system.

Cache Coherency

• InvalToDirty commands are generated by WH64 instructions that miss in the

21264/EV67 cache system.

• FetchBlk and FetchBlkSpec are noncached references to memory space that have

missed in the 21264/EV67 cache system.

• Rdiox commands are noncached references to I/O address space.

• Evict and STCChangeToDirty commands are generated by ECB and STx_C

instructions, respectively.

Table 4–5 shows the system responses to 21264/EV67 commands and 21264/EV67 reactions.

Table 4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions

21264/EV67 CMD SysDc 21264/EV67 Action

Rdx ReadData

ReadDataShared

Rdx ReadDataShared/Dirty The cache block is filled and marked dirty/shared. Succeeding store

Rdx ReadDataDirty The cache block is filled and marked dirty. Rdx ReadDataError The cache block access was to NXM address space. The 21264/EV67

This is a normal fill. The cache block is filled and marked clean or shared based on SysDc.

commands cannot update the block without external reference.

delivers an all-ones pattern to any load command and evicts the block from the cache (with associated victim processing). The cache block is marked invalid.

Rdx ChangeToDirtySuccess

ChangeToDirtyFail

Alpha 21264/EV67 Hardware Reference Manual

Both SysDc responses are illegal for read commands.

Cache and External Interfaces 4–11

Cache Coherency

Table 4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions (Continued)

21264/EV67 CMD SysDc 21264/EV67 Action

RdBlkModx ReadData

ReadDataShared ReadDataShared/Dirty

The cache block is filled and marked with a nonwritable status. If the store instruction that generated the RdBlkModx command is still active (not killed), the 21264/EV67 will retry the instruction, generating the appropriate ChangeToDirty command. Succeeding store commands cannot update the block without external reference.

RdBlkModx ReadDataDirty The 21264/EV67 performs a normal fill r esponse, and the cache block

becomes writable.

RdBlkModx ChangeToDirtySuccess

Both SysDc responses are illegal for read/modify commands.

ChangeToDirtyFail

RdBlkModx ReadDataError The cache block command was to NXM address space. The 21264/

EV67 delivers an all-ones pattern to any dependent load command, forces a fail action on any pending s to re comm ands to th i s block , and any store to this block is not retried. The Cbox evicts the cache block from the cache system (with associated victim processing). Th e cache block is marked invalid.

ChxToDirty ReadData

ReadDataShared ReadDataShared/Dirty

The original data in the Dcache is replaced with the filled data. The block is not writable, so the 21264/EV67 will retry the store instruction and generate another ChxToDirty class command. To avoid a potential livelock situation, the STC_ENABLE CSR bit must be set. Any STx_C instruction to this block is forced to fail. In addition, a Shared/Dirty response causes the 21264/EV67 to generate a victim for this block upon eviction.

ChxToDirty ReadDataDirty The data in the Dcache is replaced with the filled data. The block is

writable, so the store instruction that generated the original command can update this block. Any STx_C instruction to this block is forced to fail. In addition, the 21264/EV67 generates a victim for this block upon eviction.

ChxToDirty ReadDataError Impossible situation. The block must be cached to generate a ChxTo-

Dirty command. Caching the block is not possible because all NXM fills are filled noncached.

ChTo Dirty ChangeToDirtySuccess Normal response. ChangeToDirtySuccess makes the block writable.

The 21264/EV67 retries the store instruction and u pdates th e Dcache. Any STx_C instruction associated wi th this block is allowed to succeed.

ChxToDirty ChangeToDirtyFail The MAF entry is retired. Any STx_C instruction associated with the

block is forced to fail. If a STx instruction generated this block, the 21264/EV67 retries and generates either a RdBlkModx (because the reference that failed the ChangeToDirty also invalidated the cache by way of an invalidating probe) or another ChxToDirty command.

InvalToDirty ReadData

ReadDataShared

The block is not writable, so the 21264/EV67 will retry the WH64 instruction and generate a ChxToDirty command .

ReadDataShared/Dirty

InvalToDirty ReadDataError The 21264/EV67 doesn’t send InvalToDirty commands offchip spec-

ulatively. This NXM condition is a hard error. Systems should perform a machine check.

InvalToDirty ReadDataDirty

The block is writable. Done.

ChangeToDirtySuccess

4–12 Cache and External Interfaces

Alpha 21264/EV67 Hardware Reference Manual

Compaq 21264, EV67 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Table of Contents

Figures

Tables

Preface

1.1 The Architecture

Introduction

1.1.1 Addressing

1.1.2 Integer Data Types

1.1.3 Floating-Point Data Types

21264/EV67 Microprocessor Features

Internal Architecture

2.1 21264/EV67 Microarchitecture

2.1.1 Instruction Fetch, Issue, and Retire Unit

2.1.1.1 Virtual Program Counter Logic

2.1.1.2 Branch Predictor

2.1.1.3 Instruction-Stream Translation Buffer

2.1.1.4 Instruction Fetch Logic

2.1.1.5 Register Rename Maps

2.1.1.6 Integer Issue Queue

2.1.1.7 Floating-Point Issue Queue

2.1.1.8 Exception and Interrupt Logic

2.1.1.9 Retire Logic

2.1.2 Integer Execution Unit

2.1.3 Floating-Point Execution Unit

2.1.4 External Cache and System Interface Unit

2.1.4.1 Victim Address File and Victim Data File

2.1.4.2 I/O Write Buffer

2.1.4.3 Probe Queue

2.1.4.4 Duplicate Dcache Tag Array

2.1.5 Onchip Caches

2.1.5.1 Instruction Cache

2.1.5.2 Data Cache

2.1.6 Memory Referenc e Unit

2.1.6.1 Load Queue

2.1.6.2 Store Queue

Pipeline Organization

2.1.6.3 Miss Address File

2.1.6.4 Dstream Translation Buffer

2.1.7 SROM Interface

Instruction Issue Rules

2.2.1 Pipel ine Aborts

2.3.1 Instruction Group Definitions

2.3.2 Ebox Slotting

2.3.3 Instruction Latencies

Instruction Retire Rules

Retire of Operate Instructions into R31/F31

2.4.1 Floating-Point Divide/Square Root Early Retire

2.6 Load Instructions to R31 and F31

2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU, HW_LDL Instructions

2.6.2 Prefetch with Modify Intent: LDS Instruction

Special Cases of Alpha Instruction Execution

2.6.3 Prefetch, Evi ct Next: LDQ and HW_LDQ Instructions

2.6.4 Prefetch with the LDx_L / STx_C Instruction Sequence

2.7.1 Load Hit Speculation

2.7.2 Floating-Point Store Instructions

2.7.3 CMOV Instruction

Memory and I/O Address Space Instructions

2.8.1 Memory Address Space Load Instructions

2.8.2 I/O Address Space Load Instructions

2.8.3 Memory Address Space Store Instructions

2.8.4 I/O Address Space Store Instructions

MAF Memory Address Space Merging Rules

2.10 Instructio n Ordering

Replay Traps

I/O Write Buffer and the WMB Instruction

2.11.1.1 Load-Load Order Trap

2.11.1.2 Stor e-Load Order Trap

2.11.2 Other Mbox Replay Traps

2.12.1 Memory Barrier (MB/WMB/TB Fill Flow)

2.12.1.1 MB Instruction Processing

2.12.1.2 WMB Instruction Processing

2.12.1.3 TB Fill Flow

Performance Measurement Support—Performance Counters

2.14 Floating-Point Control Register

AMASK and IMPLVER Instruction Values

2.15.1 AMASK