Compaq 21264, EV67 User Manual

Alpha 21264/EV67 Microprocessor Hardware Reference Manual
Order Numbe r: DS–0028B–TE
This manual is directly derived from the internal 21264/EV67 Specifications, Revi­sion 1.4. You can access this hardware reference manual in PDF format from the following site:
ftp://ftp.compaq.com/pub/products/alphaCPUdocs
Revision/Update Information: This is a revised document . It supercedes
the Alpha 21264A Microprocessor Hardware Reference Manual
(DS–0028A–TE).
Compaq Computer Corporation Shrewsbury, Massachusetts
September 2000
The information in this publication is subj ec t to change without notice.
COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAM­AGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS
INFORMATION IS PROVIDED “AS IS” AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WAR­RANTIES OF MERCHANTABILITY, FITNESS FOR P ARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT.
This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form wit h out prior written consent from Compaq Computer Corporation.
© Compaq Computer Corporation 2000. All rights reserved. Printed in the U.S.A.
COMPAQ, the Compaq logo, the Digi tal logo, and VAX Registered in United States Pa tent and Trademark Office.
Pentium is a registered tra de ma rk of Intel Corporation.
Other product names mentioned herein may be trademarks and/or registered trademarks of their respective compa­nies.
Alpha 21264/EV67 Hardware Reference Manual

Table of Contents

Preface
1 Introduction
1.1 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–1
1.1.1 Addressing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2
1.1.2 Integer Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2
1.1.3 Floating-Point Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2
1.2 21264/EV67 Microprocessor Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–3
2 Internal Arch itecture
2.1 21264/EV67 Microarchitecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–1
2.1.1 Instruction Fetch, Issue, and Retire Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2
2.1.1.1 Virtual Program Counter Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–2
2.1.1.2 Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–3
2.1.1.3 Instruction-Stream Translation Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5
2.1.1.4 Instruction Fetch Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–6
2.1.1.5 Register Rename Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–6
2.1.1.6 Integer Issue Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–6
2.1.1.7 Floating-Point Issue Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–7
2.1.1.8 Exception and Interrupt Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–8
2.1.1.9 Retire Logic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–8
2.1.2 Integer Execution Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–8
2.1.3 Floating-Point Execution Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–10
2.1.4 External Cache and System Interface Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.4.1 Victim Address File and Victim Data File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.4.2 I/O Write Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.4.3 Probe Queue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.4.4 Duplicate Dcache Tag Array. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.5 Onchip Caches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.5.1 Instruction Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–11
2.1.5.2 Data Cache. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–12
2.1.6 Memory Reference Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–12
2.1.6.1 Load Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13
2.1.6.2 Store Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13
2.1.6.3 Miss Address File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13
2.1.6.4 Dstream Translation Buffer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13
2.1.7 SROM Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13
2.2 Pipeline Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–13
2.2.1 Pipeline Aborts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–16
2.3 Instruction Issue Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–16
Alpha 21264/EV67 Hardware Reference Manual
iii
2.3.1 Instruction Group Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–17
2.3.2 Ebox Slotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18
2.3.3 Instruction Latencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–20
2.4 Instruction Retire Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–21
2.4.1 Floating-Point Divide/Square Root Early Retire. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–22
2.5 Retire of Operate Instructions into R31/F31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–22
2.6 Load Instructions to R31 and F31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–23
2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU, HW_LDL Instructions . . . . . . . 2–23
2.6.2 Prefetch with Modify Intent: LDS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–23
2.6.3 Prefetch, Evict Next: LDQ and HW_LDQ Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24
2.6.4 Prefetch with the LDx_L / STx_C Instruction Sequence . . . . . . . . . . . . . . . . . . . . . . . . 2–24
2.7 Special Cases of Alpha Instruction Execution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24
2.7.1 Load Hit Speculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–24
2.7.2 Floating-Point Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–26
2.7.3 CMOV Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–26
2.8 Memory and I/O Address Space Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–27
2.8.1 Memory Address Space Load Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–27
2.8.2 I/O Address Space Load Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–28
2.8.3 Memory Address Space Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–29
2.8.4 I/O Address Space Store Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–29
2.9 MAF Memory Address Space Merging Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–30
2.10 Instruction Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–30
2.11 Replay Traps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31
2.11.1 Mbox Order Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31
2.11.1.1 Load-Load Order Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32
2.11.1.2 Store-Load Order Trap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32
2.11.2 Other Mbox Replay Traps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32
2.12 I/O Write Buffer and the WMB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32
2.12.1 Memory Barrier (MB/WMB/TB Fill Flow) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–32
2.12.1.1 MB Instruction Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–33
2.12.1.2 WMB Instruction Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34
2.12.1.3 TB Fill Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34
2.13 Performance Measurement Support—Performance Counters . . . . . . . . . . . . . . . . . . . . . . . 2–36
2.14 Floating-Point Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–36
2.15 AMASK and IMPLVER Instruction Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38
2.15.1 AMASK. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38
2.15.2 IMPLVER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38
2.16 Design Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–39
3 Hardware Interface
3.1 21264/EV67 Microprocessor Logic Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–1
3.2 21264/EV67 Signal Names and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3.3 Pin Assignments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–8
3.4 Mechanical Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–17
3.5 21264/EV67 Packaging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–18
4 Cache and External Inte rf ace s
4.1 Introduction to the External Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 System Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.1.1 Commands and Addresses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.1.2 Second-Level Cache (Bcache) Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Physical Address Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.3 Bcache Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4.3.1 Bcache Interface Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
iv
Alpha 21264/EV67 Hardware Reference Manual
4.3.2 System Duplicate Tag Stores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4.4 Victim Data Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.5 Cache Coherency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.5.1 Cache Coherency Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.5.2 Cache Block States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.5.3 Cache Block State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4.5.4 Using SysDc Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–11
4.5.5 Dcache States and Duplicate Tags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–13
4.6 Lock Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–14
4.6.1 In-Order Processing of LDx_ L/STx_C Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–15
4.6.2 Internal Eviction of LDx_L Blocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–15
4.6.3 Liveness and Fairness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–15
4.6.4 Managing Speculative Store Issues with Multiprocessor Systems . . . . . . . . . . . . . . . . 4–16
4.7 System Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–16
4.7.1 System Port Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17
4.7.2 Programming the System Interface Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–18
4.7.3 21264/EV67-to-System Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–19
4.7.3.1 Bank Interleave on Cache Block Boundary Mode . . . . . . . . . . . . . . . . . . . . . . . . . 4–19
4.7.3.2 Page Hit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–20
4.7.4 21264/EV67-to-System Commands Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–21
4.7.5 ProbeResponse Commands (Command[4:0] = 00001). . . . . . . . . . . . . . . . . . . . . . . . . 4–24
4.7.6 SysAck and 21264/EV67-to-System Commands Flow Control . . . . . . . . . . . . . . . . . . . 4–25
4.7.7 System-to-21264/EV67 Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–26
4.7.7.1 Probe Commands (Four Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–26
4.7.7.2 Data Transfer Commands (Two Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–28
4.7.8 Data Movement In and Out of the 21264/EV67. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–30
4.7.8.1 21264/EV67 Clock Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–30
4.7.8.2 Fast Data Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–31
4.7.8.3 Fast Data Disable Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–33
4.7.8.4 SysDataInValid_L and SysDataOutValid_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–34
4.7.8.5 SysFillValid_L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–35
4.7.8.6 Data Wrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–36
4.7.9 Nonexistent Memory Proce ssing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–38
4.7.10 Ordering of System Port Transactions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–40
4.7.10.1 21264/EV67 Commands and System Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–40
4.7.10.2 System Probes and SysDc Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–42
4.8 Bcache Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–42
4.8.1 Bcache Port Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–43
4.8.2 Bcache Clocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–44
4.8.2.1 Setting the Period of the Cache Clock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–45
4.8.3 Bcache Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–47
4.8.3.1 Bcache Data Read and Tag Read Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . 4–47
4.8.3.2 Bcache Data Write Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–48
4.8.3.3 Bubbles on the Bcache Data Bus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–49
4.8.4 Pin Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–51
4.8.4.1 BcAdd_H[23:4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–51
4.8.4.2 Bcache Control Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52
4.8.4.3 BcDataInClk_H and BcTagInClk_H . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53
4.8.5 Bcache Banking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–54
4.8.6 Disabling the Bcache for Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–54
4.9 Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–54
5 Internal Processor Registers
5.1 Ebox IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5.1.1 Cycle Counter Register – CC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5.1.2 Cycle Counter Control Register – CC_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3
Alpha 21264/EV67 Hardware Reference Manual
v
5.1.3 Virtual Address Register – VA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5.1.4 Virtual Address Control Register – VA_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5.1.5 Virtual Address Format Register – VA_FORM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5
5.2 Ibox IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5.2.1 ITB Tag Array Write Register – ITB_TAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5.2.2 ITB PTE Array Write Register – ITB_PTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5.2.3 ITB Invalidate All Process (ASM=0) Register – ITB_IAP. . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.2.4 ITB Invalidate All Register – ITB_IA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.2.5 ITB Invalidate Single Register – ITB_IS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.2.6 ProfileMe PC Register – PMPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5.2.7 Exception Address Register – EXC_ADDR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5.2.8 Instruction Virtual Address Format Register — IVA_FORM. . . . . . . . . . . . . . . . . . . . . . 5–9
5.2.9 Interrupt Enable and Current Processor Mode Register – IER_CM. . . . . . . . . . . . . . . . 5–9
5.2.10 Software Interrupt Request Register – SIRR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10
5.2.11 Interrupt Summary Register – ISUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11
5.2.12 Hardware Interrupt Clear Register – HW_INT_CLR . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12
5.2.13 Exception Summary Register – EXC_SUM. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–13
5.2.14 PAL Base Register – PAL_BASE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15
5.2.15 Ibox Control Register – I_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15
5.2.16 Ibox Status Register – I_STAT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–18
5.2.17 Icache Flush Register – IC_FLUSH. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21
5.2.18 Icache Flush ASM Register – IC_FLUSH_ASM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21
5.2.19 Clear Virtual-to-Physical Map Register – CLR_MAP. . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21
5.2.20 Sleep Mode Register – SLEEP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21
5.2.21 Process Context Register – PCTX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21
5.2.22 Performance Counter Control Register – PCTR_CTL. . . . . . . . . . . . . . . . . . . . . . . . . . 5–23
5.3 Mbox IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–25
5.3.1 DTB Tag Array Write Registers 0 and 1 – DTB_TAG0, DTB_TAG1 . . . . . . . . . . . . . . . 5–25
5.3.2 DTB PTE Array Write Registers 0 and 1 – DTB_PTE0, DTB_PTE1 . . . . . . . . . . . . . . . 5–26
5.3.3 DTB Alternate Processor Mode Register – DTB_ALTMODE. . . . . . . . . . . . . . . . . . . . . 5–26
5.3.4 Dstream TB Invalidate All Process (ASM=0) Register – DTB_IAP . . . . . . . . . . . . . . . . 5–27
5.3.5 Dstream TB Invalidate All Register – DTB_IA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–27
5.3.6 Dstream TB Invalidate Single Registers 0 and 1 – DTB_IS0,1 . . . . . . . . . . . . . . . . . . . 5–27
5.3.7 Dstream TB Address Space Number Registers 0 and 1 – DTB_ASN0,1 . . . . . . . . . . . 5–28
5.3.8 Memory Management Status Register – MM_STAT . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–28
5.3.9 Mbox Control Register – M_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–29
5.3.10 Dcache Control Register – DC_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–30
5.3.11 Dcache Status Register – DC_STAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31
5.4 Cbox CSRs and IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32
5.4.1 Cbox Data Register – C_DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5.4.2 Cbox Shift Register – C_SHFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5.4.3 Cbox WRITE_ONCE Chain Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5.4.4 Cbox WRITE_MANY Chain Descriptio n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–38
5.4.5 Cbox Read Register (IPR) Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–41
6 Privileged Architecture Library Code
6.1 PALcode Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–1
6.2 PALmode Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–2
6.3 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3
6.4 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3
6.4.1 HW_LD Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3
6.4.2 HW_ST Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4
6.4.3 HW_RET Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–5
6.4.4 HW_MFPR and HW_MTPR Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6
6.5 Internal Processor Register Access Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–7
6.5.1 IPR Scoreboard Bits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–8
vi
Alpha 21264/EV67 Hardware Reference Manual
6.5.2 Hardware Structure of Explicitly Written IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–8
6.5.3 Hardware Structure of Implicitly Written IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9
6.5.4 IPR Access Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9
6.5.5 Correct Ordering of Explicit Writers Followed by Implicit Readers. . . . . . . . . . . . . . . . . 6–10
6.5.6 Correct Ordering of Explicit Readers Followed by Implicit Writers. . . . . . . . . . . . . . . . . 6–11
6.6 PALshadow Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–11
6.7 PALcode Emulation of the FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–11
6.7.1 Status Flags. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12
6.7.2 MF_FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12
6.7.3 MT_FPCR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12
6.8 PALcode Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12
6.8.1 CALL_PAL Entry Points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–12
6.8.2 PALcode Exception Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13
6.9 Translation Buffer (TB) Fill Flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14
6.9.1 DTB Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14
6.9.2 ITB Fill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–16
6.10 Performance Counter Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–17
6.10.1 General Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18
6.10.2 Aggregate Mode Programming Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18
6.10.2.1 Aggregate Mode Precautions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18
6.10.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–19
6.10.2.3 Aggregate Counting Mode Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.2.3.1 Cycle counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.2.3.2 Retired instructions cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.2.3.3 Bcache miss or long latency probes cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.2.3.4 Mbox replay traps cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.2.4 Counter Modes for Aggregate Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.3 ProfileMe Mode Programming Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.3.1 ProfileMe Mode Precautions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–20
6.10.3.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–21
6.10.3.3 ProfileMe Counting Mode Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.10.3.3.1 Cycle counting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.10.3.3.2 Inum retire delay cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.10.3.3.3 Retired instructions cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.10.3.3.4 Bcache miss or long latency probes cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.10.3.3.5 Mbox replay traps cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–23
6.10.3.4 Counter Modes for ProfileMe Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–24
7 Initialization and Configuration
7.1 Power-Up Reset Flow and the Reset_L and DCOK_H Pins. . . . . . . . . . . . . . . . . . . . . . . . . 7–1
7.1.1 Power Sequencing and Reset State for Signal Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3
7.1.2 Clock Forwarding and System Clock Ratio Configuration . . . . . . . . . . . . . . . . . . . . . . . 7–4
7.1.3 PLL Ramp Up. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6
7.1.4 BiST and SROM Load and the TestStat_H Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–6
7.1.5 Clock Forward Reset and System Interface Initialization. . . . . . . . . . . . . . . . . . . . . . . . 7–7
7.2 Fault Reset Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–8
7.3 Energy Star Certification and Sleep Mode Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–9
7.4 Warm Reset Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11
7.5 Array Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–12
7.6 Initialization Mode Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–12
7.7 External Interface Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–14
7.8 Internal Processor Register Power-Up Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–14
7.9 IEEE 1149.1 Test Port Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–16
7.10 Reset State Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–16
7.11 Phase-Lock Loop (PLL) Functional Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19
7.11.1 Differential Reference Clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19
Alpha 21264/EV67 Hardware Reference Manual
vii
7.11.2 PLL Output Clocks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19
7.11.2.1 GCLK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19
7.11.2.2 Differential 21264/EV67 Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19
7.11.2.3 Nominal Operating Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–19
7.11.2.4 Power-Up/Reset Clocking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–20
8 Error Detection and Error Handling
8.1 Data Error Correction Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2
8.2 Icache Data or Tag Parity Error. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2
8.3 Dcache Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2
8.4 Dcache Data Single-Bit Correctable ECC Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3
8.4.1 Load Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–3
8.4.2 Store Instruction (Quadword or Smaller) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4
8.4.3 Dcache Victim Extracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4
8.5 Dcache Store Second Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4
8.6 Dcache Duplicate Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–4
8.7 Bcache Tag Parity Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5
8.8 Bcache Data Single-Bit Correctable ECC Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5
8.8.1 Icache Fill from Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–5
8.8.2 Dcache Fill from Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–6
8.8.3 Bcache Victim Read. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–6
8.8.3.1 Bcache Victim Read During a Dcache/Bcache Miss . . . . . . . . . . . . . . . . . . . . . . . 8–6
8.8.3.2 Bcache Victim Read During an ECB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7
8.9 Memory/System Port Single-Bit Data Correctable ECC Error. . . . . . . . . . . . . . . . . . . . . . . . 8–7
8.9.1 Icache Fill from Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7
8.9.2 Dcache Fill from Memory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–7
8.10 Bcache Data Single-Bit Correctable ECC Error on a Probe . . . . . . . . . . . . . . . . . . . . . . . . . 8–8
8.11 Double-Bit Fill Errors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–9
8.12 Error Case Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–9
9 Electrical Data
9.1 Electrical Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1
9.2 DC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2
9.3 Power Supply Sequencing and Avoiding Potential Failure Mechanisms . . . . . . . . . . . . . . . 9–5
9.4 AC Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–6
10 Thermal Management
10.1 Operating Temperature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1
10.2 Heat Sink Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–3
10.3 Thermal Design Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–7
11 Testability and Diagnostics
11.1 Test Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–1
11.2 SROM/Serial Diagnostic Terminal Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2
11.2.1 SROM Load Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2
11.2.2 Serial Terminal Port . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–2
11.3 IEEE 1149.1 Port. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–3
11.4 TestStat_H Pin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–4
11.5 Power-Up Self-Test and Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5
11.5.1 Built-in Self-Test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5
viii
Alpha 21264/EV67 Hardware Reference Manual
11.5.2 SROM Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–5
11.5.2.1 Serial Instruction Cache Load Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–6
11.6 Notes on IEEE 1149.1 Operation and Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–7
A Alpha Instruction Set
A.1 Alpha Instruction Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–1
A.2 Reserved Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–8
A.2.1 Opcodes Reserved for Compaq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–8
A.2.2 Opcodes Reserved for PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9
A.3 IEEE Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9
A.4 VAX Floating-Point Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–11
A.5 Independent Floating-Point Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–11
A.6 Opcode Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–12
A.7 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–13
A.8 IEEE Floating-Point Conformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–14
B 21264/EV67 Boundary-Scan Register
B.1 Boundary-Scan Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B–1
B.1.1 BSDL Description of the Alpha 21264/EV67 Boundary-Scan Register . . . . . . . . . . . . . B–1
C Serial Icache Load Predecode Values
D PALcode Restrictions and Guidelines
D.1 Restriction 1 : Reset Sequence Required by Retire Logic and Mapper. . . . . . . . . . . . . . . D–1
D.2 Restriction 2 : No Multiple Writers to IPRs in Same Scoreboard Group . . . . . . . . . . . . . . . D–8
D.3 Restriction 4 : No Writers and Readers to IPRs in Same Scoreboard Group . . . . . . . . . . D–8
D.4 Guideline 6 : Avoid Consecutive Read-Modify-Write-Read-Modify-Write. . . . . . . . . . . . D–9
D.5 Restriction 7 : Replay Trap, Interrupt Code Sequence, and STF/ITOF . . . . . . . . . . . . . . . D–9
D.6 Restriction 9 : PALmode Istream Address Ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–10
D.7 Restriction 10: Duplicate IPR Mode Bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–10
D.8 Restriction 11: Ibox IPR Update Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–11
D.9 Restriction 12: MFPR of Implicitly-Written IPRs EXC_ADDR, IVA_FORM, and EXC_SUM D–11
D.10 Restriction 13 : DTB Fill Flow Collision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–11
D.11 Restriction 14 : HW_RET . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–11
D.12 Guideline 16 : JSR-BAD VA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–12
D.13 Restriction 17: MTPR to DTB_TAG0/DTB_PTE0/DTB_TAG1/DTB_PTE1 . . . . . . . . . . . . . D–12
D.14 Restriction 18: No FP Operates, FP Conditional Branches, FTOI, or STF in Same Fetch Block as
HW_MTPR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .D–12
D.15 Restriction 19: HW_RET/STALL After Updating the FPCR by way of MT_FPCR in PALmode D–12
D.16 Guideline 20 : I_CTL[SBE] Stream Buffer Enable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–12
D.17 Restriction 21: HW_RET/STALL After HW_MTPR ASN0/ASN1. . . . . . . . . . . . . . . . . . . . . . D–12
D.18 Restriction 22: HW_RET/STALL After HW_MTPR IS0/IS1. . . . . . . . . . . . . . . . . . . . . . . . . . D–13
D.19 Restriction 23: HW_ST/P/CONDITIONAL Does Not Clear the Lock Flag. . . . . . . . . . . . . . . D–13
D.20 Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, IC_FLUSH_ASM, CLEAR_MAP D–
14
D.21 Restriction 25: HW_MTPR ITB_IA After Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–14
D.22 Guideline 26: Conditional Branches in PALcode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–14
D.23 Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode. . . . . . . . . . . . . . . . . . . . . D–15
D.24 Restriction 28: Enforce Ordering Between IPRs Implicitly Written by Loads and Subsequent Loads
D–15
D.25 Guideline 29 : JSR, JMP, RET, and JSR_COR in PALcode. . . . . . . . . . . . . . . . . . . . . . . . . D–15
Alpha 21264/EV67 Hardware Reference Manual
ix
D.26 Restriction 30 : HW_MTPR and HW_MFPR to the Cbox CSR. . . . . . . . . . . . . . . . . . . . . . . D–15
D.27 Restriction 31 : I_CTL[VA_48] Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–17
D.28 Restriction 32 : PCTR_CTL Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–17
D.29 Restriction 33 : HW_LD Physical/Lock Use. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18
D.30 Restriction 34 : Writing Multiple ITB Entries in the Same PALcode Flow . . . . . . . . . . . . . . . D–18
D.31 Guideline 35 : HW_INT_CLR Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18
D.32 Restriction 36 : Updating I_CTL[SDE]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18
D.33 Restriction 37 : Updating VA_CTL[VA_48] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18
D.34 Restriction 38 : Updating PCTR_CTL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–18
D.35 Guideline 39: Writing Multiple DTB Entries in the Same PAL Flow. . . . . . . . . . . . . . . . . . . . D–19
D.36 Restriction 40: Scrubbing a Single-Bit Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–19
D.37 Restriction 41: MTPR ITB_TAG, MTPR ITB_PTE Must Be in the Same Fetch Block. . . . . D–21
D.38 Restriction 42: Updating VA_CTL, CC_CTL, or CC IPRs. . . . . . . . . . . . . . . . . . . . . . . . . . . D–21
D.39 Restriction 43: No Trappable Instructions Along with HW_MTPR. . . . . . . . . . . . . . . . . . . . . D–21
D.40 Restriction 44: Not Applicable to the 21264/EV67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–21
D.41 Restriction 45: No HW_JMP or JMP Instructions in PALcode . . . . . . . . . . . . . . . . . . . . . . . D–21
D.42 Restriction 46: Avoiding Live locks in Speculative Load CRD Handlers . . . . . . . . . . . . . . . D–22
D.43 Restriction 47: Cache Eviction for Single-Bit Cache Errors . . . . . . . . . . . . . . . . . . . . . . . . . D–22
D.44 Restriction 48: MB Bracketing of Dcache Writes to Force Bad Data ECC and Force Bad Tag Parity
D–24
E 21264/EV67-to-Bcache Pin Interconnections
E.1 Forwarding Clock Pin Groupings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–1
E.2 Late-Write Non-Bursting SSRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–2
E.3 Dual-Data Rate SSRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–3
Glossary
Index
x
Alpha 21264/EV67 Hardware Reference Manual

Figures

2–1 21264/EV67 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–3
2–2 Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–4
2–3 Local Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–4
2–4 Global Predictor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5
2–5 Choice Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–5
2–6 Integer Execution Unit—Clusters 0 and 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–9
2–7 Floating-Point Execution Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–10
2–8 Pipeline Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–14
2–9 Pipeline Timing for Integer Load Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–25
2–10 Pipeline Timing for Floating-Point Load Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–26
2–11 Floating-Point Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–36
2–12 Typical Uniprocessor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–39
2–13 Typical Multiprocessor Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–40
3–1 21264/EV67 Microprocessor Logic Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–2
3–2 Package Dimensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–17
3–3 21264/EV67 Top View (Pin Down) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–18
3–4 21264/EV67 Bottom View (Pin Up). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–19
4–1 21264/EV67 System and Bcache Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4–2 21264/EV67 Bcache Interface Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4–3 Cache Subset Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4–4 System Interface Signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17
4–5 Fast Transfer Timing Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–32
4–6 SysFillValid_L Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–36
5–1 Cycle Counter Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5–2 Cycle Counter Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5–3 Virtual Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5–4 Virtual Address Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5–5 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 0) . . . . . . . . . . . . . . . . . . . . 5–5
5–6 Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) . . . . . . . . . . . . . . . . . . . . 5–6
5–7 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) . . . . . . . . . . . . . . . . . . . . 5–6
5–8 ITB Tag Array Write Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5–9 ITB PTE Array Write Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5–10 ITB Invalidate Single Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5–11 ProfileMe PC Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5–12 Exception Address Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5–13 Instruction Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 0) . . . . . . . . . . . 5–9
5–14 Instruction Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) . . . . . . . . . . . 5–9
5–15 Instruction Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) . . . . . . . . . . . 5–9
5–16 Interrupt Enable and Current Processor Mode Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10
5–17 Software Interrupt Request Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11
5–18 Interrupt Summary Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11
5–19 Hardware Interrupt Clear Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12
5–20 Exception Summary Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14
5–21 PAL Base Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15
5–22 Ibox Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–16
5–23 Ibox Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–19
5–24 Process Context Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–22
5–25 Performance Counter Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–23
5–26 DTB Tag Array Write Registers 0 and 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–25
5–27 DTB PTE Array Write Registers 0 and 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–26
5–28 DTB Alternate Processor Mode Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–26
5–29 Dstream Translation Buffer Invalidate Single Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–27
5–30 Dstream Translation Buffer Address Space Number Registers 0 and 1. . . . . . . . . . . . . . . . 5–28
5–31 Memory Management Status Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–28
5–32 Mbox Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–29
5–33 Dcache Control Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31
Alpha 21264/EV67 Hardware Reference Manual
xi
5–34 Dcache Status Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32
5–35 Cbox Data Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5–36 Cbox Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5–37 WRITE_MANY Chain Write Transaction Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–39
6–1 HW_LD Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4
6–2 HW_ST Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4
6–3 HW_RET Instruction Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6
6–4 HW_MFPR and HW_MTPR Instructions Format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6
6–5 Single-Miss DTB Instructions Flow Example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–14
6–6 ITB Miss Instructions Flow Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–16
7–1 Power-Up Timing Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3
7–2 Fault Reset Sequence of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–9
7–3 Sleep Mode Sequence of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11
7–4 Example for Initializing Bcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–13
7–5 21264/EV67 Reset State Machine State Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–17
10–1 Type 1 Heat Sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–4
10–2 Type 2 Heat Sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–5
10–3 Type 3 Heat Sink. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–6
11–1 TAP Controller State Machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–4
11–2 TestStat_H Pin Timing During Power-U p Built-In Self-Test (BiST) . . . . . . . . . . . . . . . . . . . 11–5
11–3 TestStat_H Pin Timing During Buil t-In Self-Initialization (BiSI) . . . . . . . . . . . . . . . . . . . . . . . 11–5
11–4 SROM Content Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–6
xii
Alpha 21264/EV67 Hardware Reference Manual

Tables

1–1 Integer Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1–2
2–1 Pipeline Abort Delay (GCLK Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–16
2–2 Instruction Name, Pipeline, and Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–17
2–3 Instruction Group Definitions and Pipeline Unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–18
2–4 Instruction Class Latency in Cycles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–20
2–5 Minimum Retire Latencies for Instruction Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–21
2–6 Instructions Retired Without Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–23
2–7 Rules for I/O Address Space Load Instruction Data Merging . . . . . . . . . . . . . . . . . . . . . . . . 2–28
2–8 Rules for I/O Address Space Store Instruction Data Merging. . . . . . . . . . . . . . . . . . . . . . . . 2–29
2–9 MAF Merging Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–30
2–10 Memory Reference Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31
2–11 I/O Reference Ordering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–31
2–12 TB Fill Flow Example Sequence 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–34
2–13 TB Fill Flow Example Sequence 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–35
2–14 Floating-Point Control Register Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–36
2–15 21264/EV67 AMASK Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38
2–16 AMASK Bit Assignments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2–38
3–1 Signal Pin Types Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3–2 21264/EV67 Signal Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–3
3–3 21264/EV67 Signal Descriptions by Function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–6
3–4 Pin List Sorted by Signal Name. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–8
3–5 Pin List Sorted by PGA Location. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–12
3–6 Ground and Power (VSS and VDD) Pin List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3–16
4–1 Translation of Internal References to External Interface Reference . . . . . . . . . . . . . . . . . . . 4–5
4–2 21264/EV67-Supported Cache Block States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4–3 Cache Block State Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4–4 System Responses to 21264/EV67 Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–10
4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions. . . . . . . . . . . . 4–11
4–6 System Port Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–17
4–7 Programming Values for System Interface Clocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–18
4–8 Program Values for Data-Sample/Drive CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–18
4–9 Forwarded Clocks and Frame Clock Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–19
4–10 Bank Interleave on Cache Block Boundary Mode of Operation . . . . . . . . . . . . . . . . . . . . . . 4–19
4–11 Page Hit Mode of Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–20
4–12 21264/EV67-to-System Command Fields Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–20
4–13 Maximum Physical Address for Short Bus Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–21
4–14 21264/EV67-to-System Commands Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–21
4–15 Programming INVAL_TO_DIRTY_ENABLE[1:0]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–23
4–16 Programming SET_DIRTY_ENABLE[2:0]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–24
4–17 21264/EV67 ProbeResponse Command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–24
4–18 ProbeResponse Fields Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–25
4–19 System-to-21264/EV67 Probe Commands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–26
4–20 System-to-21264/EV67 Probe Commands Fields Descriptions . . . . . . . . . . . . . . . . . . . . . . 4–27
4–21 Data Movement Selection by Probe[4:3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–27
4–22 Next Cache Block State Selection by Probe[2:0] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–27
4–23 Data Transfer Command Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–28
4–24 SysDc[4:0] Field Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–29
4–25 SYSCLK Cycles Between SysAddOut and SysData. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–32
4–26 Cbox CSR SYSDC_DELAY[4:0] Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–33
4–27 Four Timing Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–34
4–28 Data Wrapping Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–36
4–29 System Wrap and Deliver Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–37
4–30 Wrap Interleave Order. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–37
4–31 Wrap Order for Double-Pumped Data Transfers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–38
4–32 21264/EV67 Commands with NXM Addresses and System Response . . . . . . . . . . . . . . . . 4–39
4–33 21264/EV67 Response to System Probe and In-Flight Command Interaction . . . . . . . . . . . 4–41
Alpha 21264/EV67 Hardware Reference Manual
xiii
4–34 Rules for System Control of Cache Status Update Order. . . . . . . . . . . . . . . . . . . . . . . . . . . 4–42
4–35 Range of Maximum Bcache Clock Ratios. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–43
4–36 Bcache Port Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–43
4–37 BC_CPU_CLK_DELAY[1:0] Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–45
4–38 BC_CLK_DELAY[1:0] Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–45
4–39 Program Values to Set the Cache Clock Period (Single-Data) . . . . . . . . . . . . . . . . . . . . . . . 4–46
4–40 Program Values to Set the Cache Clock Period (Dual-Data Rate) . . . . . . . . . . . . . . . . . . . . 4–46
4–41 Data-Sample/Drive Cbox CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–47
4–42 Programming the Bcache to Support Each Size of the Bcache . . . . . . . . . . . . . . . . . . . . . . 4–51
4–43 Programming the Bcache Control Pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52
4–44 Control Pin Assertion for RAM_TYPE A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52
4–45 Control Pin Assertion for RAM_TYPE B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–52
4–46 Control Pin Assertion for RAM_TYPE C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53
4–47 Control Pin Assertion for RAM_TYPE D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–53
5–1 Internal Processor Registers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–1
5–2 Cycle Counter Control Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5–3 Virtual Address Control Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5
5–4 ProfileMe PC Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5–5 IER_CM Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–10
5–6 Software Interrupt Request Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–11
5–7 Interrupt Summary Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–12
5–8 Hardware Interrupt Clear Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–13
5–9 Exception Summary Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–14
5–10 PAL Base Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–15
5–11 Ibox Control Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–16
5–12 Ibox Status Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–19
5–13 IPR Index Bits and Register Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–21
5–14 Process Context Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–22
5–15 Performance Counter Control Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . 5–24
5–16 Performance Counter Control Register Input Select Fields. . . . . . . . . . . . . . . . . . . . . . . . . . 5–25
5–17 DTB Alternate Processor Mode Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . 5–27
5–18 Memory Management Status Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . 5–28
5–19 Mbox Control Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–30
5–20 Dcache Control Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–31
5–21 Dcache Status Register Fields Description. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–32
5–22 Cbox Data Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5–23 Cbox Shift Register Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–33
5–24 Cbox WRITE_ONCE Chain Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–34
5–25 Cbox WRITE_MANY Chain Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–39
5–26 Cbox Read IPR Fields Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–41
6–1 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3
6–2 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–3
6–3 HW_LD Instruction Fields Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–4
6–4 HW_ST Instruction Fields Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–5
6–5 HW_RET Instruction Fields Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–6
6–6 HW_MFPR and HW_MTPR Instructions Fields Descriptions. . . . . . . . . . . . . . . . . . . . . . . . 6–7
6–7 Paired Instruction Fetch Orde r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–9
6–8 PALcode Exception Entry Locations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–13
6–9 IPRs Used for Performance Counter Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–18
6–10 Aggregate Mode Returned IPR Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–19
6–11 Aggregate Mode Performance Counter IPR Input Select Fields. . . . . . . . . . . . . . . . . . . . . . 6–20
6–12 CMOV Decomposed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–21
6–13 ProfileMe Mode Returned IPR Contents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–22
6–14 ProfileMe Mode PCTR_CTL Input Select Fields. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6–24
7–1 21264/EV67 Reset State Machine Major Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–1
7–2 Signal Pin Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–3
7–3 Pin Signal Names and Initialization State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–5
7–4 Power-Up Flow Signals and Their Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–7
7–5 Effect on IPRs After Fault Reset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–8
xiv
Alpha 21264/EV67 Hardware Reference Manual
7–6 Effect on IPRs After Transition Through Sleep Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–10
7–7 Signals and Constraints for the Sleep Mode Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11
7–8 Effect on IPRs After Warm Reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–11
7–9 WRITE_MANY Chain CSR Values for Bcache Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 7–12
7–10 Internal Processor Registers at Power-Up Reset State . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–14
7–11 21264/EV67 Reset State Machine State Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7–17
7–12 Differential Reference Clock Frequencies in Full-Speed Lock . . . . . . . . . . . . . . . . . . . . . . . 7–20
8–1 21264/EV67 Error Detection Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–1
8–2 64-Bit Data and Check Bit ECC Code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–2
8–3 Error Case Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8–9
9–1 Maximum Electrical Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–1
9–2 Signal Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–2
9–3 VDD (I_DC_POWER) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3
9–4 Input DC Reference Pin (I_DC_REF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3
9–5 Input Differential Amplifier Receiver (I_DA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3
9–6 Input Differential Amplifier Clock Receiver (I_DA_CLK) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–3
9–7 Pin Type: Open-Drain Output Driver (O_OD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–4
9–8 Bidirectional, Differential Amplifier Receiver, Open-Drain Output Driver (B_DA_OD) . . . . . 9–4
9–9 Pin Type: Open-Drain Driver for Te st Pins (O_OD_TP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–4
9–10 Bidirectional, Differential Amplifier Receiver, Push-Pull Output Driver (B_DA_PP) . . . . . . . 9–4
9–11 Push-Pull Output Driver (O_PP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5
9–12 Push-Pull Output Clock Driver (O_PP_CLK). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–5
9–13 AC Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9–7
10–1 Operating Temperature at Heat Sink Center (Tc) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–1
10–2 qca at Various Airflows for 21264/EV67 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10–2
10–3 Maximum Ta for 21264/EV67 @ 600 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2
10–4 Maximum Ta for 21264/EV67 @ 667 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2
10–5 Maximum Ta for 21264/EV67 @ 700 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2
10–6 Maximum Ta for 21264/EV67 @ 733 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–2
10–7 Maximum Ta for 21264/EV67 @ 750 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–3
10–8 Maximum Ta for 21264/EV67 @ 833 MHz and @ 2.0 V with Various Airflows . . . . . . . . . . 10–3
11–1 Dedicated Test Port Pins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–1
11–2 IEEE 1149.1 Instructions and Opcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–3
11–3 Icache Bit Fields in an SROM Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11–7
A–1 Instruction Format and Opcode Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–1
A–2 Architecture Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–2
A–3 Opcodes Reserved for Compaq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–8
A–4 Opcodes Reserved for PALcode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9
A–5 IEEE Floating-Point Instruction Function Codes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–9
A–6 VAX Floating-Point Instruction Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–11
A–7 Independent Floating-Point Instruction Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–12
A–8 Opcode Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–12
A–9 Key to Opcode Summary Used in Table A–8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–13
A–10 Required PALcode Function Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–13
A–11 Exceptional Input and Output Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A–15
E–1 Bcache Forwarding Clock Pi n Groupings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–1
E–2 Late-Write Non-Bursting SSRAMs Data Pin Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–2
E–3 Late-Write Non-Bursting SSRAMs Tag Pin Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–2
E–4 Dual-Data Rate SSRAM Data Pin Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–3
E–5 Dual-Data Rate SSRAM Tag Pin Usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E–4
Alpha 21264/EV67 Hardware Reference Manual
xv
Audience
Content

Preface

This manual is for system designers and programmers who use the Alpha 21264/EV67 microprocessor (referred to as the 21264/EV67).
This manual contains the following chapters and appendixes: Chapter 1, Introduction, introduces the 21264/EV67 and provides an overview of the
Alpha architecture. Chapter 2, Internal Architecture, describes the major hardware functions and the inter-
nal chip architect ure. It descri bes performanc e measurement faci lities, co ding rules, an d design examples.
Chapter 3, Hardware Interface, lists and describes the internal hardware interface sig­nals, and provides mechanical data and packaging information, including signal pin lists.
Chapter 4, Cache and External Interfaces, describes the external bus functions and transactions, lists bus commands, and describes the clock functions.
Chapter 5, Internal Processor Registers, lists and describes the internal processor regis­ter set.
Chapter 6, Privileged Architecture Library Code, describes the privileged architecture library code (PALcode).
Chapter 7, Initialization and Configuration, describes the initialization and configura­tion sequence.
Chapter 8, Error Detection and Error Handling, describes error detection and error han­dling.
Chapter 9, Electrical Data, pr ovi des elec tr ical data and describes signal integrity issues. Chapter 10, Thermal Management, provides information about thermal management. Chapter 11, Testability and Diagnostics, describes chip and system testability features. Appendix A, Alpha Instruction Set, summarizes the Alpha instruction set. Appendix B, 21264/EV67 Boundary-Scan Register, presents the BSDL description of
the 21264/EV67 boundary-scan register.
Alpha 21264/EV67 Hardware Reference Manual
xvii
Appendix C, Serial Icache Load Predecode Values, provides a pointer to the Alpha
Motherboards Software Developer’s Kit (SDK), which contains this information. Appendix D, PALcode Restrictions and Guidelines, lists restrictions and guidelines
that must be adhered to when generating PALcode. Appendix E, 21264/EV67-to-Bcache Pin Interconnections, provides the pin interface
between the 21264/EV67 and Bcache SSRAMs. The Glossary lists and defines terms associated with the 21264/EV67. An Index is provided at the end of the document.
Documentation Included by Reference
The companio n volume to this manual, the Alpha Architecture Handbook, Version 4, con- tains the instruction set architecture. You can access this document from the following website: ftp.digital.com/pub/Digital/info/semiconductor/lit-
erature/dsc-library.html
Also available is the Alpha Architecture Reference Manual, Third Edition, which con- tains the complete architecture information. That manual is available at bookstores from the Digital Press as EQ-W938E-DP.
xviii
Alpha 21264/EV67 Hardware Reference Manual
Terminology and Conventions
This section defines the abbreviations, terminology, and other conventions used throughout this document.
Abbreviations
Binary Multiples
The abbreviations K, M, and G (kilo, mega, and giga) represent binary multiples and have the following values.
K M G
10
=2
20
=2
30
=2
(1024) (1,048,576) (1,073,741,824)
For example:
2KB = 2 kilobytes 4MB = 4 megabytes 8GB = 8 gigabytes 2K pixels = 2 kilopixels 4M pixels = 4 megapixels
Register Access
=2 × 2 =4 × 2 =8 × 2 =2 × 2 =4 × 2
10
bytes
20
bytes
30
bytes
10
pixels
20
pixels
The abbreviations used to indica te the t ype of acc ess to re giste r fields and bits ha ve the following definitions:
Abbreviation Meaning
IGN Ignore
Bits and fields specified are ignored on writes.
MBZ Must Be Zero
Software must never place a nonzero value in bits and fields specified as MBZ. A nonzero read produces an Illegal Operand exception. Also, MBZ fields are reserved for future use.
RAZ Read As Zero
Bits and fields return a zero when read.
RC Read Clears
Bits and fields are cleared when read. Unless otherwise specified, such bits cannot be written.
RES Reserved
Bits and fields are reserved by Compaq and should not be used; however, zeros can be written to reserved fields that cannot be masked.
RO Read Only
The value may be read by software. It is written by hardware. Software write operations are ignored.
RO,n Read Only, and takes the value n at power-on reset.
The value may be read by software. It is written by hardware. Software write operations are ignored.
Alpha 21264/EV67 Hardware Reference Manual
xix
Abbreviation Meaning
RW Read/Write
Bits and fields can be read and written.
RW,n Read/Write, and takes the value n at power-on reset.
Bits and fields can be read and written.
W1C Write One to Clear
If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 1 cause the bit to be cleared by hardware. Software write operations of a 0 do not modify the state of the bit.
W1S Write One to Set
If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 1 cause the bit to be set by hardware. Softwa re write operations of a 0 do not modi fy the state of the bit.
WO Write Only
Bits and fields can be written but not read.
WO,n Write Only, and takes the value n at power-on reset.
Bits and fields can be written but not read.
Sign extension
SEXT(x) means x is sign-extended to the required size.
Addresses
Unless otherwise noted, all addresses and offsets are hexa decimal.
Aligned and Unaligned
The terms aligned and naturally aligned are interchangeable and refer to data objects that are powers of two in size. An aligned datum of size 2n is stored in memory at a byte address that is a multiple of 2n; that is , one that has n low-order zeros. For ex­ample, an aligned 64-byte st ack frame has a memory address that is a multiple of 64.
A datum of size 2n is unaligned if it is stored in a byte address that is not a multiple of 2n.
Bit Notation
Multiple-bit fields can include contiguous and noncontiguous bits contained in square brackets ([]). Multiple contiguous bit s are indicated by a pair of numbers separ ated by a colon [:]. For example , [ 9:7,5,2: 0] s pecif ies b its 9,8,7, 5,2,1, a nd 0. Similar ly, single bits are frequently indicated with square brackets. For example, [27] specifies bit 27. See also Field Notation.
Caution
Cautions indicate potential damage to equipment or loss of data.
xx
Alpha 21264/EV67 Hardware Reference Manual
Data Units
The following data unit terminology is used throughout this manual.
Term Words Bytes Bits Other
Byte ½1 8— Word1216— Longword 2 4 32 Dword Quadword 4 8 64 2 longword
Do Not Care (X)
A capital X represents any valid value.
External
Unless otherwise stated, external means not contained in the chip.
Field Notation
The names of single-bit and multiple-bit fields can be used rather than the actual bit numbers (see Bit Notation). When the field name is used, it is contained in square brackets ([]). For example, RegisterName[LowByte] specifies RegisterName[7:0].
Note
Notes emphasize particularly important information.
Numbering
All numbers are deci mal or hexadecimal unless otherwise indicat ed. The prefix 0x indi­cates a hexadecimal numbe r. For example, 19 is decimal, but 0x19 and 0x19A a re hexa ­decimal (also see Addresses). Otherwise, the base is indicated by a subscript; for example, 100
Ranges and Extents
is a binary number.
2
Ranges are specified by a pair of numbers separated by two periods (..) and are inclu­sive. For example, a range of integers 0..4 includes the integers 0, 1, 2, 3, and 4.
Extents are specified by a pair of numbers in square brackets ([]) separated by a colon (:) and are inclusive. Bit fields are often specified as extents. For example, bits [7:3] specifies bits 7, 6, 5, 4, and 3.
Register Figures
The gray areas in register figures indicate reserved or unused bits and fields. Bit ranges that are coupled with the field n ame specify the bits of the name d field that
are included in the register. The bit range may, but need not necessarily, correspond to the bit Extent in the register . Se e the explan ation above Table 5–1 for more information.
Signal Names
The following examples describe signal-name conventions used in this document.
Alpha 21264/EV67 Hardware Reference Manual
xxi
AlphaSignal[n:n] Boldface, mixed-case type denotes signal names that are
assigned internal and external to the 21264/EV67 (that is, the signal traverses a chip interface pin).
AlphaSignal_x[n:n] When a signal has high and low assertion states, a lower-
case italic x represents the assertion states. For example,
SignalName_x[3:0] represents SignalName_H[3:0] and SignalName_L[3:0].
UNDEFINED
Operations specified as UNDEFINED may vary f rom moment to moment , implemen ta­tion to implementation, and instruction to instruction within implementations. The operation may vary in effect from nothing to stopping system operation.
UNDEFINED operations may halt the processor or cause it to lose information. How­ever, UNDEFINED operations must not cause the processor to hang, that is, reach an unhalted state from which there is no transition to a normal state in which the machine executes instructions.
UNPREDICTABLE
UNPREDICTABLE results or occurrences do not disrupt the ba sic ope ratio n of the pro ­cessor; it continues to execute instructions in its normal manner. Further:
Results or occurrences specified as UNPREDICTABLE may vary from moment to
moment, implementation to imp lementation, and instruction to instruction within implementations. Software can never depend on results specified as UNPREDICT­ABLE.
An UNPREDICTABLE result may acquire an arbitrary value subject to a few con-
straints. Such a result may be an arbitrary function of the input operands or of any state information that is accessible to the process in its current access mode. UNPREDICTABLE results may be unchanged from their previous values.
Operations that produce UNPREDICTABLE results may also produce exceptions.
An occurrence specified as UNPREDICTABLE may happen or not based on an
arbitrary choice function. The choice function is subject to the same constraints as are UNPREDICTABLE results and, in particular, must not constitute a security hole.
Specifically, UNPREDICT ABLE resul ts must not de pend upon, or be a functio n of, the contents of memory locations or registers that are inaccessible to the current process in the current ac cess mode.
Also, operations that may pr oduce UNPREDICTABLE results must not:
Write or modify the contents of memory locations or registers to which the cur-
rent process in the current access mode does not have access, or – Halt or hang the system or any of its components . For example, a security hole would exist if some UNPREDICTABLE result
depended on the val ue o f a re gister in another process, on the contents of processor temporary registers left behind by some previously running process, or on a sequence of actions of different processes.
xxii
Alpha 21264/EV67 Hardware Reference Manual
X
Do not care. A capital X represents any valid va lue.
Alpha 21264/EV67 Hardware Reference Manual
xxiii
This chapter provides a brief introduction to the Alpha architecture, Compaq’s RISC (reduced instruction set computing) architecture designed for high performance. The chapter then summarizes the specific features of the Alpha 21264/EV67 microproces­sor (hereafter called the 21264/EV67) that implements the Alpha ar chitecture. Appen­dix A provides a list of Alpha instructions.
The companio n volume to this manual, the Alpha Architecture Handbook, Version 4, contains the i nstruction set architecture. Als o available is the Alpha Architecture Refer- ence Manual, Third Edition, which contains the complete architecture information.

1.1 The Architecture

The Alpha architecture is a 64-bit load and store RISC architecture designed with par­ticular emphasis o n s peed , mul ti ple instruction issue, multiple proces sor s, and software migration from many operating systems.
All registers are 64 bits long and all operations are performed between 64-bit registers. All instructions ar e 32 bits lo ng. Memory operat ions are e ither loa d or store operation s. All data manipulation is done between registers.
1

Introduction

The Alpha architecture supports the following data types:
8-, 16-, 32-, and 64-bit integers
IEEE 32-bit and 64-bit floating-point formats
VAX architecture 32-bit and 64-bit floating-point formats
In the Alpha architecture, instructions interact with each other only by one instruction writing to a register or memory loc ation a nd anothe r inst ructi on read ing fro m that reg is­ter or memory location. This use of resources makes it easy to build implementations that issue multiple instructions every CPU cycle.
The 21264/EV67 uses a set of subroutines, called privileged architecture library code (PAL code), that is specific to a particular Alpha operating sys tem implementation and hardware platform. These subroutines provide operating system primitives for context switching, interrupts, exceptions, and memory management. These subroutines can be invoked by hardware or CALL_PAL instructions. CALL_PAL instructions use the function field of the instruction to vector to a specified subroutine. PALcode is written in standard machine code with some implementation-specific extensions to provide
Alpha 21264/EV67 Hardware Reference Manual
Introduction 1–1
The Architecture
direct access to low- level hardwar e funct ions. PALcode suppor ts opti mizat ions fo r mul­tiple operating systems, flexible memor y-management implementat ions, and multi­instruction atomic sequ ences.
The Alpha architecture performs byte shifting and masking with normal 64-bit, regis­ter-to-regi ster instruct ions. The 21264/EV67 pe rforms single-byt e and single-wo rd load and store instructions.

1.1.1 Addressing

The basic addressable unit in the Alpha architecture is the 8-bit byte. The 21264/EV67 supports a 48-bit or 43-bit virtual address (selectable under IPR control).
V irtua l addr esses as see n by the progra m ar e tran slat ed int o physic al memory addres ses by the memory-management mechanism. The 21264/EV67 supports a 44-bit physical address.

1.1.2 Integer Data Types

Alpha architecture supports the four integer data types listed in Table 1–1.
Table 1–1 Integer Data Types
Data Type Description
Byte A byte is 8 contiguous bits that start at an addressable byte boundary.
A byte is an 8-bit value.
Word A word is 2 contiguous bytes that start at an arbitrary byte boundary.
A word is a 16-bit value.
Longword A longword i s 4 conti guo us byte s that s tar t at an arbit rary byte boundary. A
longword is a 32-bit value.
Quadword A quadword is 8 contiguous bytes that start at an arbitrary byte boundary.
Note: Alpha implementations may impose a significant performance penalty
when accessing operands that are not naturally aligned. Refer to the Alpha Architecture Handbook, Version 4

1.1.3 Floating-Point Data Types

The 21264/EV67 supports the following floating-point data types:
Longword integer format in floating-point unit
Quadword integer format in floating-point unit
IEEE floating-point formats
for details.
VAX floating-point formats
1–2 Introduction
S_floating – T_floating
F_floating –G_floating – D_floating (limited support)
Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Microprocessor Features

1.2 21264/EV 67 Microprocessor Features
The 21264/EV67 microproces sor is a sup er sca la r pipelined processor. It is packaged in a 587-pin PGA carrier and has removable application-specific heat sinks. A number of configuration optio ns allow it s use in a ra nge of syst em designs r anging fro m extremely simple uniprocessor systems with minimum component count to high-performance multiprocessor systems with very high cache and memory bandwidth.
The 21264/EV67 can issue four Alpha instructions in a single cycle, thereby minimiz­ing the average cycles per instruction (CPI). A number of low-late ncy and/or high­throughput featu res in the i nstru ction issue unit and the onchip compo nents o f the mem­ory subsystem further reduce the average CPI.
The 21264/EV67 and associated PALcode implements IEEE single-precision and dou­ble-precision, VAX F_floating and G_floating data types, and supports longword (32-bit) and quadword (64-bit) integers. Byte (8-bit) and word (16-bit) support is pro­vided by byte-manipulation instructions. Limited hardware support is provided for the VAX D_floating data type.
Other 21264/EV67 features include:
The ability to issue up to four instructions during each CPU clock cycle.
A peak instruction execution rate of four times the CPU clock frequency.
An onchip, demand-paged memory-management unit with translation buffer, which,
when used with PALcode, can implement a variety of page tabl e s tructures and trans­lation algorithms. The uni t consists of a 128-entry , fully-associative data translation buffer (DTB) and a 128- entry, fully-associative inst ruction translat ion buf fer (ITB), with each entry able to map a single 8KB page or a group of 8, 64, or 512 8KB pages. The allocati on scheme f or t he ITB a nd DTB is r ound-r obin. Th e siz e of e ach
translation buffer entry’s group is specified by hint bits stored in the entry. The DTB and ITB implement 8-bit address space numbers (ASN), MAX_ASN=255.
Two onchip, high-throughput pipelined floating-point units, capable of executing
both VAX and IEEE floating-point data types.
An onchip, 64KB virtually-addressed instruction cache with 8-bit ASNs
(MAX_ASN=255).
An onchip, virtually-indexed, physically-tagged dual-read-ported, 64KB data
cache.
Supports a 48-bit or 43-bit virtual address (program selectable).
Supports a 44-bit physical address.
An onchip I/O write buff er with four 64-byte entries for I/O write transactions.
An onchip, 8-entry victim data buffer.
An onchip, 32-entry load queue.
An onchip, 32-entry store queue.
An onchip, 8-entry miss address file for cache fill requests and I/O read
transactions.
An onchip, 8-entry probe queue, holding pending system port probe commands.
Alpha 21264/EV67 Hardware Reference Manual
Introduction 1–3
21264/EV67 Microprocessor Features
An onchip, duplicate tag array used to maintain level 2 cache coherency.
A 64-bit data bus with onchip parity and error correction code (ECC) support.
Support for an external second-level (Bcache) cache. The size and some timing
parameters of the Bcache are programmable.
An internal clock generator providing a high-speed clock used by the 21264/EV67,
and two clocks for use by the CPU module.
Onchip performance counters to measure and analyze CPU and system perfor-
mance.
Chip and module level test support, including an instruction cache test interface to
support chip and module level testing.
A 2.0-V exter nal interface.
Refer to Chapter 9 for 21264/EV67 dc and ac electrical characteristics. Refer to the
Alpha Archit ecture Handbook, Version 4
implementation-dependent information.
, Appendix E, for waivers and any other
1–4 Introduction
Alpha 21264/EV67 Hardware Reference Manual
2

Internal Architecture

This chapter provides both an o verview of the 21264/EV67 microarchitecture and a sys-
tem designer’s view of t he 2 1264/ EV67 imple me ntat io n of t he Alp ha ar chitecture. The combination of the 2126 4/EV67 mic roar chi tecture and privileged architecture library code (PALcode) defines the chip’s implementation of the Alpha architecture. If a ce rt ain piece of hardware seems to be “ar chitecturally incomplete,” the missing functionality is implemented in PALcode. Chapter 6 provides more infor mati on on PALcode.
This chapter describes the major functional hardware units and is not intended to be a detailed hardware description of the chip. It is organized as follows:
21264/EV67 microarchitecture
Pipeline organization
Instruction issue and retire rules
Load instructions to R31/F31 (software-directed instruction prefetch)
Special cases of Alpha instruction execution
Memory and I/O address space
Miss address file (MAF) and load-merging rules
Instruction orderi ng
Replay traps
I/O write buffer and the WMB inst ruction
Performance measurement support
Floating-point control register
AMASK and IMPLVER instruction values
Design examples

2.1 21264/EV67 Microarchitecture

The 21264/EV67 microprocessor is a high-performance third-generation implementa­tion of the Compaq Alpha archit ec tur e. The 21264 /EV67 cons ists of the following sec-
tions, as shown in Figure 2–1:
Instruction fetch, issue, and retire unit (Ibox)
Integer execution unit (Ebox)
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–1
21264/EV67 Microarchitecture
Floating-point execution unit (Fbox)
Onchip caches (Icache and Dcache)
Memory reference unit (Mbox)
External cache and syst em interface unit (Cbox)
Pipeline operation sequence

2.1.1 Instruction Fetch, Issue, and Retire Unit

The instruction fetch, issue, and retire unit (Ibox) consists of the following subsections:
Vi rtual program counter logic
Branch predictor
Instruction-stream translation buffer (ITB)
Instruction fetch logic
Register rename maps
Integer and floating-point issue queues
Exception and interrupt logic
Retire logic
2.1.1.1 Virtual Program Counter Logic
The virtual program counter (VPC) logic maintains the virtual addr esses for instruc­tions that are in flight . There can be up to 80 instr uctions, in 20 succ essive f etch slo ts, in flight between the register rename mappers and the end of the pipeline. The VPC logic contains a 20-entry table to store these fetched VPC addresses.
2–2 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Figure 2–1 21264/EV67 Block Diagram
FP
MUL
g
Store
Victim
IOWB
Duplicate
Probe
Cache
Cache
System
System
Address
128
Cbox
128
056
21264/EV67 Microarchitecture
Instruction Cache
Ibox
Fetch Unit
VPC
Queue
Branch
Predictor
Ebox
Address
ALU 0
(L0)
Integer Registers 0
(80 Registers)
Virtual Address
Next Address
Integer Issue Queue
(20 Entries)
INT
UNIT
0
(U0)
INT
UNIT
1
(U1)
Integer Registers 1
(80 Registers)
ITB
Address
ALU 1
(L1)
Retire
Unit
Four Instructions
Predecode
Decode and
Rename Registers
FP Issue Queue
(15 Entries)
Fbox
FP
ADD
DIV
SQRT
FP Registers
(72 Re
isters)
Queue
Tag Store
Buffer
Arbiter
Physical Address
Data
128
Index
20
Bus
64
15
Mbox
DTB
(Dual-ported, 128-entry)
Physical Address
Dual-Ported Data Cache
2.1.1.2 Branch Predictor
The branch predictor is composed of three units: the local, global, and choice predic-
tors. Figure 2–2 shows how the branch predictor generates the predicted branch address.
Load
Queue
Queue
Data
Miss Address
File
Data
FM-
42-AI4
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–3
21264/EV67 Microarchitecture
Figure 2–2 Branch Predictor
Local
Predictor
Global
Predictor
Predicted
Branch
Address
Choice
Predictor
FM-05810.AI4
Local Predictor
The local predictor uses a 2-level table that holds the history of individual branches. The 2-level table desi gn approaches the prediction accuracy of a larger single-level
table while requiring fewer total bits of storage. Figure 2–3 shows how the local pre­dictor generates a prediction. Bits [11:2] of the VPC of the current branch are used as the index to a 1K entry table in which each entry is a 10-bit value. This 10-bit value is used as the index to a 1K entry table of 3-bit saturating counters. The value of the satu­rating counter determines the predication, taken/not-taken, of the current branch.
Figure 2–3 Local Predictor
VPC[11:2]
Local
History
Table
1K x 10
10
10
Index
Local Branch Prediction
Local
Predictor
1K x 3
3
1
+/-
3
FM-05811.AI4
Global Predictor
The global predictor is indexed by a global history of all recent branches. The global predictor correlates the local history of the current branch with all recent branches. Fig-
ure 2–4 shows how the global predictor generates a prediction. The global path history is comprised of the taken/not-taken state of the 12 most-recent branches. These 12 states are used to form an index into a 4K entry table of 2-bit saturating counters. The value of the saturating counter determines the predication, taken/not-taken, of the cur­rent branch.
2–4 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
21264/EV67 Microarchitecture
Figure 2–4 Global Predictor
Global
Path
History
12
Index
Global Branch Prediction
Choice Predictor
The choice predictor moni tors the history of the local and global predictor s and choose s
the best of the two predictors for a particular branch. Figure 2–5 shows how the choice predictor generates its choice of the result of the local or global prediction. The 12-bit global path history (see Figure 2–4) is used to index a 4K entry t abl e of 2- bit sa turating counters. The value of the sa turatin g counter det ermines th e choice bet ween the output s of the local and global predictors.
Global
Predictor
4K x 2
2
1
+/-
2
FM-05812.AI4
Figure 2–5 Choice Predictor
Global
Path
History
12
Choice
Predictor
4K x 2
2.1.1.3 Instruction-Stream Translation Buffer
The Ibox includes a 128-entry, fully-associative instruction-stream translation buffer (ITB) that is used to store recently used instruction-stream (Istream) address transla­tions and page protection information. Each of the entries in the ITB can map 1, 8, 64, or 512 contiguous 8KB pages. The allocation scheme is round-robin.
The ITB supports an 8-bit ASN and contains an ASM bit. The Icache is virtually addressed and contains the access-check information, so the ITB is accessed only for Istream references that miss in the Icache.
Istream transactions to I/ O address space are UNDEFINED.
2
Choice Prediction
12
FM-05813.AI4
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–5
21264/EV67 Microarchitecture
2.1.1.4 Instruction Fetch Logic
The instruction prefetcher (predecode) reads an octaword, containing up to four natu­rally aligned instructions per cycle, from the Icache. Branch prediction and line predic­tion bits accompany the four instructions. The branch prediction scheme operates most efficiently when only one branch instruction is contained among the four fetched instructions. The line prediction scheme attempts to predict the Icache line that the branch predictor will generate, and is described in Section 2.2.
An entry from the subroutine return prediction stack, toge ther with set prediction bits for use by the Ica che s tream con troll er, are fetched along with the octawo rd. The I cache stream controller generates fetch requests for additional Icache lines and stores the Istream data in the Icache. Th ere is no separate buffer to hold Istream requests.
2.1.1.5 Register Rename Maps
The instruction prefetcher forwards instructions to the integer and floating-point regis­ter rename maps. The rename maps perform the two functions listed here:
Eliminate register write-after-read (WAR) and write-after-write (WAW) data
dependencies while preserving true read-after-write (RAW) data dependencies, in order to allow instructions to be dynamically rescheduled.
Provide a means of speculatively executing instruction s before the con trol flow
previous to those inst ructions is resolved. Both exceptions and branch mispredictions represent deviations from the control flow predicted by the instruction prefetcher.
The map logic translates each instruction’s operand register specifiers from the virtual register numbers in the instruction to the physical register numbers that hold the corre­sponding architecturally-correct values. The map logic also renames each instruction’s destination register specifier from the virtual number in the instruction to a physical register number chosen from a list of free physical registers, and updates the register maps.
The map logic can process four instructions per cycle. It does not return the physical register, which holds the old value of an instruction’s virtual destination register, to the free list until the instru ction has bee n retired, in dicating that the control flow up to that instruction has been resolved.
If a branch mispredict or exception occurs, the map logic backs up the contents of the integer and floating-po int register rename maps to the state associated with the instruc­tion that triggere d the condition, and the prefetcher restarts at the appropriate VPC. At most, 20 valid fetch slots containing up to 80 instructions can be in flight between the register maps and the end of the machine’s pipeline, where the control flow is finally resolved. The map logic is capable of backing up the contents of the maps to the state associated with any of these 80 instructions in a single cycle.
The register rename logic places instructions into an integer or floating-point issue queue, from which they are later issued to functional units for execution.
2.1.1.6 Integer Issue Queue
The 20-entry integer issue queue (IQ), associated with the integer execution units (Ebox), issues the following types of instructions at a maximum rate of four per cycle:
2–6 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
21264/EV67 Microarchitecture
Integer operate
Integer conditional branch
Unconditional branch – both displacement and memory format
Integer and floating-point load and store
PAL-reserved instructions: HW_MTPR, HW_MFPR, HW_LD, HW_ST,
HW_RET
Integer-to-floa ting-point (ITOFx) and floating-point-to-integer (FTOIx)
Each queue entry asserts f our requ est si gnals—one f or ea ch of the Ebox subcl uster s. A
queue entry asserts a re quest wh en it contai ns an instr ucti on that can b e execu te d by the subcluster, if the instruction’s operand register values are available within the subclus­ter.
There are two arbiters—one f or the upper s ubcluster s and one for t he lower subcl usters. (Subclusters are described in Section 2.1.2.) Each arbiter picks two of the possible 20 requesters for servi ce each cycl e. A given instru ction only re quests upper subclust ers or lower subclusters, but because many instructions can only be executed in one type or another this is not too limiting.
For example, load and store instructions can only go to lower subclusters and shift instructions can only go to upper subclusters. Other instructions, such as addition and logic operations, can execute in either upper or lower subclusters and are statically assigned before being placed in the IQ.
The IQ arbiters choose between simultaneous requesters of a subcluster based on the age of the request—older requests are given priority over newer requests. If a given instruction requests both lower subclusters, and no older instruction requests a lower subcluster, then the arbiter assigns subcluster L0 to the instruction. If a given instruction requests both upper subclusters, and no older instruction requests an upper subcluster, then the arbiter assigns subcluster U1 to the instruction. This asymmetry between the upper and lower subcluster arbiters is a circuit implementation optimization with ne gli­gible overall performance effect.
2.1.1.7 Floating-Point Issue Queue
The 15-entry floating-point issue queue (FQ) associated with the Fbox issues the fol­lowing instruction types:
Floating-point operates
Floating-point conditional branches
Floating-point stores
Floating-point register to integer register transfers (FTOIx)
Each queue entry has thr ee req uest l ines— one for the ad d pipel ine, on e for t he multi ply pipeline, and one for the two store pipelines. There are three ar biters—one for each of the add, multiply, and store pipelines. The add and multiply arbiters pick one requester per cycle, while the store pipeline arbiter picks two requesters per cycle, one for each store pipeline.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–7
21264/EV67 Microarchitecture
The FQ arbiters pick between simul taneou s reques ters of a pipeline bas ed on the age of
the request—older requests are given priority over newer requests. Fl oat i ng-point store instructions and FTOIx instructions in even-numbered queue entries arbitrate for one store port. Floating-point store instructions and FTOIx instructions in odd-numbered queue entries arbitrate for the second store port.
Floating-point store instructions and FTOIx instructions are queued in both the integer and floating-point queue s. They wait i n the float ing-poi nt queue unt il thei r opera nd reg­ister values are available. They subsequently request service from the store arbiter. Upon being issued fr om the float ing-point queue, the y signal t he corre sponding en try in the integer queue to request service. Upon being issued from the integer queue, the operation is completed.
2.1.1.8 Exception and Interrupt Logic
There are two types of exceptions: faults and synchronous traps. Ar it hmet ic exceptions are precise and are reported as synchronous traps.
The four sources of interrupts are listed as follows:
Level-sensitive hardware interrupts sourced by the IRQ_H[5:0] pins
Edge-sensitive hardware interrupts generated by the serial line receive pin,
Software interrupts sourced by the software interrupt request (SIRR) register
Asynchronous system traps (ASTs)
Interrupt sources can be individually masked. In addition, AST inte rrupts are qualified by the current processor mode.
2.1.1.9 Retire Logic
The Ibox fetches instructions in program order, executes them out of order, and then retires them in order. The Ibox retire logic maintains the architectural state of the machine by retiring an instruction only if all previous instructions have executed with­out generating excepti ons or branch mispr edictions. Retir ing an instruc tion commits the machine to any changes the instruction may have made to the software-visible state. The three software-visible states are listed as follows:
Integer and floating-point registers
Memory
Internal processor registers (including control/status registers and translation
The retire logic can sustain a maximum retire rate of eight instructions per cycle, and can retire up to as many as 11 instructions in a single cycle.
performance counter overflows, and hardware corrected read errors
buffers)

2.1.2 Integer Execution Unit

The integer execut ion u nit ( Ebox ) is a 4- path integ er ex ecu tion unit that is implement ed
as two functional-uni t “cl uster s” la beled 0 and 1. Ea ch clus ter c ontain s a copy of an 80­entry, physical-register file and two “subcluste rs ”, named upper (U) and lower (L). Fig­ure 2–6 shows the integer execution unit. In the figure, iop_wr is the cross-cluster bus for moving integer result values between clusters.
2–8 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
21264/EV67 Microarchitecture
Figure 2–6 Integer Execution Unit—Clusters 0 and 1
iop_wr iop_wr
U0
Register
L0
iop_wr iop_wr
Load/Store Data Load/Store Data
eff_VA eff_VA
U1
Register
L1
FM-05643.AI4
Most instructions have 1- cycle late ncy for consumer s tha t execu te wit hin th e sa me clus ­ter . Al so, t here is an oth er 1- cycle de la y ass ociat ed wit h prod ucing a val ue in on e clu ster and consuming the value i n th e other cluster. The instruction issu e queue mi nimizes the performance effect of this cross-cluster delay. The Ebox contains the following resources:
Four 64-bit adders that are used to calculate results for integer add instructions
(located in U0, U1, L0, and L1)
The adders in the lower subclusters that are used to generate the effective virtual
address for load and st ore instructions (located in L0 and L1)
Four logic units
Two barrel shifters and associated byte logic (located in U0 and U1)
Two sets of conditional branch logic (located in U0 and U1)
Two copies of an 80-entry register file
One pipelined multiplier (locate d in U1) with 7-cycle lat ency for all integer m ultiply
operations
One fully-pipelined uni t (l oca te d in U0), wit h 3-c y cl e la te ncy, that executes the fol-
lowing instructions:
CTLZ, CTPOP , CTTZ – PERR, MINxxx, MAXxxx, UNPKxx, PKxx
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–9
21264/EV67 Microarchitecture
LK98-0004A
FP Mul
Reg
FP Add
FP Div
SQRT
Floating-P oi n t
Execution Units
The Ebox has 80 register-file entries that contain storage f or t he values of the 31 Alpha integer registers (the value of R31 is not stored), the values of 8 PALshadow registers, and 41 results written by instructions that have not yet been retired.
Ignoring cross-cluster delay, the two copies of the Ebox register file contain identical values. Each copy of the Ebox register file contains four read ports and six write ports. The four read ports are used to source operands to each of the two subclusters within a cluster. The six write ports are used as follows:
Two write ports are used to write results generated within t he cluster.
Two write ports are used to write results generated by the other cluster.
Two write ports are used to write results from load instructions. These two ports
are also used for FTO Ix instructions.

2.1.3 Floating-Point Execution Unit

The floating-point execution unit (Fbox) has two paths. The Fbox executes both VAX and IEEE floating-point instructions. It support IEEE S_floating-point and T_floating­point data types and all rounding modes. It also supports VAX F_floating-point and G_floating-point data types, and provides limited support for D_floating-point format.
The basic structure of the floating-point execution unit is shown in Figure 2–7.
Figure 2–7 Floating-Point Execution Units
The Fbox contains the following resources:
72-entry physical re gister file
Fully-pipelined multiplier with 4-cycle latency
Fully-pipelined adder with 4-cycle latency
Nonpipelined divide unit associated with the adder pipeline
Nonpipelined square root unit associated with the adder pipeline
The 72 Fbox register file entries contain storage for the values of the 31 Alpha floating­point registers (F31 is not stored) and 41 values written by instructions that have not been retired .
2–10 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
The Fbox register file contains six reads ports and four write ports. Four read ports are used to source operands to the add and multiply pipelines, and two read ports are used to source data for store instructions. Two write ports are used to write results generated by the add and multiply pipelines, and two write ports are used to write results from floating-point load instructions.

2.1.4 External Cache and System Interface Unit

The interface for t he system and external cache (Cbox) controls the Bcac he a nd system ports. It contains the following structures:
Victim address file (VAF)
Victim data file (VDF)
I/O write buffer (IOWB)
Probe queue (PQ)
Duplicate Dcache tag (DTAG)
2.1.4.1 Victim Address File and Victim Data File
21264/EV67 Microarchitecture
The victim address file (VAF) and victim data file (VDF) together form an 8-entry vic­tim buffer used for holding:
Dcache blocks to be written to the Bcache
Istream cache blocks from memory to be written to the Bcache
Bcache blocks to be written to memory
Cache blocks sent to the system in response to probe commands
2.1.4.2 I/O Write Buffer
The I/O write buffer (IOWB) consists of four 64-byte entries and associated address and control logic used for buffering I/O write data between the store queue and the sys­tem port.
2.1.4.3 Probe Queue
The probe queue (PQ) is an 8-entry queue that holds pending system port cache probe commands and addresses.
2.1.4.4 Duplicate Dcache Tag Array
The duplicate Dcache tag (DTAG) array holds a duplicat e copy of the Dca che tags and is used by the Cbox when processing Dcache fills, Icache fills, and system port probes.

2.1.5 Onchip Caches

The 21264/EV67 contains two onchip primary-level caches.
2.1.5.1 Instruction Cache
The instruction cache (Icache) is a 64KB virtual-addressed, 2-way set-predict cache. Set prediction is us ed t o approximate the performance of a 2-set cache without slowing the cache access time. Each Icache block contains:
16 Alpha instructions (64 bytes)
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–11
21264/EV67 Microarchitecture
Vi rtual tag bits [47:15]
8-bit address space number (ASN) field
1-bit address space match (ASM) bit
1-bit PALcode bit to indicate physical addressing
Valid bit
Data and tag parity bits
Four access-check bits for the following modes: kernel, executive, supervisor, and
user (KESU)
Additional predecoded information to assist with instruction processing and fetch
control
2.1.5.2 Data Cache
The data cache (Dcache) is a 64KB, 2-way set- associativ e, virtually index ed, physically tagged, write-back, read/write allocate cache with 64-byte blocks. During each cycle the Dcache can perform one of the following transactions:
Two quadword (or shorter) read transactions to arbitrary addresses
Two quadword write transactions to the same aligned octaword
Two non-overlapping less-than-quadword writes to the same aligned quadword
One sequential read and write transaction from and to the same aligned octaword
Each Dcache block contains:
64 data bytes and associated quadword ECC bits
Physical tag bits
Valid, dirty, shared, and modified bits
Tag parity bit calculated across the tag, dirty, shared, and modified bits
One bit to control round-robin set allocation (one bit per two cache blocks)
The Dcache contains two sets, each with 512 rows containing 64-byte blocks per row (that is, 32K bytes of data per set). The 21264/EV67 requires t wo additional bits of vir­tual address beyond the bi ts tha t speci fy an 8KB pag e, in orde r to spe cify a Dc ache row index. A given virtual address might be found in four unique locations in the Dcache, depending on the virtual-to-physical translation for those two bits. The 21264/EV67 prevents this aliasing by keeping only one of the four possible translated addresses in the cache at any time.

2.1.6 Memory Referenc e Unit

The memory reference unit (Mbox) controls the Dcache and ensures architecturally correct behavior for load and store instructions. Th e Mbox contains the following struc­tures:
Load queue (LQ)
Store queue (SQ)
2–12 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Miss address file (MAF)
Dstream translation bu ffer (DTB)
2.1.6.1 Load Queue
The load queue (LQ) is a reorder buffer for load instructions. It contains 32 entries and maintains the stat e a ssociated with load instructions tha t have be en issued to the Mbox, but for which results have not been delivered to the processor and the instructions retired. The Mbox assigns load instructions to LQ slots based on the order in which they were fetched f rom the Icache, then place s them into the LQ after they are issued by the IQ. The LQ helps ensure corr ect Alpha memory reference behavior.
2.1.6.2 Store Queue
The store queue (SQ) is a reorder buffer and graduation unit for store instructions. It contains 32 entries and maintains the state associated with store instructions that have been issued to the Mbox, but for which data has not been written to the Dcache and the instruction retir ed. The Mbox assigns store instructions to SQ slots based on the order in which they were fetche d from the Icache and places them into the SQ after they are issued by the IQ. The SQ holds data associated with store instructions issued from the IQ until they are retired, at which point the store can be allowed to update the Dcache. The SQ also helps ensure correct Alpha memory reference behavior.

Pipeline Organization

2.1.6.3 Miss Address File
The 8-entry miss address file (MAF) holds physical addresses associated with pending Icache and Dcache fill requests and pending I/O space read transactions.
2.1.6.4 Dstream Translation Buffer
The Mbox includes a 128-entry, fully associative Dstream tra nsl ati on buffer (DTB) used to store Dstream addr ess tr anslat ions and page protec tion i nforma tion. Ea ch of t he entr ies in the DTB can map 1, 8, 64, or 512 contig uous 8KB pa ges. The allocation scheme is round-robin. The DTB supports an 8-bi t ASN and c ontains an ASM bit.

2.1.7 SROM Interface

The serial read-only memory (SROM) interface provides th e initialization data load path from a system SROM to the Icache. Refer to Chapter 7 for more information.
2.2 Pipeline Organization
The 7-stage pipeline provides an optimized environment for executing Alpha instruc-
tions. The pipeline stage s (0 t o 6) are shown in Figur e 2–8 and des cri bed in the follow­ing paragraphs.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–13
Pipeline Organization
Figure 2–8 Pipeline Organization
0213456
ALU
Branch
Predictor
Instruction
Cache (64KB) (2-Set)
Integer Register Rename
Four Instructions
Floating­Register
Rename
Map
Point
Map
Integer
Issue
Queue
(20)
Floating-
Point
Issue
Queue
(15)
Integer
Register
File
Floating-
Point
Register
File
Shifter
ALU Shifter
Multiplier
Address
ALU
Address
ALU
Floating-Point
Add, Divide,
and Square Root
Floating-Point
Multiply
64KB
Data
Cache
Bus
Interface
Unit
System Bus (64 Bits)
Cache Bus (128 Bits)
Physical Address (44 Bits)
FM-05575.AI4
Stage 0 Instruction Fetch
The branch predictor uses a branch history algorithm to predict a br anc h in st ruction tar­get address.
Up to four aligned instructions are fetched from the Icache, in program order. The branch prediction tables are also accessed in this cycle. The branch predictor uses tables and a branch history algorithm to predict a branch instruction target address for one branch or memory format JSR instruct ion per cycl e. Therefore, the prefetcher is limited to fetching through one branch per cycle. If there is more than one branch within the fetch line, and the branch pre dictor p redicts that the first b ranch will not be t aken, it will predict through subsequ ent branche s at the rate of on e per cycle, un til it pre dicts a ta ken branch or predicts through the last branch in the fetch line.
The Icache array also contains a line prediction field, the contents of which are applied to the Icache in the next cycle . The purpose o f the line predictor is to remove the pipe­line bubble which would otherwise be created when the branch predictor predicts a branch to be taken. In effect, the line predictor attempts to pr edict the Ica che line whi ch the branch predictor will generate. On fills, the line predictor value at each fetc h line is initialized with the inde x of the next sequential fetch line, and later retrained by the branch predictor if necessary.
Stage 1 — Instruction Slot
The Ibox maps four instructions per cycle from the 64KB 2-way set-predict Icache. Instructions are mapped in order, executed dynamically, but are retired in order.
2–14 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Pipeline Organization
In the slot stage, the branch predictor compares the next Icache index that it generates to the index that was generated by the line predictor. If there is a mismatch, the branch
predictor wins—the instructions fetched during that cycle are aborted, and the index predicted by the branch predictor is applied to the Icache during the next cycle. Line mispredictions result in one pipeline bubble.
The line predictor ta kes precedence over the branch predictor during memory format calls or jumps. If the line predictor was trained with a true (as opposed to predicted) memory format call or jump target, then its contents take precedence over the target hint field associated with these instructions. This allows dynamic calls or jumps to be correctly predicted.
The instruction fet cher produce s the full VPC addr ess d uring t he fe tc h stage of th e pipe ­line. The Icache produces the tags for both Icache sets 0 and 1 each time it is accessed. That enables the fetcher to separate set mispredictions from true Icache misses. If the access was caused by a set misprediction, the instruction fetcher aborts the last two fetched slots and refetches the slot in the next cycle. It also retrains the appropriate set prediction bits.
The instruction data is transferred from the Icache to the integer and floating-point reg­ister map hardware during this stage. When the integer instr uction is fetched from the Icache and sl otted into the IQ, the slot logi c determines wh ether the instruction is for the upper or lower subclusters. The slot logic makes the decision based on the resources needed by th e (up to four) integer inst ructions in the fetc h block. Althou gh all four instructions need not be issued simultaneously, distributing their resource usage improves instruction loading across the units. For example, if a fetch block contains two instructions that can be placed in either cluster followed by two instructions that must execute in the lower cluster, the slot logic would designate that combination as EELL and slot them as UULL. Slot combinations are described in Sect ion 2.3.2 and Table 2–3.
Stage 2 Map
Instructions are se nt from the Icache to the integer and floating-poi nt reg is ter maps dur ­ing the slot stage and register renaming is performed during the map stage. Also, each instruction is assigned a unique 8-bit number, called an inum, which is used to identify the instruction and its program order with respect to other instructions during the time that it is in flight. Instructions are considered to be in flight between the time they are mapped and the time they are retired.
Mapped instructions and their associated inums are placed in the integer and floating­point queues by the end of the map stage.
Stage 3 — Issue
The 20-entry integer issue queue (IQ) issues instructions at the rate of four per cycle. The 15-entry floating-point issue queue (FQ) issues floating-point opera te ins tr uct ions, conditional branch instructions, and store instructions, at the rate of two per cycle. Nor­mally, instructions are deleted from the IQ or FQ two cycles after they are issued. For example, if an instruction is issued in cycle n, it remains in the FQ or IQ in cycle n+1 but does not request service, and is deleted in cycle n+2.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–15

Instruction Issue Rules

Stage 4 Register Read
Instructions iss ued from the issue queues read their o per ands from the integer and float­ing-point register files and receive bypass data.
Stage 5 — Execute
The Ebox and Fbox pipelines begin execution.
Stage 6 Dcache Access
Memory reference instructions access the Dcache and data translation buffers. Nor­mally load instructions access the tag and data arrays while store instructions only access the tag arrays. Store data is written to the store queue where it is held until the store instruction is retired. Most integer operate instructions write their register results in this cycl e.

2.2.1 Pipel ine Aborts

The abort penalty as given is measured from the cycle after the fetch stage of the instruction which tr iggers the abort to the fetch stage of the new target, ignoring any Ibox pipeline stalls or queuing delay that the triggering instruction might experience.
Table 2–1 lists the timing associated with each common source of pipeline abort.
Table 2–1 Pipeline Abort Delay (GCLK Cycles)
Abort Condition
Branch misprediction 7 Integer or floating-point conditional branch
JSR misprediction 8 Memory format JSR or HW_RET. Mbox order trap 14 Load-load order or store-load order. Other Mbox replay traps 13
DTB miss 13 — ITB miss 7 — Integer arithmetic trap 12 — Floating-point arithmetic
trap
2.3 Instruction Issue Rules
This section defines instruction classes, the functional unit pipelines to which they are issued, and their associated latencies.
Penalty (Cycles) Comments
misprediction.
13+latency Add latency of instruction. See Section 2.3.3 for
instruction latencies.
2–16 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual

2.3.1 Instruction Group Definitions

Table 2–2 lists the instruction class, the pipeline assignments, and the instructions included in the class.
Table 2–2 Instruction Name, Pipeline, and Types
Class Name Pipeline Instruction Type
ild L0, L1 All integer load instructions fld L0, L1 All floating-point lo ad instructions ist L0, L1 All integer store instructions fst FST0, FST1, L0, L1 All floating-point store instructions lda L0, L1, U0, U1 LDA, LDAH mem_misc L1 WH64, ECB, WMB rpcc L1 RPCC rx L1 RS, RC
Instruction Issue Rules
mxpr L0, L1
(depends on IPR) ibr U0, U1 Integer conditional branch instructions jsr L0 BR, BSR, JMP, CALL, RET, COR, HW_RET,
iadd L0, U0, L1, U1 Instructions with opcode 10 ilog L0, U0, L1, U1 AND, BIC, BIS, ORNOT, XOR, EQV, CMPBGE ishf U0, U1 Instructions with opcode 12 cmov L0, U0, L1, U1 Integer CMOV — either cluster
imul U1 Integer multiply instructions imisc U0 CTLZ, CT POP, CTTZ, PERR, MINxxx, MAXxxx, PKxx,
fbr FA Floating-point conditional branch inst ru ct ions fadd FA All flo a ting-point operate instructions except multiply,
fmul FM Floating-point multiply instruction fcmov1 FA Fl oat ing-point CMOV—first half fcmov2 FA Fl oat ing-point CMOV— second half
HW_MTPR, HW_MFPR
CALL_PAL
, except CMPBGE
16
16
UNPKxx
divide, square root, and conditional move instructions
fdiv FA Floating- poi n t divi de in st r uct i on fsqrt FA Floating-point square root instruction nop None TRAP, EXCB, UNOP - LDQ_U R31, 0(Rx)
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–17
Instruction Issue Rules
Table 2–2 Instruction Name, Pipeline, and Types (Continued)
Class Name Pipeline Instruction Type
ftoi FST0, FST1, L0, L1 FTOIS, FTOIT itof L0, L1 ITOFS, ITOFF, ITOFT mx_fpcr FM Instructions that move data from the floating-point

2.3.2 Ebox Slotting

Instructions that are issued from the IQ, and could execute in either upper or lower Ebox subclusters, are slotted to one pair or the other during the pipeline mapping stage
based on the instruction mixture in the fetch line. The codes that are used in Table 2–3 are as follows:
U—The instruction only executes in an upper subcluster.
L—The instruction only executes in a lower subcluster.
control register
E—The instruction could execute in either an upper or lower subcluster.
Table 2–3 defines the slotting rules. The table field Instruction Class 3, 2, 1 and 0 iden- tifies each instruction’s locati on in the fetch line by the va lue of bits [3:2 ] in its PC.
Table 2–3 Instruction Group Definitions and Pipeline Unit
Instruction Class 3 2 1 0
E E E E U L U L L L L L L L L L E E E L U L U L L L L U L L L U E E E U U L L U L L U E L L U U E E L E U L L U L L U L L L U L E E L L U U L L L L U U L L U U E E L U U L L U L U E E L U L U E E U E U L U L L U E L L U U L E E U L U L U L L U E U L U L U E E U U L L U U L U L E L U L U E L E E U L U L L U L L L U L L E L E L U L U L L U L U L U L U
Slotting 3 2 1 0
Instruction Class 3 2 1 0
Slotting 3 2 1 0
E L E U U L L U L U U E L U U L E L L E U L L U L U U L L U U L E L L L U L L L L U U U L U U U E L L U U L L U U E E E U L U L E L U E U L U L U E E L U L U L E L U L U L U L U E E U U L L U
2–18 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Instruction Issue Rules
Table 2–3 Instruction Group Definitions and Pipeline Unit (Continued)
Instruction Class 3 2 1 0
Slotting 3 2 1 0
Instruction Class 3 2 1 0
Slotting 3 2 1 0
E L U U L L U U U E L E U L L U E U E E L U L U U E L L U U L L E U E L L U U L U E L U U L L U E U E U L U L U U E U E U L U L E U L E L U L U U E U L U L U L E U L L U U L L U E U U U L U U E U L U L U L U U L E E U L U L E U U E L U U L U L E L U L U L E U U L L U U L U L E U U L L U E U U U L U U U U L L E U L L U L E E E L U L U U L L L U L LL L E E L L U U L U L L U U L L U L E E U L U L U U L U E U L U L L E L E L U L U U L U L U L U L L E L L L U L L U L U U U L U U L E L U L U L U U U E E U U L L L E U E L U U L U U E L U U L L L E U L L U U L U U E U U U L U L E U U L L U U U U L E U U L L L L E E L L U U U U L L U U L L L L E L L L U L U U L U U U L U L L E U L L U U U U U E U U U L L L L E L L L U U U U L U U U L
U U U U U U U U
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–19
Instruction Issue Rules

2.3.3 Instruction Latencies

After an ins truction is placed in the IQ or FQ, its issue point is determined by the avail­ability of its register operands, functional unit(s), and relationship to other instructions in the queue. There are register producer-consumer dependencies and dynamic func­tional unit availability dependencies that affect instruction issue. The mapper removes register producer-producer dependencies.
The latency to produce a reg ister resul t is genera lly fi xed. The one exce ption i s for l oad
instructions that miss the Dcache. Table 2–4 lists the latency, in cycles, for each instruction class.
Table 2–4 Instruction Class Latency in Cycles
Class Latency Comments
ild 3
13+
fld 4
14+
ist Does not produce register value.
fst Does not produce register value. rpcc 1 Possible 1-cycle cross-cluster delay. rx 1 — mxpr 1 or 3 HW_MFPR: Ebox IPRs = 1.
icbr Conditional branch. Does not produce register value. ubr 3 Uncond itional branch. Does not produce register value. jsr 3 — iadd 1 Possible 1-cycle Ebox cross-cluster delay. ilog 1 Possible 1-cycle Ebox cross-cluster delay. ishf 1 Possible 1-cycle Ebox cross-cluster delay.
Dcache hit. Dcache miss, latency with 6-cycle Bcache. Add additional Bcache loop latency if Bcache latency is greater than 6 cycles.
Dcache hit. Dcache miss, latency with 6-cycle Bcache. Add additional Bcache loop latency if Bcache latency is greater than 6 cycles.
Ibox and Mbox IPRs = 3.
HW_MTPR does not produce a register value.
cmov1 1 Only consumer is cmov2. Possible 1-cycle Ebox cross-cluster delay. cmov2 1 Possible 1-cycle Ebox cross-cluster delay. imul 7 Possible 1-cycle Ebox cross-cluster delay. imisc 3 Possible 1-cycle Ebox cross-cluster delay. fcbr Does not produce register value. fadd 4
6
2–20 Internal Architecture
Consumer other than fst or ftoi. Consumer fst or ftoi. Measured from when an fadd is issued from the FQ to when an fst or ftoi is issued from the IQ.
Alpha 21264/EV67 Hardware Reference Manual
Table 2–4 Instruction Class Latency in Cycles (Continued)
Class Latency Comments

Instruction Retire Rules

fmul 4
6
fcmov1 4 Only consumer is fcmov2. fcmov2 4
6
fdiv 12
9 15 12
fsqrt 18
15 33 30
ftoi 3
itof 4 — nop Does not produce register value.
Consumer other than fst or ftoi. Consumer fst or ftoi. Measured from when an fmul is issued from the FQ to when an fst or ftoi is issued from the IQ.
Consumer other than fst. Consumer fst or ftoi. Measured from when an fcmov2 is issued from the FQ to when an fst or ftoi is is sued from the IQ.
Single precision - latency to consumer of result value. Single precision - latency to using divider again. Double precision - latency to consumer of result value. Double precision - latency to using divider again.
Single precision - latency to consumer of result value. Single precision - latency to using unit again. Double precision - latency to consumer of result value. Double precision - latency to using unit again.
2.4 Instruction Retire Rules
An instruction is retired when it has been executed to completion, and all previous instructions have been retired. The execution pipeline stage in which an instruction
becomes eligible to be retired depends upon the instruction’s class. Table 2–5 gives the minimum retire latencies (assuming that all previous instructions
have been retired) for various classes of instructions.
Table 2–5 Minimum Retire Latencies for Instruction Classes
Instruction Class Retire Stage Comments
Integer conditional branch 7 — Integer multiply 7/13 Latency is 13 cycles for the MUL/V instruction. Integer operate 7 — Memory 10 — Floating-point add 11 — Floating-point multiply 11
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–21

Retire of Operate Instructions into R31/F31

Table 2–5 Minimum Retire Latencies for Instruction Classes (Continued)
Instruction Class Retire Stage Comments
Floating-point DIV/SQRT 11 + latency Add latency of unit reuse for the instruction indicated in Table
2–4. For example, latency for a single-precision fdiv would be 11 plus 9 from Table 2–4. Latency is 11 if har dware detects that no exception is possible (see Section 2.4.1).
Floating-point conditional branch
BSR/JSR 10 JSR instruction m ispredict is reported in stage 8.
11 Branch instruction mispredict is reported in stage 7.

2.4.1 Floating-Point Divide/Square Root Early Retire

The floating-point divider and square root unit can detect that, for many combinations of source operand values, no exception can be generated. Instructions with these oper­ands can be retired before the result is generated. When detected, they are retired with the same latency as the FP add class. Early re tirement is not possible for th e following instruction/operand/architecture state conditions:
Instruction is not a DIV or SQRT.
SQRT source operand is negative.
Divide operand exponent_a is 0.
Either operand is NaN or INF.
Divide operand exponent_b is 0.
Trapping mode is /I (inexact).
INE status bit is 0.
Early retirement is a lso not possi ble f or div ide i nstruc tions if t he res ul ting expon ent has any of the following characteristics (EXP is the result exponent):
DIVT, DIVG: (EXP >= 3FF
DIVS, DIVF: (EXP >= 7F
) OR (EXP <= 216)
16
) OR (EXP <= 38216)
16
2.5 Retire of Operate Instructions into R31/F31
Many instructions that have R31 or F31 as their destination are retired immediately upon decode (stage 3). These i nstructions do not produce a r esult and ar e removed fro m the pipeline as well. They do not occupy a slot in the issue queues and do not occupy a
functional unit. Table 2–6 lists th ese instructions and some of their char act er is ti cs . The instruction type in Table 2–6 is from Table C-6 in Append ix C o f the Alpha Ar c hitecture Handbook, Version 4.
2–22 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Table 2–6 Instructions Retired Without Execution
Instruction Type Notes
INTA, INTL, INTM, INTS All with R31 as destination. FLTI, FLTL, FLTV All with F31 as destination. MT_FPCR is not included
because it has no destination—it is never removed from the
pipeline. LDQ_U All with R31 as destination. MISC TRAPB and EXCB are always removed. Others are never
removed. FLTS All (SQRT, ITOF) with F31 as destination.

2.6 Load Instructions to R31 and F31

This section describe s how th e 21264/EV67 processes software-directed prefetc h tr ans ­actions and load instructions with a destination of R31 and F31.
Load Instructions to R31 and F31
Prefetches allocat e a M AF entry. How the MAF entry is allocated is what distinguishes the type of prefetch. A normal prefetch is equivalent to a normal load MAF (that is , a MAF entry that puts the block into the Dcache in a readable state). A prefetch with modify intent is equivalent to a normal st ore MAF (that is, a MAF entry that puts the block into the Dcache in a wri teabl e stat e). A pref etch, evi ct next , is equi valen t to a nor ­mal load MAF, with the additional behavior described in Section 2.6.3, below.
A prefetch is not performed if the prefetch hits in the Dcache (as if it were a normal load).
Load operations to R31 and F31 may generate exceptions. These exceptions must be dismissed by PALcode.
The following sections describe the operational prefetch behavior of these instructions.

2.6.1 Normal Prefetch: LDBU, LDF, LDG, LDL, LDT, LDWU, HW_LDL Instructions

The 21264/EV67 processes these instructions as normal cache line prefetches. If the load instruction hits the Dcache, the instruction is dismissed, otherwise the addressed cache block is allocated into the Dcache.
The HW_LDL instruction construct equates to the HW_LD instruction with the LEN
field clear. See Table 6–3.

2.6.2 Prefetch with Modify Intent: LDS Instruction

The 21264/EV67 processes an LDS instruction, with F31 as the destination, as a prefetch with modify intent t ransact ion (ReadBlkM od command). I f the tr ansactio n hits a dirty Dcache block, the instr uction is dismissed . Otherwise, the addr essed cache block is allocated into the Dcache for write access, with its dirty and modifie d bits set.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–23

Special Cases of Alpha Instruction Execution

2.6.3 Prefetch, Evi ct Next: LDQ and HW_LDQ Instructions

The 21264/EV67 processes this instruction like a normal prefetch transaction (Read-
BlkSpec command), with one exception—if the load misses the Dcache, the addressed cache block is allocated into the Dcache, but the Dcache set allocation pointer is left pointing to this block. The next miss to the same Dcache line will evict the block. For example, this instruct ion might be use d when softwar e is reading an array tha t is known to fit in the offchip Bcache, but will not fit into the onchip Dcache. In this case, the instruction ensure s th at th e hardwa re prov ides t he desi red pr efet ch func tion wi thout d is­placing useful cache blocks stored in the other set within the Dcache.
The HW_LDQ instruction construct equates to the HW_LD instruction with the LEN field set. See Table 6–3.

2.6.4 Prefetch with the LDx_L / STx_C Instruction Sequence

A prefetch within a dynamic 80-instruction window of a LDx_L instruction can cause the subsequent STx_C to incorrectly succeed when all three references are to the same 64-byte cache block. Within that 80-instruction window, the proximity of the prefetch to the LDx_L instruction directly affects the possibility of the incorrect behavior. Fur­ther, if t he pre fe tc h issu es befo re the LDx_L, the er ro r cannot occur, and if the prefetch issues after the LDx_L, the error can only occur when another processor is simulta­neously acquiring the same lock.
2.7 Special Cases of Alpha Instruction Execution
This section describes the mechanisms that the 21264/EV67 uses to process irregular instructions in the Alpha instruction set, and cases in which the 21264/EV67 processes instructions in a non-intuitive way.

2.7.1 Load Hit Speculation

The latency of integer load instructions that hit in the Dcac he is three cycles. Figure 2– 9 shows the pipeline timing for these integer load instructions. In Figure 2–9:
Symbol Meaning
Q Issue queue R Register file read E Execute D Dcache access B Data bus active
2–24 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Special Cases of Alpha Instruction Execution
Figure 2–9 Pipeline Timing for Integer Load Instructions
Hit
1Cycle Number
2 3 4 5 6 7 8
ILD Instruction 1 Instruction 2
QREDB
QR
Q
FM-05814.AI4
There are two cycles in which the IQ may spec ul ati v e ly is sue inst ru ctions that use load data before Dcache hit infor mat ion is known. An y inst ructions th at are is sue d by the IQ within this 2-cycle speculative window are ke pt in the IQ with their requests inhibited
until the load instru ction’s hit condition is known, even if they are not depe nden t on the load operation. If the lo ad instr uc tion hit s, then the se inst ruc tions are remo ved from the queue. If the load instruction misses, then the execution of these instructions is aborted and the instructions are allowed to request service again.
For example, in Figure 2–9, instruction 1 and instruction 2 are issued within the specu­lative window of the load instruction. If the load instruction hits, then both instructions will be deleted from the queue by the start of cycle 7—one cycle later than normal for instruction 1 and at the norm al time for instruc tion 2. If the load inst ruction misses , both instructions are aborted from the execution pipelines and may request service again in cycle 6.
IQ-issued instructi ons are aborte d if iss ued with in the sp eculat ive win dow of an int eger load instruction that missed in the Dcache, even if they are not dependent on the load data. However, if software misses are likely, the 21264/EV67 can still benefit from scheduling the instruction stream for Dcache miss latency. The 21264/EV67 includes a saturating counter that is incremented when load instructions hit and is decremented when load instructions miss. When the upper bit of the counter equals zero, the integer load latency is incr eased to five cycles and the speculative window is removed. The counter is 4 bits wide and is incremented by 1 on a hit and is decremented by two on a miss.
Since load instructions to R31 do not produce a result, they do not create a speculative window when they execute and, therefore, never waste IQ-issue cycles if they miss.
Floating-point load instructions that hit in the Dcache have a latency of four cycles. Fig­ure 2–10 shows the pipeli ne timing for floating-point load instructions. In Figure 2–10:
Symbol Meaning
Q Issue queue R Register file read E Execute D Dcache access B Data bus active
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–25
Special Cases of Alpha Instruction Execution
Figure 2–10 Pipeline Timing for Floating-Point Load Instructions
1Cycle Number
2 3 4 5 6 7 8
Hit
FLD Instruction 1 Instruction 2
QREDB
The speculative window for floating-point load instructions is one cycle wide. FQ-issued instruction s that are issue d within the spe culative window of a floating- point load instruction that has missed, are only aborted if they depend on the load being suc­cessful.
For example, in Figure 2–10 instruction 1 is issued in the speculative window of the load instruc tion.
If instruction 1 is not a user of the data returned by the load instruction, then it is removed from the queue at its normal time (at the start of cycle 7).
If instruction 1 is dependent on the load instruction data and the load instruction hits , instruction 1 is removed from the queue one cycle later (at the start of cycle 8). If the load instruction misses, then instruction 1 is aborted from the Fbox pipeline and may request service again in cycle 7.

2.7.2 Floating-Point Store Instructions

QR
Q
FM-05815.AI4
Floating-point store instructions are duplicated and loaded into both the IQ and the FQ from the mapper. Each IQ entry contains a control bit, fpWait, that when set prevents that entry from asserting its requests. This bit is initially set for each floating-point store instruction that enter s the IQ, unless it was the ta rget of a replay trap. The instruction’s FQ clone is issued when its Ra register is about to become clean, resulting in its IQ clone’s fpWait bit being cleared and allowing the IQ clone to issue and be executed by the Mbox. This mechanism ensures that floating-point store instructions are always issued to the Mbox, along wi th t he associated data, without requiring the float in g-p oin t register dirty bits to be available wi thin the IQ.

2.7.3 CMOV Instruction

For the 21264/EV67, the Alpha CMOV instruction has three operands, and so presents a special case. The required operation is to move either the value in register Rb or the value from the old physical destination register into the new destination register, based upon the value in Ra. Since neither the mapper nor the Ebox and Fbox data paths are otherwise required to handle three operand instructions, the CMOV instruction is decomposed by the Ibox pipeline into two 2-operand instructions:
The Alpha architecture instruction CMOV Ra, Rb ⇒ Rc Becomes the 21264/EV67 instructions CMOV1 Ra, oldRc newRc1 CMOV2 newRc1, Rb newRc2
2–26 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual

Memory and I/O Address Space Instructions

The first instructi on, CM OV1, tests the value of Ra and records the result of this te st in a 65th bit of its destina tion r egister, newRc1. It also copies the value of th e old phys ical destination register, oldRc, to newRc1.
The second instruction, CMOV2, t hen copies eit her the value in newRc 1 or the value in Rb into a second physical destination register, newRc2, based on the CMOV predicate bit stored in newRc1.
In summary, the original CMOV instruction is decomposed into two dependent instruc­tions that each use a physical register from the free list.
To further simplify this operation, the two component instructions of a CM OV instruc­tion are driven thr ough the map pers in s uccessive cycles. Hence, i f a fetc h line conta ins n CMOV instructions, it takes n+1 cycles to run that fetch line through the mappers.
For example, the following fetch line:
ADD CMOVx SUB CMOVy
Results in the following three map cycles:
ADD CMOVx1 CMOVx2 SUB CMOVy1 CMOVy2
The Ebox executes intege r CMOV instructions as two distinct 1-cycle latency opera­tions. The Fbox add pipeline executes fl oating-point CMOV instruc tions as two distinct 4-cycle latency operations.
2.8 Memory and I/O Address Space Instructions
This section provide s an ove rview o f the way th e 21264 /EV67 pro cesses memory an d I/ O address space instructions.
The 21264/EV67 supports, and internally recognizes, a 44-bit physical address space that is divided equally between memory address space and I/O address space. Memory address space resides in the lower half of the physical address space (PA[43]=0) and I/O address space resides in the upper half of the physical address space (PA[43]=1).
The IQ can issue any combination of load and store instructions to t h e Mbox at the rate of two per cycle. The two lower Ebox subclusters, L0 and L1, generate the 48-bit effective virtual address for these instructions.
An instructi on is defined to be newer than another instruction if it follows that instruc­tion in program order and is older if it precedes that instruction in program order.

2.8.1 Memory Address Space Load Instructions

The Mbox begins execution of a load instruction by translating its virtual address to a physical address using the DTB and by accessing the Dcache. The Dcache is virtually indexed, allowing these two operations to be done in parallel. The Mbox puts informa­tion about the load instruction, including its physical address, destination register, and data format, into the LQ.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–27
Memory and I/O Address Space Instructions
If the requested physical location is found in the Dcache (a hit), the data is formatted and written int o the ap propri ate int eger or floati ng-poin t regis ter. If the location is n ot in the Dcache (a miss), the physical add ress is placed in the miss address file (MAF) for processing by the Cbox. The MAF perf orms a merging function in which a new miss address is compared to mis s addresse s already held in the MAF. If the new miss address points to the same Dcache block as a miss address in the MAF, then the new miss address is d iscarded.
When Dcache fill data is returned to the Dcache by the Cbox, the Mbox satisfies the requesting l oad instructio ns in the LQ.

2.8.2 I/O Address Space Load Instructions

Because I/O space load instructions may have side effects, they cannot be performed speculatively. When the Mbox receives an I/O space load instruction, the Mbox places the load instruction in the LQ, where it is held until it retires. The Mbox replays retired I/O space load instructions from the L Q to the MAF in program order, at a rate of one per GCLK cycle.
The Mbox allocates a new MAF entry to an I/O load instruction and inc reases I/O band -
width by attempting to mer ge I/O loa d instruc tions in a mer ge re gister. Tabl e 2–7 shows the rules for merging data. The columns represent the load instructions replayed to the MAF while the rows represent the size of the load in the merge register.
Table 2–7 Rules for I/O Address Space Load Instruction Data Merging
Merge Register/ Replayed Instruction Load Byte/Word Load Longword Load Quadword
Byte/Word No merge No merge No merge Longword No merg e Merge up to 32 bytes No merge Quadword No merge No merge Merge up to 64 bytes
In summary, Table 2–7 shows some of the following rules:
Byte/word load instruct ions and different size loa d instructions are not allowed to
merge.
A stream of ascending non-ove rlapping, but not necessarily consecutive, longwo rd
load instructions are allowed to merge into naturally aligned 32-byte blocks.
A stream of ascending non-ove rlappi ng, but no t nece ssari ly con secuti ve, quadwor d
load instructions are allowed to merge into naturally aligned 64-byte blocks.
Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on
the Cbox WRITE_ONCE chain 32_BYTE_IO field.
Issued MB, WMB, and I/O load instructions close the I/O register merge window.
To minimize latency, the merge window is also closed when a timer detects no I/O store instruction activity for 1024 cycles.
After the Mbox I/O regist er has closed its merge window, the Cbox sends I/O read requests offchip in the order that they were re ceived from the Mbox.
2–28 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Memory and I/O Address Space Instructions

2.8.3 Memory Address Space Store Instructions

The Mbox begins execution of a store instruction by translating its virtual address to a physical address using the DTB and by probing the Dcache. The Mbox puts informa­tion about the store instruction, including its physical address, its data and the results of the Dcache probe, into the store queue (SQ).
If the Mbox does not find the addressed location in the Dcache, it places the address into the MAF for pro ces si ng b y the Cbox. If the Mbox finds the addressed lo ca ti on i n a Dcache block that is not dirty, then it places a ChangeToDirty request into the MAF.
A store instruction can write its data into the Dcache when it is retired, and when the Dcache block containing its address is dirty and not shared. SQ entries that meet these two conditions can be placed into the writable state. These SQ entries are placed into the writable state in program order at a maximum rate of two entries per cycle. The Mbox transfers writable store queue entry data from the SQ to the Dcache in program order at a maximum rate of two entries per cycle. Dcac he lines associa ted with writab le store queue entries are locked by the Mbox. System port probe commands cannot evict these blocks until their associated writable SQ entries have been transferred into the Dcache. This restriction assists in STx_C instruction and Dcache ECC processing.
SQ entry data that has not been t ransfer red to th e Dcache may sour ce data t o newer loa d instructions. The Mbox compares the virtual Dcache index bits of incoming load instructions to queued SQ entries, and sources the data from the SQ, bypassing the Dcache, when necessary.

2.8.4 I/O Address Space Store Instructions

The Mbox begins processing I/O space store instructions, like memory space store instructions, by translating the virtual address and placing the state associated with the store instru ction into the SQ.
The Mbox replays retired I/O space store entries from the SQ to the IOWB in program order at a rate of one per GCLK cycle. The Mbox never allows queued I/O space store instructions to source data to subsequent load instructions.
The Cbox maximizes I/O bandwidth when it a llocates a new IOWB entry to an I/O store instruction by attempting to merge I/O store instructions in a merge register. Table
2–8 shows the rule s for I/O s pace sto re instr uction d ata mer ging. Th e column s represent the load instructi ons replayed to the IOWB while the rows re present th e size of the store in the merge register.
Table 2–8 Rules for I/O Address Space Store Instruction Data Merging
Merge Register/ Replayed Instruction
Store Byte/Word Store Longword Store Quadword
Byte/Word No merge No merge No merge Longword No merg e Merge up to 32 bytes No merge Quadword No merge No merge Merge up to 64 bytes
Table 2–8 shows some of the following rules:
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–29

MAF Memory Address Space Merging Rules

Byte/word store instructions and different size store instructions are not allowed to
merge.
A stream of ascending non-ove rlapping, but not necessarily consecutive, longwo rd
store instructions are allowed to merge into naturally aligned 32-byte blocks.
A stream of ascending non-ove rlappi ng, but no t nece ssari ly con secuti ve, quadwor d
store instructions are allowed to merge into naturally aligned 64-byte blocks.
Merging of quadwords can be limited to naturally-aligned 32-byte blocks based on
the Cbox WRITE_ONCE chain 32_BYTE_IO field.
Issued MB, WMB, and I/O load instructions close the I/O register merge window.
To minimize latency, the merge window is also closed when a timer detects no I/O store instruction activity for 1024 cycles.
After the IOWB merge register has closed its merge windo w, the Cbox sends I/O space store requests offchip in the order that they were received from the Mbox.
2.9 MAF Memory Address Space Merging Rules
Because all memory trans actio ns are to 6 4-byte blocks , ef fic iency i s impro ved by merg-
ing several small data transactions into a single larger data transaction. Table 2–9 lists the rules the 21264/EV67 uses when merging memory transactions into 64-byte natu­rally aligned data block transactions. Rows represent the merged instruction in the MAF and columns represent the new issued transaction.
Table 2–9 MAF Merging Rules
MAF/New LDx STx STx_C WH64 ECB Istream
LDx Merge ————— STxMergeMerge———— STx_C——Merge——— WH64———Merge—— ECB————Merge— Istream—————Merge
In summary, Table 2–9 shows that only like instruction types, with the exception of load instructions merging w ith store instructions, are merged.

2.10 Instructio n Ordering

In the absence of explicit instruction ordering, such as with MB or WMB instructions, the 21264/EV67 maintains a default instruct ion ordering relationship between pairs of load and store instructions.
2–30 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual

Replay Traps

The 21264/EV67 maintains the default memory data instruction ordering as shown in
Table 2–10 (assume address X and address Y are different).
Table 2–10 Memory Reference Ordering
First Instruction in Pair Second Instruction In Pair Reference Order
Load memory to address X Load memory to address X Maintained (litmus test 1) Load memory to address X Load memory to address Y Not maintained Store memory to address X Store memory to address X Maintained Store memory to address X Store memory to address Y Maintained Load memory to address X Store memory to address X Maintained Load memory to address X Store memory to address Y Not maintained Store memory to address X Load memory to address X Maintained Store memory to address X Load memory to address Y Not maintained
The 21264/EV67 maintains t he defa ult I/ O instru ctio n order ing as sho wn in Table 2–11 (assume address X and address Y are different).
Table 2–11 I/O Reference Ordering
First Instruction in Pair Second Instruction in Pair Reference Order
Load I/O to address X Load I/O to address X Maintained Load I/O to address X Load I/O to address Y Maintained Store I/O to address X Store I/O to address X Maintained Store I/O to address X Store I/O to address Y Maintained Load I/O to address X Store I/O to address X Maintained Load I/O to address X Store I/O to address Y Not maintained Store I/O to address X Load I/O to address X Maintained Store I/O to address X Load I/O to address Y Not maintained
2.11 Replay Traps
There are some situat ions in whic h a loa d o r store instr uctio n canno t b e execu ted due to a condition that occurs after that in structi on issues fr om the IQ or FQ. The inst ruction is aborted (along with all newer instructions) and restarted from the fetch stage of the pipeline. This mechanism is called a replay trap.
2.11 .1 Mbox Order Traps
Load and store instructions may be issued from the IQ in a different order than they were fetched from the Icach e, while the architecture dictates that Dstream memory transactions to the same physical bytes must be completed in order. Usually, the Mbox manages the memory reference stream by itself to achieve architecturally correct behavior , but t he two ca ses in whi ch the Mbox uses re play tr aps to man age the memor y stream are load-load and store-load order traps.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–31

I/O Write Buffer and the WMB Instruction

2.11.1.1 Load-Load Order Trap
The Mbox ensures that load instructions that read the same physical byte(s) ultimately issue in correct order by using the load-load order trap. The Mbox compares the address of each load instruction, as it is issued, to the address of all load instructions in the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes a load-load order trap on the newer in struction. This is a replay trap that aborts the tar­get of the trap and all newer instructions from the machine and refetches instructions starting at the target of the trap.
2.11.1.2 Stor e-Load Order Trap
The Mbox ensures that a load instruction ultimately issues after an older store instruc­tion that writes some portion of its memory operand by using the store-load order trap. The Mbox compares the address of each store instruction, as it is issued, to the address of all load instruct ions in the load queue. If the Mbox finds a newer load instruction in the load queue, i t invokes a store-load order trap on the load instr uction. This is a repla y trap. It functions like the load-load ord er tr ap.
The Ibox contains extra hardware to reduce the frequency of the store-load trap. There is a 1-bit by 1024-entr y VPC-inde xed tab le in t he Ibox calle d the st Wait table. When an Icache instruction is fetched, the associated stWait table entry is fetched along with the Icache instruction. The stWait table produces 1 bit for each instruction accessed from the Icache. When a loa d i nstru ction ge ts a store-load order repl ay tr ap, it s asso ciat ed bit in the stWait table is set during the cycle that the load is refetched. Hence, the trapping
load instruc tion’s stWait bi t will be set the next time it is fetched. The IQ will not issue load instructions whose stW ait bit is set while there are older unis-
sued store i nstructions in the queue. A load instruction whose stWait bit is set can be issued the cycle immediately after the last older store instruction is issued from the queue. All the bi ts in t he stWait table are unconditiona lly cle ared ever y 16384 c ycles, or every 65536 cycles if I_CTL[ST_WAIT_64K] is set.

2.11.2 Other Mbox Replay Traps

The Mbox also uses replay traps to control the flow of the load queue and store queue, and to ensure that there are never multiple outstanding misses to different physical addresses that map to the sa me Dcac he or Bc ache l ine. Unl ike th e order tra ps, howeve r, these replay traps are invoked on the incoming instruction that triggered the condition.
2.12 I/O Write Buffer and the WMB Instruction
The I/O write buffer (IOWB ) consists of four 64-byte entries with the associated address and control logic used to buffer I/O write data between the store queue (SQ) and the system port.

2.12.1 Memory Barrier (MB/WMB/TB Fill Flow)

The Cbox CSR SYSBUS_MB_ENABLE bit determines if MB instructions produce external system port transactions. When the SYSBUS_MB_ENABLE bit equals 0, the Cbox CSR MB_CNT[3:0] field contains the number of pending uncommitted transac­tions. The counter will increment for each of the following commands:
RdBlk, RdBlkMod, RdBlkI
2–32 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
RdBlkSpec (valid), RdBlkModSpec (valid), RdBlkSpecI (valid)
RdBlkVic, RdBlkModVic, RdBlkVicI
CleanToDirty, SharedToDirty, STChangeToDirty, InvalToDirty
FetchBlk, FetchBlkSpec (valid), Evict
RdByte, RdLw, RdQw, WrByte, WrLW, WrQW
The counter is decremented with the C (commit) bit in the Probe and SysDc commands (see Section 4.7.7). Syst ems can assert the C bit in the SysDc fill response to the com­mands that originally i ncremen ted the counter, or attached to the last probe s een by tha t command when it reac hed the syst em seri aliz ation point . If the nu mber of unc ommitted transactions reaches 15 (saturating the counter), the Cbox will stall MAF and IOWB processing until at least one of the pending transac ti ons h as been committed. Probe pro­cessing is not interrupted by the state of this counter.
2.12.1.1 MB Instruction Processing
When an MB instruction is fetched in the predicted instruction execution path, it stalls in the map stage of the pipeline. This also stalls all instructions after the MB, and con­trol of instruction flow is base d upon t he val ue in Cbox CSR SYSBUS_MB_ENABLE as follows:
I/O Write Buffer and the WMB Instruction
If Cbox CSR SYSBUS_MB_ENABLE is clear, the Cbox waits until the IQ is
empty and then performs the following actions: a. Sends all pending MAF and IOWB entries to the system port.
b. Monitors Cbox CSR MB_CNT[3:0], a 4-bit counter of outstanding committed
events. When the counter decr ements from one to zero, the Cbox marks the youngest probe queue entry.
c. Wait s until the MAF contains no more Dst ream refer ences and th e SQ, LQ, and
IOWB are empty.
When all of the above have occurred and a probe response has been sent to the sys­tem for the marked probe queue entry, instruction execution continues with the instruction after the MB.
If Cbox CSR SYSBUS_MB_ENABLE is set, the Cbox waits until the IQ is empty
and then performs the following actions: a. Sends all pending MAF and IOWB entries to the system port b. Sends the MB command to the system port c. Waits until the MB command is acknowledged, then marks the youngest entry
in the probe queue
d. Waits until the MAF contains no more Dstr eam referen ces and the SQ, LQ, and
IOWB are empty
When all of the above have occurred and a probe response has been sent to the sys­tem for the marked probe queue entry, instruction execution continues with the instruction after the MB.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–33
I/O Write Buffer and the WMB Instruction
Because the MB instruction is executed speculatively, MB processing can begin and the original MB ca n be killed. In the internal acknowledge case, the MB may have already been sent to the system interface, and the system is still expected to respond to the MB.
2.12.1.2 WMB Instruction Processing
Write memor y barrier (WMB) inst ructions are issued int o the Mbox st ore-queu e, where they wait until they are retired and all prio r store instructions become writable. The Mbox then stalls the writable poi nter and informs the Cbox. The Cbox closes the IOWB merge register and responds in one of the following two ways:
If Cbox CSR SYSBUS_MB_ENABLE is clear, the Cbox performs the following
actions: a. Stalls further MAF and IOWB processing.
b. Monitors Cbox CSR MB_CNT[3:0], a 4-bit counter of outstanding committed
events. When the counter decr ements from one to zero, the Cbox marks the youngest probe queue entry.
c. When a probe response has been s ent to t he sy stem for the mar ked probe q ueue
entry, the Cbox considers the WMB to be sa tisfied.
If Cbox CSR SYSBUS_MB_ENABLE is set, the Cbox performs the following
2.12.1.3 TB Fill Flow
Load instructions (HW_LDs) to a virtua l page table entry (VPTE) are processed by the 21264/EV67 to avoid litmus test problems associated with the ordering of memory transactions from anoth er pr ocessor against loading of a page table entry and the subse­quent virtual-mode load from this proces sor.
Consider the sequence shown in Table 2–12. The data could be in the Bcach e. Pj should fetch datai if it is using PTEi.
Table 2–12 TB Fill Flow Example Sequence 1
Pi Pj
actions: a. Stalls further MAF and IOWB processing. b. Sends the MB command to the system port. c. Waits until the MB command is acknowledged by the system with a SysDc
MBDone command, then sends acknowle dge and marks the youngest entry in the probe queue.
d. When a probe r esponse has be en sent to t he syst em for the mar ked pro be qu eue
entry, the Cbox considers the WMB to be sa tisfied.
Write Datai Load/Store datai MB <TB miss> Write PTEi Load-PTE
2–34 Internal Architecture
<write TB> Load/Store (restart)
Alpha 21264/EV67 Hardware Reference Manual
I/O Write Buffer and the WMB Instruction
Also consider the relate d sequ ence shown in Table 2–13. In this case, the data could be cached in the Bcache; Pj should fetch datai if it is using PTEi.
Table 2–13 TB Fill Flow Example Sequence 2
Pi Pj
Write Datai Istream read datai MB <TB miss> Write PTEi Load-PTE
<write TB> Istream read (restart) - will miss the Icache
The 21264/EV67 processes Dstream loads to the PTE by injecting, in hardware, some memory barrier processing between the PTE transaction and any subsequent load or store instruction. This is accomplished by the following mechanism:
1. The integer queue issues a HW_LD instruction with VPTE.
2. The integer queue issues a HW_MTPR instruction with a DTB_PTE0, that is data­dependent on the HW_LD instruction with a VPTE, and is required in order to fill the DTBs. The HW_MTPR instruction, when que ued, set s I PR scoreboard bits [4] and [0].
3. When a HW_MTPR instruction with a DTB_PTE0 is issued, the Ibox signals the Cbox indicating that a HW_LD instruction with a VPTE has been processed. This causes the Cbox to begin processing the MB instruction. The Ibox prevents any subsequent memory operation s bei ng is sued by not clearing the IPR scoreboard bit [0]. IPR scoreboard bit [0] is one of the scoreboard bits associated with the HW_MTPR instruction with DTB_PTE0.
4. When the Cbox completes processing the MB instruction (using one of the above sequences, depending upon the state of SYSBUS_MB_ENABLE), the Cbox sig­nals the Ibox to clear IPR scoreboard bit [0].
The 21264/EV67 uses a similar mechanism to process Istream TB misses and fills to the PTE for the Istream.
1. The integer queue issues a HW_LD instruction with VPTE.
2. The IQ issues a HW_MTPR instruction with an ITB_PTE that is data-dependent upon the HW_LD instruction with VPTE. This is required in order to fill th e ITB. The HW_MTPR instruction, when queued, sets IPR scoreboard bits [4] and [0].
3. The Cbox issues a HW_MTPR instruction for the ITB_PTE and signals the Ibox that a HW_LD/VPTE instruction has been proce sse d, causing the Cbox to start pro­cessing the MB instruction. The Mbox stalls Ibox fetching from when the HW_LD/ VPTE instruction finishes until the probe queue is drained.
4. When the 21264/EV67 is finished (SYS_MB selects one of the above sequences), the Cbox directs th e Ibox t o clear IPR scoreb oard bit [0]. Also , the Mbo x direct s the Ibox to start prefetching.
Inserting MB instruction processing within the TB fill flow is only required for multi­processor systems. Uniprocessor systems can disable MB instruction processing by deasserting Ibox CSR I_CTL[TB_MB_EN].
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–35

Performance Measurement Support—Performance Counters

0050
2.13 Performance Measurement Support—Performance Counters
The 21264/EV67 provides hardware support for two methods of obtaining program performance feedback information. The two methods do not require program modifica­tion. The first method offers similar capabilities to earlier microprocessor performance counters. The second method sup ports the n ew Profi leMe way of s tati sticall y sampl ing individual instruct ions dur ing prog ram execut ion to dev elop a model of progra m execu ­tion. Both methods use the same hardware registers.
See Section 6.10 for information about counter control.

2.14 Floating-Point Control Register

The floating-point control register (FPCR) is shown in Figure 2–11.
Figure 2–11 Floating-Point Control Register
63 62 6160 59 4958 4857 475655 54 5352 51 50 0
SUM
INED UNFD UNDZ
DYN
IOV
INE UNF OVF DZE
INV
OVFD DZED
INVD
DNZ
The floating-point control register fields are described in Table 2–14.
Table 2–14 Floating-Point Control Register Fields
Name Extent Type Description
SUM [63] RW Summary bit. Records bit-wise OR of FPCR exception bits. INED [62] RW Inexact Disable. If this bit is set and a floating-point instruction that enables
trapping on inexact results generates an inexact value, the result is placed in the destination register and the trap is suppressed.
LK99-
A
2–36 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual
Floating-Point Control Register
Table 2–14 Floating-Point Control Register Fields (Continued)
Name Extent Type Description
UNFD [61] RW Underflow Disable. The 21264/EV67 hardware cannot generate IEEE compli-
ant denormal results. UNFD is used in conjunction with UNDZ as follows:
UNFD UNDZ Result
0 X Underflow trap. 1 0 Trap to supply a possible denormal result. 1 1 Underflow trap suppressed. Destination is written wit h a
true zero (+0.0).
UNDZ [60] RW Underflow to zero. When UNDZ is set together with UNFD, underflow traps
are disabled and the 21264/EV67 places a true zero in the destination register. See UNFD, above.
DYN [59:58] RW Dynamic rounding mode. Indicates the rounding mode to be used by an IEEE
floating-point instruction when the instruction specifies dynamic rounding mode:
Bits Meaning
00 Chopped 01 Minus infinity 10 Normal 11 Plus infinity
IOV [57] RW Integer overflow. An integer arithmetic operation or a conversion from float-
ing-point to integer overflowed the destination precision.
INE [56] RW Inexact result. A floating-point arithmetic or conv ersion o peration gav e a result
that differed from the mathematically exact result.
UNF [55] RW Underflow. A floating-point arithmetic or conversion operation gave a result
that underflowed the destination exponent.
OVF [54] RW Overflow. A flo ating-point arithmetic o r conversion o peration gave a result that
overflowed the destination exponent.
DZE [53] RW Divide by zero. An attempt was made to perform a floating-point divide with a
divisor of zero.
INV [52] RW Invalid operation. An attempt was made to perform a floating -point arithmetic
operation and one or more of its operand values were illegal.
OVFD [51] RW Overflow disable. If thi s b i t is s e t and a f lo a ti ng-point arithmetic operation gen-
erates an overflow condition, then the appropriate IEEE nontrapping result is placed in the destination register and the trap is suppressed.
DZED [50] RW Division by zero disable. If this bit is set and a floating-point divide by zero is
detected, the appropriate IEEE nontrapping result is placed in the destination register and the trap is suppressed.
INVD [49] R W Invalid operation disable. If thi s bit is set and a fl oatin g-point op erate gener ates
an invalid operation condition and 21264/EV67 is capable of producing the correct IEEE nontrapping result, that result is placed in the destination register and the trap is suppressed.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–37

AMASK and IMPLVER Instruction Values

Table 2–14 Floating-Point Control Register Fields (Continued)
Name Extent Type Description
DNZ [48] RW Denormal operands to zero. If this bit is set, treat all Denormal operands as a
signed zero value with the same sign as the Denormal operand.
Reserved [47:0]
1
Alpha architecture FPCR bit 47 (DNOD) is not implemented by the 21264/EV67.
1
——
2.15 AMASK and IMPLVER Instruction Values
The AMASK and IMPLVER instructions return processor type and supported architec­ture extensions, respectively.

2.15.1 AMASK

The 21264/EV67 returns the AMASK instruction values provided in Table 2–15. The I_CTL register reports the 21264/EV67 pass level (see I_CTL[CHIP_ID], Section
5.2.15).
Table 2–15 21264/EV67 AMASK Values
21264/EV67 Pass Level AMASK Feature Mask Value
See I_CTL[CHIP_ID], Table 5–11 307
16
The AMASK bit definitions provided in Table 2–15 are defined in Table 2–16.
Table 2–16 AMASK Bit Assignments
Bit Meaning
0 Support for the byte/word extension (BWX)
The instructions that comprise the BWX extension are LDBU, LDWU, SEXTB, SEXTW, STB, and STW.
1 Support for the square-root and floating-point convert extension (FIX)
The instructions that comprise the FIX extension are FT OIS, FT OIT, ITOFF , IT OFS, ITOFT, SQRTF, SQRTG, SQRTS, and SQRTT.
2 Support for the count extension (CIX)
The instructions that comprise the CIX extension are CTLZ, CTPOP, and CTTZ.
8 Support for the multimedia extension (MVI)
The instructions that comprise the MVI extension are MAXSB8, MAXSW4, MAXUB8, MAXUW4, MINSB8, MINSW4, MINUB8, MINUW4, PERR, PKLB, PKWB, UNPKBL, and UNPKBW.
9 Support for precise arithmetic trap reporting in hardware. The trap PC is the same as
the instruction PC after the trapping instruction is executed.

2.15.2 IMPLVER

For the 21264/EV67, the IMPLVER instruction returns the value 2.
2–38 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual

2.16 Design Examples

21272 Core
64-bit PCI Bus
FM-05573-EV67
The 21264/EV67 can be designed into many di fferent uniprocessor and multiprocessor
system configurations. Figures 2–12 and 2–13 illustrate two possible configurations. These configurations employ additional system/memory controller chipsets.
Figure 2–12 shows a typical uniprocessor system with a second-level cache. This sys­tem configuration could be used in standalone or networked workstations.
Figure 2–12 Typical Uniprocessor Configuration
Design Examples
L2 Cache
Tag
Store
Data
Store
21264
Tag
Address
Data
Address
Out
Address
In
Data
Logic Chipset
Control
Chips
Data Slice
Chips
Host PCI
Bridge Chip
Duplicate Tag Store (Optional)
DRAM Arrays
Address Data
Figure 2–13 shows a typical multip roc essor sys tem, each p rocess or with a second -leve l cache. Each interface controller must employ a duplicate tag store to maintain cache coherency. This system configuration could be used in a networked database server application.
Alpha 21264/EV67 Hardware Reference Manual
Internal Architecture 2–39
Design Examples
Address
FM-05574-EV67
Figure 2–13 Typical Multiprocessor Configuration
L2
Cache
L2
Cache
21264
21264
Host PCI
Bridge Chip
64-bit PCI Bus
21272 Core
Logic Chipset
Control
Chip
Data Slice
Chips
Host PCI
Bridge Chip
64-bit PCI Bus
DRAM
Arrays
Address
Data
DRAM
Arrays
Data
2–40 Internal Architecture
Alpha 21264/EV67 Hardware Reference Manual

Hardware Interface

This chapter contains the 212 64/EV67 mic rop roces sor log ic symbol an d provi des inf or­mation about signal names, their function, and their location. This chapter also describes the mechanical specifications of the 21264/EV67. It is organized as follows:
The 21264/EV67 logic symbol
The 21264/EV67 signal names and functions
Lists of the signal pins, sorted by name and PGA location
The specifications for the 21264/EV67 mechanical package
The top and bottom views of the 21264/EV67 pinouts

3.1 21264/EV67 Microprocessor Logic Symbol

Figure 3–1 show the logic symbol for the 21264/EV67 chip.
3
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–1
21264/EV67 Microprocessor Logic Symbol
Figure 3–1 21264/EV67 Microprocessor Logic Symbol
21264
System Interface Bcache Interface
3.3 V
SysAddIn_L[14:0] SysAddInClk_L SysAddOut_L[14:0] SysAddOutClk_L SysVref SysData_L[63:0] SysCheck_L[7:0] SysDataInClk_H[7:0] SysDataOutClk_L[7:0] SysDataInValid_L SysDataOutValid_L SysFillValid_L
ClkIn_x FrameClk_x EV6Clk_x PLL_VDD
Clocks
BcAdd_H[23:4]
BcData_H[127:0]
BcCheck_H[15:0]
BcDataInClk_H[7:0]
BcDataOutClk_[3:0]
BcTag_H[42:20]
BcTagInClk_H
BcTagOutClk_x
BcTagParity_H
BcTagShared_H
x
BcDataOE_L
BcDataWr_L
BcVref
BcTagDirty_H
BcTagValid_H
BcTagOE_L
BcTagWr_L
BcLoad_L
IRQ_H[5:0] ClkFwdRst_H SromData_H Tms_H Trst_L Tck_H Tdi_H PllBypass_H MiscVref Reset_L DCOK_H
Miscellaneous
SromClk_H
SromOE_L TestStat_H
Tdo_H
LK99-0051A
3–2 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual

21264/EV67 Signal Names and Functions

3.2 21264/EV67 Signal Names and Functions
Table 3–1 defines the 21264/EV67 signal types referred to in this section.
Table 3–1 Signal Pin Types Definitions
Signal Type Definition
Inputs I_DC_REF Input DC reference pin I_DA Input differential amplifier receiver I_DA_CLK Input clock pin Outputs O_OD Open drain output driver O_OD_TP Open drain driver for test pins O_PP Push/pull output driv er O_PP_CLK Pus h/ pu l l outpu t clock driver Bidirectional B_DA_OD Bi directional differential amplifier receiver with open drain output B_DA_PP Bidirectional differential amplifier receiver with push/pull output Other Spare Reserved to Compaq NoConnect No connection — Do not connect to these pins for any revision of the
21264/EV67. These pins must float.
1
All Spare connections are Reserved to Compaq to maintain compatibility between passes of the chip. Designers should not use these pins.
1
Table 3–2 lists all signal pins in alphabetic order and provides a full functional descrip­tion of the pins. Table 3–4 lists the signal pins and their corresponding pin grid array (PGA) locations in al phabetic order f or the s ignal ty pe. Table 3–5 lists the pi n grid ar ray locations in alphabetical order.
Table 3–2 21264/EV67 Signal Descriptions
Signal Type Count Description
BcAdd_H[23:4] O_PP 20 These signals provide the index to the Bcache. BcCheck_H[15:0] B_DA_PP 16 ECC check bits for BcData_H[127:0]. BcData_H[127:0] B_DA_PP 128 Bcache data signals. BcDataInClk_H[7:0] I_DA 8 Bcache data input clocks. These clocks are used with high
speed SDRAMs, such as DDRs, that provide a clock-out with data-output pins to optimize Bcache read bandwidths. The 21264/EV67 internally synchronizes the data to its logic with clock forward receive circuits similar to the system interface.
BcDataOE_L O_PP 1 Bcache data output enable. The 21264/EV67 asserts this signal
during Bcache read operations.
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–3
21264/EV67 Signal Names and Functions
Table 3–2 21264/EV67 Signal Descriptions (Continued)
Signal Type Count Description
BcDataOutClk_H[3:0] BcDataOutClk_L[3:0]
BcDataWr_L O_PP 1 Bcache data write enable. The 21264/EV67 asserts this signal
BcLoad_L O_PP 1 Bcache burst enable. BcTag_H[42:20] B_DA_PP 23 Bcache tag bits. BcTagDirty_H B_DA_PP 1 Tag dirty state bit. During cache write operations, the 21264/
BcTagInClk_H I_DA 1 Bcache tag input clock. The 21264/EV67 uses this input clock
BcTagOE_L O_PP 1 Bcache tag output enable. This signal is asserted by the 21264/
O_PP 8 Bcache data output clocks. These free-running clocks are dif-
ferential copies of the Bcache clock and are derived from the 21264/EV67 GCLK. Their period is a multiple of the GCLK and is fixed for all operations. They can be configured so that their rising edge lags BcAdd_H[23:4] by 0 to 2 GCLK cycles. The 21264/EV67 synchronizes tag output information with these clocks.
when writing data to the Bcache data arrays.
EV67 will assert this signal if the Bcache data has been modi­fied.
to latch the tag information on Bcache read operations. This clock is used with high-speed SDRAMs, such as DDRs, that provide a clock-out with data-output pins to optimize Bcache read bandwidths. The 21264/ EV67 inte r nall y synchronizes the data to its logic with clock forward receive circuits similar to the system interface.
EV67 for Bcache read operations.
BcTagOutClk_H BcTagOutClk_L
BcTagParity_H B_DA_PP 1 Tag parity state bit. BcTagShared_H B_DA_PP 1 Tag shared state bit. The 21264/EV67 will write a 1 on this sig-
BcTagValid_H B_DA_PP 1 Tag valid state bit. If set, this line indicates that the cache line
BcTagWr_L O_PP 1 Tag RAM write enable. The 21264/EV67 asserts this signal
BcVr ef I_DC_REF 1 Bcache tag reference voltage. ClkFwdRst_H I_DA 1 Systems assert this synchronous signal to wake up a powered-
ClkIn_H ClkIn_L
DCOK_H I_DA 1 dc voltage OK. Must be deasserted until dc voltage reaches
EV6Clk_H EV6Clk_L
O_PP 2 Bcache tag output clock. These clocks “echo” the clock-for-
warded BcDataOutClk_x[3:0] clocks.
nal line if another agent has a copy of the cache line.
is valid.
when writing a tag to the Bcache tag arrays.
down 21264/EV67. The ClkFwdRst_H signal is cloc ked into a 21264/EV67 register by the captured FrameClk_x sig nals. Systems must ensure that the timing of this signal meets 21264/EV67 requirements (see Section 4.7.2).
I_DA_CLK 2 Differential input signals provided by the system.
proper operating level. After that, DCOK_H is asserted.
O_PP_CLK 2 Provides an external test point to measure phase alignment of
the PLL.
3–4 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual
21264/EV67 Signal Names and Functions
Table 3–2 21264/EV67 Signal Descriptions (Continued)
Signal Type Count Description
FrameClk_H FrameClk_L
IRQ_H[5:0] I_DA 6 These six interrupt signal lines may be asserted by the system.
MiscVref I_DC_REF 1 Voltage reference for the miscellaneous pins
PllBypass_H I_DA 1 When asserted, this sig nal will cause the two input clocks
PLL_VDD 3.3 V 1 3.3-V dedicated power supply for the 21264/EV67 PLL. Reset_L I_DA 1 System reset. This signal protects the 21264/EV67 from dam-
SromClk_H O_OD_TP 1 Serial ROM clock. Supplies the clock that causes the SROM to
SromData_H I_DA 1 Serial ROM data. Input data line from the S ROM. SromOE_L O_OD_TP 1 Serial ROM enable. Supplies the output enable to the SROM.
I_DA_CLK 2 A skew-controlled differential 50% duty cycle copy of the sys-
tem clock. It is used by the 21264/EV67 as a reference, or framing, clock.
The response of the 21264/EV67 is determined by the system software.
(see Table 3–3).
(ClkIn_x) to be applied to the 21264/EV67 internal circuits, instead of the 21264/EV67 global clock (GCLK).
age during initial power-up. It must be asserted until DCOK_H is asserted. After that, it is deasserted and the 21264/EV67 begins its reset sequence.
advance to the next bit. The cycle time for this clock is 256 times the cycle time of the GCLK (internal 21264/EV67 clock).
SysAddIn_L[14:0] I_DA 15 Time-multiplexed com mand/address/ID/Ack from system to
the 21264/EV67.
SysAddInClk_L I_DA 1 Single-ended forwarded clock from system for
SysAddIn_L[14:0] and SysFillValid_L.
SysAddOut_L[14:0] O_OD 15 Time-multiplexed command/address/ID/mask from the 21264/
EV67 to the system bus.
SysAddOutClk_L O_OD 1 Single-ended forwarded clock output for
SysAddOut_L[14:0]. SysCheck_L[7:0] B_DA_OD 8 Quadword ECC check bits for SysData_L[63:0]. SysData_L[63:0] B_DA_OD 64 Data bus for memory and I/O data. SysDataInClk_H[7:0] I_DA 8 Single-ended system-generated clocks for clock forwarded
input system data. SysDataInValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers to
the 21264/EV67. SysDataOutClk_L[7:0] O_OD 8 Single-ended 21264/EV67-generated clocks for clock for-
warded output system data. SysDataOutValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers from
the 21264/EV67. SysFillV alid_L I_DA 1 When asserted, this bit indicates validation for the cache fill
delivered in the previous system SysDc command.
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–5
21264/EV67 Signal Names and Functions
Table 3–2 21264/EV67 Signal Descriptions (Continued)
Signal Type Count Description
SysVref I_DC_REF 1 System interface reference voltage. Tck_H I_DA 1 IEEE 1149.1 test clock. Tdi_H I_DA 1 IEEE 1149.1 test data-in signal. Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. TestStat_H O_OD_TP 1 Test status pin. System reset drives the test status pin low.
The TestStat_H pin is forced high at the start of the Icache
BiST . If the Icache BiST passes, the p in is deasserted at the end
of the BiST operation; otherwise, it remains high.
The 21264/EV67 generates a timeout reset signal if an instruc-
tion is not retired within one billion cycles.
The 21264/EV67 signals the timeout reset event by outputting
a 256 GCLK cycle wide pulse on TestStat_H.
Tms_H I_DA 1 IEEE 1149.1 test mode select signal. Trst_L I_DA 1 IEEE 1149.1 test access port (TAP) reset signal.
Table 3–3 lists signals by function and provides an abbreviated description.
Table 3–3 21264/EV67 Signal Descriptions by Function
Signal Type Count Description
BcVref Domain BcAdd_H[23:4] O_PP 20 Bcache index. BcCheck_H[15:0] B_DA_PP 16 ECC check bits for BcData_H[127:0]. BcData_H[127:0] B_DA_PP 128 Bcache data. BcDataInClk_H[7:0] I_DA 8 Bcache data input clocks. BcDataOE_L O_PP 1 Bcache data output enable. BcDataOutClk_H[3:0]
BcDataOutClk_L[3:0] BcDataWr_L O_PP 1 Bcache data write enable. BcLoad_L O_PP 1 Bcache burst enable. BcTag_H[42:20] B_DA_PP 23 Bcache tag bits. BcTagDirty_H B_DA_PP 1 Tag dirty state bit. BcTagInClk_H I_DA 1 Bcache tag input clock. BcTagOE_L O_PP 1 Bcache tag output enable.
O_PP 8 Bcache data output clocks.
BcTagOutClk_H BcTagOutClk_L
BcTagParity_H B_DA_PP 1 Tag parity state bit. BcTagShared_H B_DA_PP 1 Tag shared state bit. BcTagValid_H B_DA_PP 1 Tag valid state bit. BcTagWr_L O_PP 1 Tag RAM write enable.
3–6 Hardware Interface
O_PP 2 Bcache tag output clocks.
Alpha 21264/EV67 Hardware Reference Manual
21264/EV67 Signal Names and Functions
Table 3–3 21264/EV67 Signal Descriptions by Function (Continued)
Signal Type Count Description
BcVr ef I_DC_REF 1 Tag data input reference voltage. SysVref Domain SysAddIn_L[14:0] I_DA 15 Time-multiplexed SysAddIn, system-to-21264/EV67. SysAddInClk_L I_DA 1 Single-ended forwarded clock from system for
SysAddIn_L[14:0] and SysFillValid_L. SysAddOut_L[14:0] O_OD 15 Time-multiplexed SysAddOut, 21264/EV67-to-system. SysAddOutClk_L O_OD 1 Single-ended forwarded-clock. SysCheck_L[7:0] B_DA_OD 8 Quadword ECC check bits for SysData_L[63:0]. SysData_L[63:0] B_DA_OD 64 Data bus for memory and I/O data. SysDataInClk_H[7:0] I_DA 8 Single-ended system-generated clocks for clock forwarded
input system data. SysDataInValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers to
the 21264/EV67. SysDataOutClk_L[7:0] O_OD 8 Single-ended 21264/EV67-generated clocks for clock for-
warded output system data. SysDataOutValid_L I_DA 1 When asserted, marks a valid data cycle for data transfers
from the 21264/EV67.
SysFillValid_L I_DA 1 Validation for fill given in previous SysDC command. SysVref I_DC_REF 1 System interface reference voltage. Clocks and PLL ClkIn_H
ClkIn_L EV6Clk_H
EV6Clk_L FrameClk_H
FrameClk_L
PLL_VDD 3.3 V 1 3.3-V dedicated power supply for the 21264/EV67 PLL. MiscVref Domain ClkFwdRst_H I_DA 1 Systems assert this synchronous signal to wake up a powered-
DCOK_H I_DA 1 dc voltage OK. Must be deasserted until dc voltage reaches
I_DA_CLK 2 Differential input signals provided by the system.
O_PP_CLK 2 Provides an external test point to measure phase alignment of
the PLL.
I_DA_CLK 2 A skew-controlled differential 50% duty cycle copy of the
system clock. It is used by the 21264/EV67 as a reference, or
framing, clock.
down 21264/EV67. The ClkFwdRst_H signal is clocked in to
a 21264/EV67 register by the captured FrameClk_x sig nals.
proper operating level. After that, DCOK_H is asserted.
IRQ_H[5:0] I_DA 6 These six interrupt signal lines may be asserted by the system. MiscVref I_DC_REF 1 Reference voltage for miscellaneous pins. PllBypass_H I_DA 1 When asserted, this sig nal will cause the input clocks
(ClkIn_x) to be applied to the 21264/EV67 internal circuits,
instead of the 21264/EV67’s global clock (GCLK).
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–7
Pin Assignments
Table 3–3 21264/EV67 Signal Descriptions by Function (Continued)
Signal Type Count Description
Reset_L I_DA 1 System reset. This signal protects the 21264/EV67 from dam-
age during initial power-up. It must be asserted until
DCOK_H is asserted. After that, it is deasserted and the
21264/EV67 begins its reset sequence.
SromClk_H O_OD_TP 1 Serial ROM clock. SromData_H I_DA 1 Serial ROM data. SromOE_L O_OD_TP 1 Serial ROM enable. Tck_H I_DA 1 IEEE 1149.1 test clock. Tdi_H I_DA 1 IEEE 1149.1 test data-in signal. Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. TestStat_H O_OD_TP 1 Test status pin. Tms_H I_DA 1 IEEE 1149.1 test mode select signal. Trst_L I_DA 1 IEEE 1149.1 test access port (TAP) reset signal.
3.3 Pin Assignments
The 21264/EV67 package has 587 pi ns aligned in a pi n grid arra y (PGA) design. There are 380 functional signa l pins, 1 ded icated 3.3-V pin f or the PLL, 112 ground VSS pins, and 94 VDD pins. Table 3–4 lists the s ignal pins an d thei r co rres ponding p in grid ar ray
(PGA) locations in alphabetical order for the signal type. Table 3–5 lists the pi n grid array locations in alphabetical order
Table 3–4 Pin List Sorted by Signal Name
Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location
BcAdd_H_10 B30 BcAdd_H_11 D30 BcAdd_H_12 C31 BcAdd_H_13 H28 BcAdd_H_14 G29 BcAdd_H_15 A33 BcAdd_H_16 E31 BcAdd_H_17 D32 BcAdd_H_18 B34 BcAdd_H_19 A35 BcAdd_H_20 B36 BcAdd_H_21 H30 BcAdd_H_22 C35 BcAdd_H_23 E33 BcAdd_H_4 B28 BcAdd_H_5 E27 BcAdd_H_6 A29 BcAdd_H_7 G27 BcAdd_H_8 C29 BcAdd_H_9 F28 BcCheck_H_0 F2 BcCheck_H_1 AB4 BcCheck_H_10 AW1 BcCheck_H_11 BD10 BcCheck_H_12 E45 BcCheck_H_13 AC45 BcCheck_H_14 AT44 BcCheck_H_15 BB36 BcCheck_H_2 AT2 BcCheck_H_3 BC11 BcCheck_H_4 M38 BcCheck_H_5 AB42 BcCheck_H_6 AU43 BcCheck_H_7 BC37 BcCheck_H_8 M8 BcCheck_H_9 AA3 BcData_H_0 B10 BcData_H_1 D10 BcData_H_10 L3 BcData_H_100 D42 BcData_H_101 D44 BcData_H_102 H40 BcData_H_103 H42 BcData_H_104 G45 BcData_H_105 L43
3–8 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual
Pin Assign ments
Table 3–4 Pin List Sorted by Signal Name (Continued)
Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location
BcData_H_106 L45 BcData_H_107 N45 BcData_H_108 T44 BcData_H_109 U45 BcData_H_11 M2 BcData_H_1 10 W45 BcData_H_111 AA43 BcData_H_112 AC43 BcData_H_113 AD44 BcData_H_114 AE41 BcData_H_115 AG45 BcData_H_116 AK44 BcData_H_117 AL43 BcData_H_118 AM42 BcData_H_119 AR45 BcData_H_12 T2 BcData_H_120 AP40 BcData_H_121 BA45 BcData_H_122 AV4 2 BcData_H_123 BB44 BcData_H_124 BB42 BcData_H_125 BC41 BcData_H_126 BA37 BcData_H_127 BD40 BcData_H_13 U1 BcData_H_14 V2 BcData_H_15 Y4 BcData_H_16 AC1 BcData_H_17 AD2 BcData_H_18 AE3 BcData_H_19 AG1 BcData_H_2 A5 BcData_H_20 AK2 BcData_H_21 AL3 BcData_H_22 AR1 BcData_H_23 AP2 BcData_H_24 AY2 BcData_H_25 BB2 BcData_H_26 AW5 BcData_H_27 BB4 BcData_H_28 BB8 BcData_H_29 BE5 BcData_H_3 C5 BcData_H_30 BB10 BcData_H_31 BE7 BcData_H_32 G33 BcData_H_33 C37 BcData_H_34 B40 BcData_H_35 C41 BcData_H_36 C43 BcData_H_37 E43 BcData_H_38 G41 BcData_H_39 F44 BcData_H_4 C3 BcData_H_40 K44 BcData_H_41 N41 BcData_H_42 M44 BcData_H_43 P42 BcData_H_44 U43 BcData_H_45 V44 BcData_H_46 Y42 BcData_H_47 AB44 BcData_H_48 AD42 BcData_H_49 AE43 BcData_H_5 E3 BcData_H_50 AF42 BcData_H_51 AJ45 BcData_H_52 AK42 BcData_H_53 AN45 BcData_H_54 AP44 BcData_H_55 AN41 BcData_H_56 AW45 BcData_H_57 AU41 BcData_H_58 AY44 BcData_H_59 BA43 BcData_H_6 H6 BcData_H_60 BC43 BcData_H_61 BD42 BcData_H_62 BB38 BcData_H_63 BE41 BcData_H_64 C11 BcData_H_65 A7 BcData_H_66 C9 BcData_H_67 B6 BcData_H_68 B4 BcData_H_69 D4 BcData_H_7 E1 BcData_H_70 G5 BcData_H_71 D2 BcData_H_72 H4 BcData_H_73 G1 BcData_H_74 N5 BcData_H_75 L1 BcData_H_76 N1 BcData_H_77 U3 BcData_H_78 W5 BcData_H_79 W1 BcData_H_8 J3 BcData_H_80 AB2 BcData_H_81 AC3 BcData_H_82 AD4 BcData_H_83 AF4 BcData_H_84 AJ3 BcData_H_85 AK4 BcData_H_86 AN1 BcData_H_87 AM4 BcData_H_88 AU5 BcData_H_89 BA1
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–9
Pin Assignments
Table 3–4 Pin List Sorted by Signal Name (Continued)
Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location
BcData_H_9 K2 BcData_H_90 BA3 BcData_H_91 BC3 BcData_H_92 BD6 BcData_H_93 BA9 BcData_H_94 BC9 BcData_H_95 AY12 BcData_H_96 A39 BcData_H_97 D36 BcData_H_98 A41 BcData_H_99 B42 BcDataInClk_H_0 E7 BcDataInClk_H_1 R3 BcDataInClk_H_2 AH2 BcDataInClk_H_3 BC5 BcDataInClk_H_4 F38 BcDataInClk_H_5 U39 BcDataInClk_H_6 AH44 BcDataInClk_H_7 AY40 BcDataOE_L A27 BcDataOutClk_H_0 J5 BcDataOutClk_H_1 AU3 BcDataOutClk_H_2 J43 BcDataOutClk_H_3 AR43 BcDataOutClk_L_0 K4 BcDataOutClk_L_1 AV 4 BcDataOutClk_L_2 K42 BcDataOutClk_L_3 AT42 BcDataWr_L D26 BcLoad_L F26 BcT ag_H_20 E13 BcTag_H_21 H16 BcTag_H_22 A11 BcT ag_H_23 B12 BcTag_H_24 D14 BcTag_H_25 E15 BcT ag_H_26 A13 BcTag_H_27 G17 BcTag_H_28 C15 BcT ag_H_29 H18 BcTag_H_30 D16 BcTag_H_31 B16 BcT ag_H_32 C17 BcTag_H_33 A17 BcTag_H_34 E19 BcT ag_H_35 B18 BcTag_H_36 A19 BcTag_H_37 F20 BcT ag_H_38 D20 BcTag_H_39 E21 BcTag_H_40 C21 BcT ag_H_41 D22 BcTag_H_42 H22 BcTagDirty_H C23 BcTagInClk_H G19 BcTagOE_L H24 BcTagOutClk_H C25 BcTagOutClk_L D24 BcTagParity_H B22 BcTagShared_H G23 BcT agValid_H B24 BcTagWr_L E25 BcVref F18 ClkFwdRst_H BE11 ClkIn_H AM8 ClkIn_L AN7 DCOK_H AY18 EV6Clk_H AM6 EV6Clk_L AL7 FrameClk_H AV16 FrameClk_L AW15 IRQ_H_0 BA15 IRQ_H_1 BE13 IRQ_H_2 AW17 IRQ_H_3 AV18 IRQ_H_4 BC15 IRQ_H_5 BB16 MiscVref AV2 2 NoConnect BB14 NoConnect BD2 PLL_VDD AV 8 PllBypass_H BD12 Reset_L BD16 Spare AJ1
Spare V38 Spare AT4 Spare BE9 Spare F8 Spare BD4 Spare AJ43 Spare AR3 Spare T4 Spare E39 Spare BA39 Spare BC21 SromClk_H AW19
SromData_H BC17 SromOE_L BE17 SysAddIn_L_0 BD30 SysAddIn_L_1 BC29 SysAddIn_L_10 BB24 SysAddIn_L_11 AV2 4 SysAddIn_L_12 BD24 SysAddIn_L_13 BE23 SysAddIn_L_14 AW23 SysAddIn_L_2 AY28 SysAddIn_L_3 BE29 SysAddIn_L_4 AW27
3–10 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual
Pin Assign ments
Table 3–4 Pin List Sorted by Signal Name (Continued)
Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location
SysAddIn_L_5 BA27 SysAddIn_L_6 BD28 SysAddIn_L_7 BE27 SysAddIn_L_8 AY26 SysAddIn_L_9 BC25 SysAddInClk_L BB26 SysAddOut_L_0 AW33 SysAddOut_L_1 BE39 SysAddOut_L_10 BE33 SysAddOut_L_11 AW29 SysAddOut_L_12 BC31 SysAddOut_L_13 AV2 8 SysAddOut_L_14 BB30 SysAddOut_L_2 BD36 SysAddOut_L_3 BC35 SysAddOut_L_4 BA33 SysAddOut_L_5 AY32 SysAddOut_L_6 BE35 SysAddOut_L_7 AV 30 SysAddOut_L_8 BB32 SysAddOut_L_9 BA31 SysAddOutClk_L BD34 SysCheck_L_0 L7 SysCheck_L_1 AA5 SysCheck_L_2 AK8 SysCheck_L_3 BA13 SysCheck_L_4 L39 SysCheck_L_5 AA41 SysCheck_L_6 AM40 SysCheck_L_7 AY34 SysData_L_0 F14 SysData_L_1 G13 SysData_L_10 P6 SysData_L_11 T8 SysData_L_12 V8 SysData_L_13 V6 SysData_L_14 W7 SysData_L_15 Y6 SysData_L_16 AB8 SysData_L_17 AC7 SysData_L_18 AD8 SysData_L_19 AE5 SysData_L_2 F12 SysData_L_20 AH6 SysData_L_21 AH8 SysData_L_22 AJ7 SysData_L_23 AL5 SysData_L_24 AP8 SysData_L_25 AR7 SysData_L_26 AT8 SysData_L_27 AV 6 SysData_L_28 AV 10 SysData_L_29 AW11 SysData_L_3 H12 SysData_L_30 AV 12 SysData_L_31 AW13 SysData_L_32 F32 SysData_L_33 F34 SysData_L_34 H34 SysData_L_35 G35 SysData_L_36 F40 SysData_L_37 G39 SysData_L_38 K38 SysData_L_39 J41 SysData_L_4 H10 SysData_L_40 M40 SysData_L_41 N39 SysData_L_42 P40 SysData_L_43 T38 SysData_L_44 V40 SysData_L_45 W41 SysData_L_46 W39 SysData_L_47 Y40 SysData_L_48 AB38 SysData_L_49 AC39 SysData_L_5 G7 SysData_L_50 AD38 SysData_L_51 AF40 SysData_L_52 AH38 SysData_L_53 AJ39 SysData_L_54 AL41 SysData_L_55 AK38 SysData_L_56 AN39 SysData_L_57 AP38 SysData_L_58 AR39 SysData_L_59 AT38 SysData_L_6 F6 SysData_L_60 AY38 SysData_L_61 AV3 6 SysData_L_62 AW35 SysData_L_63 AV 34 SysData_L_7 K8 SysData_L_8 M6 SysData_L_9 N7 SysDataInClk_H_0 D8 SysDataInClk_H_1 P4 SysDataInClk_H_2 AF6 SysDataInClk_H_3 AY6 SysDataInClk_H_4 E37 SysDataInClk_H_5 R43 SysDataInClk_H_6 AG41 SysDataInClk_H_7 AV4 0 SysDataInV alid_L BD22 SysDataOutClk_L_0 G11 SysDataOutClk_L_1 U7 SysDataOutClk_L_2 AG7 SysDataOutClk_L_3 AY8 SysDataOutClk_L_4 H36
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–11
Pin Assignments
Table 3–4 Pin List Sorted by Signal Name (Continued)
Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location
SysDataOutClk_L_5 R41 SysDataOutClk_L_6 AH40 SysDataOutClk_L_7 AW39 SysDataOutValid_L BB22 SysFillValid_L BC23 SysVref BA25 Tck_H BE19 Tdi_H BA21 Tdo_H BB20 TestStat_H BA19 Tms_H BD18 Trst_L AY20
Table 3–5 Pin List Sorted by PGA Location
PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name
A11 BcT ag_H_22 A13 BcTag_H_26 A17 BcTag_H_33 A19 BcTag_H_36 A27 BcDataOE_L A29 BcAdd_H_6 A33 BcAdd_H_15 A35 BcAdd_H_19 A39 BcData_H_96 A41 BcData_H_98 A5 BcData_H_2 A7 BcData_H_65 AA3 BcCheck_H_9 AA41 SysCheck_L_5 AA43 BcData_H_111 AA5 SysCheck_L_1 AB2 BcData_H_80 AB38 SysData_L_48 AB4 BcCheck_H_1 AB42 BcCheck_H_5 AB44 BcData_H_47 AB8 SysData_L_16 AC1 BcData_H_16 AC3 BcData_H_81 AC39 SysData_L_49 AC43 BcData_H_112 AC45 BcCheck_H_13 AC7 SysData_L_17 AD2 BcData_H_17 AD38 SysData_L_50 AD4 BcData_H_82 AD42 BcData_H_48 AD44 BcData_H_113 AD8 SysData_L_18 AE3 BcData_H_18 AE41 BcData_H_114 AE43 BcData_H_49 AE5 SysData_L_19 AF4 BcData_H_83 AF40 SysData_L_51 AF42 BcData_H_50 AF6 SysDataInClk_H_2 AG1 BcData_H_19 AG41 SysDataInClk_H_6 AG45 BcData_H_115 AG7 SysDataOutClk_L_2 AH2 BcDataInClk_H_2 AH38 SysData_L_52 AH40 SysDataOutClk_L_6 AH44 BcDataInClk_H_6 AH6 SysData_L_20 AH8 SysData_L_21 AJ1 Spare AJ3 BcData_H_84 AJ39 SysData_L_53 AJ43 Spare AJ45 BcData_H_51 AJ7 SysData_L_22 AK2 BcData_H_20 AK38 SysData_L_55 AK4 BcData_H_85 AK42 BcData_H_52 AK44 BcData_H_116 AK8 SysCheck_L_2 AL3 BcData_H_21 AL41 SysData_L_54 AL43 BcData_H_117 AL5 SysData_L_23 AL7 EV6Clk_L AM4 BcData_H_87 AM40 SysCheck_L_6 AM42 BcData_H_118 AM6 EV6Clk_H AM8 ClkIn_H AN1 BcData_H_86 AN39 SysData_L_56 AN41 BcData_H_55 AN45 BcData_H_53 AN7 ClkIn_L AP2 BcData_H_23 AP38 SysData_L_57 AP40 BcData_H_120 AP44 BcData_H_54 AP8 SysData_L_24
3–12 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual
Pin Assign ments
Table 3–5 Pin List Sorted by PGA Location (Continued)
PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name
AR1 BcData_H_22 AR3 Spare AR39 SysData_L_58 AR43 BcDataOutClk_H_3 AR45 BcData_H_119 AR7 SysData_L_25 AT2 BcCheck_H_2 AT38 SysData_L_59 AT4 Sp are AT42 BcDataOutClk_L_3 AT44 BcCheck_H_14 AT8 SysData_L_26 AU3 BcDataOutClk_H_1 AU41 BcData_H_57 AU43 BcCheck_H_6 AU5 BcData_H_88 AV 10 SysData_L_28 AV12 SysData_L_30 AV1 6 FrameClk_H AV 18 IRQ_H_3 AV2 2 MiscV ref AV2 4 SysAddIn_L_11 AV28 SysAddOut_L_13 AV30 SysAddOut_L_7 AV3 4 SysData_L_63 AV3 6 SysData_L_61 AV4 BcDataOutClk_L_1 AV4 0 SysDataInClk_H_7 AV42 BcData_H_122 AV6 SysData_L_27 AV8 PLL_VDD AW1 BcCheck_H_10 AW11 SysData_L_29 AW13 SysData_L_31 AW15 FrameClk_L AW17 IRQ_H_2 AW19 SromClk_H AW23 SysAddIn_L_14 AW27 SysAddIn_L_4 AW29 SysAddOut_L_11 AW33 SysAddOut_L_0 AW35 SysData_L_62 AW39 SysDataOutClk_L_7 AW45 BcData_H_56 AW5 BcData_H_26 AY12 BcData_H_95 AY18 DCOK_H AY2 BcData_H_24 AY20 Trst_L AY26 SysAddIn_L_8 AY28 SysAddIn_L_2 AY32 SysAddOut_L_5 AY34 SysCheck_L_7 AY38 SysData_L_60 AY40 BcDataInClk_H_7 AY44 BcData_H_58 AY6 SysDataInClk_H_3 AY8 SysDataOutClk_L_3 B10 BcData_H_0 B12 BcTag_H_23 B16 BcTag_H_31 B18 BcTag_H_35 B22 BcT a gParity_H B24 BcTagValid_H B28 BcAdd_H_4 B30 BcAdd_H_10 B34 BcAdd_H_18 B36 BcAdd_H_20 B4 BcData_H_68 B40 BcData_H_34 B42 BcData_H_99 B6 BcData_H_67 BA1 BcData_H_89 BA13 SysCheck_L_3 BA15 IRQ_H_0 BA19 TestStat_H BA21 Tdi_H BA25 SysVref BA27 SysAddIn_L_5 BA3 BcData_H_90 BA31 SysAddOut_L_9 BA33 SysAddOut_L_4 BA37 BcData_H_126 BA39 Spare BA43 BcData_H_59 BA45 BcData_H_121 BA9 BcData_H_93 BB10 BcData_H_30 BB14 NoConnect BB16 IRQ_H_5 BB2 BcData_H_25 BB20 Tdo_H BB22 SysDataOutValid_L BB24 SysAddIn_L_10 BB26 SysAddInClk_L BB30 SysAddOut_L_14 BB32 SysAddOut_L_8 BB36 BcCheck_H_15 BB38 BcData_H_62 BB4 BcData_H_27 BB42 BcData_H_124 BB44 BcData_H_123 BB8 BcData_H_28 BC11 BcCheck_H_3 BC15 IRQ_H_4 BC17 SromData_H BC21 Spare BC23 SysFillValid_L
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–13
Pin Assignments
Table 3–5 Pin List Sorted by PGA Location (Continued)
PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name
BC25 SysAddIn_L_9 BC29 SysAddIn_L_1 BC3 BcData_H_91 BC31 SysAddOut_L_12 BC35 SysAddOut_L_3 BC37 BcCheck_H_7 BC41 BcData_H_125 BC43 BcData_H_60 BC5 BcDataInClk_H_3 BC9 BcData_H_94 BD10 BcCheck_H_11 BD12 PllBypass_H BD16 Reset_L BD18 Tms_H BD2 NoConnect BD22 SysDataInValid_L BD24 SysAddIn_L_12 BD28 SysAddIn_L_6 BD30 SysAddIn_L_0 BD34 SysAddOutClk_L BD36 SysAddOut_L_2 BD4 Spare BD40 BcData_H_127 BD42 BcData_H_61 BD6 BcData_H_92 BE11 ClkFwdRst_H BE13 IRQ_H_1 BE17 SromOE_L BE19 Tck_H BE23 SysAddIn_L_13 BE27 SysAddIn_L_7 BE29 SysAddIn_L_3 BE33 SysAddOut_L_10 BE35 SysAddOut_L_6 BE39 SysAddOut_L_1 BE41 BcData_H_63 BE5 BcData_H_29 BE7 BcData_H_31 BE9 Spare C11 BcData_H_64 C15 BcTag_H_28 C17 BcTag_H_32 C21 BcTag_H_40 C23 BcTagDirty_H C25 BcTagOutClk_H C29 BcAdd_H_8 C3 BcData_H_4 C31 BcAdd_H_12 C35 BcAdd_H_22 C37 BcData_H_33 C41 BcData_H_35 C43 BcData_H_36 C5 BcData_H_3 C9 BcData_H_66 D10 BcData_H_1 D14 BcTag_H_24 D16 BcTag_H_30 D2 BcData_H_71 D20 BcTag_H_38 D22 BcTag_H_41 D24 BcTagOutClk_L D26 BcDataWr_L D30 BcAdd_H_11 D32 BcAdd_H_17 D36 BcData_H_97 D4 BcData_H_69 D42 BcData_H_100 D44 BcData_H_101 D8 SysDataInClk_H_0 E1 BcData_H_7 E13 BcTag_H_20 E15 BcTag_H_25 E19 BcTag_H_34 E21 BcTag_H_39 E25 BcTagWr_L E27 BcAdd_H_5 E3 BcData_H_5 E31 BcAdd_H_16 E33 BcAdd_H_23 E37 SysDataInClk_H_4 E39 Spare E43 BcData_H_37 E45 BcCheck_H_12 E7 BcDataInClk_H_0 F12 SysData_L_2 F14 SysData_L_0 F18 BcVref F2 BcCheck_H_0 F20 BcTag_H_37 F26 BcLoad_L F28 BcAdd_H_9 F32 SysData_L_32 F34 SysData_L_33 F38 BcDataInClk_H_4 F40 SysData_L_36 F44 BcData_H_39 F6 SysData_L_6 F8 Spare G1 BcData_H_73 G11 SysDataOutClk_L_0 G13 SysData_L_1 G17 BcTag_H_27 G19 BcTagInClk_H G23 BcTagShared_H G27 BcAdd_H_7 G29 BcAdd_H_14 G33 BcData_H_32 G35 SysData_L_35
3–14 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual
Pin Assign ments
Table 3–5 Pin List Sorted by PGA Location (Continued)
PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name
G39 SysData_L_37 G41 BcData_H_38 G45 BcData_H_104 G5 BcData_H_70 G7 SysData_L_5 H10 SysData_L_4 H12 SysData_L_3 H16 BcTag_H_21 H18 BcTag_H_29 H22 BcTag_H_42 H24 BcTagOE_L H28 BcAdd_H_13 H30 BcAdd_H_21 H34 SysData_L_34 H36 SysDataOutClk_L_4 H4 BcData_H_72 H40 BcData_H_102 H42 BcData_H_103 H6 BcData_H_6 J3 BcData_H_8 J41 SysData_L_39 J43 BcDataOutClk_H_2 J5 BcDataOutClk_H_0 K2 BcData_H_9 K38 SysData_L_38 K4 BcDataOutClk_L_0 K42 BcDataOutClk_L_2 K44 BcData_H_40 K8 SysData_L_7 L1 BcData_H_75 L3 BcData_H_10 L39 SysCheck_L_4 L43 BcData_H_105 L45 BcData_H_106 L7 SysCheck_L_0 M2 BcData_H_11 M38 BcCheck_H_4 M40 SysData_L_40 M44 BcData_H_42 M6 SysData_L_8 M8 BcCheck_H_8 N1 BcData_H_76 N39 SysData_L_41 N41 BcData_H_41 N45 BcData_H_107 N5 BcData_H_74 N7 SysData_L_9 P4 SysDataInClk_H_1 P40 SysData_L_42 P42 BcData_H_43 P6 SysData_L_10 R3 BcDataInClk_H_1 R41 SysDataOutClk_L_5 R43 SysDataInClk_H_5 T2 BcData_H_12 T38 SysData_L_43 T4 Spare T44 BcData_H_108 T8 SysData_L_11 U1 BcData_H_13 U3 BcData_H_77 U39 BcDataInClk_H_5 U43 BcData_H_44 U45 BcData_H_109 U7 SysDataOutClk_L_1 V2 BcData_H_14 V38 Spare V40 SysData_L_44 V44 BcData_H_45 V6 SysData_L_13 V8 SysData_L_12 W1 BcData_H_79 W39 SysData_L_46 W41 SysData_L_45 W45 BcData_H_110 W5 BcData_H_78 W7 SysData_L_14 Y4 BcData_H_15 Y40 SysData_L_47 Y42 BcData_H_46 Y6 SysData_L_15
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–15
Pin Assignments
Table 3–6 lists the 21264/EV67 ground and power (VSS and VDD, respectively) pin list.
Table 3–6 Ground and Power (VSS and VDD) Pin List
Signal PGA Location
VSS A15 A21 A25 A3 A31 A37 A43 A9 AA1 AA39
AA45 AA7 AC41 AC5 AE1 AE39 AE45 AE7 AG3 AG39 AG43 AG5 AJ41 AJ5 AL1 AL39 AL45 AN3 AN43 AN5 AR41 AR5 AU1 AU39 AU45 AU7 AW21 AW25 AW3 AW31 AW37 AW41 AW43 AW7 AW9 AY14 BA11 BA17 BA23 BA29 BA35 BA41 BA5 BA7 BC1 BC13 BC19 BC27 BC33 BC39 BC45 BC7 BE15 BE21 BE25 BE3 BE31 BE37 BE43 C1 C13 C19 C27 C33 C39 C45 C7 DS8 E11 E17 E23 E29 E35 E41 E5 E9 G15 G21 G25 G3 G31 G37 G43 G9 J1 J39 J45 J7 L41 L5 N3N43R1R39R45R5R7T42U41U5 W3 W43 ————————
VDD A23 AB40 AB6 AD40 AD6 AF2 AF38 AF44 AF8 AH4
AH42 AK40 AK6 AM2 AM38 AM44 AP4 AP42 AP6 AT40 AT6 AV14 AV2 AV20 AV26 AV32 AV38 AV44 AY10 AY16 AY22 AY24 AY30 AY36 AY4 AY42 B14 B2 B20 B26 B32 B38 B44 B8 BB12 BB18 BB28 BB34 BB40 BB6 BD14 BD20 BD26 BD32 BD38 BD44 BD8 D12 D18 D28 D34D40D6 F10F16F22F24F30F36F4 F42 H14 H2 H20 H26 H32 H38 H44 K40 K6 M4 M42 P2 P38 P44 P8 T40 T6 V4 V42 Y2Y38Y44Y8 ——————
3–16 Hardware Interface
Alpha 21264/EV67 Hardware Reference Manual
3.4 Mechanical Specifications
This section shows the 21264/EV67 mechanical package dimensions without a heat sink. For heat sink information and dimensions, refer to Chapter 10.
Figure 3–2 shows the package physical dimensions without a heat sink.
Figure 3–2 Package Dimensions
2.54 mm (.100 in) Typ
B
BE
BD
BC
BC
BB
BA
AY
AW
AV
AU
AT
AR
AP
AN
AM
AL
AK
AJ
AH
AG
AF
AE
AD
AC
AB
AA
Y
W
V
U
T
R
P
N
M
L
K
J
H
G
F
E
D
C
B
A
02
04 06 08 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
01
27.94 mm (1.100 in)
Standoff (4x)
587x 1.40 mm (.055 in) Typ
1.27 mm (.050 in) Typ
27.94 mm
(1.100 in)
45434139373533312927252321191715131109070503
Lid
.13 mm (.005 in) R
Mechanical Specifications
1.27 mm (.050 in) Typ
4.32 mm (.170 in) Typ
1.377 mm (.055 in) Typ
1/4-20 Stud (2x)
7.62 mm (.300 in) Typ
1.905 mm (.075 in) Typ
59.94 mm (2.360 in) Typ
29.62 mm (1.180 in) Typ
25.40 mm
(1.000 in) Typ
53.85 mm
(2.120 in) Typ
29.62 mm (1.180 in) Typ
FM-05662.AI4
Alpha 21264/EV67 Hardware Reference Manual
Hardware Interface 3–17
21264/EV67 Packagin g
7
3.5 21264/EV67 Packaging
Figure 3–3 shows the 21264/EV67 pinout from the top view with pins facing down.
Figure 3–3 21264/EV67 Top View (Pin Down)
B
BE
BD
BC
BC
BB
BA
AY
AW
AV
AU
AT
AR
AP
AN
AM
AL
AK
AJ
AH
AG
AF
AE AC AA
AD AB
Y
W
V
U
T
R
P
N
M
L
K
J
H
G
F
E
D
C
B
A
21264/EV67
(PinDown)
Top
View
42 40 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 08 06 04 02
44
45
3–18 Hardware Interface
01030507091113151719212325272931333537394143
FM-05644-EV6
Alpha 21264/EV67 Hardware Reference Manual
Figure 3–4 shows the 21264/EV67 pinout from the bottom view with pins facing up.
7
Figure 3–4 21264/EV67 Bottom View (Pin Up)
B
BE
BD
BC
BC
BB
BA
AY
AW
AV
AU
AT
AR
AP
AN
AM
AL
AK
AJ
AH
AG
AF
AE AC AA
AD AB
Y
W
V
U
T
R
P
N
M
L
K
J
H
G
F
E
D
C
B
A
21264/EV67
Bottom
(PinUp)
21264/EV67 Packaging
View
04 06 08 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
02
01
Alpha 21264/EV67 Hardware Reference Manual
45434139373533312927252321191715131109070503
FM-05645-EV6
Hardware Interface 3–19
4

Cache and External Interfaces

This chapter describ es the 21264/EV67 c ache and exter nal interf ace, which include s the second-level cache (Bcache) interface and the system interface. It also describes locks, interrupt signals, and ECC/parity generation. It is organized as follows:
Introduction to the external interfaces
Physical address considerations
Bcache structure
Victim data buffer
Cache coherency
Lock mecha nism
System port
Bcache port
Interrupts
Chapter 3 lists and defines all 21264/EV67 hardware interface signal pins. Chapter 9 describes the 21264/EV67 hardware interface electrical requirements.

4.1 Introduction to the External Interfaces

A 21264/EV67-based system ca n be divided into three major sections:
21264/EV67 microprocessor
Second-level Bcache
System interface logic
Optional duplicate tag store – Optional lock register – Optional victim buffers
The 21264/EV67 external i n t er fac e is f lexible and mandates few design rules, al lo wing a wide range of prospective systems. The external interface is composed of the Bcache interface and the system interface.
Input clocks must hav e the sa me frequenc y as the ir corr esponding o utput cl ock. For
example, the frequency of SysAddInClk_L must be the same as SysAddOutClk_L.
Alpha 21264/EV67 Hardware Reference Manual
Cache and External Interfaces 4–1
Introduction to the External Interfaces
The Bcache interface includes a 128-bit bidirectional data bus, a 20-bit unidirec-
tional address bus, and several control signals.
–The BcDataOutClk_x[3:0] clocks are free-running and are derived from the
internal GCLK. The period of BcDataOutClk_x[3:0] is a programmable mul- tiple of GCLK.
The Bcache turns the BcDataOutClk_x[3:0] clocks around and returns them
to the 21264/EV67 as BcDataInClk_H[7:0]. Likewise, BcTagOutClk_x returns as BcTagInClk_H.
The Bcache interface supports a 64-byte block size.
The system interface includes a 64-bit bidirectional data bus, two 15-bit
unidirectional address buses, and several control signals.
–The SysAddOutClk_L clock is free-running and is derived from the internal
GCLK. The period of SysAddOutClk_L is a programmable multiple of GCLK.
–The SysAddInClk_L
clock is a turned-around copy of SysAddOutClk_L.
Figure 4–1 shows a simplifi ed view of the external interface. The function and purpose of each signal is desc ribed in Chapter 3.
4–2 Cache and External Interfaces
Alpha 21264/EV67 Hardware Reference Manual
Introduction to the External Interfaces
FM-05818B-EV67
System
Figure 4–1 21264/EV67 System and Bcache Interfaces
SysAddIn_L[14:0]
SysAddInClk_L
SysAddOut_L[14:0]
SysAddOutClk_L
SysVref
SysData_L[63:0]
SysCheck_L[7:0]
SysDataInClk_H[7:0]
SysDataOutClk_L[7:0]
SysDataInValid_L
SysDataOutValid_L
SysFillValid_L
BcAdd_H[23:4]
21264
BcLoad_L
BcData_H[127:0]
BcCheck_H[15:0]
BcDataInClk_H[7:0]
BcDataOutClk_x[3:0]
BcDataOE_L
BcDataWr_L
BcTag_H[42:20]
BcTagInClk_H
BcTagOutClk_
BcVref
BcTagWr_L
BcTagOE_L BcTagValid_H BcTagDirty_H
BcTagShared_H
BcTagParity_H
IRQ_H[5:0]
x
[23:4] [23:6] [23:6]
Data Tag Status

4.1.1 System Interface

This section introduces the system (external) bus interface. The system interface is made up of two unidirecti onal 15-bit address buses, 64 bidirectional data lines, eight bidirectional chec k bits, two si ngle-end ed un idirect ional c locks, and a few control pin s. The 15-bit address buses provide time-shared address/command/ID in two or four GCLK cycles. The Cbox controls the system interface.
Alpha 21264/EV67 Hardware Reference Manual
Cache and External Interfaces 4–3

Physical Address Considerations

4.1.1.1 Commands and Addresses
The system sends probe and data mov ement command s to the 21264/EV6 7. The 21264 / EV67 can hold up to eight probe commands from the system. The system controls the number of outstan din g pr obe co mman ds and must ensure that the 21264/EV67 8- ent ry probe queue does not overflow.
The Cbox contains an 8-entry miss buffer (MAF) and an 8-entry victim buffer (VAF). A miss occurs when the 21264/EV67 probes the Bc ache but doe s not find t he address ed
block. The 21264/EV67 can queue eight cache misses to the system in its MAF.

4.1.2 Second-Level Cache (Bcache) Interface

The 21264/EV67 Cbox provides control signals and an interface for a second-level cache, the Bcache. The 21264/EV67 supports a Bcache from 1MB to 16MB, with 64­byte blocks. A 128-bit data bus is used for transfers between the 21264/EV67 and the Bcache. The Bcache must be comprised of synchronous static RAMs (SSRAMs) and must contain either one , t w o, or t hr ee i nt er n al r egi st ers . Al l Bcache control and address pins are clocked synchronously on Bcache cycle boundaries. The Bcache clock rate varies as a multiple of the CPU clock cycle in half-cycle increments from 1.5 to 4.0, and in full-cycle increments of 5, 6, 7, and 8 times the CPU clock cycle. The 1.5 multi­ple is only available in dual-data mode.
4.2 Physical Address Considerations
The 21264/EV67 supports a 44-bit physical address space that is divided equally between memory space and I/O space. Memory space resides in the lower half of the physical address space (PA[43] = 0) and I/O space resides in the upper half of the phys­ical address space (PA[43] = 1). The 21264/EV67 recognizes these spaces internally.
The 21264/EV67-generated external references to memory space are always of a fixed 64-byte size, though the internal access granularity is byte, word, longword, or quad­word. All 21264/EV67-gener ated e xtern al ref erences t o memory or I/O space are phys ­ical addresses that are either successfully translated from a virtual address or produced by PALcode. Speculative execution may cause a reference to nonexistent memory. Sys­tems must check the range of all addresses and report nonexistent addresses to the 21264/EV67.
Table 4–1 describes the translation of inter nal references to external interface refer­ences. The first column lists the instructions used by the programmer, including load (LDx) and store (STx) instructions of several si zes . Th e column headings are described here:
DcHit (block was found in the Dcache)
DcW (block was found in a writable state in the Dcache)
BcHit (block was found in the Bcache)
BcW (block was found in a writable state in the Bcache)
Status and Action (status at end of instruction and action performed by the 21264/
EV67)
4–4 Cache and External Interfaces
Alpha 21264/EV67 Hardware Reference Manual
Physical Address Considerations
Prefetches (LDL, LDF, LDG, LDT, LDBU, LDWU) to R31 use the LDx flow, and prefetch with modify intent (LDS) uses the STx flow. If the prefetch target i s addres sed to I/O space, the upper address bit is cleared, converting the address to memory space (PA[42:6] ). Notes follow the table.
Table 4–1 Translation of Internal References to External Interface Reference
Instruction DcHit DcW BcHit BcW Status and Action
LDx Memory 1 X X X Dcache hit, done. LDx Memory 0 X 1 X Bcache hit, done. LDx Memory 0 X 0 X Miss, generate RdBlk command. LDx I/O X X X X RdBytes, RdLWs, or RdQWs based on size. Istream Memory 1 X X X Dcache hit, Istream serviced from Dcache. Istream Memory 0 X 1 X Bcache hit, Istream serviced from Bcache. Istream Memory 0 X 0 X Miss, generate RdBlkI command. STx Memory 1 1 X X Store Dcache hit and writable, done. STx Memory 1 0 X X Store hit and not writable, set dirty flow (note 1). STx Memory 0 X 1 1 Store Bcache hit and writable, done. STx Memory 0 X 1 0 Store hit and not writable, set-dirty flow (note 1). STx Memory 0 X 0 X Miss, generate RdBlkMod command. STx I/O X X X X WrBytes, WrLWs, or WrQWs based on size. STx_C Memory 0 X X X Fail STx_C. STx_C Memory 1 0 X X STx_C hit and not writable, set dirty flow (note 1). STx_C I/O X X X X Always succeed and WrQws or WrLws are generated,
based on the size. WH64 Memory 1 1 X X Hit, done. WH64 Memory 1 0 X X WH64 hit not writable, set dirty flow (note 1). WH64 Memory 0 X 1 1 WH64 hit dirty, done. WH64 Memory 0 X 1 0 WH64 hit not writable, s et dirty flow (note 1). WH64 Memory 0 X 0 X Miss, generate InvalToDirty command (n ote 2). WH64 I/O X X X X NOP the instruction. WH64 is UNDEFINED for I/O
space. ECB Memory X X X X Generate evict command (note 3). ECB I/O X X X X NOP the instruction. ECB instruction is UNDEFINED
for I/O space. MB/WMB
TB Fill Flows
Alpha 21264/EV67 Hardware Reference Manual
X X X X Generate MB command (note 4). See Section 2.12.1.
Cache and External Interfaces 4–5
Physical Address Considerations
Table 4–1 notes:
1. Set Dirty Flow: Based on the Cbox CSR SET_DIRTY_ENABLE[2:0], SetDirty requests can be either internally acknowledged (called a SetModify) or sent to the system environment f or processing. When externally acknowl edg ed, the shared sta­tus information for the cache block is also broadcast. The commands sent exter­nally are SharedToDirty or CleanToDirty. Based on the Cbox CSR ENABLE_STC_COMMAND[0], the external system can be informed of a STx_C generating a SetDirty using the STCChangeToDirty command. See Table 4–16 for more information.
2. InvalToDirty: Based on the Cbox CSR INVAL_TO_DIRTY_ENABLE[1:0], Inval­ToDirty requests can be either internally acknowledged or sent to the system envi­ronment as InvalToDirty commands. Th is Cbox CSR provide s the ability t o conver t WH64 instructions to RdModx operations. See Table 4–15 for more information.
3. Evict: There are two aspects to the commands that are generated by an ECB instruction: fi rst, those com mands that are gene rated to not ify the system of a n evict being performed; second, those commands that are generated by any victim that is created by servicing the ECB.
If Cbox CSR ENABLE_EVICT[0] is clear, no command is issued by the
21264/EV67 on the external interface to notify the system of an evict being performed. If Cbox CSR ENABLE_EVICT[0] is se t, the 21264/EV67 iss ues an Evict command on the system interface only if a Bcache index match to the ECB address is found in the 21264/EV67 cache system.
Note that whenever ENABLE_EVICT[0] is true (in the write-many chain), BC_CLEAN_VICTIM must also be true (in the write-once chain). Otherwise, the 21264/EV67 could respon d miss t o a pr obe, ra ther t han hi t, bef ore a n Evict command has been sent off chip, but after the Evict command has removed a (clean) block from the internal caches and the Bcache. That behavior might cause systems that maintain an external duplicate copy of the Bcache tags to become confused, because the system could receive the probe re spo nse indicat­ing the miss befo re it receives the Evict command.
The 21264/EV67 can issue the commands CleanVictimBlk and WrVictimBlk
for a victim that is created by an ECB. CleanVictimBlk is issued only if Cbox CSR BC_CLEAN_VICTIM is set and there is a Bcache index match valid but not dirty in the 21264/EV67 cache system. Wr VictimBlk is issued for any Bcache match of the ECB address that is dirty in the 21264/EV67 cache sys­tem.
4. MB: Based on the Cbox CSR SYSBUS_MB_ENABLE, the MB command can be sent to the pins.
Each of these CSRs is programmed appropriately, based on the cache coherence proto­col used by the system environment. For example, uniprocessor systems would prefer to internally acknowledge most of these transactions. In contrast, multiprocessor sys­tems may require notification and control of any change in cache state. The 21264/ EV67 and the external syste m must cooper ate to mai ntai n cache coh erence . Secti on 4.5 explains the 21264/EV67 part of the cache coherency protocol.
4–6 Cache and External Interfaces
Alpha 21264/EV67 Hardware Reference Manual

4.3 Bcache Structure

7
The 21264/EV67 Cbox provides control signals and an interface for a second-level cache (Bcache).
The 21264/EV67 supports a Bcache from 1MB to 16MB, with 64-byte blocks. A 128­bit bidirectiona l data b us is used for t ransf ers be tween t he 212 64/EV67 a nd the Bcache . The Bcache is fully synchronous and the synchronous static RAMs (SSRAMs) must contain either one, two, or three internal registers. All Bcache control and address pins are clocked synchronous ly on Bcache cycl e boundaries. The Bcache clock rate va ries as a multiple of the CPU clock cycle in half-cycle increments from 1.5 to 4.0, and in full­cycle increments of 5, 6, 7, and 8 times the CPU clock cycle. The 1.5 multiple is only available in dual-data mode.

4.3.1 Bcache Interface Signals

Figure 4–2 shows the 21264/EV67 system interface signals.
Figure 4–2 21264/EV67 Bcache Interface Signals
Bcache Structure
BcData_H[127:0]
21264
BcCheck_H[15:0] BcDataInClk_H[7:0] BcDataOutClk_[3:0] BcDataOE_L BcDataWr_L BcAdd_H[23:4] BcTag_H[42:20] BcTagInClk_H BcTagOutClk_ BcVref BcTagDirty_H BcTagParity_H BcTagShared_H BcTagValid_H BcTagOE_L BcTagWr_L BcLoad_L
x
x
FM-05650-EV6

4.3.2 System Duplicate Tag Stor es

The 21264/EV67 provides Bcache st ate sup port fo r syste ms wit h and witho ut dupli cate tag stores, and will take different actions on this basis. The system sets the Cbox CSR DUP_TAG_ENA[0], indicating that it has a du plica te ta g store for t he Bcache. Syste ms using the DUP_TAG_ENA[0] bit must also use the Cbox CSR BC_CLEAN_VICTIM[0] bit to avoid deadlock situations.
Systems using a Bcache duplicate tag store can accelerate system performance by:
Alpha 21264/EV67 Hardware Reference Manual
Cache and External Interfaces 4–7

Victim Data Buffer

Issuing probes and SysDc fill commands to the 21264/EV67 out-of-order with
respect to their order at the system serialization point
Filtering out all probe misses from the 21264/EV67 cache system
If a probe misses in the 21264/EV67 cache system (Bcache miss and VAF miss), the 21264/EV67 stalls probe processing with the expectation that a SysDc fill will allocate this block. Because of this, in du plicate tag mode, the 21264/E V67 can never generate a probe miss response.
When Cbox CSR DUP_TAG_ENA[0] equals 0, the 21264/EV67 delivers a miss response for probes that do not hit in its cache system.
4.4 Victim Data Buffer
The 21264/EV67 has eight victim data buffers (VDBs). They have the following prop­erties:
The VDBs are used for both vi ctims ( fil ls tha t are rep lacin g dirt y cache blo cks) a nd
for system probes that require data movement. The CleanVictimBlk command (optional) assigns and uses a VDB.
Each VDB has two valid bits that indicate the buffer is valid for a victim or valid
for a probe or valid for both a victim and a probe. Probe commands that match the address of a victim address file (VAF) entry with an asserted probe-valid bit (P) will stall the 21264/EV67 probe queue. No ProbeResponses will be returned until the P bit is c lear.
The release victim buffer (RVB) bit, when asserted, causes the victim valid bit, on
the victim data buffer (VDB) specified in the ID field, to be cleared. The RVB bit will also clear t he IOWB when s ystems move dat a on I/ O writ e tra nsacti ons. I n this case, ID[3] equals one.
The release probe buffer (RPB) bit, when asserted (with a WriteData or Release-
Buffer SysDc command), clears the P bit in the victim buffer entry specified in the ID field.
Read data commands and victim write commands use IDs 0-7, while IDs 8-11 are
used to address the four I/O write buffers.

4.5 Cache Coherency

This section describes the basics and protocols of the 21264/EV67 cache coherency scheme.

4.5.1 Cache Coherency Basics

The 21264/EV67 systems maintain the ca che hi er arc hy shown in Figure 4–3.
4–8 Cache and External Interfaces
Alpha 21264/EV67 Hardware Reference Manual
Figure 4–3 Cache Subset Hierarchy
Cache Coherency
System
Icache
Main Memory
Bcache
Dcache
FM-05824.AI4
The following tasks must be performed to maintain cache coherency:
Istream data from memory spaces may be cached in the Icache and Bcache. Icache
coherence is not maintai ned by hardware —it must be maint ained by soft ware using the CALL_PAL IMB instruction.
The 21264/EV67 maintains the Dcache as a subset of the Bcache. The Dcache is
set-associative but is kept a subset of the larger externally implemented direct­mapped Bcache.
System logic must help the 21264/EV67 to keep the Bcache coherent with main
memory and other caches in the system.
The 21264/EV67 requires the system to allow only one change to a block at a time.
This means that if the 21264/EV67 gains the bus to read or write a block, no other node on the bus should be allowed to access that block until the data has been moved.
The 21264/EV67 provides hardware mechanisms to support several cache coher-
ency protocols. The protocols can be separat ed into two classes: write invalidate cache coherency protocol and flush cache coherency protocol.

4.5.2 Cache Block States

Table 4–2 lists the cache block states supported by the 21264/EV67.
Table 4–2 21264/EV67-Supported Cache Block States
State Name Description
Invalid The 21264/EV67 do es not have a copy of the block. Clean This 21264/EV67 holds a read-on ly copy o f the blo ck, an d no other agent i n th e system holds
a copy. Upon eviction, the block is not written to memory.
(Sheet 1 of 2)
Alpha 21264/EV67 Hardware Reference Manual
Cache and External Interfaces 4–9
Cache Coherency
Table 4–2 21264/EV67-Supported Cache Block States
State Name Description
Clean/Shared This 21264/EV67 holds a read-only copy of the block, and at least one other agent in the sys-
tem may hold a copy of the block. Upon eviction, the block is not written to memory.
Dirty This 21264/EV67 holds a read-write copy of the block, and must write it to memory after it is
evicted from the cache. No other agent in the system holds a copy of the block.
Dirty/Shared This 21264/EV67 holds a read-only copy of the dirty block, which may be shared with
another agent. The block must be written back to memory when it is evicted.
(Sheet 2 of 2)

4.5.3 Cache Block State Transitions

Cache block state transitions are reflected by 21264/EV67-generated commands to the system. Cache block state transitions can also be caused by system-generated com­mands to the 21264/EV67 (probes). Probes control the next state for the cache block.
The next state ca n be based on the previous state of the cache block. Table 4–3 lists the next state for the cache block.
Table 4–3 Cache Block State Transitions
Next State Action Based on Probe Hit
No change Do not update cache state. Useful for DMA transactions that sample data but
do not want to update tag state. Clean Independent of previous state, update next state to Clean. Clean/Shared Independent of previous state, update next state to Clean/Shared. This transac-
tion is useful for systems that update memory on probe hits. T1:
Clean Clean/Shared Dirty Dirty/Shared
T3: Clean Clean/Shared Dirty Invalid Dirty/Shared Clean/Shared
Based on the dirty bit, make the block clean or dirty shared. This transaction
is useful for systems that do not update memory on probe hits.
If the block is Clean or Dirty/Shared, change to Clean/Shared. If the block is
Dirty, change to Invalid. This transaction is useful for systems that use the
Dirty/Shared state as an exclusive state.
The cache state transitions caused by 21264/EV67-generated commands are under the full control o f the system environment usin g the SysDc (system data control) com­mands. Table 4–4 lists these commands.
Table 4–4 System Responses to 21264/EV67 Commands
Response Type 21264/EV67 Action
SysDc ReadData Fill block with the associated data and update tag with clean cache status. SysDc ReadDataDirty Fill block with the associated data and update tag with dirty cache status. SysDc ReadDataShared Fill block with the associated data and update tag with shared cache status. SysDc ReadDataShared/Dirty Fill block with the associated data and update tag with dirty/shared status. SysDc ReadDat aE rror Fill block with a l l-ones reference pattern and update tag with inval i d status. SysDc ChangeToDirtySuccess Unconditionall y upda te block with dirty cache status. SysDc ChangeToDirtyFail Do not update cache status and fail any associated STx_C instructions.
4–10 Cache and External Interfaces
Alpha 21264/EV67 Hardware Reference Manual

4.5.4 Using SysDc Commands

Note the following:
The conventional response for RdBlk commands is SysDc ReadData or ReadD-
ataShared.
The conventional response for a RdBlkMod command is SysDc ReadDataDirty.
The conventional response for ChangeToDirty commands is
ChangeToDirtySuccess or ChangeToDirtyFail.
However, t he system en vironment i s not limited to these r esponses. Table 4–5 shows all 21264/EV67 commands, system responses, and the 21264/EV67 reaction. The 21264/ EV67 commands are described in the following list:
Rdx commands are generated by load or Istream references.
RdBlkModx commands are generated by store references.
The ChxToDirty command group includes CleanToDirty, SharedToDirty, and STC-
ChangeToDirty commands, which are generated by store references that hit in the 21264/EV67 cache system.
Cache Coherency
InvalToDirty commands are generated by WH64 instructions that miss in the
21264/EV67 cache system.
FetchBlk and FetchBlkSpec are noncached references to memory space that have
missed in the 21264/EV67 cache system.
Rdiox commands are noncached references to I/O address space.
Evict and STCChangeToDirty commands are generated by ECB and STx_C
instructions, respectively.
Table 4–5 shows the system responses to 21264/EV67 commands and 21264/EV67 reactions.
Table 4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions
21264/EV67 CMD SysDc 21264/EV67 Action
Rdx ReadData
ReadDataShared
Rdx ReadDataShared/Dirty The cache block is filled and marked dirty/shared. Succeeding store
Rdx ReadDataDirty The cache block is filled and marked dirty. Rdx ReadDataError The cache block access was to NXM address space. The 21264/EV67
This is a normal fill. The cache block is filled and marked clean or shared based on SysDc.
commands cannot update the block without external reference.
delivers an all-ones pattern to any load command and evicts the block from the cache (with associated victim processing). The cache block is marked invalid.
Rdx ChangeToDirtySuccess
ChangeToDirtyFail
Alpha 21264/EV67 Hardware Reference Manual
Both SysDc responses are illegal for read commands.
Cache and External Interfaces 4–11
Cache Coherency
Table 4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions (Continued)
21264/EV67 CMD SysDc 21264/EV67 Action
RdBlkModx ReadData
ReadDataShared ReadDataShared/Dirty
The cache block is filled and marked with a nonwritable status. If the store instruction that generated the RdBlkModx command is still active (not killed), the 21264/EV67 will retry the instruction, generat­ing the appropriate ChangeToDirty command. Succeeding store com­mands cannot update the block without external reference.
RdBlkModx ReadDataDirty The 21264/EV67 performs a normal fill r esponse, and the cache block
becomes writable.
RdBlkModx ChangeToDirtySuccess
Both SysDc responses are illegal for read/modify commands.
ChangeToDirtyFail
RdBlkModx ReadDataError The cache block command was to NXM address space. The 21264/
EV67 delivers an all-ones pattern to any dependent load command, forces a fail action on any pending s to re comm ands to th i s block , and any store to this block is not retried. The Cbox evicts the cache block from the cache system (with associated victim processing). Th e cache block is marked invalid.
ChxToDirty ReadData
ReadDataShared ReadDataShared/Dirty
The original data in the Dcache is replaced with the filled data. The block is not writable, so the 21264/EV67 will retry the store instruc­tion and generate another ChxToDirty class command. To avoid a potential livelock situation, the STC_ENABLE CSR bit must be set. Any STx_C instruction to this block is forced to fail. In addition, a Shared/Dirty response causes the 21264/EV67 to generate a victim for this block upon eviction.
ChxToDirty ReadDataDirty The data in the Dcache is replaced with the filled data. The block is
writable, so the store instruction that generated the original command can update this block. Any STx_C instruction to this block is forced to fail. In addition, the 21264/EV67 generates a victim for this block upon eviction.
ChxToDirty ReadDataError Impossible situation. The block must be cached to generate a ChxTo-
Dirty command. Caching the block is not possible because all NXM fills are filled noncached.
ChTo Dirty ChangeToDirtySuccess Normal response. ChangeToDirtySuccess makes the block writable.
The 21264/EV67 retries the store instruction and u pdates th e Dcache. Any STx_C instruction associated wi th this block is allowed to suc­ceed.
ChxToDirty ChangeToDirtyFail The MAF entry is retired. Any STx_C instruction associated with the
block is forced to fail. If a STx instruction generated this block, the 21264/EV67 retries and generates either a RdBlkModx (because the reference that failed the ChangeToDirty also invalidated the cache by way of an invalidating probe) or another ChxToDirty command.
InvalToDirty ReadData
ReadDataShared
The block is not writable, so the 21264/EV67 will retry the WH64 instruction and generate a ChxToDirty command .
ReadDataShared/Dirty
InvalToDirty ReadDataError The 21264/EV67 doesn’t send InvalToDirty commands offchip spec-
ulatively. This NXM condition is a hard error. Systems should per­form a machine check.
InvalToDirty ReadDataDirty
The block is writable. Done.
ChangeToDirtySuccess
4–12 Cache and External Interfaces
Alpha 21264/EV67 Hardware Reference Manual
Loading...