MPC604D
SA14-2053-00 |
MPC604E/D |
(IBM Order Number) |
(Motorola Order Number) |
|
1/96 |
|
REV 1 |
Advance Information
PowerPC 604e RISC Microprocessor
Technical Summary
This document provides an overview of the PowerPC 604e microprocessor features, including a block diagram showing the major functional components. It provides information about how the 604e implementation complies with the PowerPC architecture definition. This document is divided into two parts:
•Part 1,“PowerPC 604e Microprocessor Overview,” provides an overview of the 604e features, including a block diagram showing the major functional components.
•Part 2, “PowerPC 604e Microprocessor: Implementation,” gives specific details about the implementation of the 604e as a 32-bit member of the PowerPC processor family.
In this document, the term “604e” is used as an abbreviation for the phrase “PowerPC 604e microprocessor” and “604” is an abbreviation for the phrase “PowerPC 604 microprocessor.” The PowerPC 604e microprocessors are available from IBM as PPC604e and from Motorola as MPC604e.
The PowerPC name, the PowerPC logotype, PowerPC 604, and PowerPC 604e, are trademarks of International Business Machines Corporation, used by Motorola under license from International Business Machines Corporation.
This document contains information on a new product under development by Motorola and IBM. Motorola and IBM reserve the right to change or discontinue this product without notice.
Motorola Inc., 1996. All rights reserved.
Portions hereof International Business Machines Corporation, 1991–1996. All rights reserved.
604e Technical Summary
Part 1 PowerPC 604e Microprocessor Overview
This section describes the features of the 604e, provides a block diagram showing the major functional units, and describes briefly how those units interact.
The 604e is an implementation of the PowerPC family of reduced instruction set computer (RISC) microprocessors. The 604e implements the PowerPC architecture as it is specified for 32-bit addressing, which provides 32-bit effective (logical) addresses, integer data types of 8, 16, and 32 bits, and floatingpoint data types of 32 and 64 bits (single-precision and double-precision). For 64-bit PowerPC implementations, the PowerPC architecture provides additional 64-bit integer data types, 64-bit addressing, and related features.
The 604e is a superscalar processor capable of issuing four instructions simultaneously. As many as seven instructions can finish execution in parallel. The 604e has seven execution units that can operate in parallel:
•Floating-point unit (FPU)
•Branch processing unit (BPU)
•Condition register unit (CRU)
•Load/store unit (LSU)
•Three integer units (IUs):
—Two single-cycle integer units (SCIUs)
—One multiple-cycle integer unit (MCIU)
This parallel design, combined with the PowerPC architecture’s specification of uniform instructions that allows for rapid execution times, yields high efficiency and throughput. The 604e’s rename buffers, reservation stations, dynamic branch prediction, and completion unit increase instruction throughput, guarantee in-order completion, and ensure a precise exception model. (Note that the PowerPC architecture specification refers to all exceptions as interrupts.)
The 604e has separate memory management units (MMUs) and separate 32-Kbyte on-chip caches for instructions and data. The 604e implements two 128-entry, two-way set associative translation lookaside buffers (TLBs), one for instructions and one for data, and provides support for demand-paged virtual memory address translation and variable-sized block translation. The TLBs and the cache use least-recently used (LRU) replacement algorithms.
The 604e has a 64-bit external data bus and a 32-bit address bus. The 604e interface protocol allows multiple masters to compete for system resources through a central external arbiter. Additionally, on-chip snooping logic maintains data cache coherency for multiprocessor applications. The 604e supports single-beat and burst data transfers for memory accesses and memory-mapped I/O accesses.
The 604e uses an advanced, 2.5-V CMOS process technology and is fully compatible with TTL devices.
1.1 PowerPC 604e Microprocessor Features
This section summarizes features of the 604e’s implementation of the PowerPC architecture.
Figure 1 provides a block diagram showing features of the 604e. Note that this is a conceptual block diagram intended to show the basic features rather than an attempt to show how these features are physically implemented on the chip.
2 |
PowerPC 604e RISC Microprocessor Technical Summary |
Technical Microprocessor RISC 604e PowerPC |
.1 Figure |
Summary |
Diagram Block |
3
|
|
64 Bit |
|
INSTRUCTION UNIT |
|
128 Bit |
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
Fetcher |
Branch Processing |
CR File |
Condition |
|
|
|
|
|
Unit |
Rename |
Register |
I MMU |
|
|
|
|
|
||||
|
|
|
|
|
Logical Unit |
|
|
|
|
|
BTAC |
CTR |
Buffers (8) |
SRs |
|
|
|
|
|
LR |
|
|
|
|
|
|
|
|
|
IBAT |
|
|
|
|
|
|
|
|
Array |
|
Time Base |
|
|
|
|
|
ITLB |
|
Instruction Queue |
|
|
|
|
|
|
Counter/Decrementer |
Reservation |
|
Reservation |
|
|||
(8 word) |
|
|
|||||
|
|
Station (2 Entry) |
|
Station (2 Entry) |
|
||
|
|
|
|
|
|||
Clock |
JTAG/COP |
|
|
|
32 Bit |
|
|
|
|
|
|
|
|
||
Multiplier |
Interface |
|
|
|
|
|
|
|
|
|
128 Bit |
Dispatch Unit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
BHT |
|
128 Bit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Reservation |
|
|
|
|
|
Reservation |
|
|
|
|
|
|
|
|
|
|
|
|
Reservation |
|
|
|
|
|
|
|
|
|
|
Reservation |
|
|||||||
|
|
Station (2 Entry) |
|
|
|
|
|
|
|
|
|
|
GPR File |
|
|
|
|
|
Station (2 Entry) |
|
|
|
|
FPR File |
|
Station (2 Entry) |
|
|||||||||||||
|
|
|
|
|
|
Station (2 Entry) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rename |
|
|
|
|
|
|
|
|
|
|
|
|
Rename |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Buffers (12) |
|
|
|
|
|
|
|
|
|
|
|
|
Buffers (8) |
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 Bit |
|
Load/Store |
|
|
|
|
|
|
|
|||||||||||||||||
|
|
Multiple- |
|
|
|
Single-Cycle |
|
|
|
|
|
|
|
|
64 Bit |
|
|
|
|
64 Bit |
|
|
Floating- |
|||||||||||||||||
|
|
Cycle Integer |
|
|
|
Integer Units |
|
|
|
|
|
|
|
|
|
|
|
|
Unit |
|
|
|
|
|
|
|
|
|
|
Point Unit |
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||||||||
|
|
Unit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
/ |
|
/ |
|
+ |
|
EA |
/ |
* + |
* |
+* |
|
Calculation |
|||||
|
|
|
||||||
32 Bit |
|
FPSCR |
||||||
|
|
|
|
32 Bit |
+ |
|||
|
|
32 Bit |
|
|
64 Bit |
|
||
|
|
|
|
|
|
|
|
32-Kbyte COMPLETION Tags I Cache
UNIT
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64 Bit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16-Entry |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
D MMU |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
|
|
Reorder Buffer |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
Store Queue |
Finish Load |
32 Bit |
|
SRs |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
Queue |
|
|
DBAT |
|
|
|
|
|
|
|
|
|
|
BUS INTERFACE |
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Array |
|
|
|
Tags |
|
32-Kbyte |
|
|
|
|
|
UNIT |
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DTLB |
|
|
|
|
|
|
|
D Cache |
|
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32-BIT ADDRESS BUS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
64-BIT DATA BUS |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1.1.1 New Features of the PowerPC 604e Processor
Features of the 604e that are not implemented in the 604 are as follows:
•Additional special-purpose registers
—HID1 provides four read-only PLL_CFG bits for indicating the processor/bus clock ratio.
—Three additional registers to support the performance monitor—MMCR1 is a second control register that includes bits to support the use of two additional counter registers, PMC3 and PMC4.
•Instruction execution
—Separate units for branch and condition register (CR) instructions. The BPU is now split into a CR logical unit and a branch unit, which makes it possible for branch instructions to execute and resolve before preceding CR logical instructions. The 604e can still only dispatch one CR logical or branch instruction per cycle, but it can execute both branch and CR logical instructions at the same time.
—Branch correction in decode stage. Branch correction in the decode stage can now predict branches whose target is taken from the count or link registers if no updates of the count and link register are pending. This saves at least one cycle on branch correction when the Move to Special-Purpose Register (mtspr) instruction can be sufficiently separated from the branch that uses the special-purpose register (SPR) as a target address.
—Ability to disable the branch target address cache (BTAC)—HID0[30] has been defined to allow the BTAC to be disabled. When HID0[30] is set, the BTAC contents are invalidated and the BTAC behaves as if it were empty. New entries cannot be added until the BTAC is enabled.
•Improvements to cache implementation
—32-Kbyte split data and instruction caches. Like the 604, both caches are four-way set associative; however, each cache has twice as many sets, logically separated into 128 sets of odd lines and 128 sets of even lines.
—Data cache line-fill buffer forwarding. In the 604 only the critical double word of a burst operation was made available to the requesting unit at the time it was burst into the line-fill buffer. Subsequent data was unavailable until the cache block was filled. On the 604e, subsequent data is also made available as it arrives in the line-fill buffer.
—Additional cache copyback buffers. The 604e implements three copyback write buffers (as opposed to one in the 604). Having multiple copyback buffers provides the ability for certain instructions to take fuller advantage of the pipelined system bus to provide more efficient handling of cache copyback, block invalidate operations caused by the Data Cache Block Flush (dcbf) instruction, and cache block clean operations resulting from the Data Cache Block Store (dcbst) instruction.
—Coherency support for instruction fetching. Instruction fetching coherency is controlled by HID0[23]. In the default mode, HID0[23] is 0, GBL is not asserted for instruction accesses, as is the case with the 604. If the bit is set, and instruction translation is enabled (MSR[IR] = 1), the GBL signal is set to reflect the M bit for this page or block. If instruction translation is disabled (MSR[IR] = 0), the GBL signal is asserted.
•System interface operation
—The 604e has the same pin configuration as the 604; however, on the 604e Vdd and AVdd must be tied to 2.5 Vdc and OVdd must be tied to 3.3 Vdc. The 604e uses split voltage planes, and for replacement compatibility, 604/604e designs should provide both 2.5-V and 3.3-V planes and the ability to tie those two planes together and disable the 2.5-V plane for operation with a 604.
4 |
PowerPC 604e RISC Microprocessor Technical Summary |
—Support for additional processor/bus clock ratios (5:2 and 4:1). Configuration of the processor/ bus clock ratios is displayed through a new 604e-specific register, HID1.
—To support the changes in the clocking configuration, different precharge timings for the ABB, DBB, ARTRY, and SHD signals are implemented internally by the processor. The precharge timings for ARTRY and SHD can be disabled by setting HID0[7].
—No-DRTRY mode. In addition to the normal and fast L2 modes implemented on the 604, a noDRTRY mode is implemented on the 604e that improves performance on read operations for systems that do not use the DRTRY signal. No-DRTRY mode makes read data available to the processor one bus clock cycle sooner than in normal mode. In no-DRTRY mode, the DRTRY signal is no longer sampled as part of a qualified bus grant.
•Full hardware support for little-endian accesses. Little-endian accesses take alignment exceptions for only the same set of causes as big-endian accesses. Accesses that cross a word boundary require two accesses with the lower-addressed word accessed first.
•Additional enhancements to the performance monitor.
1.1.2 Overview of the PowerPC 604e Microprocessor Features
Major features of the 604e are as follows:
•High-performance, superscalar microprocessor
—As many as four instructions can be issued per clock
—As many as seven instructions can start executing per clock (including three integer instructions)
—Single-clock-cycle execution for most instructions
•Seven independent execution units and two register files
—BPU featuring dynamic branch prediction
–Two-entry reservation station
–Out-of-order execution through two branches
–Shares dispatch bus with CRU
–64-entry fully-associative branch target address cache (BTAC). In the 604e, the BTAC can be disabled and invalidated.
–512-entry branch history table (BHT) with two bits per entry for four levels of prediction— not-taken, strongly not-taken, taken, strongly taken
—Condition register logical unit
–Two-entry reservation station
–Shares dispatch bus with BPU
—Two single-cycle IUs (SCIUs) and one multiple-cycle IU (MCIU)
–Instructions that execute in the SCIU take one cycle to execute; most instructions that execute in the MCIU take multiple cycles to execute.
–Each SCIU has a two-entry reservation station to minimize stalls
–The MCIU has a single-entry reservation station and provides early exit (three cycles) for 16- x 32-bit and overflow operations.
–Thirty-two GPRs for integer operands
PowerPC 604e RISC Microprocessor Technical Summary |
5 |
—Three-stage floating-point unit (FPU)
–Fully IEEE 754-1985-compliant FPU for both singleand double-precision operations
–Supports non-IEEE mode for time-critical operations
–Fully pipelined, single-pass double-precision design
–Hardware support for denormalized numbers
–Two-entry reservation station to minimize stalls
–Thirty-two 64-bit FPRs for singleor double-precision operands
—Load/store unit (LSU)
–Two-entry reservation station to minimize stalls
–Single-cycle, pipelined cache access
–Dedicated adder performs EA calculations
–Performs alignment and precision conversion for floating-point data
–Performs alignment and sign extension for integer data
–Four-entry finish load queue (FLQ) provides load miss buffering
–Six-entry store queue
–Supports both bigand little-endian modes
•Rename buffers
—Twelve GPR rename buffers
—Eight FPR rename buffers
—Eight condition register (CR) rename buffers
The 604e rename buffers are described in Section 1.2.7, “Rename Buffers.”
•Completion unit
—The completion unit retires an instruction from the 16-entry reorder buffer when all instructions ahead of it have been completed and the instruction has finished execution.
—Guarantees sequential programming model (precise exception model)
—Monitors all dispatched instructions and retires them in order
—Tracks unresolved branches and flushes executed, dispatched, and fetched instructions if branch is mispredicted
—Retires as many as four instructions per clock
•Separate on-chip instruction and data caches (Harvard architecture)
—32-Kbyte, four-way set-associative instruction and data caches
—LRU replacement algorithm
—32-byte (eight-word) cache block size
—Physically indexed/physical tags. (Note that the PowerPC architecture refers to physical address space as real address space.)
—Cache write-back or write-through operation programmable on a per page or per block basis
—Instruction cache can provide four instructions per clock; data cache can provide two words per clock
—Caches can be disabled in software
6 |
PowerPC 604e RISC Microprocessor Technical Summary |
—Caches can be locked
—Parity checking performed on both caches
—Data cache coherency (MESI) maintained in hardware
—Secondary data cache support provided
—Instruction cache coherency maintained in hardware
—Data cache line-fill buffer forwarding. In the 604 only the critical double word of the cache block was made available to the requesting unit at the time it was burst into the line-fill buffer. Subsequent data was unavailable until the cache block was filled. On the 604e, subsequent data is also made available as it arrives in the line-fill buffer.
•Separate memory management units (MMUs) for instructions and data
—Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte segment size
—Both TLBs are 128-entry and two-way set associative
—TLBs are hardware reloadable (that is, the page table search is performed in hardware)
—Separate IBATs and DBATs (four each) also defined as SPRs
—Separate instruction and data translation lookaside buffers (TLBs)
—LRU replacement algorithm
—52-bit virtual address; 32-bit physical address
•Bus interface features include the following:
—Selectable processor-to-bus clock frequency ratios (1:1, 3:2, 2:1, 5:2, 3:1, and 4:1)
—A 64-bit split-transaction external data bus with burst transfers
—Support for address pipelining and limited out-of-order bus transactions
—Four burst write queues—three for cache copyback operations and one for snoop push operations
—Two single-beat write queues
—Additional signals and signal redefinition for direct-store operations
—Provides a data streaming mode that allows consecutive burst read data transfers to occur without intervening dead cycles. This mode also disables data retry operations.
—No-DRTRY mode eliminates the DRTRY signal from the qualified bus grant and allows read operations. This improves performance on read operations for systems that do not use the DRTRY signal. No-DRTRY mode makes read data available to the processor one bus clock cycle sooner than if normal mode is used.
•Multiprocessing support features include the following:
—Hardware enforced, four-state cache coherency protocol (MESI) for data cache. Bits are provided in the instruction cache to indicate only whether a cache block is valid or invalid.
—Separate port into data cache tags for bus snooping
—Load/store with reservation instruction pair for atomic memory references, semaphores, and other multiprocessor operations
•Power management
—NAP mode supports full shut down and snooping
—Operating voltage of 2.5 ± 0.3 V
PowerPC 604e RISC Microprocessor Technical Summary |
7 |
•Performance monitor can be used to help in debugging system designs and improving software efficiency, especially in multiprocessor systems.
•In-system testability and debugging features through JTAG boundary-scan capability
1.2 PowerPC 604e Microprocessor Hardware
Implementation
This section provides an overview of the 604e’s hardware implementation, including descriptions of the functional units, shown in Figure 2, the cache implementation, MMU, and the system interface.
Note that Figure 2 provides a more detailed block diagram than that presented in Figure 1—showing the additional data paths that contribute to the improved efficiency in instruction execution and more clearly shows the relationships between execution units and their associated register files.
Branch |
|
|
|
|
|
|
|
|
|
|
|
Dispatch Unit |
|
|
|
|
|
|||||
correction |
|
Fetch Unit |
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
(Four-instruction |
|
|
|
|
|
|||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
dispatch) |
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Instruction dispatch buses |
||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
GPR operand buses |
|
|
|
|
|
||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
GPR result buses |
|
|
|
|
|
||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
buses |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
FPR operand |
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CR result bus
FPR result buses
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
RS(2) |
|
|
RS(2) |
|
RS(2) |
|
RS(2) |
|
RS(1) |
|
RS(2) |
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
CRU |
|
|
BPU |
|
SCIU |
|
SCIU |
|
MCIU |
|
LSU |
|||||||||
|
|
||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
32 GPRs |
RS(2) |
|
GPR Rename Buffers |
FPR Rename Buffers |
32 FPRs |
|
|
|
FPU |
|
Result status buses
Completion |
|
32-Kbyte data cache |
|
|
|
|
|
|
|
|
Result buses |
||
Unit |
|
4-way, 8 words/block |
|
|
|
|
|
|
|
|
Operand buses |
||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 2. Block Diagram—Internal Data Paths |
|
8 |
PowerPC 604e RISC Microprocessor Technical Summary |
1.2.1 Instruction Flow
Several units on the 604e ensure the proper flow of instructions and operands and guarantee the correct update of the architectural machine state. These units include the following:
•Fetch unit—Using the next sequential address or the address supplied by the BPU when a branch is predicted or resolved, the fetch unit supplies instructions to the eight-word instruction buffer.
•Dispatch unit—The decode/dispatch unit decodes instructions and dispatches them to the appropriate execution unit. During dispatch, operands are provided to the execution unit (or reservation station) from the register files, rename buffers, and result buses.
•Branch processing unit (BPU)—Provides the fetcher with predicted target instructions when a branch is predicted (and a mispredict recovery address if a branch is incorrectly predicted).
•Condition register unit (CRU)—The CRU executes all condition register logical and flow control instructions. The CRU shares the dispatch bus with the BPU only one condition register or branch instruction can be issued per clock cycle.
•Completion unit—The completion unit retires executed instructions in program order and controls the updating of the architectural machine state.
1.2.2 Fetch Unit
The fetch unit provides instructions to the eight-entry instruction queue by accessing the on-chip instruction cache. Typically, the fetch unit continues fetching sequentially as many as four instructions at a time.
The address of the next instruction to be fetched is determined by several conditions, which are prioritized as follows:
1.Detection of an exception. Instruction fetching begins at the exception vector.
2.The BPU recovers from an incorrect prediction when a branch instruction is in the execute stage. Undispatched instructions are flushed and fetching begins at the correct target address.
3.The BPU recovers from an incorrect prediction when a branch instruction is in the dispatch stage. Subsequent instructions are flushed and fetching begins at the correct target address.
4.The BPU recovers from an incorrect prediction when a branch instruction is in the decode stage. Subsequent instructions are flushed and fetching begins at the correct target address.
5.A fetch address is found in the BTAC. As a cache block is fetched, the branch target address cache (BTAC) and the branch history table (BHT) are searched with the fetch address. If it is found in the BTAC, the target address from the BTAC is the first candidate for being the next fetch address.
6.If none of the previous conditions exists, the instruction is fetched from the next sequential address.
1.2.3 Decode/Dispatch Unit
The decode/dispatch unit provides the logic for decoding instructions and issuing them to the appropriate execution unit. The eight-entry instruction queue consists of two four-entry queues—a decode queue (DEQ) and a dispatch queue (DISQ).
The decode logic decodes the four instructions in the decode queue. For many branch instructions, these decoded instructions along with the bits in the BHT, are used during the decode stage for branch correction.
The dispatch logic decodes the instructions in the DISQ for possible dispatch. The dispatch logic resolves unconditional branch instructions and predicts conditional branch instructions using the branch decode logic, the BHT, and values in the CTR.
PowerPC 604e RISC Microprocessor Technical Summary |
9 |
The 512-entry BHT provides two bits per entry, indicating four levels of dynamic prediction—strongly nottaken, not-taken, taken, and strongly taken. The history of a branch’s direction is maintained in these two bits. Each time a branch is taken the value is incremented (with a maximum value of three meaning stronglytaken); when it is not taken, the bit value is decremented (with a minimum value of zero meaning strongly not-taken). If the current value predicts taken and the next branch is taken again, the BHT entry then predicts strongly taken. If the next branch is not taken, the BHT then predicts taken.
The dispatch logic also allocates each instruction to the appropriate execution unit. A reorder buffer (ROB) entry is allocated for each instruction, and dependency checking is done between the instructions in the dispatch queue. The rename buffers are searched for the operands as the operands are fetched from the register file. Operands that are written by other instructions ahead of this one in the dispatch queue are given the tag of that instruction’s rename buffer; otherwise, the rename buffer or register file supplies either the operand or a tag. As instructions are dispatched, the fetch unit is notified that the dispatch queue can be updated with more instructions.
1.2.4 Branch Processing Unit (BPU)
The BPU handles prediction and recovery for branch instructions. All branches, including unconditional branches, are placed in a two-entry reservation station until conditions are resolved and they can be executed. At that point, branch instructions are executed in order and the completion unit is notified whether the prediction was correct.
Unlike the 604, the 604e has a separate unit for executing condition register logical instructions, which makes it possible for branch instructions to execute and resolve before a preceding CR logical instruction. The 604e can still only dispatch one CR logical or branch instruction per cycle, but it can execute both branch and CR logical instructions at the same time.
Branch correction in the decode stage in the 604e can predict branches whose target is taken from the count or link registers if no updates of the count and link register are pending. This saves at least one cycle on branch correction when the mtspr instruction can be sufficiently separated from the branch that uses the SPR as a target address.
HID0[30] has been defined to allow the BTAC to be disabled. When HID0[30] is set, the BTAC contents are invalidated and that BTAC behaves as if it were empty. New entries cannot be added until the BTAC is enabled.
The BPU shares a dispatch bus with the condition register.
1.2.5 Condition Register Unit (CRU)
Condition register logical instructions are executed by the CRU, which shares the dispatch bus with the BPU. The CRU has its own two-entry reservation station. The 604e can still only dispatch one CR logical or branch instruction per cycle, but it can execute both branch and CR logical instructions at the same time.
1.2.6 Completion Unit
The completion unit retires executed instructions from the reorder buffer (ROB) in the completion unit and updates register files and control registers. The completion unit recognizes exception conditions and discards any operations being performed on subsequent instructions in program order. The completion unit can quickly remove instructions from a mispredicted branch, and the decode/dispatch unit begins dispatching from the correct path.
10 |
PowerPC 604e RISC Microprocessor Technical Summary |
The instruction is retired from the reorder buffer when it has finished execution and all instructions ahead of it have been completed. The instruction’s result is written into the appropriate register file and is removed from the rename buffers at or after completion. At completion, the 604e also updates any other resource affected by this instruction. Several instructions can complete simultaneously. Most exception conditions are recognized at completion time.
1.2.7 Rename Buffers
To avoid contention for a given register location, the 604e provides rename registers for storing instruction results before the completion unit commits them to the architected register. Twelve rename registers are provided for the GPRs, eight for the FPRs, and eight for the condition register. GPRs are described in Section 2.1.1.1, “General-Purpose Registers (GPRs),” FPRs are described in Section 2.1.1.2, “FloatingPoint Registers (FPRs),” and the condition register is described in Section 2.1.1.3, “Condition Register (CR).”
When the dispatch unit dispatches an instruction to its execution unit, it allocates a rename register for the results of that instruction. The dispatch unit also provides a tag to the execution unit identifying the result that should be used as the operand. When the proper result is returned to the rename buffer it is latched into the reservation station. When all operands are available in the reservation station, the execution can begin.
The completion unit does not transfer instruction results from the rename registers to the registers until any branch conditions preceding it in the completion queue are resolved and the instruction itself is retired from the completion queue without exceptions. If a branch is found to have been incorrectly predicted, all instructions following the branch are flushed from the completion queue and any results of those instructions are flushed from the rename registers.
1.2.8 Execution Units
The following sections describe the 604e’s arithmetic execution units—the two single-cycle integer units (SCIUs), the multiple cycle integer unit (MCIU), and the FPU. When the reservation station sees the proper result being written back, it will grab it directly from one of the result buses. Once all operands are in the reservation station for an instruction, it is eligible to be executed. Reservation stations temporarily store dispatched instructions that cannot be executed until all of the source operands are valid.
1.2.8.1 Integer Units (IUs)
The two SCIUs and one MCIU execute all integer instructions. These are shown in Figure 1 and Figure 2. Each IU has a dedicated result bus that connects to rename buffers and to all reservation stations. Each SCIU has a two-entry reservation station and the MCIU has a single-entry reservation station to reduce stalls. A reservation station can receive instructions from the decode/dispatch unit and operands from the GPRs, the rename buffers, or the result buses.
Each SCIU consists of three single-cycle subunits—a fast adder/comparator, a subunit for logical operations, and a subunit for performing rotates, shifts, and count-leading-zero operations. These subunits handle all one-cycle arithmetic instructions; only one subunit can execute an instruction at a time.
The MCIU consists of a 32-bit integer multiplier/divider. The multiplier supports early exit on 16- x 32-bit operations, and is responsible for executing the Move from Special-Purpose Register (mfspr) and Move to Special-Purpose Register (mtspr) instructions, which are used to read and write special-purpose registers. Note that the load and store instructions that update their address base register (specified by the rA operand) pass the update results on the MCIU’s result bus. Otherwise, the MCIU’s result bus is dedicated to MCIU operations.
PowerPC 604e RISC Microprocessor Technical Summary |
11 |