This document provides an overview of the PowerPC 601 RISC microprocessor features,
including a block diagram showing the major functional components. It also provides an
overview of the Po werPC architecture, and information about ho w the 601 implementation
differs from the architectural definitions.
This document is divided into three parts:
•Part 1, “PowerPC 601 Microprocessor Overvie w, ” provides an overvie w of the 601
features, including a block diagram showing the major functional components.
•Part 2, “Levels of the PowerPC Architecture,” describes the three levels of the
PowerPC architecture.
•Part 3, “PowerPC 601 Microprocessor: Implementation,” describes the PowerPC
architecture in general, noting where the 601 differs.
In this document, the terms “PowerPC 601 RISC Microprocessor” and “601” are used to
Freescale Semiconductor, I
denote the first microprocessor from the PowerPC architecture family. The PowerPC 601
microprocessors are available from IBM as PPC601 and from Motorola as MPC601.
A
™
R
Freescale Semiconductor, Inc.
601 RISC Microprocessor
E
V
I
H
C
D
C
U
D
N
O
C
I
M
ESE
L
A
C
S
E
E
R
F
Y
B
(Motorola Order Number)
.
C
N
I
,
R
O
T
MPC601/D
11/93
REV 1
™
PowerPC is a trademark of International Business Machines Corp.
This document contains information on a new product under development. Specifications and information herein are subject to change without notice.
Part 1 describes the features of the 601, provides a block diagram showing the major functional units, and
gives an overview of how the 601 operates.
The 601 is the first implementation of the PowerPC family of reduced instruction set computer (RISC)
microprocessors. The 601 implements the 32-bit portion of the PowerPC architecture, which provides 32bit effective (logical) addresses, integer data types of 8, 16, and 32 bits, and floating-point data types of 32
and 64 bits. For 64-bit PowerPC implementations, the PowerPC architecture provides 64-bit integer data
types, 64-bit addressing, and other features required to complete the 64-bit architecture.
..
.
nc
Freescale Semiconductor, I
The 601 is a superscalar processor capable of issuing and retiring three instructions per clock, one to each
of three execution units. Instructions can complete out of order for increased performance; howe ver , the 601
makes execution appear sequential.
N
D
U
C
The 601 integrates three execution units—an integer unit (IU), a branch processing unit (BPU), and a
floating-point unit (FPU). The ability to execute three instructions in parallel and the use of simple
instructions with rapid execution times yield high efficiency and throughput for 601-based systems. Most
integer instructions execute in one clock cycle. The FPU is pipelined so a single-precision multiply-add
instruction can be issued every clock cycle.
The 601 includes an on-chip, 32-Kbyte, eight-way set-associative, physically addressed, unified instruction
and data cache and an on-chip memory management unit (MMU). The MMU contains a 256-entry , two-way
set-associative, unified translation lookaside buffer (UTLB) and provides support for demand paged virtual
memory address translation and variable-sized block translation. Both the UTLB and the cache use least
recently used (LRU) replacement algorithms.
The 601 has a 64-bit data bus and a 32-bit address bus. The 601 interface protocol allows multiple masters
to compete for system resources through a central external arbiter. Additionally, on-chip snooping logic
maintains cache coherency in multiprocessor applications. The 601 supports single-beat and burst data
transfers for memory accesses; it also supports both memory-mapped I/O and I/O controller interface
addressing.
The 601 uses an advanced, 3.6-V CMOS process technology and maintains full interface compatibility with
TTL devices.
A
R
C
H
D
E
V
I
F
Y
B
R
E
E
S
C
A
ESE
L
M
O
C
I
T
O
R
,
.
C
N
I
1.1 PowerPC 601 Microprocessor Features
This section describes details of the 601’s implementation of the PowerPC architecture. Major features of
the 601 are as follows:
•High-performance, superscalar microprocessor
— As many as three instructions in execution per clock (one to each of the three execution units)
— Single clock cycle execution for most instructions
— Pipelined FPU for all single-precision and most double-precision operations
•Three independent execution units and two register files
— BPU featuring static branch prediction
— A 32-bit IU
— Fully IEEE 754-compliant FPU for both single- and double-precision operations
— Thirty-two GPRs for integer operands
— Thirty-two FPRs for single- or double-precision operands
For More Information On This Product,
Go to: www.freescale.com
PowerPC 601 RISC Microprocessor Technical Summary
..
.
nc
Freescale Semiconductor, I
•High instruction and data throughput
— Zero-cycle branch capability
— Programmable static branch prediction on unresolved conditional branches
— Instruction unit capable of fetching eight instructions per clock from the cache
— An eight-entry instruction queue that provides look-ahead capability
— Interlocked pipelines with feed-forwarding that control data dependencies in hardware
— Unified 32-Kbyte cache—eight-way set-associative, physically addressed; LRU replacement
algorithm
— Cache write-back or write-through operation programmable on a per page or per block basis
— Memory unit with a two-element read queue and a three-element write queue
— Run-time reordering of loads and stores
— BPU that performs condition register (CR) look-ahead operations
— Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte
segment size
— A 256-entry, two-way set-associative UTLB
— Four-entry BAT array providing 128-Kbyte to 8-Mbyte blocks
— Four-entry, first-level ITLB
— Hardware table search (caused by UTLB misses) through hashed page tables
— 52-bit virtual address; 32-bit physical address
•Facilities for enhanced system performance
— Bus speed defined as selectable division of operating frequency
— A 64-bit split-transaction external data bus with burst transfers
— Support for address pipelining and limited out-of-order bus transactions
— Snooped copyback queues for cache block (sector) copyback operations
— Bus extensions for I/O controller interface operations
— Multiprocessing support features that include the following:
– Hardware enforced, four-state cache coherency protocol (MESI)
– Separate port into cache tags for bus snooping
•In-system testability and debugging features through boundary-scan capability
A
R
Freescale Semiconductor, Inc.
E
V
I
H
C
D
O
C
I
M
ESE
L
A
C
S
E
E
R
F
Y
B
N
D
U
C
T
O
R
,
C
N
I
.
1.2 Block Diagram
Figure 1 provides a block diagram of the 601 that illustrates how the execution units—IU, FPU, and BPU—
operate independently and in parallel.
The 601's 32-Kbyte, unified cache tag directory has a port dedicated to snooping bus transactions,
preventing interference with processor access to the cache. The 601 also provides address translation and
protection facilities, including a UTLB and a BAT array, and a four-entry ITLB that contains the four most
recently used instruction address translations for fast access by the instruction unit.
Instruction fetching and issuing is handled in the instruction unit. Translation of addresses for cache or
ussed
external memory accesses are handled by the memory management unit. Both units are disc
detail in Sections 1.3, “Instruction Unit,” and 1.5, “Memory Management Unit (MMU).”
As shown in Figure 1, the 601 instruction unit, which contains an instruction queue and the BPU, provides
centralized control of instruction flow to the execution units. The instruction unit determines the address of
the next instruction to be fetched based on information from a sequential fetcher and the BPU. The IU also
enforces pipeline interlocks and controls feed-forwarding.
The sequential fetcher contains a dedicated adder that computes the address of the next sequential
instruction based on the address of the last fetch and the number of words accepted into the queue. The BPU
searches the bottom half of the instruction queue for a branch instruction and uses static branch prediction
on unresolved conditional branches to allow the instruction fetch unit to fetch instructions from a predicted
target instruction stream while a conditional branch is ev aluated. The BPU also folds out branch instructions
for unconditional branches.
Instructions issued beyond a predicted branch do not complete execution until the branch is resolved,
preserving the programming model of sequential execution. If any of these instructions are to be executed
in the BPU, they are decoded but not issued. FPU and IU instructions are issued and allowed to complete
up to the register write-back stage. Write-back is performed when a correctly predicted branch is resolved,
and instruction execution continues without interruption along the predicted path.
C
A
ESE
L
If branch prediction is incorrect, the instruction fetcher flushes all predicted path instructions and
instructions are issued from the correct path.
1.3.1 Instruction Queue
The instruction queue, shown in Figure 1, holds as many as eight instructions (a cache block) and can be
filled from the cache during a single cycle. The instruction fetch can access only one cache sector at a time
A
R
C
H
and will load as many instruction as space in the IQ allows.
The upper half of the instruction queue (Q4–Q7) provides buffering to reduce the frequency of cache
accesses. Integer and branch instructions are dispatched to their respective e xecution units from Q0 through
Q3. Q0 functions as the initial decode stage for the IU.
For a more detailed overview of instruction dispatch, see Section 3.7, “Instruction Timing.”
D
E
V
I
F
Y
B
R
E
E
S
M
O
C
I
N
D
U
C
T
O
R
,
.
C
N
I
1.4 Independent Execution Units
The PowerPC architecture’s support for independent floating-point, integer, and branch processing
execution units allows implementation of processors with out-of-order instruction issue. For example,
because branch instructions do not depend on GPRs or FPRs, branches can often be resolved early,
eliminating stalls caused by taken branches.
The following sections describe the 601’s three execution units—the BPU, IU, and FPU.
1.4.1 Branch Processing Unit (BPU)
The BPU performs condition register (CR) look-ahead operations on conditional branches. The BPU looks
through the bottom half of the instruction queue for a conditional branch instruction and attempts to resolve
it early, achieving the effect of a zero-cycle branch in many cases.
The BPU uses a bit in the instruction encoding to predict the direction of the conditional branch. Therefore,
when an unresolved conditional branch instruction is encountered, the 601 fetches instructions from the
predicted target stream until the conditional branch is resolved.
PowerPC 601 RISC Microprocessor Technical Summary
For More Information On This Product,
Go to: www.freescale.com
5
6
Freescale Semiconductor, Inc.
The BPU contains an adder to compute branch target addresses and three special-purpose, user-control
registers—the link register (LR), the count register (CTR), and the CR. The BPU calculates the return
pointer for subroutine calls and saves it into the LR for certain types of branch instructions. The LR also
contains the branch target address for the Branch Conditional to Link Register (
contains the branch target address for the Branch Conditional to Count Register (
contents of the LR and CTR can be copied to or from any GPR. Because the BPU uses dedicated registers
rather than general-purpose or floating-point registers, execution of branch instructions is largely
independent from execution of integer and floating-point instructions.
bclrx) instruction. The CTR
bcctrx) instruction. The
..
.
nc
Freescale Semiconductor, I
1.4.2 Integer Unit (IU)
.
,
C
N
I
The IU executes all integer instructions and executes floating-point memory accesses in concert with the
FPU. The IU executes one integer instruction at a time, performing computations with its arithmetic logic
unit (ALU), multiplier, di vider, inte ger exception register (XER), and the general-purpose re gister file. Most
integer instructions are single-cycle instructions.
The IU interfaces with the cache and MMU for all instructions that access memory. Addresses are formed
by adding the source 1 register operand specified by the instruction (or zero) to either a source 2 register
operand or to a 16-bit, immediate value embedded in the instruction.
ESE
S
C
A
L
Load and store instructions are issued and translated in program order; howev er, the accesses can occur out
of order. Synchronizing instructions are provided to enforce strict ordering.
Load and store instructions are considered to have completed execution with respect to precise exceptions
D
Y
B
after the address is translated. If the address for a load or store instruction hits in the UTLB or BAT array
and it is aligned, the instruction execution (that is, calculation of the address) takes one clock cycle, allo wing
back-to-back issue of load and store instructions. The time required to perform the actual load or store
operation varies depending on whether the operation involves the cache, system memory, or an I/O device.
A
R
C
H
E
V
I
E
E
R
F
M
C
I
O
N
D
U
C
T
O
R
1.4.3 Floating-Point Unit (FPU)
The FPU contains a single-precision multiply-add array, the floating-point status and control register
(FPSCR), and thirty-two 64-bit FPRs. The multiply-add array allows the 601 to efficiently implement
floating-point operations such as multiply , add, di vide, and multiply-add. The FPU is pipelined so that most
single-precision instructions and many double-precision instructions can be issued back-to-back. The FPU
contains two additional instruction queues. These queues allow floating-point instructions to be issued from
the instruction queue even if the FPU is busy, making instructions available for issue to the other execution
units.
Like the BPU, the FPU can access instructions from the bottom half of the instruction queue (Q3–Q0),
which permits floating-point instructions that do not depend on unexecuted instructions to be issued early
to the FPU.
The 601 supports all IEEE 754 floating-point data types (normalized, denormalized, NaN, zero, and infinity)
in hardware, eliminating the latency incurred by software exception routines.
1.5 Memory Management Unit (MMU)
The 601’s MMU supports up to 4 Petabytes (252) of virtual memory and 4 Gigabytes (232) of physical
memory. The MMU also controls access privileges for these spaces on block and page granularities.
Referenced and changed status are maintained by the processor for each page to assist implementation of a
demand-paged virtual memory system.
For More Information On This Product,
Go to: www.freescale.com
PowerPC 601 RISC Microprocessor Technical Summary
..
.
nc
Freescale Semiconductor, I
Freescale Semiconductor, Inc.
The instruction unit generates all instruction addresses; these addresses are both for sequential instruction
fetches and addresses that correspond to a change of program flow. The integer unit generates addresses for
data accesses (both for memory and the I/O controller interface).
After an address is generated, the upper order bits of the logical (effective) address are translated by the
MMU into physical address bits. Simultaneously, the lower order address bits (that are untranslated and
therefore considered both logical and physical), are directed to the on-chip cache where they form the index
into the eight-way set-associative tag array. After translating the address, the MMU passes the higher-order
bits of the physical address to the cache, and the cache lookup completes. For cache-inhibited accesses or
accesses that miss in the cache, the untranslated lower order address bits are concatenated with the translated
higher-order address bits; the resulting 32-bit physical address is then used by the memory unit and the
system interface, which accesses external memory.
The MMU also directs the address translation and enforces the protection hierarchy programmed by the
operating system in relation to the supervisor/user privilege le vel of the access and in relation to whether the
access is a load or store.
For instruction accesses, the MMU first performs a lookup in the four entries of the ITLB for both blockand page-based physical address translation. Instruction accesses that miss in the ITLB and all data accesses
cause a lookup in the UTLB and BAT array for the physical address translation. In most cases, the physical
address translation resides in one of the TLBs and the physical address bits are readily available to the onchip cache. In the case where the physical address translation misses in the TLBs, the 601 automatically
performs a search of the translation tables in memory using the information in the table search description
register 1 (SDR1) and the corresponding segment register.
Memory management in the 601 is described in more detail in Section 3.6.2, “PowerPC 601 Microprocessor
Memory Management.”
H
C
R
1.6 Cache Unit
The PowerPC 601 microprocessor contains a 32-Kbyte, eight-way set associative, unified (instruction and
data) cache. The cache line size is 64 bytes, divided into two eight-word sectors, each of which can be
snooped, loaded, cast-out, or invalidated independently. The cache is designed to adhere to a write-back
policy , but the 601 allo ws control of cacheability, write policy, and memory coherency at the page and block
level. The cache uses a least recently used (LRU) replacement policy.
As shown in Figure 1, the cache provides an eight-word interface to the instruction fetcher and load/store
unit. The surrounding logic selects, organizes, and forwards the requested information to the requesting unit.
Write operations to the cache can be performed on a byte basis, and a complete read-modify-write operation
to the cache can occur in each cycle.
The instruction unit provides the cache with the address of the next instruction to be fetched. In the case of
a cache hit, the cache returns the instruction and as many of the instructions following it as can be placed in
the eight-word instruction queue up to the cache sector boundary. If the queue is empty, as many as eight
words (an entire sector) can be loaded into the queue in parallel.
A
D
E
V
I
.
C
N
I
,
R
O
T
C
U
D
N
O
C
I
M
ESE
L
A
C
S
E
E
R
F
Y
B
The cache tag directory has one address port dedicated to instruction fetch and load/store accesses and one
dedicated to snooping transactions on the system interface. Therefore, snooping does not require additional
clock cycles unless a snoop hit that requires a cache status update occurs.
PowerPC 601 RISC Microprocessor Technical Summary
For More Information On This Product,
Go to: www.freescale.com
7
..
.
nc
Freescale Semiconductor, I
8
Freescale Semiconductor, Inc.
1.7 Memory Unit
The 601’s memory unit contains read and write queues that b uf fer operations between the external interf ace
and the cache. These operations are comprised of operations resulting from load and store instructions that
are cache misses and read and write operations required to maintain cache coherency , table search, and other
operations. The memory unit also handles address-only operations and cache-inhibited loads and stores. As
shown in Figure 2, the read queue contains two elements and the write queue contains three elements. Each
element of the write queue can contain as many as eight words (one sector) of data. One element of the write
queue, marked snoop in Figure 2, is dedicated to writing cache sectors to system memory after a modified
sector is hit by a snoop from another processor or snooping device on the system bus. The use of the write
queue guarantees a high priority operation that ensures a deterministic response behavior when snooping
hits a modified sector.
ADDRESS
(from cache)
O
C
I
(to cache)
DATA QUEUE
(four word)
B
D
E
V
I
H
C
R
A
The other two elements in the write queue are used for store operations and writing back modified sectors
that have been deallocated by updating the queue; that is, when a cache location is full, the least-recently
used cache sector is deallocated by first being copied into the write queue and from there to system memory .
Note that snooping can occur after a sector has been pushed out into the write queue and before the data has
been written to system memory. Therefore, to maintain a coherent memory, the write queue elements are
compared to snooped addresses in the same way as the cache tags. If a snoop hits a write queue element, the
data is first stored in system memory before it can be loaded into the cache of the snooping bus master.
Coherency checking between the cache and the write queue prevents dependency conflicts. Single-beat
writes in the write queue are not snooped; coherency is ensured through the use of special cache operations
that accompany the single-beat write operation on the bus.
READ
QUEUE
L
A
C
S
E
E
R
F
Y
SYSTEM INTERFACE
Figure 2. Memory Unit
M
ESE
ADDRESSDATA
C
U
(from cache)
D
N
WRITE QUEUE
O
T
DATA
R
,
.
C
N
I
SNOOP
Execution of a load or store instruction is considered complete when the associated address translation
completes, guaranteeing that the instruction has completed to the point where it is known that it will not
generate an internal exception. Howe ver, after address translation is complete, a read or write operation can
still generate an external exception.
Load and store instructions are always issued and translated in program order with respect to other load and
store instructions. Howev er, a load or store operation that hits in the cache can complete ahead of those that
miss in the cache; additionally , loads and stores that miss the cache can be reordered as they arbitrate for the
system bus.
For More Information On This Product,
Go to: www.freescale.com
PowerPC 601 RISC Microprocessor Technical Summary
Freescale Semiconductor, Inc.
If a load or store misses in the cache, the operation is managed by the memory unit which prioritizes
accesses to the system bus. Read requests, such as loads, RWITMs, and instruction fetches have priority
over single-beat write operations. The 601 ensures memory consistenc y by comparing tar get addresses and
prohibiting instructions from completing out of order if an address matches. Load and store operations can
be forced to execute in strict program order.
The 601 ensures memory consistency by comparing target addresses and prohibiting instructions from
completing out of order if an address matches. Load and store operations can be forced to execute in strict
program order.
..
.
nc
1.8 System Interface
T
O
R
,
Because the cache on the 601 is an on-chip, write-back primary cache, the predominant type of transaction
for most applications is burst-read memory operations, followed by burst-write memory operations, I/O
controller interface operations, and single-beat (noncacheable or write-through) memory read and write
operations. Additionally, there can be address-only operations, variants of the burst and single-beat
operations (global memory operations that are snooped, and atomic memory operations, for example), and
address retry activity (for example, when a snooped read access hits a modified line in the cache).
C
A
ESE
L
Memory accesses can occur in single-beat (1–8 bytes) and four-beat burst (32 bytes) data transfers. The
address and data buses are independent for memory accesses to support pipelining and split transactions.
The 601 can pipeline as many as two transactions and has limited support for out-of-order split-bus
transactions.
Access to the system interface is granted through an external arbitration mechanism that allows devices to
compete for bus mastership. This arbitration mechanism is flexible, allowing the 601 to be integrated into
systems that implement various fairness and bus parking procedures to avoid arbitration overhead.
Additional multiprocessor support is provided through coherency mechanisms that provide snooping,
external control of the on-chip cache and TLB, and support for a secondary cache. Multiprocessor software
support is provided through the use of atomic memory operations.
Typically, memory accesses are weakly ordered—sequences of operations, including load/store string and
multiple instructions, do not necessarily complete in the order they begin—maximizing the efficienc y of the
bus without sacrificing coherency of the data. The 601 allows read operations to precede store operations
(except when a dependency exists, of course). In addition, the 601 can be configured to reorder high priority
write operations ahead of lower priority store operations. Because the processor can dynamically optimize
run-time ordering of load/store traffic, overall performance is improved.
A
R
C
H
D
E
V
I
F
Y
B
R
E
E
S
M
O
C
I
N
D
U
C
.
C
N
I
Freescale Semiconductor, I
PowerPC 601 RISC Microprocessor Technical Summary
For More Information On This Product,
Go to: www.freescale.com
9
..
.
nc
Freescale Semiconductor, I
Freescale Semiconductor, Inc.
10
Part 2 Levels of the PowerPC Architecture
The PowerPC architecture consists of the following layers, and adherence to the PowerPC architecture can
be measured in terms of which of the following levels of the architecture is implemented:
•PowerPC user instruction set architecture—Defines the base user-level instruction set, user-level
registers, data types, floating-point exception model, memory models for a uniprocessor
environment, and programming model for uniprocessor environment.
•PowerPC virtual environment architecture—Describes the memory model for a multiprocessor
environment, defines cache control instructions, and describes other aspects of virtual
environments. Implementations that conform to the Po werPC virtual en vironment architecture also
adhere to the PowerPC user instruction set architecture, but may not necessarily adhere to the
PowerPC operating environment architecture.
•PowerPC operating environment architecture—Defines the memory management model,
supervisor-level registers, synchronization requirements, and the exception model.
Implementations that conform to the PowerPC operating environment architecture also adhere to
the PowerPC user instruction set architecture and the PowerPC virtual environment architecture
definition.
Note that while the 601 is said to adhere to the PowerPC architecture at all three lev els, it diver ges in aspects
of its implementation to a greater extent than should be expected of subsequent PowerPC processors. Man y
of the differences result from the fact that the 601 design provides compatibility with an e xisting architecture
standard (POWER), while providing a reliable platform for hardware and softw are dev elopment compatible
F
with subsequent PowerPC processors.
Note that except for the POWER instructions and the RTC implementation, the differences between the 601
and the PowerPC architecture are primarily differences in the operating environment architecture.
The PowerPC architecture allows a wide range of designs for such features as cache and system interface
implementations.
A
R
C
H
D
E
V
I
Y
B
R
E
E
S
C
A
ESE
L
M
O
C
I
N
D
U
C
T
O
R
,
.
C
N
I
Part 3 P owerPC 601 Micr opr ocessor: Implementation
The PowerPC architecture is derived from the IBM Performance Optimized with Enhanced RISC
(POWER) architecture. The Po werPC architecture shares the benefits of the POWER architecture optimized
for single-chip implementations. The architecture design facilitates parallel instruction execution and is
scalable to take advantage of future technological gains. For compatibility, the 601 also implements
instructions from the POWER user programming model that are not part of the PowerPC definition.
Part 3, “PowerPC 601 Microprocessor: Implementation,” describes the PowerPC architecture in general,
noting where the 601 differs. The organization of P art 3 follows the sequence of the chapters in the
601 RISC Microprocessor User’s Manual
as follows:
PowerPC
•Features—Section 3.1, “Features,” describes general features that the 601 shares with the PowerPC
family of microprocessors. It does not list PowerPC features not implemented in the 601.
•Registers and programming model—Section 3.2, “Registers and Programming Model,” describes
the registers for the operating environment architecture common among PowerPC processors and
describes the programming model. It also describes differences in how the re gisters are used in the
601 and describes the additional registers that are unique to the 601.
•Instruction set and addressing modes—Section 3.3, “Instruction Set and Addressing Modes,”
describes the PowerPC instruction set and addressing modes for the PowerPC operating
environment architecture. It defines the PowerPC instructions implemented in the 601 as well as
additional instructions implemented in the 601 but not defined in the PowerPC architecture.
For More Information On This Product,
Go to: www.freescale.com
PowerPC 601 RISC Microprocessor Technical Summary
Loading...
+ 22 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.