All Rights Reserved
Printed in the United States of America September 2002
The following are trademarks of International Business Machines Corporation in the United States, or other countries, or
both.
IBMIBM Logo
CoreConnect
PowerPCPowerPC logo
PowerPC Architecture
RISCTrace RISCWatch
Other company, product, and service names may be trademarks or service marks of others.
All information contained in this document is subject to change without notice. The products described in this document
are NOT intended for use in implantation, life support, space, nuclear, or military applications, or other hazardous uses
where malfunction could result in death, bodily injury, or catastrophic property damage. The information contained in this
document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as
an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information
contained in this document was obtained in specific environments, and is presented as an illustration. The results
obtained in other operating environments may vary.
hile the information contained herein is believed to be accurate, such information is preliminary, and should not be
elied upon for accuracy or completeness, and no representations or warranties of accuracy or completeness are made
Note: This document contains information on products in the sampling and/or initial production phases of
development. This information is subject to change without notice. Verify with your IBM field applications
engineer that you have the latest version of this document before finalizing a design.
THE INFORMATION CONTAINED IN THIS DOCUMENT IS PROVIDED ON AN “AS IS” BASIS. In no event will IBM be
liable for damages arising directly or indirectly from any use of the information contained in this document.
IBM Microelectronics Division
1580 Route 52, Bldg. 504
Hopewell Junction, NY 12533-6351
The IBM home page can be found at
The IBM Microelectronics Division home page can be found at
7.1 Time Base ..................................................................................................................................... 209
7.1.1 Reading the Time Base ....................................................................................................... 210
7.1.2 Writing the Time Base ......................................................................................................... 210
Table A-5.PPC440x5 Instructions by Opcode ......................................................................................559
ppc440x5LOT.fm.
September 12, 2002
Page 21 of 583
User’s Manual
PPC440x5 CPU CorePreliminary
Page 22 of 583
ppc440x5LOT.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
About This Book
This user’s manual provides the architectural overview, programming model, and detailed information about
the instruction set, registers, and other facilities of the IBM™ Book-E Enhanced PowerPC™ 440x5
(PPC440x5™) 32-bit embedded controller core.
The PPC440x5 embedded controller core features:
• Book-E Enhanced PowerPC Architecture™
• Dual-issue superscalar pipeline with dynamic branch prediction
• Separate, configurable (up to 32KB each) instruction and data caches, with cache line locking
• DSP acceleration with 24 new integer multiply-accumulate (MAC) instructions
• Memory Management Unit (MMU) with 64-entry TLB and support for page sizes of 1KB–256MB
• 64GB (36-bit) physical address capability
• 128-bit PLB interface, part of the IBM CoreConnect™ on-chip system bus architecture
• JTAG debug interface with extensive integrated debug facilities, including real-time trace
Who Should Use This Book
This book is for system hardware and software developers, and for application developers who need to
understand the PPC440x5. The audience should understand embedded system design, operating systems,
RISC microprocessing, and computer organization and architecture.
How to Use This Book
This book describes the PPC440x5 device architecture, programming model, registers, and instruction set.
This book contains the following chapters:
Chapter 1.Overview
Chapter 2.Programming Model
Chapter 3.Initialization
Chapter 4.Instruction and Data Caches
Chapter 5.Memory Management
Chapter 6.Interrupts and Exceptions
Chapter 7.Timer Facilities
Chapter 8.Debug Facilities
Chapter 9.Instruction Set
Chapter 10.Register Summary
This book contains the following appendixes:
Appendix A. Instruction Summary
Appendix B. PPC440 Core Compiler Optimizations
Appendix B contains preliminary information.
To help readers find material in these chapters, this book contains:
preface.fm.
September 12, 2002
Page 23 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Contents, on page v.
Figures, on page xi.
Tables, on page xiii.
Index, on page 571.
Notation
The manual uses the following notational conventions:
• Active low signals are shown with an overbar (Active_Low)
• All numbers are decimal unless specified in some special way.
• 0bnnnn means a number expressed in binary format.
• 0xnnnn means a number expressed in hexadecimal format.
Underscores may be used between digits.
• RA refers to General Purpose Register (GPR) RA.
• (RA) refers to the contents of GPR RA.
• (RA|0) refers to the contents of GPR RA, or to the value 0 if the RA field is 0.
• Bits in registers, instructions, and fields are specified as follows.
• Bits are numbered most-significant bit to least-significant bit, starting with bit 0.
Note: This document differs from the Book-E architecture specification in the use of bit numbering
for architected registers. Book-E defines the full, 64-bit instruction set architecture, and all
registers are shown as having bit numbers from 0 to 63, with bit 63 being the least
significant. This manual describes a 32-bit subset implementation of the architecture.
Architected registers are described as being 32 bits long, with bits numbered from 0 to 31,
and with bit 31 being the least significant. When this document refers to register bits 0 to 31,
they actually correspond to bits 32 to 63 of the same register in the Book-E architecture
specification.
•Xp means bit p of register, instruction, or field X
•X
means bits p through q of register, instruction, or field X
p:q
•X
means bits p, q,... of register, instruction, or field X
p,q,...
• X[p] means a named field p of register X.
• X[p:q] means named fields p through q of register X.
• X[p,q,...]
means named fields p, q,... of register X.
...
• ¬X means the ones complement of the contents of X.
• A period (.) as the last character of an instruction mnemonic means that the instruction records status
information in certain fields of the Condition Register as a side effect of execution, as described in
Chapter 9, “Instruction Set.”
• The symbol || is used to describe the concatenation of two values. For example, 0b010 || 0b111 is the
same as 0b010111.
•xn means x raised to the n power.
preface.fm.
Page 24 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
•nx means the replication of x, n times (that is, x concatenated to itself n – 1 times).n0 andn1 are special
cases:
•n0 means a field of n bits with each bit equal to 0. Thus50 is equivalent to 0b00000.
•n1 means a field of n bits with each bit equal to 1. Thus51 is equivalent to 0b11111.
• /, //, ///, ... denotes a reserved field in an instruction or in a register.
• ? denotes an allocated bit in a register.
• A shaded field denotes a field that is reserved or allocated in an instruction or in a register.
Related Publications
The following book describes the Book-E Enhanced PowerPC Architecture:
•
Book E: PowerPC Architecture Enhanced for Embedded Applications
The following CD-ROM contains publications describing the IBM PowerPC 400 family of embedded controllers, including this manual PowerPC PPC440x5 User’s Manual, and application and technical notes.
•IBM PowerPC Embedded Processor Solutions (Order Number SC09-3032)
preface.fm.
September 12, 2002
Page 25 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Page 26 of 589
preface.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
1. Overview
The IBM™ PowerPC™ 440x5 32-bit embedded processor core, referred to as the PPC440x5 core, implements the Book-E Enhanced PowerPC Architecture.
This chapter describes:
• PPC440x5 core features
• The PPC440x5 core as an implementation of the Book-E Enhanced PowerPC Architecture
• The organization of the PPC440x5 core, including a block diagram and descriptions of the functional units
• PPC440x5 core interfaces
1.1 PPC440x5 Features
The PPC440x5 core is a high-performance, low-power engine that implements the flexible and powerful
Book-E Enhanced PowerPC Architecture.
The PPC440x5 contains a dual-issue, superscalar, pipelined processing unit, along with other functional
elements required by embedded ASIC product specifications. These other functions include memory
management, cache control, timers, and debug facilities. Interfaces for custom co-processors and floating
point functions are provided, along with separate instruction and data cache array interfaces which can be
configured to various sizes (optimized for 32KB). The processor local bus (PLB) system interface has been
extended to 128 bitsand is fully compatible with the IBM CoreConnect on-chip system architecture, providing
the framework to efficiently support system-on-a-chip (SOC) designs.
In addition, the PPC440x5 core is a member of the PowerPC 400 Series of advanced embedded processors
cores, which is supported by the PowerPC Embedded Tools Program. In this program, IBM and many thirdparty vendors offer a full range of robust development tools for embedded applications. Among these are
compilers, debuggers, real-time operating systems, and logic analyzers.
PPC440x5 features include:
• High performance, dual-issue, superscalar 32-bit RISC CPU
• Superscalar implementation of the full 32-bit Book-E Enhanced PowerPC Architecture
• Dual instruction fetch, decode, and out-of-order issue
• Out-of-order dispatch, execution, and completion
• High-accuracy dynamic branch prediction using a Branch History Table (BHT)
• Reduced branch latency using Branch Target Address Cache (BTAC)
• Three independent pipelines
• Combined complex integer, system, and branch pipeline
• Simple integer pipeline
• Load/store pipeline
• Single cycle multiply
• Single cycle multiply-accumulate (DSP instruction set extensions)
overview.fm.
September 12, 2002
Page 27 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
• 9-port (6-read, 3-write) 32x32-bit General Purpose Register (GPR) file
• Hardware support for all CPU misaligned accesses
• Full support for both big and little endian byte ordering
• Extensive power management designed into core for maximum performance/power efficiency
• Primary caches
• Independently configurable instruction and data cache arrays
• Array size offerings: 32KB, 16KB, and 8KB
• Single-cycle access
• 32-byte (eight word) line size
• Highly-associative (64-way for 32KB/16KB, 32-way for 8KB)
• Write-back and write-through operation
• Control over whether stores will allocate or write-through on cache miss
• Extensive load/store queues and multiple line fill/flush buffers
• Non-blocking with up to four outstanding load misses
• Cache line locking supported
• Caches can be partitioned to provide separate regions for “transient” instructions and data
• High associativity permits efficient allocation of cache memory
• Critical word first data access and forwarding
• Cache tags and data are parity protected against soft errors.
• Memory Management Unit
• Separate instruction and data shadow TLBs
• 64-entry, fully-associative unified TLB array
• Variable page sizes (1KB-256MB), simultaneously resident in TLB
• 4-bit extended real address for 36-bit (64 GB) addressability
• Flexible TLB management with software page table search
• Storage attibute controls for write-through, caching inhibited, guarded, and byte order (endianness)
• Four user-definable storage attribute controls (for controlling CodePack™ code compression and
transient data, for example)
• TLB tags and data are parity protected against soft errors.
• Debug facilities
• Extensive hardware debug facilities incorporated into the IEEE 1149.1 JTAG port
• Multiple instruction and data address breakpoints (including range)
• Data value compare
• Single-step, branch, trap, and other debug events
• Non-invasive real-time software trace interface
• Timer facilities
– 64-bit time base
Page 28 of 589
overview.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
– Decrementer with auto-reload capability
– Fixed Interval Timer (FIT)
– Watchdog Timer with critical interrupt and/or auto-reset
• Multiple core Interfaces defined by the IBM CoreConnect on-chip system architecture
• PLB interfaces
• Three independent 128-bit interfaces for instruction reads, data reads, and data writes
• Glueless attachment to 32-, 64-, or 128-bit CoreConnect system environments
• Multiple CPU:PLB frequency ratios supported (N:1, N:2, N:3)
• 6.4 GB/sec maximum data rate to CPU
• On-chip memory (OCM) integration capability over the PLB interface
• Auxiliary Processor Unit (APU) Port
• Provides functional extensions to the processor pipelines, including GPR file operations
• 128-bit load/store interface (direct access between APU and the primary data cache)
• Interface can support APU execution of all PowerPC floating point instructions
• Attachment capability for DSP co-processing such as accumulators and SIMD computation
• Enables customer-specific instruction enhancements for multimedia applications
• Device Control Register (DCR) interface for independent access to on-chip control registers
• Avoids contention for high-bandwidth PLB system bus
• Clock and power management interface
• JTAG debug interface
1.2 The PPC440x5 as a PowerPC Implementation
The PPC440x5 core implements the full, 32-bit fixed-point subset of the Book-E Enhanced PowerPC Architecture. The PPC440x5 core fully complies with these architectural specifications. The 64-bit operations of
the architecture are not supported, and the core does not implement the floating point operations, although a
floating point unit (FPU) may be attached (using the APU interface). Within the core, the 64-bit operations and
the floating point operations are trapped, and the floating point operations can be emulated using software.
See Appendix A of the Book-E Enhanced PowerPC Architecture specification for more information on 32-bit
subset implementations of the architecture.
Note: This document differs from the Book-E architecture specification in the use of bit numbering for
architected registers. Specifically, Book-E defines the full, 64-bit instruction set architecture, and
thus all registers are shown as having bit numbers from 0 to 63, with bit 63 being the least
significant. On the other hand, this document describes the PPC440x5 core, which is a 32-bit
subset implementation of the architecture. Accordingly, all architected registers are described as
being 32 bits in length, with the bits numbered from 0 to 31, and with bit 31 being the least
significant. Therefore, when this document makes reference to register bit numbers from 0 to 31,
they actually correspond to bits 32 to 63 of the same register in the Book-E architecture
specification.
overview.fm.
September 12, 2002
Page 29 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
1.3 PPC440x5 Organization
The PPC440x5 core includes a seven-stage pipelined PowerPC core, which consists of a three stage, dualissue instruction fetch and decode unit with attached branch unit, together with three independent, 4-stage
pipelines for complex integer, simple integer, and load/store operations, respectively. The PPC440x5 core
also includes a memory management unit (MMU); separate instruction and data cache units; JTAG, debug,
and trace logic; and timer facilities.
Figure 1-1 illustrates the logical organization of the PPC440x5 core:
128-bit
PLB
I-Cache Controller
Instruction
Unit
IssueIssue
0
Complex
Integer
Pipe
MAC
Instruction Cache
(Size Configurable)
ITLB
Branch
Unit
Target
Addr
File
Cache
Simple
Integer
Pipe
1
GPR
Data Cache
(Size Configurable)
MMU
64-entry
4KB
BHT
GPR
File
Load/Store Queues
DTLB
D-Cache Controller
DCR Bus
Debug
JTAG
Load
Store
Pipe
Interrupt
Clocks
128-bit
PLB
Trace
and
Timers
and
Pwr Mgmt
Figure 1-1. PPC440 Core Block Diagram
1.3.1 Superscalar Instruction Unit
The instruction unit of the PPC440x5 core fetches, decodes, and issues two instructions per cycle to any
combination of the three execution pipelines and/or the APU interface (see “Execution Pipelines” below, and
Auxiliary Processor Unit (APU) Port on page 36). The instruction unit includes a branch unit which provides
dynamic branch prediction using a branch history table (BHT), as well as a branch target address cache
(BTAC). These mechanisms greatly improve the branch prediction accuracy and reduce the latency of taken
branches, such that the target of a branch can usually be executed immediately after the branch itself, with no
penalty.
overview.fm.
Page 30 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
1.3.2 Execution Pipelines
The PPC440x5 core contains three execution pipelines: complex integer, simple integer, and load/store.
Each pipeline consists of four stages and can access the nine-ported (six read, three write) GPR file. In order
to improve performance and avoid contention for the GPR file, there are two identical copies of it. One is dedicated to the complex integer pipeline, while the other is shared by the simple integer and the load/store pipelines.
The complex integer pipeline handles all arithmetic, logical, branch, and system management instructions
(such as interrupt and TLB management, move to/from system registers, and so on). This pipeline also
handles multiply and divide operations, and 24 DSP instructions that perform a variety of multiply-accumulate
operations. The complex integer pipeline multiply unit can perform 32-bit × 32-bit multiply operations with
single-cycle throughput and three-cycle latency;16-bit × 32-bit multiply operations have only two-cycle
latency. Divide operations take 33 cycles.
The simple integer pipeline can handle most arithmetic and logical operations which do not update the Condition Register (CR).
The load/store pipeline handles all load, store, and cache management instructions. All misaligned operations are handled in hardware, with no penalty on any operation which is contained within an aligned 16-byte
region. The load/store pipeline supports all operations to both big endian and little endian data regions.
Appendix B, “PPC440x5 Core Compiler Optimizations,” provides detailed information on instruction timings
and performance implications in the PPC440x5 core.
1.3.3 Instruction and Data Cache Controllers
The PPC440x5 core provides separate instruction and data cache controllers and arrays, which allow concurrent access and minimize pipeline stalls. The storage capacity of the cache arrays, which can range from
8KB–32KB each, depends upon the implementation. Both cache controllers have 32-byte lines, and both are
highly-associative, with 64-way set-associativity for 32KB and 16KB sizes, and 32-way set-associativity for
the 8KB size. Both caches support parity checking on the tags and data in the memory arrays, to protect
against soft errors. If a parity error is detected, the CPU will cause a machine check exception.
The PowerPC instruction set provides a rich set of cache management instructions for software-enforced
coherency. The PPC440x5 implementation also provides special debug instructions that can directly read the
tag and data arrays. See Chapter 4, “Instruction and Data Caches,” for detailed information about the instruction and data cache controllers.
The cache controllers connect to the PLB for connection to the IBM CoreConnect system-on-a-chip environment.
1.3.3.1 Instruction Cache Controller (ICC)
The ICC delivers two instructions per cycle to the instruction unit of the PPC440x5 core. The ICC also
handles the execution of the PowerPC instruction cache management instructions for coherency. The ICC
includes a speculative pre-fetch mechanism which can be configured to automatically pre-fetch a burst of up
to three additional lines upon any fetch request which misses in the instruction cache. These speculative prefetches can be abandoned if the instruction execution branches away from the original instruction stream.
overview.fm.
September 12, 2002
Page 31 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
The ICC supports cache line locking, at either an 8-line or 16-line granularity, depending on cache size (16line for 32KB, 8-line for 8KB and 16KB). In addition, the notion of a “transient” portion of the cache is
supported, in which the cache can be configured such that only a limited portion is used for instruction cache
lines from memory pages that are designated by a storage attribute from the MMU as being transient in
nature. Such memory pages would contain code which is unlikely to be reused once the processor moves on
to the next series of instruction lines, and thus performance may be improved by preventing each series of
instruction lines from overwriting all of the “regular” code in the instruction cache.
1.3.3.2 Data Cache Controller (DCC)
The DCC handles all load and store data accesses, as well as the PowerPC data cache management instructions. All misaligned accesses are handled in hardware, with those accesses that are contained within a halfline (16 bytes) being handled as a single request. Load and store accesses which cross a 16-byte boundary
are broken into two separate accesses by the hardware.
The DCC interfaces to the APU port to provide direct load/store access to the data cache for APU load and
store operations. Such APU load and store instructions can access up to 16 bytes (one quadword) in a single
cycle.
The data cache can be operated in a store-in (copy-back) or write-through manner, according to the writethrough storage attribute specified for the memory page by the MMU. The DCC also supports both “storewith-allocate” and “store-without-allocate” operations, such that store operations that miss in the data cache
can either “allocate” the line in the cache by reading it in and storing the new data into the cache, or alternatively bypassing the cache on a miss and simply storing the data to memory. This characteristic can also be
specified on a page-by-page basis by a storage attribute in the MMU.
The DCC also supports cache line locking and “transient” data, in the same manner as the ICC (see Instruc-tion Cache Controller (ICC) on page 31).
The DCC provides extensive load, store, and flush queues, such that up to three outstanding line fills and up
to four outstanding load misses can be pending, and the DCC can continue servicing subsequent load and
store hits in an out-of-order fashion. Store-gathering can also be performed on caching inhibited, writethrough, and “without-allocate” store operations, for up to 16 contiguous bytes. Finally, each cache line has
four separate “dirty” bits (one per doubleword), so that the amount of data flushed on cache line replacement
can be minimized.
1.3.4 Memory Management Unit (MMU)
The PPC440x5 supports a flat, 36-bit (64GB) real (physical) address space. This 36-bit real address is generated by the MMU, as part of the translation process from the 32-bit effective address, which is calculated by
the processor core as an instruction fetch or load/store address.
The MMU provides address translation, access protection, and storage attribute control for embedded applications. The MMU supports demand paged virtual memory and other management schemes that require
precise control of logical to physical address mapping and flexible memory protection. Working with appropriate system level software, the MMU provides the following functions:
• Translation of the 32-bit effective address space into the 36-bit real address space
• Page level read, write, and execute access control
• Storage attributes for cache policy, byte order (endianness), and speculative memory access
• Software control of page replacement strategy
Page 32 of 589
overview.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
The translation lookaside buffer (TLB) is the primary hardware resource involved in the control of translation,
protection, and storage attributes. It consists of 64 entries, each specifying the various attributes of a given
page of the address space. The TLB is fully-associative; the entry for a given page can be placed anywhere
in the TLB. The TLB tag and data memory arrays are parity protected against soft errors; if a parity error is
detected, the CPU will cause a machine check exception.
Software manages the establishment and replacement of TLB entries. This gives system software significant
flexibility in implementing a custom page replacement strategy. For example, to reduce TLB thrashing or
translation delays, software can reserve several TLB entries for globally accessible static mappings. The
instruction set provides several instructions for managing TLB entries. These instructions are privileged and
the processor must be in supervisor state in order for them to be executed.
The first step in the address translation process is to expand the effective address into a virtual address. This
is done by taking the 32-bit effective address and appending to it an 8-bit Process ID (PID), as well as a 1-bit
“address space” identifier (AS). The PID value is provided by the PID register (see Chapter 5, “Memory
Management”). The AS identifier is provided by the Machine State Register (MSR, see Chapter 6, “Interrupts
and Exceptions,” which contains separate bits for the instruction fetch address space (MSR[IS]) and the data
access address space (MSR[DS]). Together, the 32-bit effective address, the 8-bit PID, and the 1-bit AS form
a 41-bit virtual address. This 41-bit virtual address is then translated into the 36-bit real address using the
TLB.
The MMU divides the address space (whether effective, virtual, or real) into pages. Eight page sizes (1KB,
4KB, 16KB, 64KB, 256KB, 1MB, 16MB, 256MB) are simultaneously supported, such that at any given time
the TLB can contain entries for any combination of page sizes. In order for an address translation to occur, a
valid entry for the page containing the virtual address must be in the TLB. An attempt to access an address
for which no TLB entry exists causes an Instruction (for fetches) or Data (for load/store accesses) TLB Error
exception.
To improve performance, both the instruction cache and the data cache maintain separate “shadow” TLBs.
The instruction shadow TLB (ITLB) contains four entries, while the data shadow TLB (DTLB) contains eight.
These shadow arrays minimize TLB contention between instruction fetch and data load/store operations. The
instruction fetch and data access mechanisms only accessthe main 64-entry unified TLB when a miss occurs
in the respective shadow TLB. The penalty for a miss in either of the shadow TLBs is three cycles. Hardware
manages the replacement and invalidation of both the ITLB and DTLB; no system software action is required.
Each TLB entry provides separate user state and supervisor state read, write, and execute permission
controls for the memory page associated with the entry. If software attempts to access a page for which it
does not have the necessary permission, an Instruction (for fetches) or Data (for load/store accesses)
Storage exception will occur.
Each TLB entry also provides a collection of storage attributes for the associated page. These attributes
control cache policy (such as cachability and write-through as opposed to copy-back behavior), byte order
(big endian as opposed to little endian), and enabling of speculative access for the page. In addition, a set of
four, user-definable storage attributes are provided. These attributes can be used to control various systemlevel behaviors, such as instruction compression using IBM CodePack technology. They can also be configured to control whether data cache lines are allocated upon a store miss, and whether accesses to a given
page should use the “normal” or “transient” portions of the instruction or data cache (see Chapter 4, “Instruction and Data Caches,” for detailed information about these features).
Chapter 5, “Memory Management,” describes the PPC440x5 MMU functions.
overview.fm.
September 12, 2002
Page 33 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
1.3.5 Timers
The PPC440x5 contains a Time Base and three timers: a Decrementer (DEC), a Fixed Interval Timer (FIT),
and a Watchdog Timer. The Time Base is a 64-bit counter which gets incremented at a frequency either
equal to the processor core clock rate or as controlled by a separate asynchronous timer clock input to the
core. No interrupt is generated as a result of the Time Base wrapping back to zero.
The DEC is a 32-bit register that is decremented at the same rate at which the Time Base is incremented.
The user loads the DEC register with a value to create the desired interval. When the register is decremented
to zero, a number of actions occur: the DEC stops decrementing, a status bit is set in the Timer Status
Register (TSR), and a Decrementer exception is reported to the interrupt mechanism of the PPC440x5 core.
Optionally, the DEC can be programmed to reload automatically the value contained in the Decrementer
Auto-Reload register (DECAR), after which the DEC resumes decrementing. The Timer Control Register
(TCR) contains the interrupt enable for the Decrementer interrupt.
The FIT generates periodic interrupts based on the transition of a selected bit from the Time Base. Users can
select one of four intervals for the FIT period by setting a control field in the TCR to select the appropriate bit
from the Time Base. When the selected Time Base bit transitions from 0 to 1, a status bit is set in the TSR
and a Fixed Interval Timer exception is reported to the interrupt mechanism of the PPC440x5 core. The FIT
interrupt enable is contained in the TCR.
Similar to the FIT, the Watchdog Timer also generates a periodic interrupt based on the transition of a
selected bit from the Time Base. Users can select one of four intervals for the watchdog period, again by
setting a control field in the TCR to select the appropriate bit from the Time Base. Upon the first transition
from 0 to 1 of the selected Time Base bit, a status bit is set in the TSR and a Watchdog Timer exception is
reported to the interrupt mechanism of the PPC440x5 core. The Watchdog Timer can also be configured to
initiate a hardware reset if a second transition of the selected Time Base bit occurs prior to the first Watchdog
exception being serviced. This capability provides an extra measure of recoverability from potential system
lock-ups.
The timer functions of the PPC440x5 core are more fully described in Chapter 7, “Timer Facilities.”
1.3.6 Debug Facilities
The PPC440x5 debug facilities include debug modes for the various types of debugging used during hardware and software development. Also included are debug events that allow developers to control the debug
process. Debug modes and debug events are controlled using debug registers in the chip. The debug registers are accessed either through software running on the processor, or through the JTAG port.
The debug modes, events, controls, and interfaces provide a powerful combination of debug facilities for
hardware development tools, such as the RISCWatch™ debugger from IBM.
A brief overview of the debug modes and development tool support are provided below. Chapter 8, “Debug
Facilities,” provides detailed information about each debug mode and other debug resources.
1.3.6.1 Debug Modes
The PPC440x5 core supports four debug modes: internal, external, real-time-trace, and debug wait. Each
mode supports a different type of debug tool used in embedded systems development. Internal debug mode
supports software-based ROM monitors, and external debug mode supports a hardware emulator type of
debug. Real-time-trace mode uses the debug facilities to indicate events within a trace of processor execu-
overview.fm.
Page 34 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
tion in real time. Debug wait mode enables the processor to continue to service real-time critical interrupts
while instruction execution is otherwise stopped for hardware debug. The debug modes are controlled by
Debug Control Register 0 (DBCR0) and the setting of bits in the Machine State Register (MSR).
Internal debug mode supports accessing architected processor resources, setting hardware and software
breakpoints, and monitoring processor status. In internal debug mode, debug events can generate debug
exceptions, which can interrupt normal program flow so that monitor software can collect processor status
and alter processor resources.
Internal debug mode relies on exception-handling software—running on the processor—along with an
external communications path to debug software problems. This mode is used while the processor continues
executing instructions and enables debugging of problems in application or operating system code. Access to
debugger software executing in the processor while in internal debug mode is through a communications port
on the processor board, such as a serial port or ethernet connection.
External debug mode supports stopping, starting, and single-stepping the processor, accessing architected
processor resources, setting hardware and software breakpoints, and monitoring processor status. In
external debug mode, debug events can architecturally “freeze” the processor. While the processor is frozen,
normal instruction execution stops, and the architected processor resources can be accessed and altered
using a debug tool (such as RISCWatch) attached through the JTAG port. This mode is useful for debugging
hardware and low-level control software problems.
1.3.6.2 Development Tool Support
The PPC440x5 provides powerful debug support for a wide range of hardware and software development
tools.
The OS Open real-time operating system debugger is an example of an operating system-aware debugger,
implemented using software traps.
RISCWatch is an example of a development tool that uses the external debug mode, debug events, and the
JTAG port to support hardware and software development and debugging.
The RISCTrace™ feature of RISCWatch is an example of a development tool that uses the real-time trace
capability of the PPC440x5.
1.4 Core Interfaces
Several interfaces to the PPC440x5 core support the IBM CoreConnect on-chip system architecture, which
simplifies the attachment of on-chip devices. These interfaces include:
• Processor local bus (PLB)
• Device configuration register (DCR) interface
• Auxiliary processor unit (APU) port
• JTAG, debug, and trace ports
• Interrupt interface
• Clock and power management interface
Several of these interfaces are described briefly in the sections below.
overview.fm.
September 12, 2002
Page 35 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
1.4.1 Processor Local Bus (PLB)
There are three independent 128-bit PLB interfaces to the PPC440x5 core. Each of these interfaces includes
a 36-bit address bus and a 128-bit data bus. One PLB interface supports instruction cache reads, while the
other two support data cache reads and writes, respectively. The frequency of each PLB interface can be
independently specified, allowing an IBM CoreConnect system in which the interfaces are not all connected
as part of the same PLB and in which each PLB subsystem operates at its own frequency. Each PLB interface frequency can be configured to any value such that the ratio of the processor core frequency to the PLB
(core:PLB) is n:1, n:2, or n:3, where n is any integer greater than or equal to the denominator of the ratio.
Each of the PLB interfaces supports connection to a PLB subsystem of either 32, 64, or 128 bits. The instruction and data cache controllers handle any dynamic data bus resizing which is required when the subsystem
data width is less than the 128 bits of the PPC440x5 core PLB interfaces.
The data cache PLB interfaces make requests for 32-byte lines, as well as for 1 - 15 bytes within a 16-byte
(quadword) aligned region. A 16-byte line request is used for quadword APU load operations to caching
inhibited pages, and for quadword APU store operations to caching inhibited, write-through, or “without allocate” pages.
The instruction cache controller makes 32-byte line read requests, and also presents quadword burst read
requests for up to three 32-byte lines (six quadwords), as part of its speculative line fill mechanism.
Each of the PLB interfaces fully supports the address pipelining capabilities of the PLB, and in fact can go
beyond the pipeline depth and minimum latency which the PLB supports. Specifically, each interface
supports up to three pipelined request/acknowledge sequences prior to performing the data transfers associated with the first request. For the data cache, if each of the requests must themselves be broken into three
separate transactions (for example, for a misaligned doubleword request to a 32-bit PLB slave), then the
interface actually supports up to nine outstanding request/acknowledge sequences prior to the first data
transfer. Furthermore, each PLB interface tolerates a zero-cycle latency between the request and the
address and data acknowledge (that is, the request, address acknowledge, and data acknowledge may all
occur in the same cycle).
1.4.2 Device Control Register (DCR) Interface
The DCR interface provides a mechanism for the PPC440x5 core to setup other on-chip facilities. For
example, programmable resources in an external bus interface unit may be configured for usage with various
memory devices according to their transfer characteristics and address assignments. DCRs are accessed
through the use of the PowerPC mfdcr and mtdcr instructions.
The interface is interlocked with control signals such that it may be connected to peripheral units that may be
clocked at different frequencies from the processor core. The design allows for future expansion of the noncore facilities without changing the I/O on either the PPC440x5 core or the ASIC peripherals.
The DCR interface also allows the PPC440x5 core to communicate with peripheral devices without using the
PLB interface, thereby avoiding the impact to the primary system bus bandwidth, and without additional
segmentation of the useable address map.
1.4.3 Auxiliary Processor Unit (APU) Port
This interface provides the PPC440x5 core with the flexibility for attaching a tightly-coupled coprocessor-type
macro incorporating instructions which go beyond those provided within the processor core itself. The APU
port provides sufficient functionality for attachment of various coprocessor functions such as a fully-compliant
overview.fm.
Page 36 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
PowerPC floating point unit (single or double precision), multimedia engine, DSP, or other custom function
implementing algorithms appropriate for specific system applications. The APU interface supports dual-issue
pipeline designs, and can be used with macros that contain their own register files, or with simpler macros
which use the CPU GPR file for source and/or target operands. APU load and store instructions can directly
access the PPC440x5 data cache, with operands of up to a quadword (16 bytes) in length.
The APU interface provides the capability for a coprocessor to execute concurrently with the PPC440x5 core
instructions that are not part of the PowerPC instruction set. Accordingly, areas have been reserved within
the architected instruction space to allow for these customer-specific or application-specific APU instruction
set extensions.
1.4.4 JTAG Port
The PPC440x5 JTAG port is enhanced to support the attachment of a debug tool such as the RISCWatch
product from IBM. Through the JTAG test access port, and using the debug facilities designed into the
PPC440x5 core, a debug workstation can single-step the processor and interrogate internal processor state
to facilitate hardware and software debugging. The enhancements comply with the IEEE 1149.1 specification
for vendor-specific extensions, and are therefore compatible with standard JTAG hardware for boundaryscan system testing.
overview.fm.
September 12, 2002
Page 37 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Page 38 of 589
overview.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
2. Programming Model
The programming model of the PPC440x5 core describes how the following features and operations of the
core appear to programmers:
• Storage addressing (including data types and byte ordering), starting on page 39
• Registers, starting on page 47
• Instruction classes, starting on page 53
• Instruction set, starting on page 56
• Branch processing, starting on page 64
• Integer processing, starting on page 71
• Processor control, starting on page 74
• User and supervisor state, starting on page 80
• Speculative access, starting on page 81
• Synchronization, starting on page 82
2.1 Storage Addressing
As a 32-bit implementation of the Book-E Enhanced PowerPC Architecture, the PPC440x5 core implements
a uniform 32-bit effective address (EA) space. Effective addresses are expanded into virtual addresses and
then translated to 36-bit (64GB) real addresses by the memory management unit (see Memory Management
on page 133 for more information on the translation process). The organization of the real address space into
a physical address space is system-dependent, and is described in the user’s manuals for chip-level products
that incorporate a PPC440x5 core.
The PPC440x5 generates an effective address whenever it executes a storage access, branch, cache
management, or translation lookaside buffer (TLB) management instruction, or when it fetches the next
sequential instruction.
2.1.1 Storage Operands
Bytes in storage are numbered consecutively starting with 0. Each number is the address of the corresponding byte.
Data storage operands accessed by the integer load/store instructions may be bytes, halfwords, words, or—
for load/store multiple and string instructions—a sequence of words or bytes, respectively. Data storage operands accessed by auxiliary processor (AP) load/store instructions can be bytes, halfwords, words, doublewords, or quadwords. The address of a storage operand is the address of its first byte (that is, of its lowestnumbered byte). Byte ordering can be either big endian or little endian, as controlled by the endian storage
attribute (see Byte Ordering on page 42; also see Endian (E) on page 146 for more information on the endian
storage attribute).
Operand length is implicit for each scalar storage access instruction type (that is, each storage access
instruction type other than the load/store multiple and string instructions). The operand of such a scalar
storage access instruction has a “natural” alignment boundary equal to the operand length. In other words,
the ‘natural’ address of an operand is an integral multiple of the operand length. A storage operand is said to
be aligned if it is aligned at its natural boundary: otherwise it is said to be unaligned.
prgmodel.fm.
September 12, 2002
Page 39 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Data storage operands for storage access instructions have the following characteristics.
Table 2-1. Data Operand Definitions
Storage Access Instruction
Type
Byte (or String) 8 bits0bxxxx
Halfword 2 bytes0bxxx0
Word (or Multiple) 4 bytes0bxx00
Doubleword (AP only) 8 bytes0bx000
Quadword (AP only) 16 bytes0b0000
Operand
Length
Addr[28:31] if aligned
Note: An “x” in an address bit position indicates that the bit can be 0
or 1 independent of the state of other bits in the address.
The alignment of the operand effective address of some storage access instructions may affect performance,
and in some cases may cause an Alignment exception to occur. For such storage access instructions, the
best performance is obtained when the storage operands are aligned. Table 2-2 summarizes the effects of
alignment on those storage access instruction types for which such effects exist. If an instruction type is not
shown in the table, then there are no alignment effects for that instruction type.
Table 2-2. Alignment Effects for Storage Access Instructions
Storage Access
InstructionType
Integer load/store halfword
Integer load/store word
Integer load/store multiple or string
AP load/store halfword
AP load/store word
AP load/store doubleword
AP load/store quadword
Broken into two byte accesses if crosses 16-byte boundary (EA[28:31] = 0b1111); otherwise
no effect
Broken into two accesses if crosses 16-byte boundary (EA[28:31] > 0b1100); otherwise no
effect
Broken into a series of 4-byte accesses untilthe last byte is accessed or a 16-byte boundary is
reached, whichever occurs first. If bytes remain past a 16-byte boundary, resume accessing 4
bytes at a time until the last byte is accessed or the next 16-byte boundary is reached, whichever occurs first; repeat.
Alignment exception if crosses 16-byte boundary (EA[28:31] = 0b1111); otherwise no effect
(see note)
Alignment exception if crosses 16-byte boundary (EA[28:31] > 0b1100); otherwise no effect
(see note)
Alignment exception if crosses 16-byte boundary (EA[28:31] > 0b1000); otherwise no effect
(see note)
Alignment exception if crosses 16-byte boundary (EA[28:31] ≠ 0b0000); otherwise no effect
Alignment Effects
Note: An auxiliary processor can specify that the EA for a givenAP load/store instruction must be aligned at the
operand-size boundary, or alternatively, at a word boundary.If the AP so indicates this requirement and the
calculated EA fails to meet it, the PPC440x5 core generates an Alignment exception. Alternatively, an
auxiliary processor can specify that the EA for a given AP load/store instruction should be “forced” to be
aligned, by ignoring the appropriate number of low-order EA bits and processing the AP load/store as if
those bits were 0. Byte, halfword, word, doubleword,andquadwordAP load/store instructions would ignore
0, 1, 2, 3, and 4 low-order EA bits, respectively.
Cache management instructions access cache block operands, and for the PPC440x5 core the cache block
size is 32 bytes. However, the effective addresses calculated by cache management instructions are not
required to be aligned on cache block boundaries. Instead, the architecture specifies that the associated loworder effective address bits (bits 27:31 for PPC440x5) are ignored during the execution of these instructions.
prgmodel.fm.
Page 40 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
Similarly, the TLB management instructions access page operands, and—as determined by the page size—
the associated low-order effective address bits are ignored during the execution of these instructions.
Instruction storage operands, on the other hand, are always four bytes long, and the effective addresses
calculated by Branch instructions are therefore always word-aligned.
2.1.2 Effective Address Calculation
For a storage access instruction, if the sum of the effective address and the operand length exceeds the
maximum effective address of 232–1 (that is, the storage operand itself crosses the maximum address
boundary), the result of the operation is undefined, as specified by the architecture. The PPC440x5 core
performs the operation as if the storage operand wrapped around from the maximum effective address to
effective address 0. Software, however, should not depend upon this behavior, so that it may be ported to
other implementations that do not handle this scenario in the same fashion. Accordingly, software should
ensure that no data storage operands cross the maximum address boundary.
Note that since instructions are words and since the effective addresses of instructions are always implicitly
on word boundaries, it is not possible for an instruction storage operand to cross any word boundary,
including the maximum address boundary.
Effective address arithmetic, which calculates the starting address for storage operands, wraps around from
the maximum address to address 0, for all effective address computations except next sequential instruction
fetching. See Instruction Storage Addressing Modes on page 41 for more information on next sequential
instruction fetching at the maximum address boundary.
2.1.2.1 Data Storage Addressing Modes
There are two data storage addressing modes supported by the PPC440x5 core:
• Base + displacement (D-mode) addressing mode:
The 16-bit D field is sign-extended and added to the contents of the GPR designated by RA or to zero if
RA = 0; the low-order 32 bits of the sum form the effective address of the data storage operand.
• Base + index (X-mode) addressing mode:
The contents of the GPR designated by RB (or the value 0 for lswi and stswi) are added to the contents
of the GPR designated by RA, or to 0 if RA = 0; the low-order 32 bits of the sum form the effective
address of the data storage operand.
2.1.2.2 Instruction Storage Addressing Modes
There are four instruction storage addressing modes supported by the PPC440x5 core:
• I-form branch instructions (unconditional):
The 24-bit LI field is concatenated on the right with 0b00, sign-extended, and then added to either the
address of the branch instruction if AA=0, or to 0 if AA=1; the low-order 32 bits of the sum form the effective address of the next instruction.
• Taken B-form branch instructions:
prgmodel.fm.
September 12, 2002
Page 41 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
The 14-bit BD field is concatenated on the right with 0b00, sign-extended, and then added to either the
address of the branch instruction if AA=0, or to 0 if AA=1; the low-order 32 bits of the sum form the effective address of the next instruction.
• Taken XL-form branch instructions:
The contents of bits 0:29 of the Link Register (LR) or the Count Register (CTR) are concatenated on the
right with 0b00 to form the 32-bit effective address of the next instruction.
• Next sequential instruction fetching (including non-taken branch instructions):
The value 4 is added to the address of the current instruction to form the 32-bit effective address of the
next instruction. If the address of the current instruction is 0xFFFFFFFC, the PPC440x5 core wraps the
next sequential instruction address back to address 0. This behavior is not required by the architecture,
which specifies that the next sequential instruction address is undefined under these circumstances.
Therefore, software should not depend upon this behavior, so that it may be ported to other implementations that do not handle this scenario in the same fashion. Accordingly, if software wishes to execute
across this maximum address boundary and wrap back to address 0, it should place an unconditional
branch at the boundary, with a displacement of 4.
In addition to the above four instruction storage addressing modes, the following behavior applies to
branch instructions:
• Any branch instruction with LK=1:
The value 4 is added to the address of the current instruction and the low-order 32 bits of the result are
placed into the LR. As for the similar scenario for next sequential instruction fetching, if the address of the
branch instruction is 0xFFFF FFFC, the result placed into the LR is architecturally undefined, although
once again the PPC440x5 core wraps the LR update value back to address 0. Again, however, software
should not depend on this behavior, in order that it may be ported to implementations which do not handle this scenario in the same fashion.
2.1.3 Byte Ordering
If scalars (individual data items and instructions) were indivisible, there would be no such concept as “byte
ordering.” It is meaningless to consider the order of bits or groups of bits within the smallest addressable unit
of storage, because nothing can be observed about such order. Only when scalars, which the programmer
and processor regard as indivisible quantities, can comprise more than one addressable unit of storage does
the question of order arise.
For a machine in which the smallest addressable unit of storage is the 64-bit doubleword, there is no question
of the ordering of bytes within doublewords. All transfers of individual scalars between registers and storage
are of doublewords, and the address of the byte containing the high-order eight bits of a scalar is no different
from the address of a byte containing any other part of the scalar.
For the Book-E Enhanced PowerPC Architecture, as for most current computer architectures, the smallest
addressable unit of storage is the 8-bit byte. Many scalars are halfwords, words, or doublewords, which
consist of groups of bytes. When a word-length scalar is moved from a register to storage, the scalar occupies four consecutive byte addresses. It thus becomes meaningful to discuss the order of the byte addresses
with respect to the value of the scalar: which byte contains the highest-order eight bits of the scalar, which
byte contains the next-highest-order eight bits, and so on.
Given a scalar that contains multiple bytes, the choice of byte ordering is essentially arbitrary. There are 4! =
24 ways to specify the ordering of four bytes within a word, but only two of these orderings are sensible:
prgmodel.fm.
Page 42 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
• The ordering that assigns the lowest address to the highest-order (“left-most”) eight bits of the scalar, the
next sequential address to the next-highest-order eight bits, and so on.
This ordering is called big endian because the “big end” (most-significant end) of the scalar, considered
as a binary number, comes first in storage. IBM RISC System/6000, IBM System/390, and Motorola
680x0 are examples of computer architectures using this byte ordering.
• The ordering that assigns the lowest address to the lowest-order (“right-most”) eight bits of the scalar, the
next sequential address to the next-lowest-order eight bits, and so on.
This ordering is called little endian because the “little end” (least-significant end) of the scalar, considered
as a binary number, comes first in storage. The Intel x86 is an example of a processor architecture using
this byte ordering.
PowerPC Book-E supports both big endian and little endian byte ordering, for both instruction and data
storage accesses. Which byte ordering is used is controlled on a memory page basis by the endian (E)
storage attribute, which is a field within the TLB entry for the page. The endian storage attribute is set to 0 for
a big endian page, and is set to 1 for a little endian page. See Memory Management on page 133 for more
information on memory pages, the TLB, and storage attributes, including the endian storage attribute.
2.1.3.1 Structure Mapping Examples
The following C language structure, s, contains an assortment of scalars and a character string. The
comments show the value assumed to be in each structure element; these values show how the bytes
comprising each structure element are mapped into storage.
struct {
int a;/* 0x1112_1314 word */
long long b;/* 0x2122_2324_2526_2728 doubleword */
char *c;/* 0x3132_3334 word */
char d[7];/* 'A','B','C','D','E','F','G' array of bytes */
short e;/* 0x5152 halfword */
int f;/* 0x6162_6364 word */
} s;
C structure mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries.
The structure mapping examples below show each scalar aligned at its natural boundary. This alignment
introduces padding of four bytes between a and b, one byte between d and e, and two bytes between e and
f. The same amount of padding is present in both big endian and little endian mappings.
prgmodel.fm.
September 12, 2002
Page 43 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Big Endian Mapping
The big endian mapping of structure s follows (the data is highlighted in the structure mappings). Addresses,
in hexadecimal, are below the data stored at the address. The contents of each byte, as defined in structure
s, is shown as a (hexadecimal) number or character (for the string elements). The shaded cells correspond to
padded bytes.
11121314
0x000x010x020x030x040x050x060x07
2122232425262728
0x080x090x0A0x0B0x0C0x0D0x0E0x0F
31323334'A''B''C''D'
0x100x110x120x130x140x150x160x17
'E''F''G'5152
0x180x190x1A0x1B0x1C0x1D0x1E0x1F
61626364
0x200x210x220x230x240x250x260x27
Little Endian Mapping
Structure
s is shown mapped little endian.
14131211
0x000x010x020x030x040x050x060x07
2827262524232221
0x080x090x0A0x0B0x0C0x0D0x0E0x0F
34333231'A''B''C''D'
0x100x110x120x130x140x150x160x17
'E''F''G'5251
0x180x190x1A0x1B0x1C0x1D0x1E0x1F
64636261
0x200x210x220x230x240x250x260x27
2.1.3.2 Instruction Byte Ordering
PowerPC Book-E defines instructions as aligned words (four bytes) in memory. As such, instructions in a big
endian program image are arranged with the most-significant byte (MSB) of the instruction word at the
lowest-numbered address.
Consider the big endian mapping of instruction p at address 0x00, where, for example, p = add r7, r7, r4:
Page 44 of 589
MSBLSB
0x000x010x020x03
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
On the other hand, in a little endian mapping the same instruction is arranged with the least-significant byte
(LSB) of the instruction word at the lowest-numbered address:
LSBMSB
0x000x010x020x03
By the definition of PowerPC Book-E bit numbering, the most-significant byte of an instruction is the byte
containing bits 0:7 of the instruction. As depicted in the instruction format diagrams (see Instruction Formats
on page 250), this most-significant byte is the one which contains the primary opcode field (bits 0:5). Due to
this difference in byte orderings, the processor must perform whatever byte reversal is required (depending
on the particular byte ordering in use) in order to correctly deliver the opcode field to the instruction decoder.
In the PPC440x5, this reversal is performed between the memory interface and the instruction cache,
according to the value of the endian storage attribute for each memory page, such that the bytes in the
instruction cache are always correctly arranged for delivery directly to the instruction decoder.
If the endian storage attribute for a memory page is reprogrammed from one byte ordering to the other, the
contents of the memory page must be reloaded with program and data structures that are in the appropriate
byte ordering. Furthermore, anytime the contents of instruction memory change, the instruction cache must
be made coherent with the updates by invalidating the instruction cache and refetching the updated memory
contents with the new byte ordering.
2.1.3.3 Data Byte Ordering
Unlike instruction fetches, data accesses cannot be byte-reversed between memory and the data cache.
Data byte ordering in memory depends upon the data type (byte, halfword, word, and so on) of a specific data
item. It is only when moving a data item of a specific type from or to an architected register (as directed by the
execution of a particular storage access instruction) that it becomes known what kind of byte reversal may be
required due to the byte ordering of the memory page containing the data item. Therefore, byte reversal
during load or store accesses is performed between data cache (or memory, on a data cache miss, for
example) and the load register target or store register source, depending on the specific type of load or store
instruction (that is, byte, halfword, word, and so on).
Comparing the big endian and little endian mappings of structure s, as shown in Structure Mapping Examples
on page 43, the differences between the byte locations of any data item in the structure depends upon the
size of the particular data item. For example (again referring to the big endian and little endian mappings of
structure s):
• The word a has its four bytes reversed within the word spanning addresses 0x00 – 0x03.
• The halfword e has its two bytes reversed within the halfword spanning addresses 0x1C – 0x1D.
Note that the array of bytes d, where each data item is a byte, is not reversed when the big endian and little
endian mappings are compared. For example, the character 'A' is located at address 0x14 in both the big
endian and little endian mappings.
The size of the data item being loaded or stored must be known before the processor can decide whether,
and if so, how to reorder the bytes when moving them between a register and the data cache (or memory).
• For byte loads and stores, including strings, no reordering of bytes occurs, regardless of byte ordering.
• For halfword loads and stores, bytes are reversed within the halfword, for one byte order with respect to
the other.
prgmodel.fm.
September 12, 2002
Page 45 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
• For word loads and stores (including load/store multiple), bytes are reversed within the word, for one byte
order with respect to the other.
• For doubleword loads and stores (AP loads/stores only), bytes are reversed within the doubleword, for
one byte order with respect to the other.
• For quadword loads and stores (AP loads/stores only), bytes are reversed within the quadword, for one
byte order with respect to the other.
Note that this mechanism applies independent of the alignment of data. In other words, when loading a multibyte data operand with a scalar load instruction, bytes are accessed from the data cache (or memory) starting
with the byte at the calculated effective address and continuing with consecutively higher-numbered bytes
until the required number of bytes have been retrieved. Then, the bytes are arranged such that either the byte
from the highest-numbered address (for big endian storage regions) or the lowest-numbered address (for
little endian storage regions) is placed into the least-significant byte of the register. The rest of the register is
filled in corresponding order with the rest of the accessed bytes. An analogous procedure is followed for
scalar store instructions.
For load/store multiple instructions, each group of four bytes is transferred between memory and the register
according to the procedure for a scalar load word instruction.
For load/store string instructions, the most-significant byte of the first register is transferred to or frommemory
at the starting (lowest-numbered) effective address, regardless of byte ordering. Subsequent register bytes
(from most-significant to least-significant, and then moving into the next register, starting with the most-significant byte, and so on) are transferred to or from memory at sequentially higher-numbered addresses. This
behavior for byte strings ensures that if two strings are loaded into registers and then compared, the first
bytes of the strings are treated as most significant with respect to the comparison.
2.1.3.4 Byte-Reverse Instructions
PowerPC Book-E defines load/store byte-reverse instructions which can access storage which is specified as
being of one byte ordering in the same manner that a regular (that is, non-byte-reverse) load/store instruction
would access storage which is specified as being of the opposite byte ordering. In other words, a load/store
byte-reverse instruction to a big endian memory page transfers data between the data cache (or memory)
and the register in the same manner that a normal load/store would transfer the data to or from a little endian
memory page. Similarly, a load/store byte-reverse instruction to a little endian memory page transfers data
between the data cache (or memory) and the register in the same manner that a normal load/store would
transfer the data to or from a big endian memory page.
The function of the load/store byte-reverse instructions is useful when a particular memory page contains a
combination of data with both big endian and little endian byte ordering. In such an environment, the Endian
storage attribute for the memory page would be set according to the predominant byte ordering for the page,
and the normal load/store instructions would be used to access data operands which used this predominant
byte ordering. Conversely, the load/store byte-reverse instructions would be used to access the data operands which were of the other (less prevalent) byte ordering.
Software compilers cannot typically make general use of the load/store byte-reverse instructions, so they are
ordinarily used only in special, hand-coded device drivers.
Page 46 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
2.2 Registers
This section provides an overview of the register categories and types provided by the PPC440x5. Detailed
descriptions of each of the registers are provided within the chapters covering the functions with which they
are associated (for example, the cache control and cache debug registers are described in Instruction and
Data Caches on page 95). An alphabetical summary of all registers, including bit definitions, is provided in
Register Summary on page 451
All registers in the PPC440x5 core are architected as 32 bits wide, although certain bits in some registers are
reserved and thus not necessarily implemented. For all registers with fields marked as reserved, these
reserved fields should be written as 0 and read as undefined. The recommended coding practice is to
perform the initial write to a register with reserved fields set to 0, and to perform all subsequent writes to the
register using a read-modify-write strategy: read the register; use logical instructions to alter defined fields,
leaving reserved fields unmodified; and write the register.
All of the registers are grouped into categories according to the processor functions with which they are associated. In addition, each register is classified as being of a particular type, as characterized by the specific
instructions which are used to read and write registers of that type. Finally, most of the registers contained
within the PPC440x5 core are defined by the Book-E Enhanced PowerPC Architecture, although some registers are implementation-specific and unique to the PPC440x5.
Figure 2-1 on page 48 illustrates the PPC440x5 registers contained in the user programming model, that is,
those registers to which access is non-privileged and which are available to both user and supervisor
programs. Figure 2-2 on page 49 illustrates the PPC440x5 registers contained in the supervisor programming model, to which access is privileged and which are available to supervisor programs only. See User andSupervisor Modes on page 80 for more information on privileged instructions and register access, and the
user and supervisor programming models.
Table 2-3 on page 50, lists each register category and the registers that belong to each category, along with
their types and a cross-reference to the section of this document which describes them more fully. Registers
that are not part of PowerPC Book-E, and are thus specific to the PPC440x5, are shown in italics in
Table 2-3. Unless otherwise indicated, all registers have read/write access.
prgmodel.fm.
September 12, 2002
Page 47 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Integer Processing
General Purpose
GPR0
GPR1
GPR2
•
•
•
GPR31
Integer Exception Register
XER
Timer
Time Base
TBL
TBU
Branch Control
Condition Register
CR
Count Register
CTR
Link Register
LR
Processor Control
SPR General 4–7
SPRG4
SPRG5
SPRG5
SPRG7
User SPR General 0
USPRG0
Figure 2-1. User Programming Model Registers
Page 48 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
Processor Control
Machine State Register
MSR
Processor Version Register
PVR
Processor ID Register
PIR
Core Configuration Registers
CCR0
CCR1
Reset Configuration
RSTCFG
SPR General
SPRG0
•
•
•
SPRG7
Interrupt Processing
Exception Syndrome Register
ESR
Machine Check Syndrome Register
MCSR
Data Exception Address Register
DEAR
Save/Restore Registers
SRR0
SRR1
Critical Save/Restore Registers
CSRR0
CSRR1
Machine Check Save/Restore Registers
MCSRR0
MCSRR1
Interrupt Vector Prefix Register
IVPR
Interrupt Vector Offset Registers
IVOR0
•
•
•
IVOR15
Timer
Time Base
TBU
TBL
Timer Control Register
TCR
Timer Status Register
TSR
Decrementer
DEC
Decrementer Auto-Reload
DECAR
Cache Control
Instruction Cache Victim Limit
IVLIM
Instruction Cache Normal Victim
INV0
INV1
INV2
INV3
Instruction Cache Transient Victim
ITV0
ITV1
ITV2
ITV3
Data Cache Victim Limit
DVLIM
Data Cache Normal Victim
DNV0
DNV1
DNV2
DNV3
Data Cache Transient Victim
DTV0
DTV1
DTV2
DTV3
Storage Control
Process ID
PID
MMU Control Register
MMUCR
Debug
Debug Status Register
DBSR
Debug Data Register
DBDR
Debug Control Registers
DBCR0
DBCR1
DBCR2
Data Address Compares
DAC1
DAC2
Data Value Compares
DVC1
DVC2
Instruction Address Compares
IAC1
IAC2
IAC3
IAC4
Cache Debug
Instruction Cache Debug Data Register
ICDBDR
Instruction Cache Debug Tag Registers
ICDBTRH
ICDBTRL
Data Cache Debug Tag Registers
DCDBTRH
DCDBTRL
prgmodel.fm.
September 12, 2002
Figure 2-2. Supervisor Programming Model Registers
Page 49 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Table 2-3. Register Categories
Register CategoryRegister(s)Model and AccessTypePage
There are five register types contained within and/or supported by the PPC440x5 core. Each register type is
characterized by the instructions which are used to read and write the registers of that type. The following
subsections provide an overview of each of the register types and the instructions associated with them.
2.2.1.1 General Purpose Registers
The PPC440x5 core contains 32 integer general purpose registers (GPRs); each contains 32 bits. Data from
the data cache or memory can be loaded into GPRs using integer load instructions; the contents of GPRs can
be stored to the data cache or memory using integer store instructions. Most of the integer instructions reference GPRs. The GPRs are also used as targets and sources for most of the instructions which read and write
the other register types.
Integer Processing on page 71 provides more information on integer operations and the use of GPRs.
2.2.1.2 Special Purpose Registers
Special Purpose Registers (SPRs) are directly accessed using the
mtspr and mfspr instructions. In addi-
tion, certain SPRs may be updated as a side-effect of the execution of various instructions. For example, the
Integer Exception Register (XER) (see Integer Exception Register (XER) on page 72) is an SPR which is
updated with arithmetic status (such as carry and overflow) upon execution of certain forms of integer arithmetic instructions.
SPRs control the use of the debug facilities, timers, interrupts, memory management, caches, and other
architected processor resources. Table 10-2 on page 454 shows the mnemonic, name, and number for each
SPR, in order by SPR number. Each of the SPRs is described in more detail within the section or chapter
covering the function with which it is associated. See Table 2-3 on page 50 for a cross-reference to the associated document section for each register.
2.2.1.3 Condition Register
The Condition Register (CR) is a 32-bit register of its own unique type and is divided up into eight, independent 4-bit fields (CR0–CR7). The CR may be used to record certain conditional results of various arithmetic
and logical operations. Subsequently, conditional branch instructions may designate a bit of the CR as one of
the branch conditions (see Branch Processing on page 64). Instructions are also provided for performing
logical bit operations and for moving fields within the CR.
See Condition Register (CR) on page 67 for more information on the various instructions which can update
the CR.
Page 52 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
2.2.1.4 Machine State Register
The Machine State Register (MSR) is a register of its own unique type that controls important chip functions,
such as the enabling or disabling of various interrupt types.
The MSR can be written from a GPR using the
a GPR using the
mfmsr instruction. The MSR[EE] bit can be set or cleared atomically using the wrtee or
mtmsr instruction. The contents of the MSR can be read into
wrteei instructions. The MSR contents are also automatically saved, altered, and restored by the interrupt-
handling mechanism. See Machine State Register (MSR) on page 165 for more detailed information on the
MSR and the function of each of its bits.
2.2.1.5 Device Control Registers
Device Control Registers (DCRs) are on-chip registers that exist architecturally and physically outside the
PPC440x5 core, and thus are not specified by the Book-E Enhanced PowerPC Architecture, nor by this
user’s manual for the PPC440x5 core. Rather, PowerPC Book-E simply defines the existence of the DCR
address space and the instructions that access the DCRs, and does not define any particular DCRs. The
DCR access instructions are mtdcr (move to device control register) and mfdcr (move from device control
register), which move data between GPRs and the DCRs.
DCRs may be used to control various on-chip system functions, such as the operation of on-chip buses,
peripherals, and certain processor core behaviors.
2.3 Instruction Classes
PowerPC Book-E architecture defines all instructions as falling into exactly one of the following four classes,
as determined by the primary opcode (and the extended opcode, if any):
1. Defined
2. Allocated
3. Preserved
4. Reserved (-illegal or -nop)
2.3.1 Defined Instruction Class
This class of instructions consists of all the instructions defined in PowerPC Book-E. In general, defined
instructions are guaranteed to be supported within a PowerPC Book-E system as specified by the architecture, either within the processor implementation itself or within emulation software supported by the system
operating software.
One exception to this is that, for implementations (such as the PPC440x5) that only provide the 32-bit subset
of PowerPC Book-E, it is not expected (and likely not even possible) that emulation of the 64-bit behavior of
the defined instructions will be provided by the system.
As defined by PowerPC Book-E, any attempt to execute a defined instruction will:
• cause an Illegal Instruction exception type Program interrupt, if the instruction is not recognized by the
implementation; or
• cause an Unimplemented Instruction exception type Program interrupt, if the instruction is recognized by
the implementation and is not a floating-point instruction, but is not supported by the implementation; or
prgmodel.fm.
September 12, 2002
Page 53 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
• cause a Floating-Point Unavailable interrupt if the instruction is recognized as a floating-point instruction,
but floating-point processing is disabled; or
• cause an Unimplemented Instruction exception type Program interrupt, if the instruction is recognized as
a floating-point instructionand floating-point processing is enabled,but the instruction is not supported by
the implementation; or
• perform the actions described in the rest of this document, if the instruction is recognized and supported
by the implementation. The architected behavior may cause other exceptions.
The PPC440x5 core recognizes and fully supports all of the instructions in the defined class, with a few
exceptions. First, because the PPC440x5 is a 32-bit implementation, those operations which are defined
specifically for 64-bit operation are not supported at all, and will always cause an Illegal Instruction exception
type Program interrupt.
Second, instructions that are defined for floating-point processing are not supported within the PPC440x5
core, but may be implemented within an auxiliary processor and attached to the core using the AP interface.
If no such auxiliary processor is attached, attempting to execute any floating-point instructions will cause an
Illegal Instruction exception type Program interrupt. If an auxiliary processor which supports the floating-point
instructions is attached, the behavior of these instructions is as defined above and as determined by the
implementation details of the floating-point auxiliary processor.
Finally, there are two other defined instructions which are not supported within the PPC440x5 core. One is a
TLB management instruction (tlbiva, TLB Invalidate Virtual Address) that is specifically intended for
coherent multiprocessor systems. The other is mfapidi (Move From Auxiliary Processor ID Indirect), which
is a special instruction intended to assist with identification of the auxiliary processors which may be attached
to a particular processor implementation. Since the PPC440x5 core does not support mfapidi, the means of
identifying the auxiliary processors in a PPC440x5 core-based system are implementation-dependent.
Execution of either tlbiva or mfapidi will cause an Illegal Instruction exception type Program interrupt.
2.3.2 Allocated Instruction Class
This class of instructions contains a set of primary opcodes, as well as extended opcodes for certain primary
opcodes. The specific opcodes are listed in Appendix A.3 on page 557.
Allocated instructions are provided for purposes that are outside the scope of PowerPC Book-E, and are for
implementation-dependent and application-specific use.
PowerPC Book-E declares that any attempt to execute an allocated instruction results in one of the following
effects:
• Causes an Illegal Instruction exception type Program interrupt, if the instruction is not recognized by the
implementation
• Causes an Auxiliary Processor Unavailable interrupt if the instruction is recognized by the implementation, but allocated instruction processing is disabled
• Causes an Unimplemented Instruction exception type Program interrupt, if the instruction is recognized
and allocated instruction processing is enabled, but the instruction is not supported by the implementation
• Perform the actions described for the particular implementation of the allocated instruction. The implementation-dependent behavior may cause other exceptions.
prgmodel.fm.
Page 54 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
In addition to supporting the defined instructions of PowerPC Book-E, the PPC440x5 also implements a
number of instructions which use the allocated instruction opcodes, and thus are not part of the PowerPC
Book-E architecture. Table 2-21 on page 63 identifies the allocated instructions that are implemented within
the PPC440x5 core. All of these instructions are always enabled and supported, and thus they always
perform the functions defined for them within this document, and never cause Illegal Instruction, Auxiliary
Processor Unavailable, nor Unimplemented Instruction exceptions.
The PPC440x5 also supports the use of any of the allocated opcodes by an attached auxiliary processor,
except for those allocated opcodes which have been implemented within the PPC440x5 core, as mentioned
above. Also, there is one other allocated opcode (primary opcode 31, secondary opcode 262) that has been
implemented within the PPC440x5 core and is thus not available for use by an attached auxiliary processor.
This is the opcode which was used on previous PowerPC 400 Series embedded controllers for the icbt
(Instruction Cache Block Touch) instruction. The icbt instruction is now part of the defined instruction class
for PowerPC Book-E, and uses a new opcode (primary opcode 31, secondary opcode 22). The PPC440x5
implements the new defined opcode, but also continues to support the previous opcode, in order to support
legacy software written for earlier PowerPC 400 Series implementations. The icbt instruction description in
Instruction Set on page 249 only identifies the defined opcode, although Appendix A, “Instruction Summary,”
includes both the defined and the allocated opcode in the table which lists all the instructions by opcode. In
order to ensure portability between the PPC440x5 and future PowerPC Book-E implementations, software
should take care to only use the defined opcode for icbt, and avoid usage of the previous opcode which is
now in the allocated class.
2.3.3 Preserved Instruction Class
The preserved instruction class is provided to support backward compatibility with the PowerPC Architecture,
and/or earlier versions of the PowerPC Book-E architecture. This instruction class includes opcodes which
were defined for these previous architectures, but which are no longer defined for PowerPC Book-E.
Any attempt to execute a preserved instruction results in one of the following effects:
• Performs the actions described in the previous version of the architecture, if the instruction is recognized
by the implementation
• Causes an Illegal Instruction exception type Program interrupt, if the instruction is not recognized by the
implementation.
The only preserved instruction recognized and supported by the PPC440x5 is the mftb (Move From Time
Base) opcode. This instruction was used in the the PowerPC Architecture to read the Time Base Upper
(TBU) and Time Base Lower (TBL) registers. PowerPC Book-E architecture instead defines TBU and TBL as
Special Purpose Registers (SPRs), and thus the mfspr (Move From Special Purpose Register) instruction is
used to read them. In order to enable legacy time base management software to be run on the PPC440x5,
the core also supports the preserved opcode of mftb. However, the mftb instruction is not included in the
various sections of this document that describe the implemented instructions, and software should take care
to use the currently architected mechanism of mfspr to read the time base registers, in order to guarantee
portability between the PPC440x5 and future implementations of PowerPC Book-E.
On the other hand, Appendix A, “Instruction Summary,” does identify the mftb instruction as an implemented
preserved opcode in the table which lists all the instructions by opcode.
prgmodel.fm.
September 12, 2002
Page 55 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
2.3.4 Reserved Instruction Class
This class of instructions consists of all instruction primary opcodes (and associated extended opcodes, if
applicable) which do not belong to either the defined, allocated, or preserved instruction classes.
Reserved instructions are available for future versions of PowerPC Book-E architecture. That is, future
versions of PowerPC Book-E may define any of these instructions to perform new functions or make them
available for implementation-dependent use as allocated instructions. There are two types of reserved
instructions: reserved-illegal and reserved-nop.
Any attempt to execute a reserved-illegal instruction will cause an Illegal Instruction exception type Program
interrupt on implementations (such as the PPC440x5) that conform to the current version of PowerPC BookE. Reserved-illegal instructions are, therefore, available for future extensions to PowerPC Book-E that would
affect architected state. Such extensions might include new forms of integer or floating-point arithmetic
instructions, or new forms of load or store instructions that affect architected registers or the contents of
memory.
Any attempt to execute a reserved-nop instruction, on the other hand, either has no effect (that is, is treated
as a no-operation instruction), or causes an Illegal Instruction exception type Program interrupt, on implementations (such as the PPC440x5) that conform to the current version of PowerPC Book-E. Because implementations are typically expected to treat reserved-nop instructions as true no-ops, these instruction opcodes
are thus available for future extensions to PowerPC Book-E which have no effect on architected state. Such
extensions might include performance-enhancing hints, such as new forms of cache touch instructions. Software would be able to take advantage of the functionality offered by the new instructions, and still remain
backwards-compatible with implementations of previous versions of PowerPC Book-E.
The PPC440x5 implements all of the reserved-nop instruction opcodes as true no-ops. The specific reservednop opcodes are listed in Appendix A.5 on page 558
2.4 Implemented Instruction Set Summary
This section provides an overview of the various types and categories of instructions implemented within the
PPC440x5. In addition, Instruction Set on page 249 provides a complete alphabetical listing of every implemented instruction, including its register transfer language (RTL) and a detailed description of its operation.
Also, Appendix A, “Instruction Summary,” lists each implemented instruction alphabetically (and by opcode)
along with a short-form description and its extended mnemonic(s).
Page 56 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
Table 2-4 summarizes the PPC440x5 instruction set by category. Instructions within each category are
described in subsequent sections.
Table 2-4. Instruction Categories
CategorySubcategoryInstruction Types
Integer Storage Accessload, store
Integer Arithmeticadd, subtract, multiply, divide, negate
Integer instructions transfer data between memory and the GPRs, and perform various operations on the
GPRs. This category of instructions is further divided into seven sub-categories, described below.
2.4.1.1 Integer Storage Access Instructions
Integer storage access instructions load and store data between memory and the GPRs. These instructions
operate on bytes, halfwords, and words. Integer storage access instructions also support loading and storing
multiple registers, character strings, and byte-reversed data, and loading data with sign-extension.
Table 2-5 shows the integer storage access instructions in the PPC440x5. In the table, the syntax “[u]” indi-
cates that the instruction has both an “update” form (in which the RA addressing register is updated with the
calculated address) and a “non-update” form. Similarly, the syntax “[x]” indicates that the instruction has both
prgmodel.fm.
September 12, 2002
Page 57 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
an “indexed” form (in which the address is formed by adding the contents of the RA and RB GPRs) and a
“base + displacement” form (in which the address is formed by adding a 16-bit signed immediate value (specified as part of the instruction) to the contents of GPR RA. See the detailed instruction descriptions in Instruc-
Arithmetic operations are performed on integer or ordinal operands stored in registers. Instructions that
perform operations on two operands are defined in a three-operand format; an operation is performed on the
operands, which are stored in two registers. The result is placed in a third register. Instructions that perform
operations on one operand are defined in a two-operand format; the operation is performed on the operand in
a register and the result is placed in another register. Several instructions also have immediate formats in
which one of the source operands is a field in the instruction.
Most integer arithmetic instructions have versions that can update CR[CR0] and/or XER[SO, OV] (Summary
Overflow, Overflow), based on the result of the instruction. Some integer arithmetic instructions also update
XER[CA] (Carry) implicitly. See Integer Processing on page 71 for more information on how these instructions update the CR and/or the XER.
Table 2-6 lists the integer arithmetic instructions in the PPC440x5. In the table, the syntax “[o]” indicates that
the instruction has both an “o” form (which updates the XER[SO,OV] fields) and a “non-o” form. Similarly, the
syntax “[.]” indicates that the instruction has both a “record” form (which updates CR[CR0]) and a “nonrecord” form.
Table 2-6. Integer Arithmetic Instructions
AddSubtractMultiplyDivideNegate
add[o][.]
addc[o][.]
adde[o][.]
addi
addic
[.]
addis
addme
[o][.]
subf[o][.]
subfc[o][.]
subfe[o][.]
subfic
subfme
[o][.]
subfze[o][.]
mulhw[.]
mulhwu[.]
mulli
mullw
[o][.]
divw[o][.]
divwu[o][.]
neg[o][.]
addze[o][.]
Page 58 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
2.4.1.3 Integer Logical Instructions
Table 2-7 lists the integer logical instructions in the PPC440x5. See Integer Arithmetic Instructions on
page 58 for an explanation of the “[.]” syntax.
Table 2-7. Integer Logical Instructions
And
and[.]
andi.
andis.
And with
complement
andc
[.]nand[.]
NandOr
or[.]
ori
oris
Or with
complement
orc
[.]nor[.]
NorXorEquivalenceExtend sign
xor[.]
xori
xoris
eqv
[.]
extsb[.]
extsh[.]
Count
leading
zeros
cntlzw[.]
2.4.1.4 Integer Compare Instructions
These instructions perform arithmetic or logical comparisons between two operands and update the CR with
the result of the comparison.
Table 2-8 lists the integer compare instructions in the PPC440x5.
Table 2-8. Integer Compare Instructions
ArithmeticLogical
cmp
cmpi
cmpl
cmpli
2.4.1.5 Integer Trap Instructions
Table 2-9 lists the integer trap instructions in the PPC440x5.
Table 2-9. Integer Trap Instructions
Trap
tw
twi
2.4.1.6 Integer Rotate Instructions
These instructions rotate operands stored in the GPRs. Rotate instructions can also mask rotated operands.
Table 2-10 lists the rotate instructions in the PPC440x5. See Integer Arithmetic Instructions onpage 58 for an
explanation of the “[.]” syntax.
Table 2-10. Integer Rotate Instructions
Rotate and InsertRotate and Mask
rlwimi[.]
rlwinm[.]
rlwnm[.]
prgmodel.fm.
September 12, 2002
Page 59 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
2.4.1.7 Integer Shift Instructions
Table 2-11 lists the integer shift instructions in the PPC440x5. Note that the shift right algebraic insructions
implicitly update the XER[CA] field. See Integer Arithmetic Instructions on page 58 for an explanation of the
.]” syntax.
“[
Table 2-11. Integer Shift Instructions
Shift LeftShift Right
slw[.]srw[.]
Shift Right
Algebraic
sraw[.]
srawi[.]
2.4.1.8 Integer Select Instruction
Table 2-12 lists the integer select instruction in the PPC440x5. The RA operand is 0 if the RA field of the
instruction is 0, or is the contents of GPR[RA] otherwise.
Table 2-12. Integer Select Instruction
Integer Select
isel
2.4.2 Branch Instructions
These instructions unconditionally or conditionally branch to an address. Conditional branch instructions can
test condition codes set in the CR by a previous instruction and branch accordingly. Conditional branch
instructions can also decrement and test the Count Register (CTR) as part of branch determination, and can
save the return address in the Link Register (LR).The target address for a branch can be a displacement from
the current instruction address or an absolute address, or contained in the LR or CTR.
See Branch Processing on page 64 for more information on branch operations.
Table 2-13 lists the branch instructions in the PPC440x5. In the table, the syntax “[
l]” indicates that the
instruction has both a “link update” form (which updates LR with the address of the instruction after the
branch) and a “non-link update” form. Similarly, the syntax “[a]” indicates that the instruction has both an
“absolute address” form (in which the target address is formed directly using the immediate field specified as
part of the instruction) and a “relative” form (in which the target address is formed by adding the specified
immediate field to the address of the branch instruction).
Table 2-13. Branch Instructions
Branch
b[l][a]
bc[l][a]
bcctr[l]
bclr[l]
2.4.3 Processor Control Instructions
Processor control instructions manipulate system registers, perform system software linkage, and synchronize processor operations. The instructions in these three sub-categories ofprocessor control instructions are
described below.
prgmodel.fm.
Page 60 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
2.4.3.1 Condition Register Logical Instructions
These instructions perform logical operations on a specified pair of bits in the CR, placing the result in
another specified bit. The benefit of these instructions is that they can logically combine the results of several
comparison operations without incurring the overhead of conditional branching between each one. Software
performance can significantly improve if multiple conditions are tested at once as part of a branch decision.
Table 2-14 lists the condition register logical instructions in the PPC440x5.
These instructions move data between the GPRs and control registers in the PPC440x5.
Table 2-15 lists the register management instructions in the PPC440x5.
Table 2-15. Register Management Instructions
CRDCRMSRSPR
mcrf
mcrxr
mfcr
mtcrf
mfdcr
mtdcr
mfmsr
mtmsr
wrtee
wrteei
mfspr
mtspr
2.4.3.3 System Linkage Instructions
These instructions invoke supervisor software level for system services, and return from interrupts.
Table 2-16 lists the system linkage instructions in the PPC440x5.
Table 2-16. System Linkage Instructions
rfi
rfci
rfmci
sc
2.4.3.4 Processor Synchronization Instruction
Tne processor synchronization instruction, isync, forces the processor to complete all instructions preceding
the isync before allowing any context changes as a result of any instructions that follow the isync. Additionally, all instructions that follow the isync will execute within the context established by the completion of all
the instructions that precede the isync. See Synchronization on page 82 for more information on the
synchronizing effect of isync.
prgmodel.fm.
September 12, 2002
Page 61 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Table 2-17 shows the processor synchronization instruction in the PPC440x5.
Table 2-17. Processor Synchronization Instruction
isync
2.4.4 Storage Control Instructions
These instructions manage the instruction and data caches and the TLB of the PPC440x5 core. Instructions
are also provided to synchronize and order storage accesses. The instructions in these three sub-categories
of storage control instructions are described below.
2.4.4.1 Cache Management Instructions
These instructions control the operation of the data and instruction caches. Instructions are provided to fill,
flush, invalidate, or zero data cache blocks, where a block is defined as a 32-byte cache line. instructions are
also provided to fill or invalidate instruction cache blocks.
Table 2-18 lists the cache management instructions in the PPC440x5.
Table 2-18. Cache Management Instructions
Data CacheInstruction Cache
dcba
dcbf
dcbi
dcbst
dcbt
dcbtst
dcbz
icbi
icbt
2.4.4.2 TLB Management Instructions
The TLB management instructions read and write entries of the TLB array, and search the TLB array for an
entry which will translate a given virtual address. There is also an instruction for synchronizing TLB updates
with other processors, but since the PPC440x5 core is intended for use in uni-processor environments, this
instruction performs no operation on the PPC440x5.
Table 2-19 lists the TLB management instructions in the PPC440x5. See Integer Arithmetic Instructions on
page 58 for an explanation of the “[.]” syntax.
Table 2-19. TLB Management Instructions
tlbre
tlbsx
[.]
tlbsync
tlbwe
prgmodel.fm.
Page 62 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
2.4.4.3 Storage Synchronization Instructions
The storage synchronization instructions allow software to enforce ordering amongst the storage accesses
caused by load and store instructions, which by default are “weakly-ordered” by the processor. “Weaklyordered” means that the processor is architecturally permitted to perform loads and stores generally out-oforder with respect to their sequence within the instruction stream, with some exceptions. However, if a
storage synchronization instruction is executed, then all storage accesses prompted by instructions
preceding the synchronizing instruction must be performed before any storage accesses prompted by
instructions which come after the synchronizing instruction. See Synchronization on page 82 for more information on storage synchronization.
Table 2-17 shows the storage synchronization instructions in the PPC440x5.
Table 2-20. Storage Synchronization Instructions
msync
mbar
2.4.5 Allocated Instructions
These instructions are not part of the PowerPC Book-E architecture, but they are included as part of the
PPC440x5 core. Architecturally, they are considered allocated instructions, as they use opcodes which are
within the allocated class of instructions, which the PowerPC Book-E architecture identifies asbeing available
for implementation-dependent and/or application-specific purposes. However, all of the allocated instructions
which are implemented within the PPC440x5 core are “standard” for IBM’s family of PowerPC embedded
controllers, and are not unique to the PPC440x5.
The allocated instructions implemented within the PPC440x5 are divided into four sub-categories, and are
shown in Table 2-21. See Integer Arithmetic Instructions on page 58 for an explanation of the “[.]” and “[o]”
syntax.
The four branch instructions provided by PPC440x5 are summarized in Table 2.4.2 on Page 60. In addition,
each of these instructions is described in detail in Instruction Set on page 249. The following sections provide
additional information on branch addressing, instruction fields, prediction, and registers.
2.5.1 Branch Addressing
The branch instruction (
b[l][a]) specifies the displacement of the branch target address as a 26-bit value (the
24-bit LI field right-extended with 0b00). This displacement is regarded as a signed 26-bit number covering
an address range of ±32MB. Similarly, the branch conditional instruction (bc[l][a]) specifies the displacement
as a 16-bit value (the 14-bit BD field right-extended with 0b00). This displacement covers an address range of
±32KB.
For the relative form of the branch and branch conditional instructions (b[l] and bc[l], with instruction field
AA = 0), the target address is the address of the branch instruction itself (the Current Instruction Address, or
CIA) plus the signed displacement. This address calculation is defined to “wrap around” from the maximum
effective address (0xFFFFFFFF) to 0x0000 0000, and vice-versa.
For the absolute form of the branch and branch conditional instructions (ba[l] and bca[l], with instruction field
AA = 1), the target address is the sign-extended displacement. This means that with absolute forms of the
branch and branch conditional instructions, the branch target can be within the first or last 32MB or 32KB of
the address space, respectively.
The other two branch instructions, bclr (branch conditional to LR) and bcctr (branch conditional to CTR), do
not use absolute nor relative addressing. Instead, they use indirect addressing, in which the target of the
branch is specified indirectly as the contents of the LR or CTR.
2.5.2 Branch Instruction BI Field
Conditional branch instructions can optionally test one bit of the CR, as indicated by instruction field BO[0]
(see BO field description below). The value of instruction field BI specifies the CR bit to be tested (0-31). The
BI field is ignored if BO[0] = 1. The branch (b[l][a]) instruction is by definition unconditional, and hence does
not have a BI instruction field. Instead, the position of this field is part of the LI displacement field.
2.5.3 Branch Instruction BO Field
The BO field specifies the condition under which a conditional branch is taken, and whether the branch decrements the CTR. The branch (b[l][a]) instruction is by definition unconditional, and hence does not have a BO
instruction field. Instead, the position of this field is part of the LI displacement field.
Conditional branch instructions can optionally test one bit in the CR. This option is selected when BO[0] = 0; if
BO[0] = 1, the CR does not participate in the branch condition test. If the CR condition option is selected, the
condition is satisfied (branch can occur) if the CR bit selected by the BI instruction field matches BO[1].
Conditional branch instructions can also optionally decrement the CTR by one, and test whether the decremented value is 0. This option is selected when BO[2] = 0; if BO[2] = 1, the CTR is not decremented and
does not participate in the branch condition test. If CTR decrement option is selected, BO[3] specifies the
condition that must be satisfied to allow the branch to be taken. If BO[3] = 0, CTR ≠ 0 is required for the
branch to occur. If BO[3] = 1, CTR = 0 is required for the branch to occur.
prgmodel.fm.
Page 64 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
Table 2-22 summarizes the usage of the bits of the BO field. BO[4] is further discussed in Branch Prediction
on page 65
Table 2-22. BO Field Definition
BO BitDescription
CR Test Control
BO[0]
0 Test CR bit specified by BI field for value specified by BO[1]
1 Do not test CR
CR Test Value
BO[1]
0 If BO[0] = 0, test for CR[BI] = 0.
1 If BO[0] = 0, test for CR[BI] = 1.
CTR Decrement and Test Control
BO[2]
0 Decrement CTR by one and test whether the decremented
CTR satisfies the condition specified by BO[3].
1 Do not decrement CTR, do not test CTR.
CTR Test Value
BO[3]
0 If BO[2] = 0, test for decremented CTR ≠ 0.
1 If BO[2] = 0, test for decremented CTR = 0.
Branch Prediction Reversal
BO[4]
0 Apply standard branch prediction.
1 Reverse the standard branch prediction.
Table 2-23 lists specific BO field contents, and the resulting actions; z represents a mandatory value of zero,
and y is a branch prediction option discussed in Branch Prediction on page 65
Table 2-23. BO Field Examples
BO ValueDescription
0000yDecrement the CTR, then branch if the decremented CTR≠ 0 and CR[BI]=0.
0001yDecrement the CTR, then branch if the decremented CTR= 0 and CR[BI] = 0.
001zyBranch if CR[BI] = 0.
0100yDecrement the CTR, then branch if the decremented CTR≠ 0 and CR[BI] = 1.
0101yDecrement the CTR, then branch if the decremented CTR=0 and CR[BI]= 1.
011zyBranch if CR[BI] = 1.
1z00yDecrement the CTR, then branch if the decremented CTR ≠ 0.
1z01yDecrement the CTR, then branch if the decremented CTR = 0.
1z1zzBranch always.
2.5.4 Branch Prediction
Conditional branches might be taken or not taken; if taken, instruction fetching is re-directed to the target
address. If the branch is not taken, instruction fetching simply falls through to the next sequential instruction.
The PPC440x5 core attempts to predict whether or not a branch is taken before all information necessary to
determine the branch direction is available. This action is called branch prediction. The core can then
prefetch instructions down the predicted path. If the prediction is correct, performance is improved because
the branch target instruction is available immediately, instead of having to wait until the branch conditions are
resolved. If the prediction is incorrect, then the prefetched instructions (which were fetched from addresses
down the “wrong” path of the branch) must be discarded, and new instructions fetched from the correct path.
prgmodel.fm.
September 12, 2002
Page 65 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
The PPC440x5 core combines the static prediction mechanism defined by PowerPC Book-E, together with a
dynamic branch prediction mechanism, in order to provide correct branch prediction as often as possible. The
dynamic branch prediction mechanism is an implementation optimization, and is not part of the architecture,
nor is it visible to the programming model. Appendix B, “PPC440x5 Core Compiler Optimizations,” provides
additional information on the dynamic branch prediction mechanism.
The static branch prediction mechanism enables software to designate the “preferred” branch prediction via
bits in the instruction encoding. The “default” static branch prediction for conditional branches is as follows:
Predict that the branch is to be taken if ((BO[0] ∧ BO[2]) ∨ s)= 1
where s is bit 16 of the instruction (the sign bit of the displacement for all bc forms, and zero for all bclr and
bcctr forms). In otherwords, conditional branches are predicted taken if they are “unconditional” (i.e., they do
not test the CR nor the CTR decrement, and are always taken), or if their branch displacement is “negative”
(i.e., the branch is branching “backwards” from the current instruction address). The standard prediction for
this case derives from considering the relative form of bc, often used at the end of loops to control the
number of times that a loop is executed. The branch is taken each time the loop is executed except the last,
so it is best if the branch is predicted taken. The branch target is the beginning of the loop, so the branch
displacement is negative and s = 1. Because this situation is most common, a branch is taken if s =1.
If branch displacements are positive, s = 0, then the branch is predicted not taken. Also, if the branch instruction is any form of bclr or bcctr except the “unconditional” form, then s = 0, and the branch is predicted not
taken.
There is a peculiar consequence of this prediction algorithm for the absolute forms of bc (bca and bcla). As
described in Branch Addressing on page 64, if s = 1, the branch target is in high memory. If s = 0, the branch
target is in low memory. Because these are absolute-addressing forms, there is no reason to treat high and
low memory differently. Nevertheless, for the high memory case the standard prediction is taken, and for the
low memory case the standard prediction is not taken.
Another bit in the BO field allows software further control over branch prediction. Specifically, BO[4] is the
prediction reversal bit. If BO[4] = 0, the default prediction is applied. If BO[4] = 1, the reverse of the default
prediction is applied. For the cases in Table 2-23 where BO[4] = y, software can reverse the default prediction by setting y to 1. This should only be done when the default prediction is likely to be wrong. Note that for
the “branch always” condition, reversal of the default prediction is not allowed, as BO[4] is designated as z for
this case, meaning the bit must be set to 0 or the instruction form is invalid.
2.5.5 Branch Control Registers
There are three registers in the PPC440x5 which are associated with branch processing, and they are
described in the following sections.
2.5.5.1 Link Register (LR)
The LR is written from a GPR using mtspr, and can be read into a GPR using mfspr. The LR can also be
updated by the “link update” form of branch instructions (instruction field LK = 1). Such branch instructions
load the LR with the address of the instruction following the branch instruction (4 + address of the branch
instruction). Thus, the LR contents can be used as a return address for a subroutine that was entered using a
link update form of branch. The bclr instruction uses the LR in this fashion, enabling indirect branching to any
address.
Page 66 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
When being used as a return address by a bclr instruction, bits 30:31 of the LR are ignored, since all instruction addresses are on word boundaries.
Access to the LR is non-privileged.
031
Figure 2-3. Link Register (LR)
0:31Link Register contentsTarget address of bclr instruction
2.5.5.2 Count Register (CTR)
The CTR is written from a GPR using mtspr, and can be read into a GPR using mfspr. The CTR contents
can be used as a loop count that gets decremented and tested by conditional branch instructions that specify
count decrement as one of their branch conditions (instruction field BO[2] = 0). Alternatively, the CTR
contents can specify a target address for the bcctr instruction, enabling indirect branching to any address.
Access to the CTR is non-privileged.
031
Figure 2-4. Count Register (CTR)
0:31Count
Used as count for branch conditional with decrement instructions, or as target address for bcctr
instructions
2.5.5.3 Condition Register (CR)
The CR is used to record certain information (“conditions”) related to the results of the various instructions
which are enabled to update the CR. A bit in the CR may also be selected to be tested as part of the condition
of a conditional branch instruction.
The CR is organized into eight 4-bit fields (CR0–CR7), as shown in Figure 2-5. Table 2-24 lists the instructions which update the CR.
Access to the CR is non-privileged.
prgmodel.fm.
September 12, 2002
Page 67 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
CR0
03 47 81112151619202324272831
CR1
CR2
CR3
CR4
CR5
CR6
CR7
Figure 2-5. Condition Register (CR)
0:3CR0Condition Register Field 0
4:7CR1Condition Register Field 1
8:11CR2Condition Register Field 2
12:15CR3Condition Register Field 3
16:19CR4Condition Register Field 4
20:23CR5Condition Register Field 5
24:27CR6Condition Register Field 6
28:31CR7Condition Register Field 7
Instruction Set on page 249, provides detailed information on how each of these instructions updates the CR.
To summarize, the CR can be accessed in any of the following ways:
• mfcr reads the CR into a GPR. Note that this instruction does not update the CR and is therefore not
listed in Table 2-24.
• Conditional branch instructions can designate a CR bit to be used as a branch condition. Note that these
instructions do not update the CR and are therefore not listed in Table 2-24.
• mtcrf sets specified CR fields by writing to the CR from a GPR, under control of a mask field specified as
part of the instruction.
• mcrf updates a specified CR field by copying another specified CR field into it.
• mcrxr copies certain bits of the XER into a specified CR field, and clears the corresponding XER bits.
• Integer compare instructions update a specified CR field.
• CR-logical instructions update a specified CR bit with the result of any one of eight logical operations on
a specified pair of CR bits.
prgmodel.fm.
September 12, 2002
Page 69 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
• Certain forms of various integer instructions (the “.”forms) implicitly update CR[CR0], as do certain forms
of the auxiliary processor instructions implemented within the PPC440x5 core.
• Auxiliary processor instructions may in general update a specified CR field in an implementation-specified manner. In addition, if an auxiliary processor implements the floating-point operations specified by
PowerPC Book-E, then those instructions update the CR in the manner defined by the architecture. See
Book E: PowerPC Architecture Enhanced for Embedded Applications for details.
CR[CR0] Implicit Update By Integer Instructions
Most of the CR-updating instructions listed in Table 2-24 implicitly update the CR0 field. These are the
various “dot-form” instructions, indicated by a “.” in the instruction mnemonic. Most of these instructions
update CR[CR0] according to an arithmetic comparison of 0 with the 32-bit result which the instruction writes
to the GPR file. That is, after performing the operation defined for the instruction, the 32-bit result which is
written to the GPR file is compared to 0 using a signed comparison, independent of whether the actual operation being performed by the instruction is considered “signed” or not. For example, logical instructions such
as and., or., and nor. update CR[CR0] according to this signed comparison to 0, even though the result of
such a logical operation is not typically interpreted as a signed value. For each of these dot-form instructions,
the individual bits in CR[CR0] are updated as follows:
CR[CR0]0 — LTLess than 0; set if the most-significant bit of the 32-bit result is 1.
CR[CR0]
— GTGreater than 0; set if the 32-bit result is non-zero and the most-
1
significant bit of the result is 0.
CR[CR0]2— EQEqual to 0; set if the 32-bit result is 0.
CR[CR0]
— SOSummary overflow; a copy of XER[SO] at the completion of the
3
instruction (including any XER[SO] update being performed the
instruction itself.
Note that if an arithmetic overflow occurs, the “sign” of an instruction result indicated in CR[CR0] might not
represent the “true” (infinitely precise) algebraic result of the instruction that set CR0. For example, if an add.
instruction adds two large positive numbers and the magnitude of the result cannot be represented as a twoscomplement number in a 32-bit register, an overflow occurs and CR[CR0]0 is set, even though the infinitely
precise result of the add is positive.
Similarly, adding the largest 32-bit twos-complement negative number (0x80000000) to itself results in an
arithmetic overflow and 0x0000 0000 is recorded in the target register. CR[CR0]2is set, indicating a result of
0, but the infinitely precise result is negative.
CR[CR0]3 is a copy of XER[SO] at the completion of the instruction, whether or not the instruction which is
updating CR[CR0] is also updating XER[SO]. Note that if an instruction causes an arithmetic overflow but is
not of the form which actually updates XER[SO], then the value placed in CR[CR0]3does not reflect the arithmetic overflow which occurred on the instruction (it is merely a copy of the value of XER[SO] which was
already in the XER before the execution of the instruction updating CR[CR0]).
There are a few dot-form instructions which do not update CR[CR0] in the fashion described above. These
instructions are: stwcx., tlbsx., and dlmzb. See the instructiondescriptions in Instruction Set on page 249 for
details on how these instructions update CR[CR0].
prgmodel.fm.
Page 70 of 589
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
CR Update By Integer Compare Instructions
Integer compare instructions update a specified CR field with the result of a comparison of two 32-bit
numbers, the first of which is from a GPR and the second of which is either an immediate value or from
another GPR. There are two types of integer compare instructions, arithmetic and logical, and they are distinguished by the interpretation given to the 32-bit numbers being compared. For arithmetic compares, the
numbers are considered to be signed, whereas for logical compares, the numbers are considered to be
unsigned. As an example, consider the comparison of 0 with 0xFFFFFFFF. In an arithmetic compare, 0 is
larger; in a logical compare, 0xFFFFFFFF is larger.
A compare instruction can direct its result to any CR field. The BF field (bits 6:8) of the instruction specifies
the CR field to be updated. After a compare, the specified CR field is interpreted as follows:
CR[(BF)]0 — LTThe first operand is less than the second operand.
CR[(BF)]
CR[(BF)]
CR[(BF)]
— GTThe first operand is greater than the second operand.
1
— EQThe first operand is equal to the second operand.
2
— SOSummary overflow; a copy of XER[SO].
3
2.6 Integer Processing
Integer processing includes loading and storing data between memory and GPRs, as well as performing
various operations on the values in GPRs and other registers (the categories of integer instructions are
summarized in Table 2-4 on page 57). The sections which follow describe the registers which are used for
integer processing, and how they are updated by various instructions. In addition, Condition Register (CR) on
page 67 provides more information on the CR updates caused by integer instructions. Finally, Instruction Set
on page 249 also provides details on the various register updates performed by integer instructions.
2.6.1 General Purpose Registers (GPRs)
The PPC440x5 contains 32 GPRs. The contents of these registers can be transferred to and from memory
using integer storage access instructions. Operations are performed on GPRs by most other instructions.
Access to the GPRs is non-privileged.
031
Figure 2-6. General Purpose Registers (R0-R31)
0:31General Purpose Register data
prgmodel.fm.
September 12, 2002
Page 71 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
2.6.2 Integer Exception Register (XER)
The XER records overflow and carry indications from integer arithmetic and shift instructions. It also provides
a byte count for string indexed integer storage access instructions (lswx and stswx). Note that the term
exception in the name of this register does not refer to exceptions as they relate to interrupts, but rather to the
arithmetic exceptions of carry and overflow.
Figure 2-7 illustrates the fields of the XER, while Tables 2-25 and 2-26 list the instructions which update
XER[SO,OV] and the XER[CA] fields, respectively. The sections which follow the figure and tables describe
the fields of the XER in more detail.
Access to the XER is non-privileged.
CA
SO
012324 2531
OV
TBC
Figure 2-7. Integer Exception Register (XER)
Summary Overflow
0SO
0 No overflow has occurred.
1 Overflow has occurred.
Overflow
1OV
0 No overflow has occurred.
1 Overflow has occurred.
Carry
2CA
0 Carry has not occurred.
1 Carry has occurred.
3:24Reserved
25:31TBCTransfer Byte Count
Can be set by mtspr or by integer or auxiliary
processor instructions with the [o] option; can be
reset by mtspr or by mcrxr.
Can be set by mtspr or by integer or allocated
instructions with the [o] option; can be reset by
mtspr, by mcrxr, or by integer or allocated
instructions with the [o] option.
Can be set by mtspr or by certain integer arith-
metic and shift instructions; can be reset by
mtspr,bymcrxr, or by certain integer arithmetic
and shift instructions.
Used as a byte count by lswx and stswx; written
by dlmzb[.] and by mtspr.
This field is set to 1 when an instruction is executed that causes XER[OV] to be set to 1, except for the case
of mtspr(XER), which writes XER[SO,OV] with the values in (RS)
, respectively. Once set, XER[SO] is not
0:1
reset until either an mtspr(XER) is executed with data that explicitly writes 0 to XER[SO], or until an mcrxr
instruction is executed. The mcrxr instruction sets XER[SO] (as well as XER[OV,CA]) to 0 after copying all
three fields into CR[CR0]
(and setting CR[CR0]3 to 0).
0:2
Given this behavior, XER[SO] does not necessarily indicate that an overflow occurred on the most recent
integer arithmetic operation, but rather that one occurred at some time subsequent to the last clearing of
XER[SO] by mtspr(XER) or mcrxr.
XER[SO] is read (along with the rest of the XER) into a GPR by mfspr(XER). In addition, various integer
instructions copy XER[SO] into CR[CR0]3 (see Condition Register (CR) on page 67).
prgmodel.fm.
September 12, 2002
Page 73 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
2.6.2.2 Overflow (OV) Field
This field is updated by certain integer arithmetic instructions to indicate whether the infinitely precise result of
the operation can be represented in 32 bits. For those integer arithmetic instructions that update XER[OV]
31
and produce signed results, XER[OV] = 1 if the result is greater than 2
– 1 or less than –231; otherwise,
XER[OV] = 0. For those integer arithmetic instructions that update XER[OV] and produce unsigned results
(certain integer divide instructions and multiply-accumulate auxiliary processor instructions), XER[OV] = 1 if
the result is greater than 2
32
–1; otherwise, XER[OV] = 0. See the instruction descriptions in Instruction Set on
page 249 for more details on the conditions under which the integer divide instructions set XER[OV] to 1.
The mtspr(XER) and mcrxr instructions also update XER[OV]. Specifically, mcrxr sets XER[OV] (and
XER[SO,CA]) to 0 after copying all three fields into CR[CR0]
(and setting CR[CR0]3 to 0), while
0:2
mtspr(XER) writes XER[OV] with the value in (RS)1.
XER[OV] is read (along with the rest of the XER) into a GPR by mfspr(XER).
2.6.2.3 Carry (CA) Field
This field is updated by certain integer arithmetic instructions (the “carrying” and “extended” versions of add
and subract) to indicate whether or not there is a carry-out of the most-significant bit of the 32-bit result.
XER[CA] = 1 indicates a carry. The integer shift right algebraic instructions update XER[CA] to indicate
whether or not any 1-bits were shifted out of the least significant bit of the result, if the source operand was
negative (see the instruction descriptions in Instruction Set on page 249 for more details).
The mtspr(XER) and mcrxr instructions also update XER[CA]. Specifically, mcrxr sets XER[CA] (as well as
XER[SO,OV]) to 0 after copying all three fields into CR[CR0]
(and setting CR[CR0]3 to 0), while
0:2
mtspr(XER) writes XER[CA] with the value in (RS)2.
XER[CA] is read (along with the rest of the XER) into a GPR by mfspr(XER). In addition, the “extended”
versions of the add and subtract integer arithmetic instructions use XER[CA] as a source operand for their
arithmetic operations.
Transfer Byte Count (TBC) Field
The TBC field is used by the string indexed integer storage access instructions (lswx and stswx) as a byte
count. The TBC field is updated by the dlmzb[.] instruction with a value indicating the number of bytes up to
and including the zero byte detected by the instruction (see the instruction description for dlmzb in InstructionSet on page 249 for more details). The TBC field is also written by mtspr(XER) with the value in (RS)
25:31
.
XER[TBC] is read (along with the rest of the XER) into a GPR by mfspr(XER).
2.7 Processor Control
The PPC440x5 core provides several registers for general processor control and status. These include:
• Machine State Register (MSR)
Controls interrupts and other processor functions
• Special Purpose Registers General (SPRGs)
SPRs for general purpose software use
Page 74 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
• Processor Version Register (PVR)
Indicates the specific implementation of a processor
• Processor Identification Register (PIR)
Indicates the specific instance of a processor in a multi-processor system
• Core Configuration Register 0 (CCR0)
Controls specific processor functions, such as instruction prefetch
• Reset Configuration (RSTCFG)
Reports the values of certain fields of the TLB as supplied at reset
Except for the MSR, each of these registers is described in more detail in the following sections. The MSR is
described in more detail in Interrupts and Exceptions on page 159.
2.7.1 Special Purpose Registers General (USPRG0, SPRG0–SPRG7)
USPRG0 and SPRG0–SPRG7 are provided for general purpose, system-dependent software use. One
common system usage of these registers is as temporary storage locations. For example, a routine might
save the contents of a GPR to an SPRG, and later restore the GPR from it. This is faster than a save/restore
to a memory location. These registers are written using mtspr and read using mfspr.
Access to USPRG0 is non-privileged for both read and write.
Access to SPRG4–SPRG7 is non-privileged for read but privileged for write, and hence different SPR
numbers are used for reading than for writing.
Access to SPRG0–SPRG3 is privileged for both read and write.
031
Figure 2-8. Special Purpose Registers General (USPRG0, SPRG0–SPRG7)
0:31General dataSoftware value; no hardware usage.
2.7.2 Processor Version Register (PVR)
The PVR is a read-only register typically used to identify a specific processor core and chip implementation.
Software can read the PVR to determine processor core and chip hardware features. The PVR can be read
into a GPR using mfspr.
Refer to PowerPC 440x5 Embedded Processor Data Sheet for the PVR value.
Access to the PVR is privileged.
prgmodel.fm.
September 12, 2002
Page 75 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
OWN
0111231
PVN
Figure 2-9. Processor Version Register (PVR)
0:11OWNOwner IdentifierIdentifies the owner of a core.
12:31PVNProcessor Version Number
Implementation-specific value identifying the specific version and use of a processor core within a
chip.
2.7.3 Processor Identification Register (PIR)
The PIR is a read-only register that uniquely identifies a specific instance of a processor core, within a multiprocessor configuration, enabling software to determine exactly which processor it is running on. This capability is important for operating system software within multiprocessor configurations. The PIR can be read
into a GPR using mfspr.
Because the PPC440x5 is a uniprocessor, PIR[PIN] = 0b0000.
Access to the PIR is privileged.
0:27Reserved
28:31PINProcessor Identification Number (PIN)
2.7.4 Core Configuration Register 0 (CCR0)
The CCR0 controls a number of special chip functions, including data cache and auxiliary processor operation, speculative instruction fetching, trace, and the operation of the cache block touch instructions. The
CCR0 is written from a GPR using mtspr, and can be read into a GPR using mfspr. Figure 2-11 on page 77
illustrates the fields of the CCR0, and gives a brief description of their functions. A cross reference after the
bit-field description indicates the section of this document which describes each field in more detail.
Must be set to 1 to guarantee full recoverability
from MMU and data cache parity errors.
for data cache
2:3Reserved
Cache Read Parity Enable
4CRPE
0 Disable parity information reads
1 Enable parity information reads
5:9Reserved
Disable Store Gathering
When enabled, execution of icread, dcread, or
tlbre loads parity information into the ICDBTRH,
DCDBTRL, or target GPR, respectively.
0 Enabled; stores to contiguous addresses
10DSTG
may be gathered into a single transfer
See Store Gathering on page 119.
1 Disabled; all stores to memory will be
performed independently
11DAPUIB
12:15
16DTB
Disable APU Instruction Broadcast
0 Enabled.
1 Disabled; instructions not broadcast to
APU for decoding
Reserved
Disable Trace Broadcast
0 Enabled.
1 Disabled; no trace information is
broadcast.
Guaranteed Instruction Cache Block Touch
This mechanism is provided as a means of reducing power consumption when an auxilliary processor is not attached and/or is not being used.
See Initialization on page 85.
This mechanism is provided as a means of reducing power consumption when instruction tracingis
not needed.
See Initialization on page 85.
0 icbt may be abandoned without having
17GICBT
filled cache line if instruction pipeline
stalls.
See icbt Operation on page 111.
1 icbt is guaranteed to fill cache line even
if instruction pipeline stalls.
Guaranteed Data Cache Block Touch
0 dcbt/dcbtst may be abandoned without
18GDCBT
having filled cache line if load/store
pipeline stalls.
See Data Cache Control and Debug on
page 125.
1 dcbt/dcbtst are guaranteed to fill cache
line even if load/store pipeline stalls.
19:22
Reserved
prgmodel.fm.
September 12, 2002
Page 77 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Force Load/Store Alignment
0 No Alignment exception on integer
storage access instructions, regardless
23FLSTA
of alignment
1 An alignment exception occurs on
See Load and Store Alignment on page 117.
integer storage access instructions if
data address is not on an operand
boundary.
24:27
28:29ICSLCInstruction Cache Speculative Line Count
30:31ICSLTInstruction Cache Speculative Line Threshold
Reserved
Number of additional lines (0–3) to fill on instruction fetch miss.
See Speculative Prefetch Mechanism on
page 105.
Number of doublewords that must have already
been filled in order that the current speculative
line fill is not abandoned on a redirection of the
instruction stream.
See Speculative Prefetch Mechanism on
page 105.
2.7.5 Core Configuration Register 1 (CCR1)
Bits 0:19 of CCR1 can cause all possible parity error exceptions to verify correct machine check exception
handler operation. Other CCR1 bits can force a full-line data cache flush, or select a CPU timer clock input
other than CPUClock. The CCR1 is written from a GPR using mtspr, and can be read into a GPR using
mfspr. Figure 2-12 illustrates the fields of the CCR1, and gives a brief description of their functions.
Access to the CCR1 is privileged.
ICDPEI
07 8 9 10 11 12 13 14 15 1619 20 2123 24 2531
DCTPEI
ICTPEI
DCUPEI
DCDPEI
FCOM
DCMPEI
MMUPEI
FFF
TCS
Figure 2-12. Core Configuration Register 1 (CCR1)
0:7ICDPEI
8:9ICTPEI
10:11DCTPEI
12DCDPEI
Instruction Cache Data Parity Error Insert
0 record even parity (normal)
1 record odd parity (simulate parity error)
Instruction Cache Tag Parity Error Insert
0 record even parity (normal)
1 record odd parity (simulate parity error)
Data Cache Tag Parity Error Insert
0 record even parity (normal)
1 record odd parity (simulate parity error)
Data Cache Data Parity Error Insert
0 record even parity (normal)
1 record odd parity (simulate parity error)
Controls inversion of parity bits recorded when the
instruction cache is filled. Each of the 8 bits corresponds to one of the instruction words in the line.
Controls inversion of paritybits recorded forthe tag
field in the instruction cache.
Controls inversion of paritybits recorded forthe tag
field in the data cache.
Controls inversion of parity bits recorded for the
data field in the data cache.
Page 78 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
Data Cache U-bit Parity Error Insert
13DCUPEI
14DCMPEI
15FCOM
16:19MMUPEI
20FFF
21:23Reserved
24TCS
25:31Reserved
0 record even parity (normal)
1 record odd parity (simulate parity error)
Data Cache Modified-bit Parity Error Insert
0 record even parity (normal)
1 record odd parity (simulate parity error)
Force Cache Operation Miss
0 normal operation
1 cache ops appear to miss the cache
Memory Management Unit Parity Error Insert
0 record even parity (normal)
1 record odd parity (simulate parity error)
Force Full-line Flush
0 flush only as much data as necessary.
1 always flush entire cache lines
Timer Clock Select
0 CPU timer advances by one at each rising edge
of the CPU input clock (CPMC440CLOCK).
1 CPU timer advances by one for each rising edge
of the CPU timer clock
(CPMC440TIMERCLOCK).
Controls inversion of parity bit recorded for the U
fields in the data cache.
Controls inversion of parity bits recorded for the
modified (dirty) field in the data cache.
Force icbt , dcbt, dcbtst, dcbst, dcbf, dcbi, and
dcbz to appear to miss the caches. The intended
use is with icbt and dcbt only, which will fill a duplicate line and allow testing of multi-hit parity errors.
See Section 4.2.4.7 Simulating Instruction Cache
Parity Errors for Software Testingon page 114 and
Figure 4.3.3.7 on page 130.
Controls inversion of paritybits recorded forthe tag
field in the MMU.
When flushing 32-byte (8-word) lines from the data
cache, normal operation is to write nothing, a double word, quad word, or the entire 8-word block to
the memory as required by the dirty bits. This bit
ensures that none or all dirty bits are set so that
either nothing or the entire 8-word block is written
to memory when flushing a line from the data
cache. Refer to Section 4.3.1.4 Line Flush Opera-tions on page 121.
When TCS = 1, CPU timer clock input can toggle
at up to half of the CPU clock frequency.
2.7.6 Reset Configuration (RSTCFG)
The read-only RSTCFG register reports the values of certain fields of TLB as supplied at reset.
Access to RSTCFG is privileged.
U0U1U2
015 16 17 18 19 2023 24 2527 2831
U3
E
ERPN
Figure 2-13. Reset Configuration
0:15Reserved
U0 Storage Attribute
16U0
0 U0 storage attribute is disabled
See Table 5-1 on page 135.
1 U0 storage attribute is enabled
prgmodel.fm.
September 12, 2002
Page 79 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
U1 Storage Attribute
17U1
0 U1 storage attribute is disabled
See Table 5-1 on page 135.
1 U1 storage attribute is enabled
U2 Storage Attribute
18U2
0 U2 storage attribute is disabled
See Table 5-1 on page 135.
1 U2 storage attribute is enabled
U3 Storage Attribute
19U3
0 U3 storage attribute is disabled
See Table 5-1 on page 135.
1 U3 storage attribute is enabled
20:23Reserved
E Storage Attribute
24E
0 Accesses to the page are big endian.
1 Accesses to the page are little endian.
25:27Reserved
This TLB field is prepended to the translated
28:31ERPNExtended Real Page Number
address to form a 36-bit real address. See Table
5.4 Address Translation on page 140 and Table
5-3 Page Size and Real Address Formation on
page 142.
2.8 User and Supervisor Modes
PowerPC Book-E architecture defines two operating “states” or “modes,” supervisor (privileged), and user
(non-privileged). Which mode the processor is operating in is controlled by MSR[PR]. When MSR[PR] is 0,
the processor is in supervisor mode, and can execute all instructions and access all registers, including privileged ones. When MSR[PR] is 1, the processor is in user mode, and can only execute non-privileged instructions and access non-privileged registers. An attempt to execute a privileged instruction or to access a
privileged register while in user mode causes a Privileged Instruction exception type Program interrupt to
occur.
Note that the name “PR” for the MSR field refers to an historical alternative name for user mode, which is
“problem state.” Hence the value 1 in the field indicates “problem state,” and not “privileged” as one might
expect.
2.8.1 Privileged Instructions
The following instructions are privileged and cannot be executed in user mode:
For any SPR Number with SPRN5= 1. See Privileged SPRs on page 81.
For any SPR Number with SPRN5= 1. See Privileged SPRs on page 81.
2.8.2 Privileged SPRs
Most SPRs are privileged. The only defined non-privileged SPRs are the LR, CTR, XER, USPRG0, SPRG4–
7 (read access only), TBU (read access only), and TBL (read access only). The PPC440x5 core also treats
all SPR numbers with a 1 in bit 5 of the SPRN field as privileged, whether the particular SPR number is
defined or not. Thus the core causes a Privileged Instruction exception type Program interrupt on any attempt
to access such an SPR number while in user mode. In addition, the core causes an Illegal Instruction exception type Program interrupt on any attempt to access while in user mode an undefined SPR number with a 0
in SPRN5. On the other hand, the result of attempting to access an undefined SPR number in supervisor
mode is undefined, regardless of the value in SPRN5.
2.9 Speculative Accesses
The PowerPC Book-E Architecture permits implementations to perform speculative accesses to memory,
either for instruction fetching, or for data loads. A speculative access is defined as any access that is not
required by the sequential execution model (SEM).
For example, the PPC440x5 speculatively prefetches instructions down the predicted path of a conditional
branch; if the branch is later determined to not go in the predicted direction, the fetching of the instructions
from the predicted path is not required by the SEM and thus is speculative. Similarly, the PPC440x5 executes
load instructions out-of-order, and may read data from memory for a load instruction that is past an undetermined branch.
Sometimes speculative accesses are inappropriate, however. For example, attempting to access data at
addresses to which I/O devices are mapped can cause problems. If the I/O device is a serial port, reading it
speculatively could cause data to be lost.
prgmodel.fm.
September 12, 2002
Page 81 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
The architecture provides two mechanisms for protecting against errant accesses to such “non-well-behaved”
memory addresses. The first is the guarded (G) storage attribute, and protects against speculative data
accesses. The second is the execute permission mechanism, and protects against speculative instruction
fetches. Both of these mechanisms are described in Memory Management on page 133
2.10 Synchronization
The PPC440x5 supports the synchronization operations of the PowerPC Book-E architecture. There are
three kinds of synchronization defined by the architecture, each of which is described in the following
sections.
2.10.1 Context Synchronization
The context of a program is the environment in which the program executes. For example, the mode (user or
supervisor) is part of the context, as are the address translation space and storage attributes of the memory
pages being accessed by the program. Context is controlled by the contents of certain registers and other
resources, such as the MSR and the translation lookaside buffer (TLB).
Under certain circumstances, it is necessary for the hardware or software to force the synchronization of a
program’s context. Context synchronizing operations include all interrupts except Machine Check, as well as
the isync, sc, rfi, rfci, and rfmci instructions. Context synchronizing operations satisfy the following requirements:
1. The operation is not initiated until all instructions preceding the operation have completed to the point at
which they have reported any and all exceptions that they will cause.
2. All instructions preceding the operation must complete in the context in which they were initiated. That is,
they must not be affected by any context changes caused by the context synchronizing operation, or any
instructions after the context synchronizing operation.
3. If the operation is the sc instruction (which causes a System Call interrupt) or is itself an interrupt, then
the operation is not initiated until no higher priority interrupt is pending (see Interrupts and Exceptions on
page 159).
4. All instructions that follow the operation mustbe re-fetched and executed in the context that is established
by the completion of the context synchronizing operation and all of the instructions which preceded it.
Note that context synchronizing operations do not force the completion of storage accesses, nor do they
enforce any ordering amongst accesses before and/or after the context synchronizing operation. If such
behavior is required, then a storage synchronizing instruction must be used (see Storage Ordering andSynchronization on page 84).
Also note that architecturally Machine Check interrupts are not context synchronizing. Therefore, an instruction that precedes a context synchronizing operation can cause a Machine Check interrupt after the context
synchronizing operation occurs and additional instructions have completed. For the PPC440x5 core, this can
only occur with Data Machine Check exceptions, and not Instruction Machine Check exceptions.
The following scenarios use pseudocode examples to illustrate the effects of context synchronization. Subsequent text explains how software can further guarantee “storage ordering.”
1. Consider the following self-modifying code instruction sequence:
stw XYZStore to caching inhibited address XYZ
isync
Page 82 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
XYZfetch and execute the instruction at address XYZ
In this sequence, the isync instruction does not guarantee that the XYZ instruction is fetched after the
store has occurred to memory. There is no guarantee which XYZ instruction will execute; either the old
version or the new (stored) version might.
2. Now consider the required self-modifying code sequence:
stwWrite new instruction to data cache
dcbstPush the new instruction from the data cache to memory
msyncGuarantee that dcbst completes before subsequent instructions begin
icbi invalidate old copy of instruction in instruction cache
msyncGuarantee that icbi completes before subsequent instructions begin
isyncforce context synchronization, discard ed instructions and re-fetch, fetch of
PVR
OWNSystem-dependentPVR[OWN] value (after reset and otherwise) is specified by core input signals
PVNSystem-dependentPVR[PVN] value (after reset and otherwise) is specified by core input signals
stored instruction guaranteed to get new value
3. This final example illustrates the use of isync with context changes to the debug facilities
mtdbcr0Enable the instruction address compare (IAC) debug event
isyncWait for the new Debug Control Register 0 (DBCR0) context to be established
XYZThis instruction is at the IAC address; an isync is necessary to guarantee that the
IAC event is recognized on the execution of this instruction; without the isync, the
XYZ instruction may be prefetched and dispatched to execution before
recognizing that the IAC event has been enabled.
2.10.2 Execution Synchronization
Execution synchronization is a subset of context synchronization. An execution synchronizing operation
satisfies the first two requirements of context synchronizing operations, but not the latter two. That is, execution synchronizing operations guarantee that preceding instructions execute in the “old” context, but do not
guarantee that subsequent instructions operate in the “new” context. An example of a scenario requiring
execution synchronization would be just before the execution of a TLB-updating instructions (such as tlbwe).
An execution synchronizing instruction should be executed to guarantee that all preceding storage access
instructions have performed their address translations before executing tlbwe to invalidate an entry which
might be used by those preceding instructions.
There are four execution synchronizing instructions: mtmsr, wrtee, wrteei, and msync. Of course, all
context synchronizing instruction are also implicitly execution synchronizing, since context synchronization is
a superset of execution synchronization.
Note that PowerPC Book-E imposes additional requirements on updates to MSR[EE] (the external interrupt
enable bit). Specifically, if a mtmsr, wrtee, or wrteei instruction sets MSR[EE] = 1, and an External Input,
Decrementer, or Fixed Interval Timer exception is pending, the interrupt must be taken before the instruction
that follows the MSR[EE]-updating is executed. In this sense, these MSR[EE]-updating instructions can be
prgmodel.fm.
September 12, 2002
Page 83 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
thought of as being context synchronizing with respect to the MSR[EE] bit, in that it guarantees that subsequent instructions execute (or are prevented from executing and an interrupt taken) according to the new
context of MSR[EE].
2.10.3 Storage Ordering and Synchronization
Storage synchronization enforces ordering between storage access instructions executed by the PPC440x5
core. There are two storage synchronizing instructions: msync and mbar. PowerPC Book-E architecture
defines different ordering requirements for these two instructions, but the PPC440x5 core implementsthem in
an identical fashion. Architecturally, msync is the “stronger” of the two, and is also execution synchronizing,
whereas mbar is not.
mbar acts as a “barrier” between all storage access instructions executed before the mbar and all those
executed after the mbar. That is, mbar ensures that all of the storage accesses initiated by instructions
before the mbar are performed with respect to the memory subsystem before any of the accesses initiated by
instructions after the mbar. However, mbar does not prevent subsequent instructions from executing (nor
even from completing) before the completion of the storage accesses initiated by instructions before the
mbar.
msync, on the other hand, does guarantee that all preceding storage accesses have actually been
performed with respect to the memory subsystem before the execution of any instruction after the msync.
Note that this requirement goes beyond the requirements of mere execution synchronization, in that execution synchronization doesn’t require the completion of preceding storage accesses.
The following two examples illustrate the distinctive use of mbar vs. msync.
stwStore data to an I/O device
msyncWait for store to actually complete
mtdcrReconfigure the I/O device
In this example, the mtdcr is reconfiguring the I/O device in a manner which would cause the preceding store
instruction to fail, were the mtdcr to change the device before the completion of the store. Since mtdcr is not
a storage access instruction, the use of mbar instead of msync would not guarantee that the store is
performed before letting the mtdcr reconfigure the device. It only guarantees that subsequent storage
accesses are not performed to memory or any device before the earlier store.
Now consider this next example:
stb XStore data to an I/O device at address X, causing a status bit at address Y to be reset
mbarGuarantee preceding store is performed to the device before any subequent
storage accesses are performed
lbz YLoad status from the I/O device at address Y
Here, mbar is appropriate instead of msync, because all that is required is that the store to the I/O device
happens before the load does, but not that other instructions subsequent to the mbar won’t get executed
before the store.
Page 84 of 589
prgmodel.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
3. Initialization
This chapter describes the initial state of the PPC440x5 core after a hardware reset, and contains a description of the initialization software required to complete initialization so that the PPC440x5 core can begin
executing application code. Initialization of other on-chip and/or off-chip system components may also be
needed, in addition to the processor core initialization described in this chapter.
3.1 PPC440x5 Core State After Reset
In general, the contents of registers and other facilities within the PPC440x5 core are undefined after a hardware reset. Reset is defined to initialize only the minimal resources required such that instructions can be
fetched and executed from the initial program memory page, and so that repeatable, deterministic behavior
can be guaranteed provided that the proper software initialization sequence is followed. System software
must fully configure the rest of the PPC440x5 core resources, as well as the other facilities within the chip
and/or system.
The following list summarizes the requirements of the Book-E Enhanced PowerPC Architecture with regards
to the processor state after reset, prior to any additional initialization by software.
• All fields of the MSR are set to 0, disabling all asynchronous interrupts, placing the processor in supervisor mode, and specifying that instruction and data accesses are to the system (as opposed to application) address space.
• DBCR0[RST] is set to 0, thereby ending any previous software-initiated reset operation.
• DBSR[MRR] records the type of the just ended reset operation (core, chip, or system; see Reset Types
on page 89).
• TCR[WRC] is set to 0, thereby disabling the Watchdog timer reset operation.
• TSR[WRS] records the type of the just ended reset operation, if the reset was initiated by the Watchdog
Timer (otherwise this field is unchanged from its pre-reset value).
• The PVR is defined, after reset and otherwise, to contain a value that indicates the specific processor
implementation.
• The program counter (PC) is set to 0xFFFFFFFC, the effective address (EA) of the last word of the
address space.
The memory management resources are set to values such that the processor is able to successfully fetch
and execute instructions and read (but not write) data within the 4KB program memory page located at the
end of the 32-bit effective address space. Exactly how this is accomplished is implementation-dependent. For
example, it may or may not be the case that a TLB entry is established in a manner which is visible to software using the TLB management instructions. Regardless of how the implementation enables access to the
initial program memory page, instruction execution starts at the effective adddress of 0xFFFFFFFC, the last
word of the effective address space. The instruction at this address must be an unconditional branch backwards to the start of the initialization sequence, which must lie somewhere within the initial 4KB program
memory page. The real address to which the initial effective address willbe translated is also implementationor system-dependent, as are the various storage attributes of the initial program memory page such as the
caching inhibited and endian attributes.
Note: In the PPC440x5 core, a single entry is established in the instruction shadow TLB (ITLB) and data
shadow TLB (DTLB) at reset with the properties described in Table 3-1. It is required that initialization software insert an entry into the UTLB to cover this same memory region before performing any context synchro-
init.fm.
September 12, 2002
Page 85 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
nizing operation (including causing any exceptions which would lead to an interrupt), since a context
synchronizing operation will invalidate the shadow TLB entries.
Initialization software should consider all other resources within the PPC440x5 core to be undefined after
reset, in order for the initialization sequence to be compatible with other PowerPC implementations. There
are, however, additional core resources which are initialized by reset, in order to guarantee correct and deterministic operation of the processor during the initialization sequence. Table 3-1 shows the reset state of all
PPC440x5 core resources which are defined to be initialized by reset. While certain other register fields and
other facilities within the PPC440x5 core may be affected by reset, this is not an architectural nor hardware
requirement, and software must treat those resources as undefined. Likewise, even those resources which
are included in Table 3-1 but which are not identified in the previous list as being architecturally required,
should be treated as undefined by the initialization software.
During chip initialization, some chip control registers must be initialized to ensure proper chip operation.
Peripheral devices can also be initialized as appropriate for the system design.
Table 3-1. Reset Values of Registers and Other PPC440x5 Facilities
ResourceFieldReset ValueComment
CCR0
CCR1
DBCR0
DAPUIB0Enable broadcast of instruction data to auxiliary processor interface
DTB0Enable broadcast of trace information
ICDPEI0
ICTPEI0
DCTPEI0
Disable Parity Error Insertion (enabled only for s/w testing)
DCDPEI0
DCUPEI0
DCMPEI0
FCOM0Do not force cache ops to miss.
MMUPEI0Disable Parity Error Insertion (enabled only for s/w testing)
FFF0Flush only as much data from dirty lines as needed.
EDM0External Debug mode disabled
RST0b00Software-initiated debug reset disabled
ICMP0Instruction completion debug events disabled
BRT0Branch taken debug events disabled
IAC10Instruction Address Compare 1 (IAC1) debug events disabled
IAC20IAC2 debug events disabled
IAC30IAC3 debug events disabled
IAC40IAC4 debug events disabled
Page 86 of 589
init.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
Table 3-1. Reset Values of Registers and Other PPC440x5 Facilities
ResourceFieldReset ValueComment
UDE0Unconditional debug event has not occurred
Indicates most recent type of reset as follows:
00 No reset has occurred since this field last cleared by software
MRRReset-dependent
ICMP0Instruction completion debug event has not occurred
BRT0Branch taken debug event has not occurred
IRPT0Interrupt debug event has not occurred
TRAP0Trap debug event has not occurred
DBSR
ESRMCI0Synchronous Instruction Machine Check exception has not occurred
MCSRMCS0Asynchronous Instruction Machine Check exception has not occurred
MSR
PC0xFFFFFFFCInitial reset instruction fetched from last word of effective addess space
PVR
IAC10IAC1 debug event has not occurred
IAC20IAC2 debug event has not occurred
IAC30IAC3 debug event has not occurred
IAC40IAC4 debug event has not occurred
DAC1R0Data address compare 1 (DAC1) read debug event has not occurred
DAC1W0DAC1 write debug event has not occurred
DAC2R0DAC2 read debug event has not occurred
DAC2W0DAC2 write debug event has not occurred
RET0Return debug event has not occurred
WE0Wait state disabled
CE0Asynchronous critical interrupts disabled
EE0Asynchronous non-critical interrupts disabled
PR0Processor in supervisor mode
FP0Floating-point Unavailable interrupts disabledStorage
ME0Machine Check interrupts disabled
FE00Floating-point Enabled interrupts disabled
DWE0Debug Wait mode disabled
DE0Debug interrupts disabled
FE10Floating-point Enabled interrupts disabled
IS0Instruction fetch access is to system-level virtual address space
DS0Data access is to system level virtual address space
OWNSystem-dependentPVR[OWN] value (after reset and otherwise) is specified by core input signals
PVNSystem-dependentPVR[PVN] value (after reset and otherwise) is specified by core input signals
01 Core reset
10 Chip reset
11 System reset
init.fm.
September 12, 2002
Page 87 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Table 3-1. Reset Values of Registers and Other PPC440x5 Facilities
V1Translation table entry for the initial program memory page is valid.
TS0Initial program memory page is in system-level virtual address space.
SIZE0b0001Initial program memory page size is 4KB.
TID0x00
RPN
0:21
ERPNSystem-dependent
U0–U3System-dependent
W0Write-through storage attribute disabled.
1
I1Caching inhibited storage attribute enabled.
M0Memory coherent storage attribute disabled.
G1Guarded storage attribute enabled.
ESystem-dependentReset value of endian storage attribute is specified by a core input signal.
SX1Supervisor mode execution access enabled.
SW0Supervisor mode write access disabled.
SR1Supervisor mode read access enabled.
0xFFFFF
0xFFFFF || 0b00Initial program memory page mapped effective=real.
Copy of TCR[WRC] If reset caused by Watchdog Timer
UnchangedIf reset not caused by Watchdog Timer
UndefinedAfter power-up
All RSTCFG fields are specified by core input signals
Match EA of initial reset instruction (EPN
compared to the EA because the page size is 4KB).
Initial program memory page is globally shared; no match required against PID
register.
Extended real page number of the initial program memory page is specified by
core input signals.
Reset value of user-definable storage attributes are specified by core input signals
are undefined, as they are not
20:21
Note 1: “TLBentry” refers to an entry in the shadow instruction and data TLB arrays that is automatically
configured by the PPC440x5 core to enable fetching and reading (but not writing) from the initial
program memory page. This entry is not architecturally visible to software, and is invalidatedupon any
context synchronizing operation. Software must initialize a corresponding entry in the main unified
TLB array before executing any operation which could lead to a context synchronization. See
Initialization Software Requirements on page 89 for more information.
Page 88 of 589
September 12, 2002
init.fm.
User’s Manual
PreliminaryPPC440x5 CPU Core
3.2 Reset Types
The PPC440x5 core supports three types of reset: core, chip, and system. The type of reset is indicated by a
set of core input signals. For each type of reset, the core resources are initialized as indicated in Table 3-1 on
page 86. Core reset is intended to reset the PPC440x5 core without necessarily resetting the rest of the onchip logic. The chip reset operation is intended to reset the entire chip, but off-chip hardware in the system is
not informed of the reset operation. System reset is intended to reset the entire chip, and also to signal the
rest of the off-chip system that the chip is being reset.
3.3 Reset Sources
A reset operation can be initiated on the PPC440x5 core through the use of any of four separate mechanisms. The first is a set of three input signals to the core, one for each of the three reset types. These signals
can be asserted asynchronously by hardware outside the core to initiate a reset operation. The second reset
source is the TCR[WRC] field, which can be setup by software to initiate a reset operation upon certain
Watchdog Timer expiration events. The third reset source is the DBCR0[RST] field, which can be written by
software to immediately initiate a reset operation. The fourth reset source is the JTAG interface, which can be
used by a JTAG-attached debug tool to initiate a reset operation asynchronously to program execution on the
PPC440x5 core.
3.4 Initialization Software Requirements
After a reset operation occurs, the PPC440x5 core is initialized to a minimum configuration to enable the
fetching and execution of the software initialization code, and to guarantee deterministic behavior of the core
during the execution of this code. Initialization software is necessary to complete the configuration of the
processor core and the rest of the on-chip and off-chip system.
The system must provide non-volatile memory (or memory initialized by some mechanism other than the
PPC440x5 core) at the real address corresponding to effective address 0xFFFFFFFC, and at the rest of the
initial program memory page. The instruction at the initial address must be an unconditional branch backwards to the beginning of the initialization software sequence.
The initialization software functions described in this section perform the configuration tasks required to
prepare the PPC440x5 core to boot an operating system and subsequently execute an application program.
The initialization software must also perform functions associated with hardware resources that are outside
the PPC440x5 core, and hence that are beyond the scope of this manual. This section makes reference to
some of these functions, but their full scope is described in the user’s manual for the specific chip and/or
system implementation.
Initialization software should perform the following tasks in order to fully configure the PPC440x5 core. For
more information on the various functions referenced in the initialization sequence, see the corresponding
chapters of this document.
1. Branch backwards from effective address 0xFFFFFFFC to the start of the initialization sequence
init.fm.
September 12, 2002
Page 89 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
2. Invalidate the instruction cache (iccci)
3. Invalidate the data cache (dccci)
4. Synchronize memory accesses (msync)
This step forces any data PLB operations that may have been in progress prior to the reset operation to
complete, thereby allowing subsequent data accesses to be initiated and completed properly.
5. Clear DBCR0 register (disable all debug events)
Although the PPC440x5 core is defined to reset some of the debug event enables during the reset oper-
ation (as specified in Table 3-1 on page 86), this is not required by the architecture and hence the initialization software should not assume this behavior. Software should disable all debug events in order to
prevent non-deterministic behavior on the trace interface to the core.
6. Clear DBSR register (initialize all debug event status)
Although the PPC440x5 core is defined to reset the DBSR debug event status bits during the reset oper-
ation (as specified in Table 3-1 on page 86), this is not required by the architecture and hence the initialization software should not assume this behavior. Software should clear all such status in order to
prevent non-deterministic behavior on the JTAG interface to the core.
7. Initialize CCR0 register
1. Enable/disable broadcast of instructions to auxiliary processor (save power if no AP attached)
2. Enable/disable broadcast of trace information (save power if not tracing)
3. Enable/configure or disable speculative instruction cache line prefetching
4. Specify behavior for icbt and dcbt/dcbtst instructions
5. Enable/disable gathering of separate store accesses
6. Enable/disable hardware support for misaligned data accesses
8. Enable/disable cache read of parity bits depending on s/w compatibility requirements
8. Initialize CCR1 register
1. enable/disable full-line flushes as desired.
2. disable force cache-op miss (FCOM) and various parity error insertion (xxxPEI).
3. Users may wish to initialize CCR1[TCS] here, or in the timer facilities section.
9. Configure instruction and data cache regions
These steps must be performed prior to enabling the caches by setting the caching inhibited storage
attribute of the corresponding TLB entry to 0.
1. Clear the instruction and data cache normal victim index registers (INV0–INV3, DNV0–DNV3)
2. Clear the instruction and data cache transient victim index registers (ITV0–ITV3, DTV0–DTV3)
3. Set the instruction and data cache victim limit registers (IVLIM and DVLIM) according to the desired
size of the normal, locked, and transient regions of each cache
10. Setup TLB entry to cover initial program memory page
Since the PPC440x5 core only initializes an architecturally-invisible shadow TLB entry during the reset
operation, and since all shadow TLB entries are invalidated upon any context synchronization, special
Page 90 of 589
September 12, 2002
init.fm.
User’s Manual
PreliminaryPPC440x5 CPU Core
care must be taken during the initialization sequence to prevent any such context synchronizing operations (such as interrupts and the isync instruction) until after this step is completed, and an architected
TLB entry has been established in the TLB. Particular care should be taken to avoid store operations,
since write permission is disabled upon reset, and an attempt to execute any store operation would result
in a Data Storage interrupt, thereby invalidating the shadow TLB entry.
1. Initialize MMUCR
- Specify TID field to be written to TLB entries
- Specify TS field to be used for TLB searches
- Specify store miss allocation behavior
- Enable/disable transient cache mechanism
- Enable/disable cache locking exceptions
2. Write TLB entry for initial program memory page
- Specify EPN, RPN, ERPN, and SIZE as appropriate for system
- Set valid bit
- Specify TID = 0 (disable comparison to PID) or else initialize PID register to matching value
- Specify TS = 0 (system address space) or else MSR[IS,DS] must be set to correspond to TS=1
- Specify storage attributes (W, I, M, G, E, U0–U3) as appropriate for system
3. Initialize PID register to match TID field of TLB entry (unless using TID = 0)
4. Setup for subsequent MSR[IS,DS] initialization to correspond to TS field of TLB entry
Only necessary if TS field of TLB entry being set to 1 (MSR[IS,DS] already reset to 0)
- Write new MSR value into SRR1
- Write address from which to continue execution into SRR0
5. Setup for subsequent change in instruction fetch address
Only necessary if EPN field of TLB entry changed from the initial value (EPN
≠ 0xFFFFF)
0:19
- Write initial/new MSR value into SRR1
- Write address from which to continue execution into SRR0
6. Initialize or invalidate all other TLB entries as desired
7. Context synchronize to invalidate shadow TLB contents and cause new TLB contents to take effect
- Use isync if not changing MSR contents and not changing the effective address of the rest of
the initialization sequence
- Use rfi if changing MSR to match new TS field of TLB entry (SRR1 will be copied into MSR, and
program execution will resume at value in SRR0)
- Use rfi if changing next instruction fetch address to correspond to new EPN field of TLB entry
(SRR1 will be copied into MSR, and program execution will resume at value in SRR0)
Instruction and data caches will now begin to be used, if the corresponding TLB entry has been setup
with the caching inhibited storage attribute set to 0. Initialization software can now branch outside of
the initial 4KB memory region as controlled by the address and size of the new TLB entry and/or any
other TLB entries which have been setup.
init.fm.
September 12, 2002
Page 91 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
11. Initialize interrupt resources
1. Initialize IVPR to specify high-order address of the interrupt handling routines
Make sure that the corresponding address region is covered by a TLB entry (or entries)
2. Initialize IVOR0–IVOR15 registers (individual interrupt vector addresses)
Make sure that the corresponding addresses are covered by a TLB entry (or entries)
Because the low order four bits of IVOR0–IVOR15 are reserved, the values written to those bits are
ignored when the registers are written, and are read as zero when the registers are used. Therefore,
all interrupt vector offsets are implicitly aligned on quadword boundaries. Software must take care to
assure that all interrupt handlers are quadword-aligned.
3. Setup corresponding memory contents with the interrupt handling routines
4. Synchronize any program memory changes as required. (See Self-Modifying Code on page 106 for
more information on the instruction sequence necessaryto synchronize changes to program memory
prior to executing the new instructions.)
12. Configure debug facilities as desired
1. Write DBCR1 and DBCR2 to specify IAC and DAC event conditions
2. Clear DBSR to initialize IAC auto-toggle status
3. Initialize IAC1–IAC4, DAC1–DAC2, DVC1–DVC2 registers to desired values
4. Write MSR[DWE] to enable Debug Wait mode (if desired)
5. Write DBCR0 to enable desired debug mode(s) and event(s)
6. Context synchronize to establish new debug facility context (isync)
13. Configure timer facilities as desired
1. Write DEC to 0 to prevent Decrementer exception after TSR is cleared
2. Write TBL to 0 to prevent Fixed Interval Timer and Watchdog Timer exceptions after TSR is cleared,
and to prevent increment into TBH prior to full initialization
3. CCR1[TCS] (Timer Clock Select) can be initialized here, or earlier with the rest of the CCR1.
4. Clear TSR to clear all timer exception status
5. Write TCR to configure and enable timers as desired
Software must take care with respect to the enabling of the Watchdog Timer reset function, as once
this function is enabled, it cannot be disabled except by reset itself
6. Initialize TBH value as desired
7. Initialize TBL value as desired
8. Initialize DECAR to desired value (if enabling the auto-reload function)
9. Initialize DEC to desired value
14. Initialize facilities outside the processor core which are possible sources of asynchronous interrupt
requests (including DCRs and/or other memory-mapped resources)
This must be done prior to enabling asynchronous interrupts in the MSR
15. Initialize the MSR to enable interrupts as desired
Page 92 of 589
September 12, 2002
init.fm.
User’s Manual
PreliminaryPPC440x5 CPU Core
1. Set MSR[CE] to enable/disable Critical Input and Watchdog Timer interrupts
2. Set MSR[EE] to enable/disable External Input, Decrementer, and Fixed Interval Timer interrupts
3. Set MSR[DE] to enable/disable Debug interrupts
4. Set MSR[ME] to enable/disable Machine Check interrupts
Software should first check the status of the ESR[MCI] field and MCSR[MCS] field to determine
whether any Machine Check exceptions have occurred after these fields were cleared by reset and
before Machine Check interrupts were enabled (by this step). Any such exceptions would have set
ESR[MCI] or MCSR[MCS] to 1, and this status can only be cleared explicitly by software. After the
MCSR[MCS] field is known to be clear, the MCSR status bits (MCSR[1:8]) should be cleared by software to avoid possible confusion upon later service of a machine check interrupt. Once MSR[ME]
has been set to 1, subsequent Machine Check exceptions will result in a Machine Check interrupt.
5. Context synchronize to establish new MSR context (isync)
16. Initialize any other processor core resources as required by the system (GPRs, SPRGs, and so on)
17. Initialize any other facilities outside the processor core as required by the system
18. Initialize system memory as required by the system software
Synchronize any program memory changes as required. (See Self-Modifying Code on page 106 for more
information on the instruction sequence necessary to synchronize changes to program memory prior to
executing the new instructions)
19. Start the system software
System software is generally responsible for initializing and/or managing the rest of the MSR fields,
including:
1. MSR[FP] to enable or disable the execution of floating-point instructions
2. MSR[FE0,FE1] to enable/disable Floating-Point Enabled exception type Program interrupts
3. MSR[PR] to specify user mode or supervisor mode
4. MSR[IS,DS] to specify application address space or system address space for instructions and data
5. MSR[WE] to place the processor into Wait State (halt execution pending an interrupt)
init.fm.
September 12, 2002
Page 93 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Page 94 of 589
init.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
4. Instruction and Data Caches
The PPC440x5 core provides separate instruction and data cache controllers and arrays, which allow concurrent access and minimize pipeline stalls. The storage capacity of the cache arrays, which can range from
8KB–32KB each, depends upon the implementation. Both cache controllers have 32-byte lines, and both are
highly associative, having 64-way set-associativity for 32KB and 16KB sizes, and 32-way set-associativity for
the 8KB size. The PowerPC instruction set provides a rich set of cache management instructions for software-enforced coherency. The PPC440x5 implementation also provides special debug instructions that can
directly read the tag and data arrays. The cache controllers interface to the processor local bus (PLB) for
connection to the IBM CoreConnect system-on-a-chip environment.
Both the data and instruction caches are parity protected against soft errors. If such errors are detected, the
CPU will vector to the machine check interrupt handler, where software can take appropriate action. The
details of suggested interrupt handling are described below in section 4.2, “Instruction Cache Controller,” and
in section 4.3, “Data Cache Controller.”
The rest of this chapter provides more detailed information about the operation of the instruction and data
cache controllers and arrays.
4.1 Cache Array Organization and Operation
The instruction and data cache arrays are organized identically, although the fields of the tag and data
portions of the arrays are slightly different because the functions of the arrays differ, and because the instruction cache is virtually tagged while the data cache has real tags.
The associativity of each cache varies according to its size: the 32KB and 16KB cache sizes are 64-way setassociative, while the 8KB cache size is 32-way set-associative. Accordingly, the number of “sets” in each
cache varies according to its size: the 32KB cache has 16 sets, while the 16KB and 8KB caches have 8 sets.
Regardless of cache array size, the cache line size is always 32 bytes.
The organization of the cache into “ways” and “sets” is as follows. Using the 32KB cache as an example,
there are 64 ways in each set, with a set consisting of all 64 lines (one line from each way) at which a given
memory location can reside. Conversely, and again using the 32KB cache as an example, there are 16 sets
in each way, with a way consisting of 16 lines (one from each set).
Table 4-1 on page -96 illustrates generically the ways and sets of the cache arrays, for any cache size, while
Table 4-2 on page -96 provides specific values for the parameters used in Table 4-1, for the different cache
sizes. As shown in Table 4-2, the tag field for each line in each way holds the high-order address bits associ-
cache.fm.
September 12, 2002
Page 95 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
ated with the line that currently resides in that way. The middle-order address bits form an index to select a
specific set of the cache, while the five lowest-order address bits form a byte-offset to choose a specific byte
(or bytes, depending on the size of the operation) from the 32-byte cache line.
Table 4-1. Instruction and Data Cache Array Organization
Way 0Way 1
Set 0Line 0Line n
Set 1Line 1Line n+1
•
•
•
Set n – 2Linen – 2Line 2n –2
Set
n – 1Line n – 1Line 2n –1
•
•
•
•
•
•
• • •
• • •
• • •
•
•
•
• • •
• • •
Way w –2Wayw –1
Line (w –2)nLine (w –1)n
Line (w –2)n + 1Line (w –1)n +1
•
•
•
Line (w –1)n – 2Line wn –2
Line (w –1)n – 1Line wn –1
Table 4-2. Cache Sizes and Parameters
Array Sizew (Ways)n (Sets)
8KB328A
16KB648A
32KB6416A
Tag
Address Bits
0:23
0:23
0:22
1
Set
Address Bits
A
24:26
A
24:26
A
23:26
Byte Offset
Address Bits
A
27:31
A
27:31
A
27:31
Note 1: The tag address bits shown in the table refer to the effective address bits,
and are for illustrative purposes only. Because the instruction cache is
tagged with the virtual address, and the data cache is tagged with the real
address, the actual tag address bits contained within each array are
different. See Figure 4-8 and Figure 4-9 on page 113 for instruction cache
tag information, and Figure 4-10 and Figure 4-11 on page 128 for data
cache tag information. Also, see “Instruction Cache Synonyms” on
page -107 fordetails oninstruction cache synonyms associated with the use
of virtual tags for the instruction cache.
•
•
•
4.1.1 Cache Line Replacement Policy
Memory addresses are specified as being cacheable or caching inhibited on a page basis, using the caching
inhibited (I) storage attribute (see Caching Inhibited (I) on page 145). When a program references a cacheable memory location and that location is not already in the cache (a cache miss), the line may be brought
into the cache (a cache line fill operation) and placed into any one of the ways within the set selected by the
middle portion of the address (the specific address bits that select the set are specified in Table 4-2). If the
particular way within the set already contains a valid line from some other address, the existing line is
removed and replaced by the newly referenced line from memory. The line being replaced is referred to as
the victim.
The way selected to be the victim for replacement is controlled by a field within a Special Purpose Register
(SPR). There is a separate “victim index field” for each set within the cache. The registers controlling the
victim selection are shown in Figure 4-1.
Registers (ITV0–ITV3)Data Cache Normal Victim Registers (DNV0–DNV3) Data Cache Transient
Victim Registers (DTV0–DTV3)
For all victim index fields, the number of bits used
to select the cache way for replacement depends
on the implemented cache size. See Table 4-3,” on
page -98
for more information.
0:7VNDXA
8:15VNDXB
16:23VNDXC
24:31VNDXD
Victim Index A (for cache lines with EA[25:26] =
0b00)
Victim Index B (for cache lines with EA[25:26] =
0b01)
Victim Index C (for cache lines with EA[25:26] =
0b10)
Victim Index D (for cache lines with EA[25:26] =
0b11)
Each of the 16 SPRs illustrated in Figure 4-1 can be written from a GPR using mtspr, and can be read into a
GPR using mfspr. In general, however, these registers are initialized by software once at startup, and then
are managed automatically by hardware after that. Specifically, every time a new cache line is placed into the
cache, the appropriate victim index field (as controlled by the type of access and the particular cache set
being updated) is first referenced to determine which way within that set should be replaced. Then, that same
field is incremented such that the ways within that set are replaced in a round-robin fashion as each new line
is brought into that set. When the victim index field value reaches the index of the last way (according to the
size of the cache and the type of access being performed), the value is wrapped back to the index of the first
way for that type of access. The first and last ways for the different types of accesses are controlled by fields
in a pair of victim limit SPRs, one for each cache (see Cache Locking and Transient Mechanism on page 99
for more information).
cache.fm.
September 12, 2002
Page 97 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
The size of the victim index fields varies according to the size of the respective cache. Also, which field is
used varies according to the type of access, the size of the cache, and the address of the cache line.
Table 4-3 describes the correlation between the victim index fields and different access types, cache sizes,
and addresses.
Note 1: In the victim index field columns, the “xx” in the SPR name refers
to one of “IN”, “IT”, “DN”, or “DT”, depending on whether the
access is to the instruction or data cache, and whether it is a
“normal” or a “transient” access (See Cache Locking andTransient Mechanism on page 99.)
Note 2: As shown in the table, the 8KB cache size only uses bits 3:7 of
the victim index fields to select a way, since there are only 32
ways. Similarly, the 16KB and 32KB cache sizes uses bits 2:7 of
the victim index fields, since those cache sizes have 64 ways. In
all cases, the unused bits of the victim index fields are reserved.
The size of the fields of the victim limit registers (IVLIM, DVLIM)
are similarly affected by the number of sets in the cache (See
Cache Locking and Transient Mechanism on page 99.)
Note 3: Since the 8KB and 16KB cache sizes only have8 sets, they only
use Address
to select the set and the victim index field, and
24:26
thus they do not use the xxV2 and xxV3 SPRs.
Page 98 of 589
cache.fm.
September 12, 2002
User’s Manual
PreliminaryPPC440x5 CPU Core
4.1.2 Cache Locking and Transient Mechanism
Both caches support locking, at a “way” granularity. Any number of ways can be locked, from 0 ways to one
less than the total number of ways (64 ways for 32KB and 16KB cache sizes, 32 ways for the 8KB cache
size). At least one way must always be left unlocked, for use by cacheable line fills. Each way contains one
line from each set; that is, either 16 lines (512 bytes), for the 32KB cache size, or 8 lines (256 bytes), for the
16KB and 8KB cache sizes.
In addition, a portion of each cache can be designated as a “transient” region, by specifying that only a limited
number of ways are used for cache lines from memory pages that are identified as being transient in nature
by a storage attribute from the MMU (see Memory Management on page 133). For the instruction cache,
such memory pages can be used for code sequences that are unlikely to be reused once the processor
moves on to the next series of instruction lines. Thus, performance may be improved by preventing each
series of instruction lines from overwriting the rest of the “regular” code in the instruction cache. Similarly, for
the data cache, transient pages can be used for large “streaming” data structures, such as multimedia data.
As each piece of the data stream is processed and written back to memory, the next piece can be brought in,
overwriting the previous (now obsolete) cache lines instead of displacing other areas of the cache, which may
contain other data that should remain in the cache.
A set of fields in a pair of victim limit registers specifies which ways of the cache are used for normal
accesses and/or transient accesses, as well as which ways are locked. These registers, Instruction Cache
Victim Limit (IVLIM) and Data Cache Victim Limit (DVLIM), are illustrated in Figure 4-2. They can be written
from a GPR using mtspr, and can be read into a GPR using mfspr.
The number of bits in the TFLOOR field varies,
depending on the implemented cache size. See
Table 4-3,” on page -98 for more information.
The number of bits in the TCEILING field varies,
depending on the implemented cache size. See
Table 4-3,” on page -98 for more information.
The number of bits in the NFLOOR field varies,
depending on the implemented cache size. See
Table 4-3,” on page -98 for more information.
When a cache line fill occurs as the result of a normal memory access (that is, one not marked as transient
using the U1 storage attribute from the MMU; see Memory Management on page 133), the cache line to be
replaced is selected by the corresponding victim index field from one of the normal victim index registers
(INV0–INV3 for instruction cache lines, DNV0–DNV3 for data cache lines). As the processor increments any
of these normal victim index fields according to the round-robin mechanism described in Cache Line
cache.fm.
September 12, 2002
Page 99 of 589
User’s Manual
PPC440x5 CPU CorePreliminary
Replacement Policy on page 96, the values of the fields are constrained to lie within the range specified by
the NFLOOR field of the corresponding victim limit register, and the last way of the cache (way 31 for the 8KB
cache size, way 63 for the 16KB or 32KB cache size). That is, when one of the normal victim index fields is
incremented past the last way of the cache, it wraps back to the value of the NFLOOR field of the associated
victim limit register.
Similarly, when a cache line fill occurs as the result of a transient memory access, the cache line to be
replaced is selected by the corresponding victim index field from one of the transient victim index registers
(ITV0–ITV3 for instruction cache lines, DTV0–DTV3 for data cachelines). As the processor increments any of
these transient victim index fields according to the round-robin replacement mechanism, the values of the
fields are constrained to lie within the range specified by the TFLOOR and the TCEILING fields of the corresponding victim limit register. That is, when one of the transient victim index fields is incremented past the
TCEILING value of the associated victim limit register, it wraps back to the value of the TFLOOR field of that
victim limit register.
Given the operation of this mechanism, if both the NFLOOR and TFLOOR fields are set to 0, and the
TCEILING is set to the index of the last way of the cache, then all cache line fills—both normal and transient—are permitted to use the entire cache, and nothing is locked. Alternatively, if both the NFLOOR and
TFLOOR fields are set to values greater than 0, the lines in those ways of the cache whose indexes are
between 0 and the lower of the two floor values are effectively locked, as no cache line fills (neither normal
nor transient) will be allowed to replace the lines in those ways. Yet another example is when the TFLOOR is
lower than the NFLOOR, and the TCEILING is lower than the last way of the cache. In this scenario, the ways
between the TFLOOR and the NFLOOR contain only transient lines, while the ways between the NFLOOR
and the TCEILING may contain either normal or transient lines, and the ways from the TCEILING to the last
way of the cache contain only normal lines.
Programming Note: It is a programming error for software to program the TCEILING field to a
value lower than that of the TFLOOR field. Furthermore, software must
initialize each of the normal and transient victim index fields to values that
are between the ranges designated by the respective victim limit fields,
prior to performing any cacheable accesses intended to utilize these
ranges.
In order to setup a locked area within the data cache, software must perform the following steps (the procedure for the instruction cache is similar, with icbt instructions substituting for dcbt instructions):
1. Execute msync and then isync to guarantee all previous cache operation have completed.
2. Mark all TLB entries associated with memory pages which are being used to perform the locking function
as caching-inhibited. Leave the TLB entries associated with the memory pages containing the data which
is to be locked into the data cache marked as cacheable, however.
3. Execute msync and then isync again, to cause the new TLB entry values to take effect.
4. Set both the NFLOOR and the TFLOOR values to the index of the first way which should be locked, and
set the TCEILING value to the last way of the cache.
5. Set each of the normal and transient victim index fields to the same value as the NFLOOR and TFLOOR.
6. Execute dcbt instructions to the cache lines within the cacheable memory pages which contain the data
which is to be locked in the data cache. The number of dcbt instructions executed to any given set should
not exceed the number of ways which will exist in the locked region (otherwise not all of the lines will be
able to be simultaneously locked in the data cache). Remember that when a series of dcbt instructions
are executed to sequentially increasing addresses (with the address increment being the size of a cache
Page 100 of 589
cache.fm.
September 12, 2002
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.