The Xilinx logo shown above is a registered trademark of Xilinx, Inc.
The shadow X shown above is a trademark of Xilinx, Inc.
"Xilinx" and the Xilinx logo are registered trademarks of Xilinx, Inc. Any rights not expressly granted herein are reserved.
CoolRunner, RocketChips, Rocket IP, Spartan, StateBENCH, StateCAD, Virtex, XACT, XC2064, XC3090, XC4005, XC5210 are registered
Trademarks of Xilinx, Inc.
ACE Controller, ACE Flash, A.K.A. Speed, Alliance Series, AllianceCORE, Bencher, ChipScope, Configurable Logic Cell, CORE Generator,
CoreLINX, Dual Block, EZTag, Fast CLK, Fast CONNECT, Fast FLASH, FastMap, Fast Zero Power, Foundation, Gigabit Speeds...and Beyond!,
HardWire, HDL Bencher, IRL, J Drive, JBits, LCA, LogiBLOX, Logic Cell, LogiCORE, LogicProfessor, MicroBlaze, MicroVia, MultiLINX, NanoBlaze, PicoBlaze, PLUSASM, PowerGuide, PowerMaze, QPro, Real-PCI, Rocket I/O, SelectI/O, SelectRAM, SelectRAM+, Silicon Xpresso,
Smartguide, Smart-IP, Smar tSearch, SMARTswitch, System ACE, Testbench In A Minute, TrueMap, UIM, VectorMaze, VersaBlock, VersaRing,
Virtex-II Pro, Wave Table, WebFITTER, WebPACK, WebPOWERED, XABEL, XACT-Floorplanner, XACT-Performance, XACTstep Advanced,
XACTstep Foundry, XAM, XAPP, X-BLOX +, XC designated products, XChecker, XDM, XEPLD, Xilinx Foundation Series, Xilinx XDTV, Xinfo,
XSI, XtremeDSP and ZERO+ are trademarks of Xilinx, Inc.
The Programmable Logic Company is a service mark of Xilinx, Inc.
The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both:
IBM IBM Logo PowerPC PowerPC Logo Blue Logic CoreConnect CodePack
All other trademarks are the property of their respective owners.
Xilinx does not assume any liability arising out of the application or use of any product described or shown herein; nor does it convey any license
under its patents, copyrights, or maskwork rights or any rights of others. Xilinx reserves the right to make changes, at any time, in order to
improve reliability, function or design and to supply the best product possible. Xilinx will not assume responsibility for the use of any circuitry
described herein other than circuitry entirely embodied in its products. Xilinx provides any design, code, or information shown or described herein
"as is." By providing the design, code, or information as one possible implementation of a feature, application, or standard, Xilinx makes no representation that such implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your
implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of any such implementation, including but not
limited to any warranties or representations that the implementation is free from claims of infringement, as well as any implied warranties of merchantability or fitness for a particular purpose. Xilinx assumes no obligation to correct any errors contained herein or to advise any user of this
text of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or software support
or assistance provided to a user.
Xilinx products are not intended for use in life support appliances, devices, or systems. Use of a Xilinx product in such applications without the
written consent of the appropriate Xilinx officer is prohibited.
•IBM PowerPC Embedded Processors Application Note: Programming Model Differences
of the IBM PowerPC 400 Family and 600/700 Family Processors.
Document Organization
•Chapter 1, Introduction to the PPC405, provides a general understanding of the
PPC405 as an implementation of the PowerPC embedded-environment architecture.
This chapter also contains an overview of the features supported by the PPC405.
•Chapter 2, Operational Concepts, introduces the processor operating modes,
execution model, synchronization, operand conventions, and instruction conventions.
•Chapter 3, User Programming Model, describes the registers and instructions
available to application software.
•Chapter 4, PPC405 Privileged-Mode Programming Model, introduces the registers
and instructions available to system software.
•Chapter 5, Memory-System Management, describes the operation of the memory
system, including caches. Real-mode storage control is also described in this chapter.
translation as supported by the PPC405. Virtual-mode storage control is also
described in this chapter.
•Chapter 7, Exceptions and Interrupts, provides details of all exceptions recognized by
the PPC405 and how software can use the interrupt mechanism to handle exceptions.
•Chapter 8, Timer Resources, describes the timer registers and timer-interrupt controls
available in the PPC405.
•Chapter 9, Debugging, describes the debug resources available to software and
hardware debuggers.
•Chapter 10, Reset and Initialization, describes the state of the PPC405 following reset
®
405D5 processor. It combines information from the
March 2002 Releasewww.xilinx.com311Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Preface
and the requirements for initializing the processor.
•Chapter 11, Instruction Set, provides a detailed description of each instruction
supported by the PPC405.
•Appendix A, Register Summary, is a reference of all registers supported by the
PPC405.
•Appendix B, Instruction Summary, lists all instructions sorted by mnemonic, opcode,
function, and form. Each entry for an instruction shows its complete encoding.
General instruction-set information is also provided.
•Appendix C, Simplified Mnemonics, lists the simplified mnemonics recognized by
many PowerPC assemblers. These mnemonics provide a shorthand means of
specifying frequently-used instruction encodings and can greatly improve assembler
code readability.
•Appendix D, Programming Considerations, provides information on improving
performance of software written for the PPC405.
®
•Appendix E, PowerPC
6xx/7xx Compatibility, describes the programming model
differences between the PPC405 and PowerPC 6xx and 7xx series processors.
®
•Appendix F, PowerPC
Book-E Compatibility, describes the programming model
differences between the PPC405 and PowerPC Book-E processors.
Document Conventions
General Conventions
Ta bl e 1 lists the general notational conventions used throughout this document.
Table P-1: General Notational Conventions
ConventionDefinition
mnemonicInstruction mnemonics are shown in lower-case bold.
. (period)Update. When used as a character in an instruction
! (exclamation)In instruction listings, an exclamation (!) indicates the
variableVariable items are shown in italic.
<optional>Optional items are shown in angle brackets.
ActiveLow
nA decimal number.
0xnA hexadecimal number.
mnemonic, a period (.) means that the instruction
updates the condition-register field.
start of a comment.
An overbar indicates an active-low signal.
0bnA binary number.
(rn)The contents of GPR rn.
(rA|0)The contents of the register rA, or 0 if the rA instruction
field is 0.
cr_bitUsed in simplified mnemonics to specify a CR-bit
Table P-1: General Notational Conventions (Continued)
ConventionDefinition
cr_fieldUsed in simplified mnemonics to specify a CR field
(0 to 7) used as an operand.
OBJECT
OBJECT
OBJECT
REGISTER[FIELD]Fields within any register are shown in square brackets.
REGISTER[FIELD, FIELD
REGISTER[FIELD:FIELD]A
Instruction Fields
Ta bl e 2 lists the instruction fields used in the various instruction formats. They are found in
the instruction encodings and pseudocode, and are referred to throughout this document
when describing instructions. The table includes the bit locations for the field within the
instruction encoding.
Table P-2: Instruction Field Definitions
FieldLocationDescription
b
b:b
b,b, . . .
A single bit in any object (a register, an instruction, an
address, or a field) is shown as a subscripted number or
name.
A range of bits in any object (a register, an instruction,
an address, or a field).
A list of bits in any object (a register, an instruction, an
address, or a field).
]A list of fields in any register.
. . .
range of fields in any register.
AA30Absolute-address bit (branch instructions).
0—The immediate field represents an address relative to the
current instruction address (CIA). The effective address (EA) of
the branch is either the sum of the LI field sign-extended to 32
bits and the branch instruction address, or the sum of the BD
field sign-extended to 32 bits and the branch instruction address.
1—The immediate field represents an absolute address. The EA of
the branch is either the LI field or the BD field, sign-extended to
32 bits.
BD16:29An immediate field specifying a 14-bit signed two’s-complement
branch displacement. This field is concatenated on the right with
0b00 and sign-extended to 32 bits.
BI11:15Specifies a bit in the CR used as a source for the condition of a
conditional-branch instruction.
BO6:10Specifies options for conditional-branch instructions. See
Conditional Branch Control, page 367
crbA11:15Specifies a bit in the CR used as a source of a CR-logical instruction.
crbB16:20Specifies a bit in the CR used as a source of a CR-logical instruction.
crbD6:10Specifies a bit in the CR used as a destination of a CR-Logical
instruction.
March 2002 Releasewww.xilinx.com313Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Preface
Table P-2: Instruction Field Definitions (Continued)
FieldLocationDescription
crfD6:8Specifies a field in the CR used as a target in a compare or mcrf
instruction.
crfS11:13Specifies a field in the CR used as a source in a mcrf instruction.
CRM12:19The field mask used to identify CR fields to be updated by the
mtcrf instruction.
d16:31Specifies a 16-bit signed two’s-complement integer displacement
for load/store instructions.
DCRF11:20A split field used to specify a device control register (DCR). The
field is used to form the DCR number (DCRN).
E16A single-bit immediate field in the wrteei instruction specifying the
value to be written to the MSR[EE] bit.
LI6:29An immediate field specifying a 24-bit signed two’s-complement
branch displacement. This field is concatenated on the right with
0b00 and sign-extended to 32 bits.
LK31Link bit.
0—Do not update the link register (LR).
1—Update the LR with the address of the next instruction.
MB21:25Mask begin. Used in rotate-and-mask instructions to specify the
beginning bit of a mask.
ME26:30Mask end. Used in rotate-and-mask instructions to specify the
ending bit of a mask.
NB16:20Specifies the number of bytes to move in an immediate-string load
or immediate-string store.
OE21Enables setting the OV and SO fields in the fixed-point exception
register (XER) for extended arithmetic.
OPCD0:5Primary opcode. Primary opcodes, in decimal, appear in the
instruction format diagrams presented with individual
instructions. The OPCD field name does not appear in instruction
descriptions.
rA11:15Specifies a GPR source operand and/or destination operand.
rB16:20Specifies a GPR source operand.
Rc31Record bit.
0—Instruction does not update the CR.
1—Instruction updates the CR to reflect the result of an
operation.
See Condition Register (CR), page 361 for a further discussion of
how the CR bits are set.
Table P-2: Instruction Field Definitions (Continued)
FieldLocationDescription
SIMM16:31An immediate field used to specify a 16-bit signed-integer value.
SPRF11:20A split field used to specify a special purpose register (SPR). The
field is used to form the SPR number (SPRN).
TBRF11:20A split field used to specify a time-base register (TBR). The field is
used to form the TBR number (TBRN).
TO6:10Specifies the trap conditions, as defined in the tw and twi
instruction descriptions.
UIMM16:31An immediate field used to specify a 16-bit unsigned-integer value.
XO21:30Extended opcode for instructions without an OE field. Extended
opcodes, in decimal, appear in the instruction format diagrams
presented with individual instructions. The XO field name does
not appear in instruction descriptions.
XO22:30Extended opcode for instructions with an OE field. Extended
opcodes, in decimal, appear in the instruction format diagrams
presented with individual instructions. The XO field name does
not appear in instruction descriptions.
Pseudocode Conventions
Ta bl e 3 lists additional conventions used primarily in the pseudocode describing the
%Remainder of an integer division. For example, (33 % 32) = 1.
||Concatenation
=, ≠Equal, not-equal relations
<, >Signed comparison relations
u
u
, Unsigned comparison relations
>
<
c
0:3
A four-bit object used to store condition results in compare
instructions.
March 2002 Releasewww.xilinx.com315Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Preface
Table P-3: Pseudocode Conventions (Continued)
ConventionDefinition
n
bThe bit or bit value b is replicated n times.
xBit positions that are don’t-cares.
CEIL(n)Least integer ≥ n.
CIACurrent instruction address. The 32-bit address of the instruction
being described by a sequence of pseudocode. This address is
used to set the next instruction address (NIA). Does not
correspond to any architected register.
DCR(DCRN)A specific device control register, as indicated by DCRN.
DCRNThe device control register number formed using the split DCRF
field in a mfdcr or mtdcr instruction.
doDo loop. “to” and “by” clauses specify incrementing an iteration
variable. “while” and “until” clauses specify terminating
conditions. Indenting indicates the scope of a loop.
EAEffective address. The 32-bit address that specifies a location in
main storage. Derived by applying indexing or indirect
addressing rules to the specified operand.
EXTS(n)The result of extending
if...then...else...Conditional execution: if
n on the left with sign bits.
condition then a else b, where a and b
represent one or more pseudocode statements. Indenting
indicates the ranges of
a and b. If b is null, the else does not
appear.
instruction(EA)An instruction operating on a data-cache block or instruction-
cache block associated with an EA.
leaveLeave innermost do-loop or the do-loop specified by the leave
statement.
MASK(MB,ME)Mask having 1’s in positions MB through ME (wrapping if
MB > ME) and 0’s elsewhere.
MS(addr, n)The number of bytes represented by
storage represented by
addr.
n at the location in main
NIANext instruction address. The 32-bit address of the next
instruction to be executed. In pseudocode, a successful branch is
indicated by assigning a value to NIA. For instructions that do
not branch, the NIA is CIA +4.
RESERVEReserve bit. Indicates whether a process has reserved a block of
storage.
ROTL((RS),n)Rotate left. The contents of RS are shifted left the number of bits
specified by
n.
SPR(SPRN)A specific special-purpose register, as indicated by SPRN.
SPRNThe special-purpose register number formed using the split
SPRF field in a mfspr or mtspr instruction
TBR(TBRN)A specific time-base register, as indicated by TBRN.
TBRNThe time-base register number formed using the split TBRF field
in a mftb instruction.
Operator Precedence
Ta bl e 4 lists the pseudocode operators and their associativity in descending order of
precedence
:
Table P-4: Operator Precedence
OperatorsAssociativity
Registers
REGISTER
n
bRight to left
, REGISTER[FIELD], function evaluationLeft to right
b
¬, – (unary minus)Right to left
×, ÷Left to right
+, –Left to right
||Left to right
u
≠, <, >, , Left to right
=,
u
>
<
∧, ⊕Left to right
∨Left to right
←None
Ta bl e 5 lists the PPC405 registers and their descriptive names.
Table P-5: PPC405 Registers
RegisterDescriptive Name
CCR0Core-configuration register 0
CRCondition register
CTRCount register
DACnData-address compare n
DBCRnDebug-control register n
DBSRDebug-status register
DCCRData-cache cacheability register
DCWRData-cache write-through register
March 2002 Releasewww.xilinx.com317Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Preface
Table P-5: PPC405 Registers (Continued)
RegisterDescriptive Name
DEARData-error address register
DVCnData-value compare n
ESRException-syndrome register
EVPRException-vector prefix register
GPRGeneral-purpose register. Specific GPRs are identified using the
notational convention rn (see below)
IACnInstruction-address compare n
ICCRInstruction-cache cacheability register
ICDBDRInstruction-cache debug-data register
LRLink register
MSRMachine-state register
PIDProcess ID
PITProgrammable-interval timer
Terms
PVRProcessor-version register
rnSpecifies GPR n (r15, for example)
SGRStorage-guarded register
SLERStorage little-endian register
SPRGnSPR general-purpose register n
SRRnSave/restore register n
SU0RStorage user-defined 0 register
TBLTime-base lower
TBUTime-base upper
TCRTimer-control register
TSRTimer-status register
USPRGnUser SPR general-purpose register n
XERFixed-point exception register
ZPRZone-protection register
atomic access
A memory access that attempts to read from and write to the
same address uninterrupted by other accesses to that address.
The term refers to the fact that such transactions are indivisible.
big endian
A memory byte ordering where the address of an item
corresponds to the most-significant byte.
An version of the PowerPC architecture designed specifically
for embedded applications.
Synonym for cacheline.
A portion of a cache array that contains a copy of contiguous
system-memory addresses. Cachelines are 32-bytes long and
aligned on a 32-byte address.
To write a bit value of 0.
Synonym for congruence class.
A collection of cachelines with the same index.
An indication that cache information is more recent than the
copy in memory.
Eight bytes, or 64 bits.
The untranslated memory address as seen by a program.
An abnormal event or condition that requires the processor’s
attention. They can be caused by instruction execution or an
external device. The processor records the occurrence of an
exception and they often cause an interrupt to occur.
A buffer that receives and sends data and instructions between
the processor and PLB. It is used when cache misses occur and
when access to non-cacheable memory occurs.
flush
GB
halfword
hit
interrupt
invalidate
KB
line buffer
little endian
logical address
MB
A cache or TLB operation that involves writing back a modified
entry to memory, followed by an invalidation of the entry.
Gigabyte, or one-billion bytes.
Two bytes, or 16 bits.
For cache arrays and TLB arrays, an indication that requested
information exists in the accessed array.
The process of stopping the currently executing program so that
an exception can be handled.
A cache or TLB operation that causes an entry to be marked as
invalid. An invalid entry can be subsequently replaced.
Kilobyte, or one-thousand bytes.
A buffer located in the cache array that can temporarily hold the
contents of an entire cacheline. It is loaded with the contents of
a cacheline when a cache hit occurs.
A memory byte ordering where the address of an item
corresponds to the least-significant byte.
Synonym for effective address.
Megabyte, or one-million bytes.
memory
miss
Collectively, cache memory and system memory.
For cache arrays and TLB arrays, an indication that requested
information does not exist in the accessed array.
March 2002 Releasewww.xilinx.com319Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Preface
OEA
on chip
pending
physical address
PLB
privileged mode
process
problem state
The PowerPC operating-environment architecture, which
defines the memory-management model, supervisor-level
registers and instructions, synchronization requirements, the
exception model, and the time-base resources as seen by
supervisor programs.
In system-on-chip implementations, this indicates on the same
chip as the processor core, but external to the processor core.
As applied to interrupts, this indicates that an exception
occurred, but the interrupt is disabled. The interrupt occurs
when it is later enabled.
The address used to access physically-implemented memory.
This address can be translated from the effective address. When
address translation is not used, this address is equal to the
effective address.
Processor local bus.
The operating mode typically used by system software.
Privileged operations are allowed and software can access all
registers and memory.
A program (or portion of a program) and any data required for
the program to run.
Synonym for user mode.
real address
scalar
set
sticky
string
supervisor state
system memory
tag
UISA
Synonym for physical address.
Individual data objects and instructions. Scalars are of arbitrary
size.
To write a bit value of 1.
A bit that can be set by software, but cleared only by the
processor. Alternatively, a bit that can be cleared by software,
but set only by the processor.
A sequence of consecutive bytes.
Synonym for privileged mode.
Physical memory installed in a computer system external to the
processor core, such RAM, ROM, and flash.
As applied to caches, a set of address bits used to uniquely
identify a specific cacheline within a congruence class. As
applied to TLBs, a set of address bits used to uniquely identify
a specific entry within the TLB.
The PowerPC user instruction-set architecture, which defines
the base user-level instruction set, registers, data types, the
memory model, the programming model, and the exception
model as seen by user programs.
user mode
The operating mode typically used by application software.
Privileged operations are not allowed in user mode, and
software can access a restricted set of registers and memory.
In addition to the source documents listed on page 311, the following documents contain
additional information of potential interest to readers of this manual:
•The PowerPC Architecture: A Specification for a New Family of RISC Processors, IBM
5/1994. Published by Morgan Kaufmann Publishers, Inc. San Francisco (ASIN:
1558603166).
•Book E: Enhanced PowerPC Architecture, IBM 3/2000.
•The PowerPC Compiler Writer’s Guide, IBM 1/1996. Published by Warthman Associates,
Palo Alto, CA (ISBN 0-9649654-0-2).
•Optimizing PowerPC Code : Programming the PowerPC Chip in Assembly Language, by
Gary Kacmarcik (ASIN: 0201408392)
•PowerPC Programming Pocket Book, by Steve Heath (ISBN 0750621117).
•Computer Architecture: A Quantitative Approach, by John L. Hennessy and David A.
Patterson.
•
The PowerPC virtual-environment architecture, which defines
a multi-access memory model, the cache model, cache-control
instructions, and the time-base resources as seen by user
programs.
An intermediate address used to translate an effective address
into a physical address. It consists of a process ID and the
effective address. It is only used when address translation is
enabled.
Four bytes, or 32 bits.
March 2002 Releasewww.xilinx.com321Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
The PPC405 is a 32-bit implementation of the PowerPC® embedded-environment architecture
that is derived from the PowerPC architecture. Specifically, the PPC405 is an embedded
PowerPC 405D5 processor core.
The PowerPC architecture provides a software model that ensures compatibility between
implementations of the PowerPC family of microprocessors. The PowerPC architecture
defines parameters that guarantee compatible processor implementations at the
application-program level, allowing broad flexibility in the development of derivative
PowerPC implementations that meet specific market requirements.
This chapter provides an overview of the PowerPC architecture and an introduction to the
features of the PPC405 core.
PowerPC Architecture Overview
Chapter 1
The PowerPC architecture is a 64-bit architecture with a 32-bit subset. The material in this
document only covers aspects of the 32-bit architecture implemented by the PPC405.
In general, the PowerPC architecture defines the following:
•Instruction set
•Programming model
•Memory model
•Exception model
•Memory-management model
•Time-keeping model
Instruction Set
The instruction set specifies the types of instructions (such as load/store, integer arithmetic,
and branch instructions), the specific instructions, and the encoding used for the
instructions. The instruction set definition also specifies the addressing modes used for
accessing memory.
Programming Model
The programming model defines the register set and the memory conventions, including
details regarding the bit and byte ordering, and the conventions for how data are stored.
Memory Model
The memory model defines the address-space size and how it is subdivided into pages. It
also defines attributes for specifying memory-region cacheability, byte ordering (bigendian or little-endian), coherency, and protection.
March 2002 Releasewww.xilinx.com323Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Exception Model
The exception model defines the set of exceptions and the conditions that can cause those
exceptions. The model specifies exception characteristics, such as whether they are precise
or imprecise, synchronous or asynchronous, and maskable or non-maskable. The model
defines the exception vectors and a set of registers used when interrupts occur as a result of
an exception. The model also provides memory space for implementation-specific
exceptions.
Memory-Management Model
The memory-management model defines how memory is partitioned, configured, and
protected. The model also specifies how memory translation is performed, defines special
memory-control instructions, and specifies other memory-management characteristics.
Time-Keeping Model
The time-keeping model defines resources that permit the time of day to be determined and
the resources and mechanisms required for supporting timer-related exceptions.
PowerPC Architecture Levels
These aboveaspects of the PowerPC architecture are defined at three levels . This layering
provides flexibility by allowing degrees of software compatibility across a wide range of
implementations. For example, an implementation such as an embedded controller can
support the user instruction set, but not the memory management, exception, and cache
models where it might be impractical to do so.
The three levels of the PowerPC architecture are defined in Tab le 1 -1 .
Chapter 1: Introduction to the PPC405
Table 1-1: Three Levels of PowerPC Architecture
User Instruction-Set Architecture
Virtual Environment Architecture
(UISA)
•Defines the architecture level to
which user-level (sometimes
referred to as problem state)
software should conform
•Defines the base user-level
instruction set, user-level
registers, data types, floatingpoint memory conventions,
exception model as seen by user
programs, memory model, and
the programming model
•Describes the memory model for
an environment in which
multiple devices can access
memory
•Defines aspects of the cache
model and cache-control
instructions
•Defines the time-base resources
from a user-level perspective
Note: All PowerPC implementations
adhere to the UISA.
Note: Implementations that conform
to the VEA level are guaranteed to
conform to the UISA level.
The PowerPC architecture requires that all PowerPC implementations adhere to the UISA,
offering compatibility among all PowerPC application programs. However, different
versions of the VEA and OEA are permitted.
Embedded applications written for the PPC405 are compatible with other PowerPC
implementations. Privileged software generally is not compatible. The migration of
(VEA)
Operating Environment
Architecture (OEA)
•Defines supervisor-level
resources typically required by
an operating system
•Defines the memorymanagement model, supervisorlevel registers, synchronization
requirements, and the exception
model
•Defines the time-base resources
from a supervisor-level
perspective
Note: Implementations that conform
to the OEA level are guaranteed to
conform to the UISA and VEA levels.
privileged software from the PowerPC architecture to the PPC405 is in many cases
straightforward because of the simplifications made by the PowerPC embeddedenvironment architecture. Software developers who are concerned with crosscompatibility of privileged software between the PPC405 and other PowerPC
implementations should refer to Appendix E, PowerPC
Latitude Within the PowerPC Architecture Levels
Although the PowerPC architecture defines parameters necessary to ensure compatibility
among PowerPC processors, it also allows a wide range of options for individual
implementations. These are:
•Some resources are optional, such as certain registers, bits within registers,
instructions, and exceptions.
•Implementations can define additional privileged special-purpose registers (SPRs),
exceptions, and instructions to meet special system requirements, such as power
management in processors designed for very low-power operation.
•Implementations can define many operating parameters. For example, the PowerPC
architecture can define the possible condition causing an alignment exception. A
particular implementation can choose to solve the alignment problem without
causing an exception.
•Processors can implement any architectural resource or instruction with assistance
from software (that is, they can trap and emulate) as long as the results (aside from
performance) are identical to those specified by the architecture. In this case, a
complete implementation requires both hardware and software.
•Some parameters are defined at one level of the architecture and defined more
specifically at another. For example, the UISA defines conditions that can cause an
alignment exception and the OEA specifies the exception itself.
®
6xx/7xx Compatibility.
R
Features Not Defined by the PowerPC Architecture
Because flexibility is an important feature of the PowerPC architecture, many aspects of
processor design (typically relating to the hardware implementation) are not defined,
including the following:
System-Bus Interface
Although many implementations can share similar interfaces, the PowerPC architecture
does not define individual signals or the bus protocol. For example, the OEA allows each
implementation to specify the signal or signals that trigger a machine-check exception.
Cache Design
The PowerPC architecture does not define the size, structure, replacement algorithm, or
mechanism used for maintaining cache coherency. The PowerPC architecture supports,
but does not require, the use of separate instruction and data caches.
Execution Units
The PowerPC architecture is a RISC architecture, and as such has been designed to
facilitate the design of processors that use pipelining and parallel execution units to
maximize instruction throughput. However, the PowerPC architecture does not define the
internal hardware details of an implementation. For example, one processor might
implement two units dedicated to executing integer-arithmetic instructions and another
might implement a single unit for executing all integer instructions.
Other Internal Microarchitecture Issues
The PowerPC architecture does not specify the execution unit responsible for executing a
particular instruction. The architecture does not define details regarding the instructionfetch mechanism, how instructions are decoded and dispatched, and how results are
written to registers. Dispatch and write-back can occur in-order or out-of-order. Although
March 2002 Releasewww.xilinx.com325Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 1: Introduction to the PPC405
the architecture specifies certain registers, such as the GPRs and FPRs, implementations
can use register renaming or other schemes to reduce the impact of data dependencies and
register contention.
Implementation-Specific Registers
Each implementation can have its own unique set of implementation registers that are not
defined by the architecture.
PowerPC Embedded-Environment Architecture
The PowerPC embedded-environment architecture is optimized for embedded controllers.
This architecture is a forerunner to the PowerPC Book-E architecture. The PowerPC
embedded-environment architecture provides an alternative definition for certain features
specified by the PowerPC VEA and OIA. Implementations that adhere to the PowerPC
embedded-environment architecture also adhere to the PowerPC UISA. PowerPC
embedded-environment processors are 32-bit only implementations and thus do not
include the special 64-bit extensions to the PowerPC UISA. Also, floating-point support
can be provided either in hardware or software by PowerPC embedded-environment
processors.
Figure 1-1 shows the relationship between the PowerPC embedded-environment
architecture, the PowerPC architecture, and the PowerPC Book-E architecture.
PowerPC
Embedded-Environment Architecture
32-Bit Only
VEA Enhancements
- True Little-Endian Support
- Enhanced Cache Management
OEA Enhancements
- Simplified Memory Management
- Software-Managed TLB
- Variable Page Sizes
- Interrupt Extensions
- Critical/Non-Critical
- Virtual-Memory Relocatable
- Timer Extensions
- Debug Extensions
64-Bit UISA Extensions
Synchronization Using Memory Barriers
PowerPC
Book-E Architecture
UISA
PowerPC
Architecture
32-Bit/64-Bit Modes
OEA
- Hashed Paging
- Segments, BATs
UG011_38_090701
Figure 1-1:Relationship of PowerPC Architectures
The PowerPC embedded-environment architecture features:
•Memory management optimized for embedded software environments.
•Cache-management instructions for optimizing performance and memory control in
complex applications that are graphically and numerically intensive.
•Storage attributes for controlling memory-system behavior.
•Special-purpose registers for controlling the use of debug resources, timer resources,
interrupts, real-mode storage attributes, memory-management facilities, and other
architected processor resources.
•A device-control-register address space for managing on-chip peripherals such as
memory controllers.
•A dual-level interrupt structure and interrupt-control instructions.
•Multiple timer resources.
•Debug resources that enable hardware-debug and software-debug functions such as
instruction breakpoints, data breakpoints, and program single-stepping.
Virtual Environment
The virtual environment defines architectural features that enable application programs to
create or modify code, to manage storage coherency, and to optimize memory-access
performance. It defines the cache and memory models, the timekeeping resources from a
user perspective, and resources that are accessible in user mode but are primarily used by
system-library routines. The following summarizes the virtual-environment features of the
PowerPC embedded-environment architecture:
•Storage model:
-Storage-control instructions as defined in the PowerPC virtual-environment
-Storage attributes for controlling memory-system behavior. These are: write-
-Operand-placement requirements and their effect on performance.
•The time-base function as defined by the PowerPC virtual-environment architecture,
for user-mode read access to the 64-bit time base.
R
architecture. These instructions are used to manage instruction caches and data
caches, and for synchronizing and ordering instruction execution.
through, cacheability, memory coherence (optional), guarded, and endian.
March 2002 Releasewww.xilinx.com327Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 1: Introduction to the PPC405
Operating Environment
The operating environment describes features of the architecture that enable operating
systems to allocate and manage storage, to handle errors encountered by application
programs, to support I/O devices, and to provide operating-system services. It specifies
the resources and mechanisms that require privileged access, including the memoryprotection and address-translation mechanisms, the exception-handling model, and
privileged timer resources. Tab le 1 -2 summarizes the operating-environment features of
the PowerPC embedded-environment architecture.
Table 1-2: Operating-Environment Features of the PowerPC Embedded-Environment Architecture
Operating
Environment
Register model
Storage model
Exception model
Debug model
Time-keeping model
Synchronization
requirements
Reset and initialization
requirements
Features
•Privileged special-purpose registers (SPRs) and instructions for accessing those
registers
•Device control registers (DCRs) and instructions for accessing those registers
•Privileged cache-management instructions
•Storage-attribute controls
•Address translation and memory protection
•Privileged TLB-management instructions
•Dual-level interrupt structure supporting various exception types
•Specification of interrupt priorities and masking
•Privileged SPRs for controlling and handling exceptions
•Interrupt-control instructions
•Specification of how partially executed instructions are handled when an interrupt
occurs
•Privileged SPRs for controlling debug modes and debug events
•Specification for seven types of debug events
•Specification for allowing a debug event to cause a reset
•The ability of the debug mechanism to freeze the timer resources
•64-bit time base
•32-bit decrementer (the programmable-interval timer)
•Three timer-event interrupts:
-Programmable-interval timer (PIT)
-Fixed-interval timer (FIT)
-Watchdog timer (WDT)
•Privileged SPRs for controlling the timer resources
•The ability to freeze the timer resources using the debug mechanism
•Requirements for special registers and the TLB
•Requirements for instruction fetch and for data access
•Specifications for context synchronization and execution synchronization
•Specification for two internal mechanisms that can cause a reset:
-Debug-control register (DBCR)
-Timer-control register (TCR)
•Contents of processor resources after a reset
•The software-initialization requirements, including an initialization code example
The PowerPC Book-E architecture extends the capabilities introduced in the PowerPC
embedded-environment architecture. Although not a PowerPC Book-E implementation,
many of the features available in the 32-bit subset of the PowerPC Book-E architecture are
available in the PPC405. The PowerPC Book-E architecture and the PowerPC embeddedenvironment architecture differ in the following general ways:
•64-bit addressing and 64-bit operands are available. Unlike 64-bit mode in the
PowerPC UISA, 64-bit support in PowerPC Book-E architecture is non-modal and
instead defines new 64-bit instructions and flags.
•Real mode is eliminated, and the memory-management unit is active at all times. The
elimination of real mode results in the elimination of real-mode storage-attribute
registers.
•Memory synchronization requirements are changed in the architecture and a
memory-barrier instruction is introduced.
•A small number of new instructions are added to the architecture and several
instructions are removed.
•Several SPR addresses and names are changed in the architecture, as are the
assignment and meanings of some bits within certain SPRs.
Embedded applications written for the PPC405 are compatible with PowerPC Book-E
implementations. Privileged software is, in general, not compatible, but the differences are
relatively minor. Software developers who are concerned with cross-compatibility of
privileged software between the PPC405 and PowerPC Book-E implementations should
®
refer to Appendix F, PowerPC
Book-E Compatibility.
R
PPC405 Features
The PPC405 processor core is an implementation of the PowerPC embedded-environment
architecture. The processor provides fixed-point embedded applications with high
performance at low power consumption. It is compatible with the PowerPC UISA. Much
of the PPC405 VEA and OEA support is also available in implementations of the PowerPC
Book-E architecture. Key features of the PPC405 include:
•A fixed-point execution unit fully compliant with the PowerPC UISA:
watchdog timer (All are synchronous with the time base)
-Static branch prediction
-Five-stage pipeline with single-cycle execution of most instructions, including
loads and stores
-Multiply-accumulate instructions
-Hardware multiply/divide for faster integer arithmetic (4-cycle multiply, 35-cycle
divide)
-Enhanced string and multiple-word handling
March 2002 Releasewww.xilinx.com329Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 1: Introduction to the PPC405
-Support for unaligned loads and unaligned stores to cache arrays, main memory,
and on-chip memory (OCM)
-Minimized interrupt latency
•Integrated instruction-cache:
-16 KB, 2-way set associative
-Eight words (32 bytes) per cacheline
-Fetch line buffer
-Instruction-fetch hits are supplied from the fetch line buffer
-Programmable prefetch of next-sequential line into the fetch line buffer
-Programmable prefetch of non-cacheable instructions: full line (eight words) or
half line (four words)
-Non-blocking during fetch line fills
•Integrated data-cache:
-16 KB, 2-way set associative
-Eight words (32 bytes) per cacheline
-Read and write line buffers
-Load and store hits are supplied from/to the line buffers
-Write-back and write-through support
-Programmable load and store cacheline allocation
-Operand forwarding during cacheline fills
-Non-blocking during cacheline fills and flushes
•Support for on-chip memory (OCM) that can provide memory-access performance
identical to a cache hit
•Flexible memory management:
-Translation of the 4 GB logical-address space into the physical-address space
-Independent control over instruction translation and protection, and data
translation and protection
-Page-level access control using the translation mechanism
-Software control over the page-replacement strategy
-Write-through, cacheability, user-defined 0, guarded, and endian (WIU0GE)
storage-attribute control for each virtual-memory region
-WIU0GE storage-attribute control for thirty-two 128 MB regions in real mode
-Additional protection control using zones
•Enhanced debug support with logical operators:
-Four instruction-address compares
-Two data-address compares
-Two data-value compares
-JTAG instruction for writing into the instruction cache
-Forward and backward instruction tracing
•Advanced power management support
Privilege Modes
Software running on the PPC405 can do so in one of two privilege modes: privilieged and
user. The privilege modes supported by the PPC405 are described in Processor Operating
Privileged mode allows programs to access all registers and execute all instructions
supported by the processor. Normally, the operating system and low-level device drivers
operate in this mode.
User Mode
User mode restricts access to some registers and instructions. Normally, application
programs operate in this mode.
The PPC405 also supports two modes of address translation: real and virtual. Refer to
Chapter 6, Virtual-Memory Management, for more information on address translation.
Real Mode
In real mode, programs address physical memory directly.
Virtual Mode
In virtual mode, programs address virtual memory and virtual-memory addresses are
translated by the processor into physical-memory addresses. This allows programs to
access much larger address spaces than might be implemented in the system.
Addressing Modes
Whether the PPC4 05 is running in real mode or virtual mode, data addressing is supported
by the load and store instructions using one of the following addressing modes:
•Register-indirect with immediate index—A base address is stored in a register, and a
displacement from the base address is specified as an immediate value in the
instruction.
•Register-indirect with index—A base address is stored in a register, and a
displacement from the base address is stored in a second register.
•Register indirect—The data address is stored in a register.
Instructions that use the two indexed forms of addressing also allow for automatic updates
to the base-address register. With these instruction forms, the new data address is
calculated, used in the load or store data access, and stored in the base-address register.
The data-addressing modes are described in Operand-Address Calculation, page 378.
With sequential-instruction execution, the next-instruction address is calculated by adding
four bytes to the current-instruction address. In the case of branch instructions, however,
the next-instruction address is determined using one of four branch-addressing modes:
•Branch to relative—The next-instruction address is at a location relative to the currentinstruction address.
•Branch to absolute—The next-instruction address is at an absolute location in
memory.
•Branch to link register—The next-instruction address is stored in the link register.
•Branch to count register—The next-instruction address is stored in the count register.
The branch-addressing modes are described in Branch-Target Address Calculation,
page 372.
Data Types
PPC405 instructions support byte, halfword, and word operands. Multiple-word operands
are supported by the load/store multiple instructions and byte strings are supported by
March 2002 Releasewww.xilinx.com331Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
the load/store string instructions. Integer data are either signed or unsigned, and signed
data is represented using two’s-complement format.
The address of a multi-byte operand is determined using the lowest memory address
occupied by that operand. For example, if the four bytes in a word operand occupy
addresses 4, 5, 6, and 7, the word address is 4. The PPC405 supports both big-endian (an
operand’s most-significant byte is at the lowest memory address) and little-endian (an
operand’s least-significant byte is at the lowest memory address) addressing.
See Operand Conventions, page 347, for more information on the supported data types
and byte ordering.
Register Set Summary
Figure 1-2, page 333 shows the registers contained in the PPC405. Descriptions of the
The processor contains thirty-two 32-bit general-purpose registers (GPRs), identified as r0
through r31. The contents of the GPRs are read from memory using load instructions and
written to memory using store instructions. Computational instructions often read
operands from the GPRs and write their results in GPRs. Other instructions move data
between the GPRs and other registers. GPRs can be accessed by all software. See General-
Purpose Registers (GPRs), page 360, for more information.
March 2002 Releasewww.xilinx.com333Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Special-Purpose Registers
The processor contains a number of 32-bit special-purpose registers (SPRs). SPRs provide
access to additional processor resources, such as the count register, the link register, debug
resources, timers, interrupt registers, and others. Most SPRs are accessed only by
privileged software, but a few, such as the count register and link register, are accessed by
all software. See User Registers, page 359, and Privileged Registers, page 429 for more
information.
Machine-State Register
The 32-bit machine-state register (MSR) contains fields that control the operating state of the
processor. This register can be accessed only by privileged software. See Machine-State
Register, page 431, for more information.
Condition Register
The 32-bit condition register (CR) contains eight 4-bit fields, CR0–CR7. The values in the CR
fields can be used to control conditional branching. Arithmetic instructions can set CR0
and compare instructions can set any CR field. Additional instructions are provided to
perform logical operations and tests on CR fields and bits within the fields. The CR can be
accessed by all software. See Condition Register (CR), page 361, for more information.
Device Control Registers
Chapter 1: Introduction to the PPC405
The 32-bit device control registers (not shown) are used to configure, control, and report
status for various external devices that are not part of the PPC405 processor. Although the
DCRs are not part of the PPC405 implementation, they are accessed using the mtdcr and
mfdcr instructions. The DCRs can be accessed only by privileged software. See the PPC405
Processor Block Manual for more information on implementing DCRs.
PPC405 Organization
As shown in Figure 1-3, the PPC405 processor contains the following elements:
•A 5-stage pipeline consisting of fetch, decode, execute, write-back, and load writeback stages
•A virtual-memory-management unit that supports multiple page sizes and a variety
of storage-protection attributes and access-control options
•Separate instruction-cache and data-cache units
•Debug support, including a JTAG interface
•Three programmable timers
The following sections provide an overview of each element.
The PPC405 central-processing unit (CPU) implements a 5-stage instruction pipeline
consisting of fetch, decode, execute, write-back, and load write-back stages.
The fetch and decode logic sends a steady flow of instructions to the execute unit. All
instructions are decoded before they are forwarded to the execute unit. Instructions are
queued in the fetch queue if execution stalls. The fetch queue consists of three elements:
two prefetch buffers and a decode buffer. If the prefetch buffers are empty instructions
flow directly to the decode buffer.
Up to two branches are processed simultaneously by the fetch and decode logic. If a branch
cannot be resolved prior to execution, the fetch and decode logic predicts how that branch
is resolved, causing the processor to speculatively fetch instructions from the predicted
path. Branches with negative-address displacements are predicted as taken, as are
branches that do not test the condition register or count register. The default prediction can
be overridden by software at assembly or compile tim e. This capability is described further
in Branch Prediction, page 370.
The PPC405 has a single-issue execute unit containing the general-purpose register file
(GPR), arithmetic-logic unit (ALU), and the multiply-accumulate unit (MAC). The GPRs
consist of thirty-two 32-bit registers that are accessed by the execute unit using three read
ports and two write ports. During the decode stage, data is read out of the GPRs for use by
the execute unit. During the write-back stage, results are written to the GPR. The use of five
read/write ports on the GPRs allows the processor to execute load/store operations in
parallel with ALU and MAC operations.
Trace
UG011_29_033101
March 2002 Releasewww.xilinx.com335Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
The execute unit supports all 32-bit PowerPC UISA integer instructions in hardware, and is
compliant with the PowerPC embedded-environment architecture specification. Floatingpoint operations are not supported.
The MAC unit supports implementation-specific multiply-accumulate instructions and
multiply-halfword instructions. MAC instructions operate on either signed or unsigned
16-bit operands, and they store their results in a 32-bit GPR. These instructions can
produce results using either modulo arithmetic or saturating arithmetic. All MAC
instructions have a single cycle throughput. See Multiply-Accumulate Instruction-Set
Extensions, page 405 for more information.
Exception Handling Logic
Exceptions are divided into two classes: critical and noncritical. The PPC405 CPU services
exceptions caused by error conditions, the internal timers, debug events, and the external
interrupt controller (EIC) interface. Across the two classes, a total of 19 possible exceptions
are supported, including the two provided by the EIC interface.
Each exception class has its own pair of save/restore registers. SRR0 and SRR1 are used for
noncritical interrupts, and SRR2 and SRR3 are used for critical interrupts. The exceptionreturn address and the machine state are written to these registers when an exception
occurs, and they are automatically restored when an interrupt handler exits using the
return-from-interrupt (rfi) or return-from critical-interrupt (rfci) instruction. Use of
separate save/restore registers allows the PPC405 to handle critical interrupts
independently of noncritical interrupts.
See Chapter 7, Exceptions and Interrupts, for information on exception handling in the
PPC405.
Chapter 1: Introduction to the PPC405
Memory Management Unit
The PPC405 supports 4 GB of flat (non-segmented) address space. The memorymanagement unit (MMU) provides address translation, protection functions, and storageattribute control for this address space. The MMU supports demand-paged virtual
memory using multiple page sizes of 1 KB, 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB and
16 MB. Multiple page sizes can improve memory efficiency and minimize the number of
TLB misses. When supported by system software, the MMU provides the following
functions:
•Translation of the 4 GB logical-address space into a physical-address space.
•Independent enabling of instruction translation and protection from that of data
translation and protection.
•Page-level access control using the translation mechanism.
•Software control over the page-replacement strategy.
•Additional protection control using zones.
•Storage attributes for cache policy and speculative memory-access control.
The translation look-aside buffer (TLB) is used to control memory translation and
protection. Each one of its 64 entries specifies a page translation. It is fully associative, and
can simultaneously hold translations for any combination of page sizes. To prevent TLB
contention between data and instruction accesses, a 4-entry instruction and an 8-entry data
shadow-TLB are maintained by the processor transparently to software.
Software manages the initialization and replacement of TLB entries. The PPC405 includes
instructions for managing TLB entries by software running in privileged mode. This
capability gives significant control to system software over the implementation of a page
replacement strategy. For example, software can reduce the potential for TLB thrashing or
delays associated with TLB-entry replacement by reserving a subset of TLB entries for
globally accessible pages or critical pages.
Storage attributes are provided to control access of memory regions. When memory
translation is enabled, storage attributes are maintained on a page basis and read from the
TLB when a memory access occurs. When memory translation is disabled, storage
attributes are maintained in storage-attribute control registers. A zone-protection register
(ZPR) is provided to allow system software to override the TLB access controls without
requiring the manipulation of individual TLB entries. For example, the ZPR can provide a
simple method for denying read access to certain application programs.
Chapter 6, Virtual-Memory Management, describes these memory-management
resources in detail.
Instruction and Data Caches
The PPC405 accesses memory through the instruction-cache unit (ICU) and data-cache
unit (DCU). Each cache unit includes a PLB-master interface, cache arrays, and a cache
controller. Hits into the instruction cache and data cache appear to the CPU as single-cycle
memory accesses. Cache misses are handled as requests over the PLB bus to another PLB
device, such as an external-memory controller.
The PPC405 implements separate instruction-cache and data-cache arrays. Each is 16 KB in
size, is two-way set-associative, and operates using 8-word (32 byte) cachelines. The caches
are non-blocking, allowing the PPC405 to overlap instruction execution with reads over
the PLB (when cache misses occur).
The cache controllers replace cachelines according to a least-recently used (LRU)
replacement policy. When a cacheline fill occurs, the most-recently accessed line in the
cache set is retained and the other line is replaced. The cache controller updates the LRU
during a cacheline fill.
The ICU supplies up to two instructions every cycle to the fetch and decode unit. The ICU
can also forward instructions to the fetch and decode unit during a cacheline fill,
minimizing execution stalls caused by instruction-cache misses. When the ICU is accessed,
four instructions are read from the appropriate cacheline and placed temporarily in a line
buffer. Subsequent ICU accesses check this line buffer for the requested instruction prior to
accessing the cache array. This allows the ICU cache array to be accessed as little as once
every four instructions, significantly reducing ICU power consumption.
The DCU can independently process load/store operations and cache-control instructions.
The DCU can also dynamically reprioritize PLB requests to reduce the length of an
execution stall. For example, if the DCU is busy with a low-priority request and a
subsequent storage operation requested by the CPU is stalled, the DCU automatically
increases the priority of the current (low-priority) request. The current request is thus
finished sooner, allowing the DCU to process the stalled request sooner. The DCU can
forward data to the execute unit during a cacheline fill, further minimizing execution stalls
caused by data-cache misses.
Additional features allow programmers to tailor data-cache performance to a specific
application. The DCU can function in write-back or write-through mode, as determined by
the storage-control attributes. Loads and stores that do not allocate cachelines can also be
specified. Inhibiting certain cacheline fills can reduce potential pipeline stalls and
unwanted external-bus traffic.
See Chapter 5, Memory-System Management, for details on the operation and control of
the PPC405 caches.
Timer Resources
The PPC405 contains a 64-bit time base and three timers. The time base is incremented
synchronously using the CPU clock or an external clock source. The three timers are
incremented synchronously with the time base. (See Chapter 8, Timer Resources, for more
information on these features.) The three timers supported by the PPC405 are:
•Programmable Interval Timer
•Fixed Interval Timer
•Watc h dog Ti m er
March 2002 Releasewww.xilinx.com337Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 1: Introduction to the PPC405
Programmable Interval Timer
The programmable interval timer (PIT) is a 32-bit register that is decremented at the time-base
increment frequency. The PIT register is loaded with a delay value. When the PIT count
reaches 0, a PIT interrupt occurs. Optionally, the PIT can be programmed to automatically
reload the last delay value and begin decrementing again.
Fixed Interval Timer
The fixed interval timer (FIT) causes an interrupt when a selected bit in the time-base register
changes from 0 to 1. Programmers can select one of four predefined bits in the time-base
for triggering a FIT interrupt.
Watchdog Timer
The watchdog timer causes a hardware reset when a selected bit in the time-base register
changes from 0 to 1. Programmers can select one of four predefined bits in the time-base
for triggering a reset, and the type of reset can be defined by the programmer.
Note: The time-base register alone does not cause interrupts to occur.
Debug
The PPC405 debug resources include special debug modes that support the various types
of debugging used during hardware and software development. These are:
•Internal-debug mode for use by ROM monitors and software debuggers
•External-debug mode for use by JTAG debuggers
•Debug-wait mode, which allows the servicing of interrupts while the processor appears
to be stopped
•Real-time trace mode, which supports event triggering for real-time tracing
Debug events are supported that allow developers to manage the debug process. Debug
modes and debug events are controlled using debug registers in the processor. The debug
registers are accessed either through software running on the processor or through the
JTAG port. The JTAG port can also be used for board tests.
The debug modes, events, controls, and interfaces provide a powerful combination of
debug resources for hardware and software development tools. Chapter 9, Debugging,
describes these resources in detail.
PPC405 Interfaces
The PPC405 provides a set of interfaces that supports the attachment of cores and user
logic. The software resources used to manage the PPC405 interfaces are described in the
Core-Configuration Register, page 459 . For information on the hardware operation, use,
and electrical characteristics of these interfaces, refer to the PPC405 Processor Block
Manual. The following interfaces are provided:
•Processor local bus interface
•Device control register interface
•Clock and power management interface
•JTAG port interface
•On-chip interrupt controller interface
•On-chip memory controller interface
Processor Local Bus
The processor local bus (PLB) interface provides a 32-bit address and three 64-bit data buses
attached to the instruction-cache and data-cache units. Two of the 64-bit buses are attached
to the data-cache unit, one supporting read operations and the other supporting write
operations. The third 64-bit bus is attached to the instruction-cache unit to support
instruction fetching.
The device control register (DCR) bus interface supports the attachment of on-chip registers
for device control. Software can access these registers using the mfdcr and mtdcr
instructions.
Clock and Power Management
The clock and power-management interface supports several methods of clock distribution
and power management.
JTAG Port
The JTAG port interface supports the attachment of external debug tools. Using the JTAG
test-access port, a debug tool can single-step the processor and examine internal-processor
state to facilitate software debugging. This capability complies with the IEEE 1149.1
specification for vendor-specific extensions, and is therefore compatible with standard
JTAG hardware for boundary-scan system testing.
On-Chip Interrupt Controller
The on-chip interrupt controllerinterface is an external interrupt controller that combines
asynchronous interrupt inputs from on-chip and off-chip sources and presents them to the
core using a pair of interrupt signals (critical and noncritical). Asynchronous interrupt
sources can include external signals, the JTAG and debug units, and any other on-chip
peripherals.
On-Chip Memory Controller
An on-chip memory (OCM) interface supports the attachment of additional memory to the
instruction and data caches that can be accessed at performance levels matching the cache
arrays.
March 2002 Releasewww.xilinx.com339Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
This chapter describes the operational concepts governing the PPC405 programming
model. These concepts include the execution and memory-access models, processor
operating modes, memory organization and management, and instruction conventions.
Execution Model
From a software viewpoint, PowerPC® processors implement a sequential-execution model.
That is, the processors appear to execute instructions in program order. Internally and
invisible to software, PowerPC processors can execute instructions out-of-order and can
speculatively execute instructions. The processor is responsible for maintaining an inorder execution state visible to software. The execution of an instruction sequence can be
interrupted by an exception caused by one of the executing instructions or by an
asynchronous event. The PPC405 does not support out-of-order instruction execution.
However, the processor does support speculative instruction execution, typically by
predicting the outcome of branch instructions.
As described in Ordering Memory Accesses, page 448, the PowerPC architecture specifies
a weakly consistent memory model for shared-memory multiprocessor systems. The
weakly consistent memory model allows system bus operations to be reordered
dynamically. The goal of reordering bus operations is to reduce the effect of memory
latency and improving overall performance. In single-processor systems, loads and stores
can be reordered dynamically to allow efficient utilization of the processor bus. Loads can
be performed speculatively to enhance the speculative-execution capabilities. This model
provides an opportunity for significantly improved performance over a model that has
stronger memory-consistency rules, but places the responsibility for access ordering on the
programmer.
When a program requires strict instruction-execution ordering or memory-access ordering
for proper execution, the programmer must insert the appropriate ordering or
synchronization instructions into the program. These instructions are described in
Synchronizing Instructions, page 424. The concept of synchronization is described in the
Synchronization Operations section that follows.
The PPC405 supports many aspects of the weakly consistent model but not all of them.
Specifically, the PPC405 does not provide hardware support for multiprocessor memory
coherency and does not support speculative loads. If the order of memory accesses is
important to the correct operation of a program, care must be taken in porting such a
program from the PPC405 to a processor that supports multiprocessor memory coherency
and speculative loads.
Chapter 2
March 2002 Releasewww.xilinx.com341Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Synchronization Operations
Various forms of synchronizing operations can be used by programs executing on the
PPC405 processor to control the behavior of instruction execution and memory accesses.
Synchronizing operations fall into the following three categories:
•Context synchronization
•Execution synchronization
•Storage synchronization
Each synchronization category is described in the following sections. Instructions
provided by the PowerPC architecture for synchronization purposes are described on
page 424.
Context Synchronization
The state of the execution environment (privilege level, translation mode, and memory
protection) defines a program’s context. An instruction or event is context synchronizing if
the operation satisfies all of the following conditions:
•Instruction dispatch is halted when the operation is recognized by the processor. This
means the instruction-fetch mechanism stops issuing (sending) instructions to the
execution units.
•The operation is not initiated (for instructions, this means dispatched) until all prior
instructions complete execution to a point where they report any exceptions they
cause to occur. In the case of an instruction-synchronize (isync) instruction, the isync
does not complete execution until all prior instructions complete execution to a point
where they report any exceptions they cause to occur.
•All instructions that precede the operation complete execution in the context they
were initiated. This includes privilege level, translation mode, and memory
protection.
•All instructions following the operation complete execution in the new context
established by the operation.
•If the operation is an exception, or directly causes an exception to occur (for example,
the sc instruction causes a system-call exception), the operation is not initiated until
all higher-priority exceptions are recognized by the exception mechanism.
The system-call instruction (sc), return-from-interrupt instructions (rfi and rfci), and most
exceptions are examples of context-synchronizing operations.
Context-synchronizing operations do not guarantee that subsequent memory accesses are
performed using the memory context established by previous instructions. When
memory-access ordering must be enforced, storage-synchronizing instructions are
required.
Chapter 2: Operational Concepts
Execution Synchronization
An instruction is execution synchronizing if it satisfies the conditions of the first two items
(as described above) for context synchronization:
•Instruction dispatch is halted when the operation is recognized by the processor. This
means the instruction-fetch mechanism stops issuing (sending) instructions to the
execution units.
•The operation is not initiated until all instructions in execution complete to a point
where they report any exceptions they cause to occur. In the case of a synchronize
(sync) instruction, the sync does not complete execution until all prior instructions
complete execution to a point where they report any exceptions they cause to occur.
The sync and move-to machine-state register (mtmsr) instructions are examples of executionsynchronizing instructions.
All context-synchronizing instructions are execution synchronizing. However, unlike a
context-synchronizing operation, there is no guarantee that subsequent instructions
execute in the context established by an execution-synchronizing instruction. The new
context becomes effective sometime after the execution-synchronizing instruction
completes and before or during a subsequent context-synchronizing operation.
Storage Synchronization
The PowerPC architecture specifies a weakly consistent memory model for sharedmemory multiprocessor systems. With this model, the order that the processor performs
memory accesses, the order that those accesses complete in memory, and the order that
those accesses are viewed as occurring by another processor can all differ. The PowerPC
architecture supports storage-synchronizing operations that provide a capability for
enforcing memory-access ordering, allowing programs to share memory. Support is also
provided to allow programs executing on a processor to share memory with some other
mechanism that can access memory, such as an I/O device.
Device control registers (DCRs) are treated as memory-mapped registers from a
synchronization standpoint. Storage-synchronization operations must be used to enforce
synchronization of DCR reads and writes.
Processor Operating Modes
R
The PowerPC architecture defines two levels of privilege, each with an associated
processor operating mode:
•Privileged mode
•User mode
The processor operating mode is controlled by the privilege-level field in the machine-state
register (MSR[PR]). When MSR[PR] = 0, the processor operates in privileged mode. When
MSR[PR] = 1, the processor operates in user mode. MSR[PR] = 0 following reset, placing
the processor in privileged mode. See Machine-State Register, page 431 for more
information on this register.
Attempting to execute a privileged instruction when in user mode causes a privilegedinstruction program exception (see Program Interrupt (0x0700), page 511).
Throughout this book, the terms privileged and system are used interchangeably to refer to
software that operates under the privileged-programming model. Likewise, the terms user
and application are used to refer to software that operates under the user-programming
model. Registers and instructions are defined as either privileged or user, indicating which
of the two programming models they belong to. User registers and user instructions
belong to both the user-programming and privileged-programming models.
Privileged Mode
Privileged mode allows programs to access all registers and execute all instructions
supported by the processor. The privileged-programming model comprises the entire register
set and instruction set supported by the PPC405. Operating systems are typically the only
software that runs in privileged mode.
The registers available only in privileged mode are shown in Figure 4-1, page 430. Refer to
the corresponding section describing each register for more information. The instructions
available only in privileged mode are shown in Ta b le 4 - 3, pa g e 43 4 . The operation of each
instruction is described in Chapter 11, Instruction Set.
Privileged mode is sometimes referred to as supervisor state.
March 2002 Releasewww.xilinx.com343Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
User Mode
User mode restricts access to some registers and instructions. The user-programming model
comprises the register set and instruction set supported by the processor running in user
mode, and is a subset of the privileged-programming model. Operating systems typically
confine the execution of application programs to user mode, thereby protecting system
resources and other software from the effects of errant applications.
The registers available in user mode are shown in Figure 3-1, page 360. Refer to the
corresponding section in Chapter 3 for a description of each register. All instructions are
available in user mode except as shown in Table 4 - 3, p age 4 34.
User mode is sometimes referred to as problem state.
Memory Organization
PowerPC programs reference memory using an effective address computed by the
processor when executing a load, store, branch, or cache-control instruction, and when
fetching the next-sequential instruction. Depending on the address-relocation mode, this
effective address is either used to directly access physical memory or is treated as a virtual
address that is translated into physical memory.
Effective-Address Calculation
Chapter 2: Operational Concepts
Programs reference memory using an effective address (also called a logical address). An
effective address (EA) is the 32-bit unsigned sum computed by the processor when
accessing memory, executing a branch instruction, or fetching the next-sequential
instruction. An EA is often referred to as the next-instruction address (NIA) when it is used
to fetch an instruction (sequentially or as the result of a branch). The input values and
method used by the processor to calculate an EA depend on the instruction that is
executed.
When accessing data in memory, effective addresses are calculated in one of the following
ways:
•EA = (rA|0)—this is referred to as register-indirect addressing.
•EA = (rA|0) + offset—this is referred to as register-indirect with immediate-index
addressing.
•EA = (rA|0) + (rB)—this is referred to as register-indirect with index addressing.
Note: In the above, the notation (rA|0) specifies the following:
If the rA instruction field is 0, the base address is 0.
If the rA instruction field is not 0, the contents of register rA are used as the base address.
When instructions execute sequentially, the next-instruction effective address is the
current-instruction address (CIA) + 4. This is because all instructions are four bytes long.
When branching to a new address, the next-instruction effective address is calculated in
one of the following ways:
•NIA = CIA + displacement—this is referred to as branch-to-relative addressing.
•NIA = displacement—this is referred to as branch-to-absolute addressing.
•NIA = (LR)—this is referred to as branch to link-register addressing.
•NIA = (CTR)—this is referred to as branch to count-register addressing.
When the NIA is calculated for a branch instruction, the two low-order bits (30:31) are
always cleared to 0, forcing word-alignment of the address. This is true even when the
address is contained in the LR or CR, and the register contents are not word-aligned.
All effective-address computations are performed by the processor using unsigned binary
arithmetic. Carries from bit 0 are ignored and the effective address wraps from the
Physical memory represents the address space of memory installed in a computer system,
including memory-mapped I/O devices. Generally, the amount of physical memory
actually available in a system is smaller than that supported by the processor. When
address translation is supported by the operating system—as it is in virtual-memory
systems—the very-large virtual-address space is translated into the smaller physicaladdress space using the memory-management resources supported by the processor.
The PPC405 supports up to four gigabytes of physical memory using a 32-bit physical
address. A hierarchical-memory system involving external (system) memory and the
caches internal to the processor are employed to support that address space. The PPC405
supports separate level-1 (L1) caches for instructions and data. The operation and control
of these caches is described in Chapter 5, Memory-System Management.
Virt u al m e mory is a relocatable address space that is generally larger than the physicalmemory space installed in a computer system. Operating systems relocate (map)
applications and data in virtual memory so it appears that more memory is available than
actually exists. Virtual memory software moves unused instructions and data between
physical memory and external storage devices (such as a hard drive) when insufficient
physical memory is available. The PPC405 supports a 40-bit virtual address that allows
privileged software to manage a one-terabyte virtual-memory space.
Memory Management
Memory management describes the collection of mechanisms used to translate the addresses
generated by programs into physical-memory addresses. Memory management also
consists of the mechanisms used to characterize memory-region behavior, also referred to
as storage control. Memory management is performed by privileged-mode software and is
completely transparent to user-mode programs running in virtual mode.
The PPC405 is a PowerPC embedded-environment implementation. The memorymanagement resources defined by the PowerPC embedded-environment architecture (and
its successor, the PowerPC Book-E architecture) differ significantly from the resources
defined by the PowerPC architecture. The resources defined by the PowerPC embedded
environment architecture are well-suited for the special requirements of embedded-system
applications. The resources defined by the PowerPC architecture better meet the
requirements of desktop and commercial-workstation systems.
Generally, the differences between the two memory-management mechanisms are as
follows:
•The PPC405 supports software page translation and provides special instructions for
managing the page tables and the translation look-aside buffer (TLB) internal to the
processor. The page-translation table format, organization, and search algorithms are
software-dependent and transparent to the PPC405 processor. The PowerPC
architecture, on the other hand, defines the page-translation table organization,
format, and search algorithms. It does not define support for the special page table
and TLB instructions but instead assumes the processor hardware is responsible for
searching page tables and updating the TLB.
•The PPC405 supports variable-sized pages. The PowerPC architecture defines fixed-size
pages of 4 KB.
•The PPC405 does not support the segment-translation mechanism defined by the
PowerPC architecture.
•The PPC405 does not support the block-address-translation (BAT) mechanism defined
by the PowerPC architecture.
March 2002 Releasewww.xilinx.com345Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 2: Operational Concepts
•Additional storage-control attributes not defined by the PowerPC architecture are
supported by the PPC405. The methods for using these attributes to characterize
memory regions also differ.
At a high level, Figure 2-1 shows the differences between 32-bit memory management in
the PowerPC embedded-environment architecture (and PowerPC Book-E architecture)
and in the PowerPC architecture. See Chapter 6, Virtual-Memory Management for more
information on the resources supported by the PPC405. Additional information on the
®
differences with the PowerPC architecture is described in Appendix E, PowerPC
6xx/7xx
Compatibility. PowerPC Book-E architecture extends the resources first defined by the
PowerPC embedded-environment architecture. A description of those extensions is in
®
Appendix F, PowerPC
Book-E Compatibility.
PowerPC Embedded Environment
PowerPC Book-E
32-Bit Effective Address
PID
40-Bit Virtual Address
Page
Translation
32-Bit Physical Address
PowerPC Architecture
32-Bit Effective Address
Segment
Translation
51-Bit Virtual Address
Page
Translation
32-Bit Physical Address
Block
Address
Translation
UG011_13_033101
Figure 2-1:PowerPC 32-Bit Memory Management
Addressing Modes
Programs can use 32-bit effective addresses to reference the 4 GB physical-address space
using one of two addressing modes:
•Real mode
•Virtual mode
Real mode and virtual mode are enabled and disabled independently for instruction
fetches and data accesses. The instruction-fetch address mode is controlled using the
instruction-relocate (IR) field in the machine-state register (MSR). When MSR[IR] = 0,
instruction fetches are performed in real mode. When MSR[IR] = 1, instruction fetches are
performed in virtual mode. Similarly, the data-access address mode is controlled using the
data-relocate (DR) field in the MSR. When MSR[DR] = 0, data accesses are performed in
real mode. Setting MSR[DR] = 1 enables virtual mode for data accesses. See Virtual Mode,
In real mode, an effective address is used directly as the physical address into the 4 GB
address space. Here, the logical-address space is mapped directly onto the physicaladdress space.
Virtual Mode
In virtual mode, address translation is enabled. Effective addresses are translated into
physical addresses using the memory-management unit, as shown in Figure 2-1, page 346.
In this mode, pages within the logical-address space are mapped onto pages in the
physical-address space. An overview of memory management is provided in the following
section.
Operand Conventions
Bit positions within registers and memory operands (bytes, halfwords, and words) are
numbered consecutively from left to right, starting with zero. The most-significant bit is
always numbered 0. The number assigned to the least-significant bit depends on the size of
the register or memory operand, as follows:
•Byte—the least-significant bit is numbered 7.
•Halfword—the least-significant bit is numbered 15.
•Wo rd —the least-significant bit is numbered 31.
A bit set to 1 has a numerical value associated with its position (b) relative to the leastsignificant bit (lsb). This value is equal to 2(lsb-b). For example, if bit 5 is set to 1 in a byte,
halfword, or word memory operand, its value is determined as follows:
•Byte—the value is 2(7-5), or 4 .
•Halfword—the value is 2(15-5), or 1024 .
•Wo rd —the value is 2(31-5), or 67108864 .
Bytes in memory are addressed consecutively starting with zero. The PPC405 supports
both big-endian and little-endian byte ordering, with big-endian being the default byte
ordering. Bit ordering within bytes and registers is always big endian.
The operand length is implicit for each instruction. Memory operands can be bytes (eight
bits), halfwords (two bytes), words (four bytes), or strings (one to 128 bytes). For the
load/store multiple instructions, memory operands are a sequence of words. The address
of any memory operand is the address of its first byte (that is, of its lowest-numbered byte).
Figure 2-2 shows how word, halfword, and byte operands appear in memory (using big-
endian ordering) and in a register. The memory operand appears on the left in this diagram
and the equivalent register representation appears on the right.
The following sections describe the concepts of byte ordering and data alignment, and
their significance to the PowerPC PPC405.
R
March 2002 Releasewww.xilinx.com347Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
The order that addresses are assigned to individual bytes within a scalar (a single data
object or instruction) is referred to as endianness. Halfwords, words, and doublewords all
consist of more than one byte, so it is important to understand the relationship between the
bytes in a scalar and the addresses of those bytes. For example, when the processor loads a
register with a value from memory, it needs to know which byte in memory holds the highorder byte, which byte holds the next-highest-order byte, and so on.
Computer systems generally use one of the following two byte orders to address data:
•Big-endian ordering assigns the lowest-byte address to the highest-order (“left-most”)
byte in the scalar. The next sequential-byte address is assigned to the next-highest
byte, and so on. The term “big endian” is used because the “big end” of the scalar
(when considered as a binary number) comes first in memory.
•Little-endian ordering assigns the lowest-byte address to the lowest-order (“right-
most”) byte in the scalar. The next sequential-byte address is assigned to the nextlowest byte, and so on. The term “little endian” is used because the “little end” of the
scalar (when considered as a binary number) comes first in memory.
The following sections further describe the differences between big-endian and littleendian byte ordering. The default byte ordering assumed by the PPC405 is big-endian.
However, the PPC405 also fully supports little-endian peripherals and memory.
The following C language structure, s, contains an assortment of scalars and a character
string. The comments show the values assumed in each structure element. These values
show how the bytes comprising each structure element are mapped into memory.
struct {
int a;/* 0x1112_1314 word */
long long b; /* 0x2122_2324_2526_2728 doubleword */
char *c;/* 0x3132_3334 word */
char d[7];/* ’A’,’B’,’C’,’D’,’E’,’F’,’G’ array of bytes */
short e;/* 0x5152 halfword */
int f;/* 0x6162_6364 word */
} s;
C structure-mapping rules permit the use of padding (skipped bytes) to align scalars on
desirable boundaries. The structure-mapping examples show how each scalar aligns on its
natural boundary (the alignment boundary is equal to the scalar size). This alignment
introduces padding of four bytes between a and b, one byte between d and e, and two bytes
between e and f. The same amount of padding is present in both big-endian and littleendian mappings.
March 2002 Releasewww.xilinx.com349Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 2: Operational Concepts
Big-Endian Mapping
The big-endian mapping of structure s follows. The contents of each byte, as defined in
structure s, is shown as a (hexadecimal) number or character (for the string elements). Data
addresses (in hexadecimal) are shown below the corresponding data value.
11121314
0x000x010x020x030x040x050x060x07
2122232425262728
0x080x090x0A0x0B0x0C0x0D0x0E0x0F
31323334’A’’B’’C’’D’
0x100x110x120x130x140x150x160x17
’E’’F’’G’5152
0x180x190x1A0x1B0x1C0x1D0x1E0x1F
61626364
0x200x210x220x230x240x250x260x27
Little-Endian Mapping
The little-endian mapping of structure s follows.
14131211
0x000x010x020x030x040x050x060x07
2827262524232221
0x080x090x0A0x0B0x0C0x0D0x0E0x0F
34333231’A’’B’’C’’D’
0x100x110x120x130x140x150x160x17
’E’’F’’G’5251
0x180x190x1A0x1B0x1C0x1D0x1E0x1F
64636261
0x200x210x220x230x240x250x260x27
Little-Endian Byte Ordering Support
Except as noted, this book describes the processor from the perspective of big-endian
operations. However, the PPC405 processor also fully supports little-endian operations.
This support is provided by the endian (E) storage attribute described in the following
sections. The endian-storage attribute is defined by both the PowerPC embeddedenvironment architecture and PowerPC Book-E architecture.
Little-endian mode, defined by the PowerPC architecture, is not implemented by the PPC405.
Little-endian mode does not support true little-endian memory accesses. This is because
little-endian mode modifies memory addresses rather than reordering bytes as they are
accessed. Memory-address modification restricts how the processor can access misaligned
data and I/O. The PPC405 little-endian support does not have these restrictions.
The endian (E) storage attribute allows the PPC405 to support direct connection of littleendian peripherals and memory containing little-endian instructions and data. An E
storage attribute is associated with every memory reference—instruction fetch, data load,
and data store. The E attribute specifies whether the memory region being accessed should
be interpreted as big endian (E = 0) or little endian (E = 1).
If virtual mode is enabled (MSR[IR] = 1 or MSR[DR] = 1), the E field in the corresponding
TLB entry defines the endianness of a memory region. When virtual mode is disabled
(MSR[IR] = 0 and MSR[DR] = 0), the SLER defines the endianness of a memory region. See
Chapter 6, Virtual-Memory Management for more information on virtual memory, and Storage Little-Endian Register (SLER), page 455 for more information on the SLER.
When a memory region is defined as little endian, the processor accesses those bytes as if
they are arranged in true little-endian order. Unlike the little-endian mode defined by the
PowerPC architecture, no address modification is performed when accessing memory
regions designated as little endian. Instead, the PPC405 reorders the bytes as they are
transferred between the processor and memory.
On-the-fly reversal of bytes in little-endian memory regions is handled in one of two ways,
depending on whether the memory access is an instruction fetch or a data access (load or
store). The following sections describe byte reordering for both types of memory accesses.
Little-Endian Instruction Fetching
Instructi ons are word (four-byte) data types th at are always aligned on word boundaries i n
memory. Instructions stored in a big-endian memory region are arranged with the mostsignificant byte (MSB) of the instruction word at the lowest byte address.
Consider the big-endian mapping of instruction p at address 0x00, where, for example, p is
an add r7,r7,r4 instruction (instruction opcode bytes are shown in hexadecimal on top,
with the corresponding byte address shown below):
MSBLSB
7CE72214
0x000x010x020x03
In the little-endian mapping, instruction p is arranged with the least-significant byte (LSB)
of the instruction word at the lowest byte address:
LSBMSB
1422E77C
0x000x010x020x03
The instruction decoder on the PPC405 assumes the instructions it receives are in bigendian order. When an instruction is fetched from memory, the instruction must be placed
in the instruction queue in big-endian order so that the instruction is properly decoded.
When instructions are fetched from little-endian memory regions, the four bytes of an
instruction word are reversed by the processor before the instruction is decoded. This byte
reversal occurs between memory and the instruction-cache unit (ICU) and is transparent to
software. The ICU always stores instructions in big-endian order regardless of whether the
instruction-memory region is defined as big endian or little endian. This means the bytes
are already in the proper order when an instruction is transferred from the ICU to the
instruction decoder.
If the endian-storage attribute is changed, the affected memory region must be reloaded
with program and data structures using the new endian ordering. If the endian ordering of
March 2002 Releasewww.xilinx.com351Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 2: Operational Concepts
instruction memory changes, the ICU must be made coherent with the updates. This is
accomplished by invalidating the ICU and updating the instruction memory with
instructions using the new endian ordering. Subsequent fetches from the updated memory
region are interpreted correctly before they are cached and decoded. See Instruction-
Cache Control Instructions, page 456 for information on instruction-cache invalidation.
Little-Endian Data Accesses
Unlike instruction fetches, data accesses from little-endian memory regions are not bytereversed between memory and the data-cache unit (DCU). The data-byte ordering stored
in memory depends on the data size (byte, halfword, or word). The data size is not known
until the data item is moved between memory and a general-purpose register. In the
PPC405, byte reversal of load and store accesses is performed between the DCU and the
GPRs.
When accessing data in a little-endian memory region, the processor automatically does
the following regardless of data alignment:
•For byte loads/stores, no reordering occurs
•For halfword loads/stores, bytes are reversed within the halfword
•For word loads/stores, bytes are reversed within the word
The big-endian and little-endian mappings of the structure s, shown in Structure-
Mapping Examples, page 349, demonstrate how the size of a data item determines its byte
ordering. For example:
•The word a has its four bytes reversed within the word spanning addresses 0x00–0x03
•The halfword e has its two bytes reversed within the halfword spanning addresses 0x1C–0x1D
•The array of bytes d (where each data item is a byte) is not reversed when the big-
endian and little-endian mappings are compared (For example, the character 'A' is
located at address 14 in both the big-endian and little-endian mappings)
In little-endian memory regions, data alignment is treated as it is in big-endian memory
regions. Unlike little-endian mode in the PowerPC architecture, no special alignment
exceptions occur when accessing data in little-endian memory regions versus big-endian
regions.
Load and Store Byte-Reverse Instructions
When accessing big-endian memory regions, load/store instructions move the moresignificant register bytes to and from the lower-numbered memory addresses and the lesssignificant register bytes are moved to and from the higher-numbered memory addresses.
The load/store with byte-reverse instructions, as described in Load and Store with Byte-
Reverse Instructions, page 385, do the opposite. The more-significant register bytes are
moved to and from the higher-numbered memory addresses, and the less-significant
register bytes are moved to and from the lower-numbered memory addresses.
Even though the load/store with byte-reverse instructions can be used to access littleendian memory, the E storage attribute provides two advantages over using those
instructions:
•The load/store with byte-reverse instructions do not solve the problem of fetching
instructions from a little-endian memory region. Only the E storage attribute
mechanism supports little-endian instruction fetching.
•Typical compilers cannot make general use of the load/store with byte-reverse
instructions, so these instructions are normally used only in device drivers written in
hand-coded assembler. However, compilers can take full advantage of the E storageattribute mechanism, allowing application programmers working in a high-level
language, such as C, to compile programs and data structures using little-endian
ordering.
The operand of a memory-access instruction has a natural alignment boundary equal to
the operand length. In other words, the natural address of an operand is an integral
multiple of the operand length. A memory operand is said to be aligned if it is aligned on
its natural boundary, otherwise it is misaligned.
All instructions are words and are always aligned on word boundaries.
Ta bl e 2 -1 shows the value required by the least-significant four address bits (bits 28:31) of
each data type for it to be aligned in memory. A value of x in a given bit position indicates
the address bit can have a value of 0 or 1.
Table 2-1: Memory Operand Alignment Requirements
Data TypeSize
Aligned Address
Bits 28:31
Byte8 Bitsxxxx
Halfword2 Bytesxxx0
Word4 Bytesxx00
Doubleword8 Bytesx000
The concept of alignment can be generally applied to any data in memory. For example, a
12-byte data item is said to be word aligned if its address is a multiple of four.
Some instructions require aligned memory operands. Also, alignment can affect
performance. For single-register memory access instructions, the best performance is
obtained when memory operands are aligned.
Alignment and Endian Storage Control
The endian storage-control attribute (E) does not affect how the processor handles operand
alignment. Data alignment is handled identically for accesses to big-endian and littleendian memory regions. No special alignment exceptions occur when accessing data in
little-endian memory regions. However, alignment exceptions that apply to big-endian
memory accesses also apply to little-endian memory accesses.
Performance Effects of Operand Alignment
The performance of accesses varies depending on the following parameters:
•Operand size
•Operand alignment
•Boundary crossing:
-None
-Cache block
-Page
To obtain the best performance across the widest range of PowerPC embeddedenvironment implementations and PowerPC Book-E processor implementations,
programmers should assume the alignment performance effects described in Figure 2-2.
This table applies to both big-endian and little-endian accesses. Figure 2-2 also applies to
PowerPC processors running in the default big-endian mode. However, those same
processors suffer further performance degradation when running in PowerPC littleendian mode.
March 2002 Releasewww.xilinx.com353Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 2-2: Performance Effects of Operand Alignment
SizeByte AlignmentNoneCache BlockPage
Byte1OptimalNot Applicable
Halfword2OptimalNot Applicable
Word4OptimalNot Applicable
Multiple Word4GoodGoodGood
Byte String1GoodGoodPoor
Note: Assumes both pages have identical storage-control attributes. Performance is poor
otherwise.
Alignment Exceptions
Misalignment occurs when addresses are not evenly divided by the data-object size. The
PPC405 automatically handles misalignments within word boundaries and across word
boundaries, generally at a cost in performance. Some instructions cause an alignment
exception if their operand is not properly aligned, as shown in Tab le 2 -3 .
Cache-control instructions ignore the four least-significant bits of the EA. No alignment
restrictions are placed on an EA when executing a cache-control instruction. However,
certain storage-control attributes can cause an alignment exception to occur when a cachecontrol instruction is executed. If data-address translation is disabled (MSR[DR]=0) and a
dcbz instruction references a non-cacheable memory region, or the memory region uses a
write-through caching policy, an alignment exception occurs. The alignment exception
allows the operating system to emulate the write-through caching policy. See Alignment
Interrupt (0x0600), page 510 for more information.
Instruction Conventions
Instruction Forms
Opcode tables and instruction listings often contain information regarding the instruction
form. This information refers to the type of format used to encode the instruction. Grouping
instructions by format is useful for programmers that must deal directly with machinelevel code, particularly programmers that write assemblers and disassemblers.
The formats used for the instructions of the PowerPC embedded-environment architecture
are shown in Instructions Grouped by Form, page 792. The Instruction Set Information,
page 797 also shows the form used by each instruction, listed alphabetically by mnemonic.
PowerPC instructions belong to one of the following three classes:
•Defined
•Illegal
•Reserved
An instruction class is determined by examining the primary opcode, and the extended
opcode if one exists. If the opcode and extended opcode combination does not specify a
defined instruction or reserved instruction, the instruction is illegal. Although the
definitions of these terms are consistent among PowerPC processor implementations, the
assignment of these classifications is not. For example, an instruction specific to 64-bit
implementations is considered defined for 64-bit implementations but illegal for 32-bit
implementations.
In future versions of the PowerPC architecture, instruction encodings that are now illegal
or reserved can become defined (by being added to the architecture) or reserved (by being
assigned a special purpose in an implementation).
Boundedly Undefined
The results of executing an instruction are said to be boundedly undefined if those results
could be achieved by executing an arbitrary sequence of instructions, starting in the
machine state prior to executing the given instruction. Boundedly-undefined results for an
instruction can vary between implementations and between different executions on the
same implementation.
Defined Instruction Class
Defined instructions contain all the instructions defined by the PowerPC architecture.
Defined instructions are guaranteed to be supported by all implementations of the
PowerPC architecture. The only exceptions are the instructions defined only for 64-bit
implementations, instructions defined only for 32-bit implementations, and instructions
defined only for embedded implementations. A PowerPC processor can invoke the illegalinstruction error handler (through the program-interrupt handler) when an
unimplemented instruction is encountered, allowing emulation of the instruction in
software.
A defined instruction can have preferred forms and invalid forms as described in the
following sections.
Preferred Instruction Forms
A preferred form of a defined instruction is one in which the instruction executes in an
efficient manner. Any form other than the preferred form can take significantly longer to
execute. The following instructions have preferred forms:
•Load-multiple and store-multiple instructions
•Load-string and store-string instructions
•OR-immediate instruction (preferred form of no-operation)
Invalid Instruction Forms
An invalid form of a defined instruction is one in which one or more operands are coded
incorrectly and in a manner that can be deduced only by examining the instruction
encoding (primary and extended opcodes). For example, coding a value of 1 in a reserved
bit (normally cleared to 0) produces an invalid instruction form.
The following instructions have invalid forms:
•Branch-conditional instructions
•Load with update and store with update instructions
•Load multiple instructions
March 2002 Releasewww.xilinx.com355Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
•Load string instructions
•Integer compare instructions
On the PPC405, attempting to execute an invalid instruction form generally yields a
boundedly-undefined result, although in some cases a program exception (illegalinstruction error) can occur.
Optional Instructions
The PowerPC architecture allows implementations to optionally support some defined
instructions. The PPC405 does not implement the following instructions:
•Floating-point instructions
•External-control instructions (eciwx, ecowx)
•Invalidate TLB entry (tlbie)
Illegal Instruction Class
Illegal instructions are grouped into the following categories:
•Unused primary opcodes. The following primary opcodes are defined as illegal but
can be defined by future extensions to the architecture:
1, 5, 6, 56, 57, 60, 61
•Unused extended opcodes. Unused extended opcodes can be derived from
information in Instructions Sorted by Opcode, page 781. The following primary
opcodes have unused extended opcodes:
Chapter 2: Operational Concepts
19, 31, 59, 63
•An instruction consisting entirely of zeros is guaranteed to be an illegal instruction.
This increases the probability that an attempt to execute data or uninitialized memory
causes an illegal-instruction error. If only the primary opcode consists of all zeros, the
instruction is considered a reserved instruction, as described in the following section.
An attempt to execute an illegal instruction causes an illegal-instruction error (program
exception). With the exception of an instruction consisting entirely of zeros, illegal
instructions are available for future addition to the PowerPC architecture.
Reserved Instruction Class
Reserved instructions are allocated to specific implementation-dependent purposes not
defined by the PowerPC architecture. An attempt to execute an unimplemented reserved
instruction causes an illegal-instruction error (program exception). The following types of
instructions are included in this class:
•Instructions for the POWER architecture that have not been included in the PowerPC
architecture.
•Implementation-specific instructions used to conform to the PowerPC architecture
specification. For example, load data-TLB entry (tlbld) and load instruction-TLB entry
(tlbli) instructions in the PowerPC 603™.
•The instruction with primary opcode 0, when the instruction does not consist entirely
of binary zeros.
•Any other implementation-specific instruction not defined by the PowerPC
architecture.
PowerPC Embedded-Environment Instructions
To support functions required in embedded-system applications, the PowerPC embeddedenvironment architecture defines instructions that are not part of the PowerPC
architecture. Tab l e 2- 4 lists the instructions specific to the PPC405 and other PowerPC
embedded-environment family implementations. From the standpoint of the PowerPC
architecture, these instructions are part of the reserved class and are implementation
dependent. Programs using these instructions are not portable to implementations that do
not support the PowerPC embedded-environment architecture.
In the table, the syntax “[o]” indicates the instruction has an overflow-enabled form that
updates XER[OV,SO] as well as a non-overflow-enabled form. The syntax “[.]” indicates
the instruction has a record form that updates CR[CR0] as well as a non-record form. The
headings “defined” and “allocated”, as they are used in Tab le 2 -4 , are described in the
following section, PowerPC Book-E Instruction Classes.
The PowerPC Book-E architecture defines four instruction classes:
•Defined
•Allocated
•Reserved
•Preserved
Referring to Ta bl e 2 -4 , the first two columns indicate which PPC405 instructions are part of
the defined instruction class and are guaranteed support in PowerPC Book-E processor
implementations. The last three columns indicate which PPC405 instructions are part of
the allocated instruction class. Support of these instructions by PowerPC Book-E
processors is implementation-dependent.
macchw[o][.]
macchws[o][.]
macchwsu[o][.]
macchwu[o][.]
machhw[o][.]
machhws[o][.]
machhwsu[o][.]
machhwu[o][.]
maclhw[o][.]
maclhws[o][.]
maclhwsu[o][.]
maclhwu[o][.]
nmacchw[o][.]
nmacchws[o][.]
nmachhw[o][.]
nmachhws[o][.]
nmaclhw[o][.]
nmaclhws[o][.]
mulchw[.]
mulchwu[.]
mulhhw[.]
mulhhwu[.]
mullhw[.]
mullhwu[.]
Defined Book-E Instruction Class
The defined instruction class consists of all instructions defined by the PowerPC Book E
architecture. In general, defined instructions are guaranteed to be supported by a PowerPC
Book E processor as specified by the architecture, either within the processor
implementation itself or within emulation software supported by the operating system.
Allocated Book-E Instruction Class
The allocated instruction class contains the set of instructions used for implementationdependent and application-specific use, outside the scope of the PowerPC Book E
architecture.
March 2002 Releasewww.xilinx.com357Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Reserved Book-E Instruction Class
The reserved instruction class consists of all instruction primary opcodes (and associated
extended opcodes, if applicable) that do not belong to either the defined class or the
allocated class.
Preserved Book-E Instruction Class
The preserved instruction class is provided to support backward compatibility with previous
generations of this architecture.
This chapter describes the processor resources and instructions available to all programs
running on the PPC405, whether they are running in user mode or privileged mode. These
resources and instructions are referred to as the user-programming model, which is a subset
of the privileged-programming model. Applications are typically restricted to running in
user mode. System software runs in privileged mode and has access to all register
processor resources, and can execute all instructions supported by the PPC405. System
software typically creates a context (execution environment) that protects itself and other
applications from the effects of an errant application program.
The remaining chapters in this book generally describe aspects of the privilegedprogramming model and are not relevant to application programmers. There are two
exceptions:
•Chapter 5, Memory-System Management, describes cache management features
available to both system and application programs.
•Chapter 8, Timer Resources, describes the time base, which can be read by
application programs.
Chapter 3
User Registers
Figure 3-1 shows the user registers supported by the PPC405, all of which are available to
software running in user mode and privileged mode. In the PPC405, all user registers are
32-bits wide, except for the time base as described in Time Base, page 524. Floating-point
registers are not supported by the PPC405.
March 2002 Releasewww.xilinx.com359Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
e
Chapter 3: User Programming Model
General-Purpose Registers
r0
r1
.
.
.
r31
Condition Register
CR
Time-Base Registers
TBR 0x10C
TBU
Special-Purpose Registers (SPRs)
Fixed-Point Exception Register
SPR 0x001
XER
Link Register
SPR 0x008
LR
Count Register
SPR 0x009
CTR
(read only)
TBR 0x10D
TBL
Figure 3-1:PPC405 User Registers
User-SPR General-Purpos
Registers
(SPR 0x100)
USPRG0
SPR General-Purpose
Registers
(read only)
SPR 0x104
SPRG4
SPR 0x105
SPRG5
SPR 0x106
SPRG6
SPR 0x107
SPRG7
UG011_30_033101
Most registers in the PPC405 are special-purpose registers, or SPRs. SPRs control the
operation of debug facilities, timers, interrupts, storage control attributes, and other
processor resources. All SPRs can be accessed explicitly using the move to special-purpose
register (mtspr) and move from special-purpose register (mfspr) instructions. See Special-
Purpose Register Instructions, page 424 for more information on these instructions. A few
registers are accessed as a by-product of executing certain instructions. For example, some
branch instructions access and update the link register.
The PPC405 SPRs in the user-programming model are shown in Figure 3-1. The SPR
number (SPRN) for each SPR is shown above the corresponding register. See Appendix A,
Special-Purpose Registers, page 770 for a complete list of all SPRs (user and privileged)
supported by the PPC405.
Simplified instruction mnemonics are available for the mtspr and mfspr instructions for
some SPRs. See Special-Purpose Registers, page 830 for more information.
General-Purpose Registers (GPRs)
The PPC405 contains thirty-two 32-bit general-purpose registers (GPRs), numbered r0
through r31, as shown in Figure 3-2. Data from memory are read into GPRs using load
instructions and the contents of GPRs are written to memory using store instructions. Most
integer instructions use the GPRs for source and destination operands.
The condition register (CR) is a 32-bit register that reflects the result of certain instructions
and provides a mechanism for testing and conditional branching. The bits in the CR are
grouped into eight 4-bit fields, CR0–CR7, as shown in Figure 3-3. The bits within an
arbitrary CRn field are shown in Figure 3-4. In this figure, the bit positions shown are
relative positions within the field rather than absolute positions within the CR register.
03 47 81112151619202324272831
CR0CR1CR2CR3CR4CR5CR6CR7
Figure 3-3:Condition Register (CR)
0123
LTGTEQSO
Figure 3-4:CRn Field
In the PPC405, the CR fields are modified in the following ways:
•The mtcrf instruction can update specific fields in the CR from a GPR.
•The mcrxr instruction can update a CR field with the contents of XER[0:3].
•The mcrf instruction can copy one CR field into another CR field.
•The condition-register logical instructions can update specific bits in the CR.
•The integer-arithmetic instructions can update CR0 to reflect their result.
•The integer-compare instructions can update a specific CR field to reflect their result.
Conditional-branch instructions can test bits in the CR and use the results of such a test as
the branch condition.
R
CR0 Field
The CR0 field is updated to reflect the result of an integer instruction if the Rc opcode field
(record bit) is set to 1. The addic., andi., and andis. instructions also update CR0 to reflect
the result they produce. For all of these instructions, CR0 is updated as follows:
•The instruction result is interpreted as a signed integer and algebraically compared to
0. The first three bits of CR0 (CR0[0:2]) are updated to reflect the result of the algebraic
comparison.
•The fourth bit of CR0 (CR0[3]) is copied from XER[SO].
The CR0 bits are interpreted as described in Tabl e 3- 1 . If any portion of the result is
undefined, the value written into CR0[0:2] is undefined.
March 2002 Releasewww.xilinx.com361Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 3: User Programming Model
Table 3-1: CR0-Field Bit Settings
BitNameFunctionDescription
0LTNegative
0—Result is not negative.
1—Result is negative.
1GTPositive
0—Result is not positive.
1—Result is positive.
2EQZero
0—Result is not equal to zero.
1—Result is equal to zero.
3SOSummary overflow
0—No overflow occurred.
1—Overflow occurred.
CR1 Field
In PowerPC® implementations that support floating-point operations, the CR1 field can be
updated by the processor to reflect the result of those operations. Because the PPC405 does
not support floating-point operations in hardware, CR1 is not updated in this manner.
CRn Fields (Compare Instructions)
Any one of the eight CRn fields (including CR0 and CR1) can be updated to reflect the
result of a compare instruction. The CRn-field bits are interpreted as described in Tab le 3 - 2.
This bit is set when the result is negative, otherwise it is cleared.
This bit is set when the result is positive (and not zero), otherwise
it is cleared.
This bit is set when the result is zero, otherwise it is cleared.
This is a copy of the final state of XER[SO] at the completion of the
instruction.
Table 3-2: CRn-Field Bit Settings
BitNameFunctionDescription
0LTLess than
0—
rA is not less than.
rA is less than.
1—
1GTGreater than
0—rA is not greater than.
1—
rA is greater than.
2EQEqual to
0—rA is not equal.
rA is equal.
1—
3SOSummary overflow
0—No overflow occurred.
1—Overflow occurred.
This bit is set when
rA < SIMM or rB (signed comparison), or
rA < UIMM or rB (unsigned comparison),
otherwise it is cleared.
This bit is set when
rA > SIMM or rB (signed comparison), or
rA > UIMM or rB (unsigned comparison),
otherwise it is cleared.
This bit is set when
rA = SIMM or rB (signed comparison), or
rA = UIMM or rB (unsigned comparison),
otherwise it is cleared.
This is a copy of the final state of XER[SO] at the completion of the
instruction.
The fixed-point exception register (XER) is a 32-bit register that reflects the result of
arithmetic operations that have resulted in an overflow or carry. This register is also used
to indicate the number of bytes to be transferred by load/store string indexed instructions.
Figure 3-5 shows the format of the XER. The bits in the XER are defined as shown in
Ta bl e 3 -3 .
012324 2531
SO OV CA
Figure 3-5:Fixed Point Exception Register (XER)
Table 3-3: Fixed Point Exception Register (XER) Bit Definitions
BitNameFunctionDescription
TBC
R
0SOSummary overflow
0—No overflow occurred.
1—Overflow occurred.
1OVOverflow
0—No overflow occurred.
1—Overflow occurred.
2CACarry
0—Carry did not occur.
1—Carry occurred.
3:24Reserved
25:31TBCTransfer-byte countTBC is modified using the mtspr instruction. It specifies the
SO is set to 1 whenever an instruction (except mtspr) sets the
overflow bit (XER[OV]). Once set, the SO bit remains set until it is
cleared to 0 by an mtspr instruction (specifying the XER) or an
mcrxr instruction. SO can be cleared to 0 and OV set to 1 using an
mtspr instruction.
OV can be modified by instructions when the overflow-enable bit
in the instruction encoding is set (OE=1). Add, subtract, and negate
instructions set OV=1 if the carry out from the result msb is not
equal to the carry out from the result msb + 1. Otherwise, they clear
OV=0. Multiply and divide set OV=1 if the result cannot be
represented in 32 bits. mtspr can be used to set OV=1, and mtspr
and mcrxr can be used to clear OV=0.
CA can be modified by add-carrying, subtract-from-carrying, add-extended, and subtract-from-extended instructions. These instructions
set CA=1 when there is a carry out from the result msb. Otherwise,
they clear CA=0. Shift-right algebraic instructions set CA=1 if any 1
bits are shifted out of a negative operand. Otherwise, they clear
CA=0. mtspr can be used to set CA=1, and mtspr and mcrxr can be
used to clear CA=0.
number of bytes to be transferred by a load-string word indexed
(lswx) or store-string word indexed (stswx) instruction.
The XER is an SPR with an address of 1 (0x001) and can be read and written using the
mfspr and mtspr instructions. The mcrxr instruction can be used to move XER[0:3] into
one of the seven CR fields.
Link Register (LR)
The link register (LR) is a 32-bit register that is used by branch instructions, generally for
the purpose of subroutine linkage. Two types of branch instructions use the link register:
•
Branch-conditional to link-register (bclrx) instructions read the branch-target address from
the LR.
•Branch instructions with the link-register update-option enabled load the LR with the
effective address of the instruction following the branch instruction. The link-register
update-option is enabled when the branch-instruction LK opcode field (bit 31) is set
to 1.
The format of LR is shown in Figure 3-6.
March 2002 Releasewww.xilinx.com363Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
031
Branch Address
Chapter 3: User Programming Model
Figure 3-6:Link Register (LR)
The LR is an SPR with an address of 8 (0x008) and can be read and written using the mfspr
and mtspr instructions. It is possible for the processor to prefetch instructions along the
target path specified by the LR provided the LR is loaded sufficiently ahead of the branch
to link-register instruction, giving branch-prediction hardware time to calculate the branch
address.
The two least-significant bits (LR[30:31]) can be written with any value. However, those
bits are ignored and assumed to have a value of 0 when the LR is used as a branch-target
address.
Some PowerPC processors implement a software-invisible link-register stack for
performance reasons. Although the PPC405 processor does not implement such a stack,
certain programming conventions should be followed so that software running on
multiple PowerPC processors can benefit from this stack. See Link-Register Stack,
page 371 for more information.
Count Register (CTR)
The count register (CTR) is a 32-bit register that can be used by branch instructions in the
following two ways:
•The CTR can hold a loop count that is decremented by a conditional-branch
instruction with an appropriately coded BO opcode field. The value in the CTR wraps
to 0xFFFF_FFFF if the value in the register is 0 prior to the decrement. See
Conditional Branch Control, page 367 for information on encoding the BO opcode
field.
•The CTR can hold the branch-target address used by
branch-conditional to count-register
(bcctrx) instructions.
The format of CTR is shown in Figure 3-7.
031
Count
Figure 3-7:Count Register (CTR)
The CTR is an SPR with an address of 9 (0x009) and can be read and written using the
mfspr and mtspr instructions. It is possible for the processor to prefetch instructions along
the target path specified by the CTR provided the CTR is loaded sufficiently ahead of the
branch to count-register instruction, giving branch-prediction hardware time to calculate
the branch address.
The two least-significant bits (CTR[30:31]) can be written with any value. However, those
bits are ignored and assumed to have a value of 0 when the CTR is used as a branch-target
address.
User-SPR General-Purpose Register
The user-SPR general-purpose register (USPRG0) is a 32-bit register that can be used by
application software for any purpose. The value stored in this register does not have an
effect on the operation of the PPC405 processor.
The USPRG0 is an SPR with an address of 256 (0x100) and can be read and written using
the mfspr and mtspr instructions.
SPR General-Purpose Registers
The SPR general-purpose registers (SPRG0–SPRG7) are 32-bit registers that can be used by
system software for any purpose. Four of the registers (SPRG4–SPRG7) are available from
user mode with read-only access. Application software can read the contents of SPRG4–
SPRG7, but cannot modify them. The values stored in these registers do not affect the
operation of the PPC405 processor.
The format of all SPRGn registers is shown in Figure 3-9.
The SPRGn registers are SPRs with the following addresses:
•SPRG4—260 (0x104).
•SPRG5—261 (0x105).
•SPRG6—262 (0x106).
•SPRG7—263 (0x107).
These registers can be read using the mfspr instruction. In privileged mode, system
software accesses these registers using different SPR numbers (see page 432).
Time-Base Registers
The time base is a 64-bit incrementing counter implemented as two 32-bit registers. The
time-base upper register (TBU) holds time-base bits 0:31, and the time-base lower register
(TBL) holds time-base bits 32:63.Figure 3-10shows the format of the time base.
031
TBU (Time Base [0:31])
031
TBL (Time Base [32:63])
Figure 3-10:Time-Base Register
The TBU and TBL registers are SPRs with user-mode read access and privileged-mode
write access. Reading the time-base registers requires use of the mftb instruction with the
following addresses:
•TBU—269 (0x10D).
•TBL—268 (0x10C).
SeeTime Base, page 524, for information on using the time base.
March 2002 Releasewww.xilinx.com365Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Exception Summary
An exception is an event that can be caused by a number of sources, including:
•Error conditions arising from instruction execution.
•Internal timer resources.
•Internal debug resources.
•External peripherals.
When an exception occurs, the processor can interrupt the currently executing program so
that system software can deal with the exception condition. The action taken by an
interrupt includes saving the processor context and transferring control to a
predetermined exception-handler address operating under a new context. When the
interrupt handler completes execution, it can return to the interrupted program by
executing a return-from-interrupt instruction.
Exceptions are handled by privileged software. The exception mechanism is described in
Chapter 7, Exceptions and Interrupts. Following is a list of exceptions that can be caused
by the execution of an instruction in user mode.
•Data-Storage Exception.
An attempt to access data in memory that results in a memory-protection violation
causes the data-storage interrupt handler to be invoked.
•Instruction-Storage Exception.
Chapter 3: User Programming Model
An attempt to access instructions in memory that result in a memory-protection
violation causes the instruction-storage interrupt handler to be invoked.
•Alignment Exception.
An attempt to access memory with an invalid effective-address alignment (for the
specific instruction) causes the alignment-interrupt handler to be invoked.
•Program Exception.
Three different types of interrupt handlers can be invoked when a program exception
occurs: illegal instruction, privileged instruction, and system trap. The conditions
causing a program interrupt include:
-An attempt to execute an illegal instruction causes the illegal-instruction interrupt
handler to be invoked.
-An attempt to execute an optional instruction not implemented by the PPC405
causes the illegal-instruction interrupt handler to be invoked.
-An attempt by a user-level program to execute a supervisor-level instruction
causes the privileged-instruction interrupt handler to be invoked.
-An attempt to execute a defined instruction with an invalid form causes either the
illegal-instruction interrupt handler or the privileged-instruction interrupt
handler to be invoked.
-Executing a trap instruction can cause the system-trap interrupt handler to be
invoked.
•Floating-Point Unavailable Exception.
On processors that support floating-point instructions, executing such instructions
when the floating-point unit is disabled (MSR[FP]=0) invokes the floating-pointunavailable interrupt handler.
•System-Call Exception.
The execution of an sc instruction causes the system-call interrupt handler to be
invoked. The interrupt handler can be used to call a system-service routine.
If data translation is enabled, an attempt to access data in memory when a valid TLB
entry is not present causes the data TLB-miss interrupt handler to be invoked.
•Instruction TLB-Miss Exception.
If instruction translation is enabled, an attempt to access instructions in memory when
a valid TLB entry is not present causes the instruction TLB-miss interrupt handler to be
invoked.
Other exceptions can occur during user-mode program execution that are not directly
caused by instruction execution. These are also described in Chapter 7:
•Machine-check exceptions.
•Exceptions caused by external devices.
•Exceptions caused by a timer.
•Debug exceptions.
Branch and Flow-Control Instructions
Branch instructions redirect program flow by altering the next-instruction address nonsequentially. Branches unconditionally or conditionally alter program flow forward or
backward using either an absolute address or an address relative to the branch-instruction
address. Branches calculate the target address using the contents of the CTR, LR, or fields
within the branch instruction. Optionally, a branch-return address can be automatically
loaded into the LR by setting the LK instruction-opcode bit to 1. This option is useful for
specifying the return address for subroutine calls and causes the address of the instruction
following the branch to be loaded in the LR. Branches are used for all non-sequential
program flow including jumps, loops, calls and returns.
Branch-conditional instructions redirect program flow if a tested condition is true. These
instructions can test a bit value within the CR, the value of the CTR, or both. Conditionregister logical instructions are provided to set up the tests for branch-conditional
instructions.
R
Conditional Branch Control
With branch-conditional instructions, the BO opcode field specifies the branch-control
conditions and how the branch affects the CTR. The BO field can specify a test of the CR
and it can specify that the CTR be decremented and tested. The BO field can also be
initialized to reverse the default prediction performed by the processor. The bits within the
BO field are defined as shown in Tab le 3 - 4.
Table 3-4: BO Field Bit Definitions
BO BitDescription
BO[0]CR Test Control
0—Test the CR bit specified by the BI opcode field for the value indicated by BO[1].
1—Do not test the CR.
BO[1]CR Test Value
0—Test for CR[BI]=0.
1—Test for CR[BI]=1.
March 2002 Releasewww.xilinx.com367Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 3: User Programming Model
Table 3-4: BO Field Bit Definitions (Continued)
BO BitDescription
BO[2]CTR Test Control
0—Decrement CTR by one, and test whether CTR satisfies the condition specified by
BO[3].
1—Do not change or test CTR.
BO[3]CTR Test Value
0—Test for C T R ≠ 0.
1—Test for C T R =0.
BO[4]Branch Prediction Reversal
0—Apply standard branch prediction.
1—Reverse the standard branch prediction.
The 5-bit BI opcode field in branch-conditional instructions specifies which of the 32 bits in
the CR are used in the branch-condition test. For example, if BI=0b01010, CR
is used in
10
the test.
In some encodings of the BO field, certain BO bits are ignored. Ignored bits can be assigned
a meaning in future extensions of the PowerPC architecture and should be cleared to 0.
Valid BO field encodings are shown in Table 3 -5 . In this table, z indicates the ignored bits
that should be cleared to 0. The y bit (BO[4]) specifies the branch-prediction behavior for
the instruction as described in Specifying Branch-Prediction Behavior, page 370.
Table 3-5: Valid BO Opcode-Field Encoding
BO[0:4]Description
0000y
y
0001
zy
001
y
0100
y
0101
zy
011
z00y
1
z01y
1
z1zz
1
Branch Instructions
The following sections describe the branch instructions defined by the PowerPC
architecture. A number of simplified mnemonics are defined for the branch instructions.
See Branch Instructions, page 821 for more information.
Branch Unconditional
Decrement the CTR. Branch if the decremented CTR ≠ 0 and CR[BI]=0.
Decrement the CTR. Branch if the decremented CTR = 0 and CR[BI]=0.
Branch if CR[BI]=0.
Decrement the CTR. Branch if the decremented CTR ≠ 0 and CR[BI]=1.
Decrement the CTR. Branch if the decremented CTR=0 and CR[BI]=1.
Branch if CR[BI]=1.
Decrement the CTR. Branch if the decremented CTR ≠ 0.
Decrement the CTR. Branch if the decremented CTR = 0.
Branch always.
Ta bl e 3 -6 lists the PowerPC unconditional branch instructions. These branches specify a 26-
bit signed displacement to the branch-target address by appending the 24-bit LI instruction
field with 0b00. The displacement value gives unconditional branches the ability to cover
an address range of ±32 MB.
blBranch and LinkBranch to relative address. LR is updated with the
address of the instruction following the branch.
blaBranch Absolute and LinkBranch to absolute address. LR is updated with the
address of the instruction following the branch.
Branch Conditional
Ta bl e 3 -7 lists the PowerPC branch-conditional instructions. The BO field specifies the
condition tested by the branch, as shown in Tab le 3-5, page 3 6 8. The BI field specifies the
CR bit used in the test. These branches specify a 16-bit signed displacement to the branchtarget address by appending the 14-bit BD instruction field with 0b00. The displacement
value gives conditional branches the ability to cover an address range of ±32 KB.
Table 3-7: Branch-Conditional Instructions
MnemonicNameOperation
bcBranch ConditionalBranch-conditional to relative address..
Operand
Syntax
tgt_addr
Operand
Syntax
BO,BI,tgt_addr
bcaBranch Conditional AbsoluteBranch-conditional to absolute address.
bclBranch Conditional and LinkBranch-conditional to relative address. LR is
updated with the address of the instruction
following the branch.
bclaBranch Conditional Absolute and
Link
Branch-conditional to absolute address. LR is
updated with the address of the instruction
following the branch.
Branch Conditional to Link Register
Ta bl e 3 -8 lists the PowerPC branch-conditional to link-register instructions. The BO field
specifies the condition tested by the branch, as shown in Table 3-5, p a ge 3 6 8. The BI field
specifies the CR bit used in the test. The branch-target address is read from the LR, with
LR[30:31] cleared to zero to form a word-aligned address. Using the 32-bit LR as a branch
target gives these branches the ability to cover the full 4 GB address range.
Table 3-8: Branch-Conditional to Link-Register Instructions
MnemonicNameOperation
bclrBranch Conditional to Link RegisterBranch-conditional to address in LR.
bclrlBranch Conditional to Link Register
and Link
Branch-conditional to address in LR. LR is updated
with the address of the instruction following the
branch.
Operand
Syntax
BO,BI
March 2002 Releasewww.xilinx.com369Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Branch Conditional to Count Register
Ta bl e 3 -9 lists the PowerPC branch-conditional to count-register instructions. The BO field
specifies the condition tested by the branch, as shown in Table 3-5, p a ge 3 6 8. The BI field
specifies the CR bit used in the test. The branch-target address is read from the CTR, with
CTR[30:31] cleared to zero to form a word-aligned address. Using the 32-bit CTR as a
branch target gives these branches the ability to cover the full 4 GB address range.
Table 3-9: Branch-Conditional to Count-Register Instructions
Chapter 3: User Programming Model
MnemonicNameOperation
bcctrBranch Conditional to Count RegisterBranch-conditional to address in CTR.
bcctrlBranch Conditional to Count Register
and Link
Branch-conditional to address in CTR. LR is
updated with the address of the instruction
following the branch.
Branch Prediction
Conditional branches alter program flow based on the value of bits in the CR. If a condition
is met by the CR bits, the branch instruction alters the next-instruction address nonsequentially. Otherwise, the next-sequential instruction following the branch is executed.
When the processor encounters a conditional branch, it scans the execution pipelines to
determine whether an instruction in progress can affect the CR bit tested by the branch. If
no such instruction is found, the branch can be resolved immediately by checking the bit in
the CR and taking the action defined by the branch instruction.
However, if a CR-altering instruction is detected, the branch is considered unresolved until
the CR-altering instruction completes execution and writes its result to the CR. Prior to that
time, the processor can predict how the branch is resolved. First, the processor uses special
dynamic prediction hardware to analyze instruction flow and branch history to predict
resolution of the current branch. If branches are predicted correctly, performance
improvements can be realized because instruction execution does not stall waiting for the
branch to be resolved. The PowerPC architecture provides software with the ability to
override (reverse) the dynamic prediction using a static prediction hint encoded in the
instruction opcode. This can be useful when it is known at compile time that a branch is
likely to behave contrary to what the processor expects. The use of static prediction is
described in the next section, Specifying Branch-Prediction Behavior.
When a prediction is made, instructions are fetched from the predicted execution path. If
the processor determines the prediction was incorrect after the CR-altering instruction
completes execution, all instructions fetched as a result of the prediction are discarded by
the processor. Instruction fetch is restarted along the correct path. If the prediction was
correct, instruction fetch and execution proceed normally along the predicted (and now
resolved) path.
Branch prediction is most effective when the branch-target address is computed well in
advance of resolving the branch. If a branch instruction contains immediate addressing
operands, the processor can compute the branch-target address ahead of branch
resolution. If the branch instruction uses the LR or CTR for addressing, it is important that
the register is loaded by software sufficiently ahead of the branch instruction.
Operand
Syntax
BO,BI
Specifying Branch-Prediction Behavior
All PowerPC processors predict a conditional branch as taken using the following rules:
•For the bcx instruction with a negative value in the displacement operand, the branch
is predicted taken.
•For all other branch-conditional instructions (bcx with a non-negative value in the
displacement operand, bclrx, or bcctrx), the branch is predicted not taken.
where s is the sign bit of the displacement operand, if the instruction has a displacement
operand (bit 16 of the branch-conditional instruction encoding).
When the result of the above equation is 0, the branch is predicted not-taken and the
processor speculatively fetches instructions that sequentially follow the branch
instruction.
Examining the above equation, BO[0] ∧ BO[2]=1 only when the conditional branch tests
nothing, meaning the branch is always taken. In this case, the processor predicts the branch
as taken.
If the conditional branch tests anything (BO[0] ∧ BO[2]=0), s controls the prediction. In the
bclrx and bcctrx instructions, bit 16 (s) is reserved and always 0. In this case those
instructions are predicted not-taken.
Only the bcx instructions can specify a displacement value. The bcx instructions are
commonly used at the end of loops to control the number of times a loop is executed. Here,
the branch is taken every time the loop is executed except the last time, so a branch should
normally be predicted as taken. Because the branch target is at the beginning of the loop,
the branch displacement is negative and s=1, so the processor predicts the branch as taken.
Forward branches have a positive displacement and are predicted not-taken.
When the y bit (BO[4]) is cleared to 0, the default branch prediction behavior described
above is followed by the processor. Setting the y bit to 1 reverses the above behavior. For branch always encoding (BO[0], BO[2]), branch prediction cannot be reversed (no y bit is
recognized).
The sign of the displacement operand (s) is used as described above even when the target
is an absolute address. The default value for the y bit should be 0. Compilers can set this bit
if it they determine that the prediction corresponding to y=1 is more likely to be correct
than the prediction corresponding to y=0. Compilers that do not statically predict branches
should always clear the y bit.
R
Link-Register Stack
Some processor implementations keep a stack (history) of the LR values most recently
used by branch-and-link instructions. Those processors use this software-invisible stack to
predict the target address of nested-subroutine returns. Although the PPC405 processor
does not implement such a stack, the following programming conventions should be
followed so that software running on multiple PowerPC processors can benefit from this
stack.
In the following examples, let A, B, and Glue represent subroutine labels:
•When obtaining the address of the next instruction, use the following form of branchand-link:
bcl 20,31,$+4
•Loop counts:
Keep loop counts in the CTR, and use one of the branch-conditional instructions to
decrement the count and to control branching (for example, branching back to the start
of a loop if the decremented CTR value is nonzero).
•Computed “go to”, case statements, etc.:
Use the CTR to hold the branch-target address, and use the bcctr instruction with the
link register option disabled (LK=0) to branch to the selected address.
•Direct subroutine linkage, where A calls B and B returns to A:
-A calls B—use a branch instruction that enables the LR (LK=1).
March 2002 Releasewww.xilinx.com371Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
-B returns to A—use the bclr instruction with the link-register option disabled
(LK=0). The return address is in, or can be restored to, the LR.
•Indirect subroutine linkage, where A calls Glue, Glue calls B, and B returns to A rather
than to Glue.
Such a calling sequence is common in linkage code where the subroutine that the
programmer wants to call, B, is in a different module than the caller, A. The binder
inserts “glue” code to mediate the branch:
-A calls Glue—use a branch instruction that sets the LR with the link-register
option enabled (LK=1).
-Glue calls B—write the address of B in the CTR, and use the bcctr instruction with
the link-register option disabled (LK=0).
-B returns to A—use the bclr instruction with the link-register option disabled
(LK=0). The return address is in, or can be restored to, the LR.
Branch-Target Address Calculation
Branch instructions compute the effective address (EA) of the next instruction using the
following addressing modes:
•Branch to relative (conditional and unconditional).
•Branch to absolute (conditional and unconditional).
•Branch to link register (conditional only).
•Branch to count register (conditional only).
Instruction addresses are always assumed to be word aligned. PowerPC processors ignore
the two low-order bits of the generated branch-target address.
Chapter 3: User Programming Model
Branch to Relative
Instructions that use branch-to-relative addressing generate the next-instruction address by
right-extending 0b00 to the immediate-displacement operand (LI), and then signextending the result. That result is added to the current-instruction address to produce the
next-instruction address. Branches using this addressing mode must have the absoluteaddressing option disabled by clearing the AA instruction field (bit 30) to 0. The linkregister update option is enabled by setting the LK instruction field (bit 31) to 1. This
option causes the effective address of the instruction following the branch instruction to be
loaded into the LR.
Figure 3-11 shows how the branch-target address is generated when using the branch-to-
If the branch conditions are met, instructions that use branch-conditional to relative
addressing generate the next-instruction address by appending 0b00 to the immediatedisplacement operand (BD) and sign-extending the result. That result is added to the
current-instruction address to produce the next-instruction address. Branches using this
addressing mode must have the absolute-addressing option disabled by clearing the AA
instruction field (bit 30) to 0. The link-register update option is enabled by setting the LK
instruction field (bit 31) to 1. This option causes the effective address of the instruction
following the branch instruction to be loaded into the LR.
Figure 3-12 shows how the branch-target address is generated when using the branch-
conditional to relative addressing mode.
30
00
UG011_11_033101
06111630 31
Instruction Encoding
Condition
Met?
031
Next Sequential Instruction Address
031
Current Instruction Address
Ye s
No
16
031
031
BOBI
+
Branch Target Address
Figure 3-12:Branch-Conditional to Relative Addressing
BD
BDSign Extension
LK
AA
3016
00
UG011_07_033101
March 2002 Releasewww.xilinx.com373Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Branch to Absolute
Instructions that use branch-to-absolute addressing generate the next-instruction address by
appending 0b00 to the immediate-displacement operand (LI) and sign-extending the
result. Branches using this addressing mode must have the absolute-addressing option
enabled by setting the AA instruction field (bit 30) to 1. The link-register update option is
enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective
address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-13 shows how the branch-target address is generated when using the branch-to-
absolute addressing mode.
Instruction Encoding
Chapter 3: User Programming Model
0630 31
18
LI
AA
LK
031
Sign Extension
031
6
LI
Branch Target Address
30
00
UG011_12_033101
Figure 3-13:Branch-to-Absolute Addressing
Branch-Conditional to Absolute
If the branch conditions are met, instructions that use branch-conditional to absolute
addressing generate the next-instruction address by appending 0b00 to the immediatedisplacement operand (BD) and sign-extending the result. Branches using this addressing
mode must have the absolute-addressing option enabled by setting the AA instruction
field (bit 30) to 1. The link-register update option is enabled by setting the LK instruction
field (bit 31) to 1. This option causes the effective address of the instruction following the
branch instruction to be loaded into the LR.
Figure 3-14 shows how the branch-target address is generated when using the branch-
conditional to absolute-addressing mode.
06111630 31
Instruction Encoding
16
BOBI
BD
AA
LK
Condition
Met?
031
Next Sequential Instruction Address
Ye s
No
031
031
Branch Target Address
BDSign Extension
3016
00
UG011_08_033101
Figure 3-14:Branch-Conditional to Absolute Addressing
If the branch conditions are met, the branch-conditional to link-register instruction generates
the next-instruction address by reading the contents of the LR and clearing the two loworder bits to zero. The link-register update option is enabled by setting the LK instruction
field (bit 31) to 1. This option causes the effective address of the instruction following the
branch instruction to be loaded into the LR.
Figure 3-15 shows how the branch-target address is generated when using the branch-
conditional to link-register addressing mode.
R
06111631
Instruction Encoding
Condition
Met?
031
Next Sequential Instruction Address
Ye s
No
19
031
031
BOBI
Branch Target Address
0000016
LR
Figure 3-15:Branch-Conditional to Link-Register Addressing
Branch-Conditional to Count Register
If the branch conditions are met, the branch-conditional to count-register instruction
generates the next-instruction address by reading the contents of the CTR and clearing the
two low-order bits to zero. The link-register update option is enabled by setting the LK
instruction field (bit 31) to 1. This option causes the effective address of the instruction
following the branch instruction to be loaded into the LR.
Figure 3-16 shows how the branch-target address is generated when using the branch-
conditional to count-register addressing mode.
21
LK
3029
00
UG011_09_033101
06111631
Instruction Encoding
Condition
Met?
031
Next Sequential Instruction Address
Ye s
No
19
031
031
BOBI
CTR
Branch Target Address
21
00000528
LK
3029
00
UG011_10_033101
Figure 3-16:Branch-Conditional to Count-Register Addressing
March 2002 Releasewww.xilinx.com375Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Condition-Register Logical Instructions
Ta bl e 3 -1 0 lists the PowerPC condition-register logical instructions. The condition-register
logical instructions perform logical operations on any two bits within the CR and store the
result of the operation in any CR bit. The move condition-register field instruction is used to
move any CR field (each field comprising four bits) to any other CR-field location. All of
these instructions are considered flow-control instructions because they are generally used
to set up conditions for testing by the branch-conditional instructions and to reduce the
number of branches in a code sequence. Simplified mnemonics are defined for the
condition-register logical instructions. See CR-Logical Instructions, page 828 for more
information.
In Tabl e 3-10 , the instruction-operand fields crbA, crbB, and crbD all specify a single bit
within the CR. The instruction-operand fields crfD and crfS specify a 4-bit field within the
CR.
crandCondition Register ANDCR-bit crbA is ANDed with CR-bit crbB and the
result is stored in CR-bit crbD.
crandcCondition Register AND with
Complement
creqvCondition Register Equivalent
CR-bit crbA is ANDed with the
bit crbB and the result is stored in CR-bit crbD.
CR-bit crbA is XORed with CR-bit crbB and the
complement of CR-
complemented result is stored in CR-bit crbD.
crnandCondition Register NAND
CR-bit crbA is ANDed with CR-bit crbB and the
complemented result is stored in CR-bit crbD.
crnorCondition Register NOR
CR-bit crbA is ORed with CR-bit crbB and the
complemented result is stored in CR-bit crbD.
crorCondition Register ORCR-bit crbA is ORed with CR-bit crbB and the
result is stored in CR-bit crbD.
crorcCondition Register OR with
Complement
crxorCondition Register XORCR-bit crbA is XORed with CR-bit crbB and the
mcrfMove Condition Register FieldCR-field crfS is copied into CR-field crfD. No other
CR-bit crbA is ORed with the
bit crbB and the result is stored in CR-bit crbD.
result is stored in CR-bit crbD.
CR fields are modified.
complement of CR-
Operand
Syntax
crbD,crbA,crbB
,crfS
crfD
System Call
Ta bl e 3 -11 lists the PowerPC system-call instruction. The sc instruction is a user-level
instruction that can be used by a user-mode program to transfer control to a privilegedmode program (typically a system-service routine). Executing the sc instruction causes a
system-call exception to occur. See System-Call Interrupt (0x0C00), page 514 for more
information on the operation of this instruction.
Table 3-11: System-Call Instruction
MnemonicNameOperation
scSystem CallCauses a system-call exception to occur.—
Ta bl e 3 -1 2 lists the PowerPC system-trap instructions. System-trap instructions are
normally used by software-debug applications to set breakpoints. These instructions test
for a specified set of conditions and cause a program exception to occur if any of the
conditions are met. If the tested conditions are not met, instruction execution continues
normally with the instruction following the system-trap instruction (a program exception
does not occur). The system-trap handler can be called from the program-interrupt handler
when it is determined that a system-trap instruction caused the exception. See Program
Interrupt (0x0700), page 511 for more information on program exceptions caused by the
system-trap instructions.
Trap instructions can also be used to cause a debug exception. See Trap-Instruction Debug
Event, page 546 for more information.
Simplified mnemonics are defined for the system-trap instructions. See Trap Instructions,
page 832 for more information.
Table 3-12: System-Trap Instructions
R
MnemonicNameOperation
twTrap WordThe contents of rA are compared with rB. A
program exception occurs if the comparison meets
any test condition enabled by the TO operand.
twiTrap Word ImmediateThe contents of rA are compared with the sign-
extended SIMM operand. A program exception
occurs if the comparison meets any test condition
enabled by the TO operand.
The TO operand field in the system-trap instructions specifies the test conditions
performed on the remaining two operands. Multiple test conditions can be set
simultaneously, expanding the number of possible conditions that can cause the trap
(program exception). If all bits in the TO operand field are set, the trap always occurs
because one of the trap conditions is always met. The bits within the TO field are defined
as shown in Ta bl e 3 -1 3.
Table 3-13: TO Field Bit Definitions
TO BitDescription
TO[0]Less-than arithmetic comparison.
0—Ignore trap condition.
1—Trap if first operand is arithmetically less-than second operand.
Operand
Syntax
TO,rA,rB
TO,rA,SIMM
TO[1]Greater-than arithmetic comparison.
0—Ignore trap condition.
1—Trap if first operand is arithmetically greater-than second operand.
March 2002 Releasewww.xilinx.com377Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 3-13: TO Field Bit Definitions (Continued)
TO BitDescription
TO[2]Equal-to arithmetic comparison.
0—Ignore trap condition.
1—Trap if first operand is arithmetically equal-to second operand.
TO[3]Less-than unsigned comparison.
0—Ignore trap condition.
1—Trap if first operand is less-than second operand.
TO[4]Greater-than unsigned comparison.
0—Ignore trap condition.
1—Trap if first operand is greater-than second operand.
Integer Load and Store Instructions
The integer load and store instructions move data between the general-purpose registers
and memory. Several types of loads and stores are supported by the PowerPC instruction
set:
•Load and zero
•Load algebraic
•Store
•Load with byte reverse and store with byte reverse
•Load multiple and store multiple
•Load string and store string
•Memory synchronization instructions
Memory accesses performed by the load and store instructions can occur out of order.
Synchronizing instructions are provided to enforce strict memory-access ordering. See
Synchronizing Instructions, page 424 for more information.
In general, the PowerPC architecture defines a sequential-execution model. When a store
instruction modifies an instruction-memory location, software synchronization is required
to ensure subsequent instruction fetches from that location obtain the modified version of
the instruction. See Self-Modifying Code, page 467 for more information.
Chapter 3: User Programming Model
Operand-Address Calculation
Integer load and store instructions generate effective addresses using one of three
addressing modes: register-indirect with immediate index, register-indirect with index, or
register indirect. These addressing modes are described in the following sections. For some
instructions, update forms that load the calculated effective address into rA are also
provided.
In the PPC405 processor, loads and stores to unaligned addresses can suffer from
performance degradation. Refer to Performance Effects of Operand Alignment, page 353
for more information.
Register-Indirect with Immediate Index
Load and store instructions using this addressing mode contain a signed, 16-bit immediate
index (d operand) and a general-purpose register operand, rA. The index is sign-extended
to 32 bits and added to the contents of rA to generate the effective address. If the rA
instruction field is 0 (specifying r0), a value of zero—rather than the contents of r0—is
added to the sign-extended immediate index. The option to specify rA or 0 is shown in the
instruction description as (rA|0).
Figure 3-17 shows how an effective address is generated when using register-indirect with
immediate-index addressing.
R
rA=0?
No
Instruction Encoding
Ye s
031
0000 0000 0000 0000 0000 0000 0000 0000
031
Figure 3-17:Register-Indirect with Immediate-Index Addressing
Register-Indirect with Index
Load and store instructions using this addressing mode contain two general-purpose
register operands, rA and rB. The contents of these two registers are added to generate the
effective address. If the rA instruction field is 0 (specifying r0), a value of zero—rather than
the contents of r0—is added to rB. The option to specify rA or 0 is shown in the instruction
description as (rA|0).
Figure 3-18 shows how an effective address is generated when using register-indirect with
index addressing.
(rA)
061116
Opcode
031
rD/rSrA
Sign Extension
d
16
d
31
+
031
Effective Address
UG011_02_033101
March 2002 Releasewww.xilinx.com379Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
0611162031
Instruction Encoding
Opcode
031
Chapter 3: User Programming Model
rD/rSrArB
(rB)
Subopcode0
rA=0?
No
Ye s
031
0000 0000 0000 0000 0000 0000 0000 0000
031
Figure 3-18:Register-Indirect with Index Addressing
Register Indirect
Only load-string and store-string instructions can use this addressing mode. This mode
uses only the contents of the general-purpose register specified by the rA operand as the
effective address. Rather than using the contents of r0, a zero in the rA operand causes an
effective address of zero to be generated. The option to specify rA or 0 is shown in the
instruction descriptions as (rA|0).
Figure 3-19 shows how an effective address is generated when using register-indirect
Integer-load instructions read an operand from memory and store it in a GPR destination
register, rD. Each type of load is characterized by what they do with unused high-order
bits in rD when the operand size is less than a word (32 bits). Load-and-zero instructions
clear the unused high-order bits in rD to zero. Load-algebraic instructions fill the unused
high-order bits in rD with a copy of the most-significant bit in the operand.
Load-with-update instructions are provided, but the following two rules apply:
•rA must not be equal to 0. If rA = 0, the instruction form is invalid.
•rA must not be equal to rD. If rA = rD, the instruction form is invalid.
In the PPC405, the above invalid instruction forms produce a boundedly-undefined result.
In other PowerPC implementations, those forms can cause a program exception.
Load Byte and Zero
Ta bl e 3 -1 4 lists the PowerPC load byte and zero instructions. These instructions load a byte
from memory into the lower-eight bits of rD and clear the upper-24 bits of rD to 0.
Table 3-14: Load Byte and Zero Instructions
R
MnemonicNameAddressing Mode
lbzLoad Byte and ZeroRegister-indirect with immediate index
EA = (rA|0) + d
lbzuLoad Byte and Zero with UpdateRegister-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0, rA ≠ rD
lbzxLoad Byte and Zero IndexedRegister-indirect with index
EA = (rA|0) + (rB)
lbzuxLoad Byte and Zero with Update
Indexed
Register-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0, rA ≠ rD
Load Halfword and Zero
Ta bl e 3 -1 5 lists the PowerPC load halfword and zero instructions. These instructions load a
halfword from memory into the lower-16 bits of rD and clear the upper-16 bits of rD to 0.
Operand
Syntax
rD,d(rA)
rD,rA,rB
March 2002 Releasewww.xilinx.com381Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 3-15: Load Halfword and Zero Instructions
Chapter 3: User Programming Model
MnemonicNameAddressing Mode
lhzLoad Halfword and ZeroRegister-indirect with immediate index
EA = (rA|0) + d
lhzuLoad Halfword and Zero with Update Register-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0, rA ≠ rD
lhzxLoad Halfword and Zero IndexedRegister-indirect with index
EA = (rA|0) + (rB)
lhzuxLoad Halfword and Zero with Update
Indexed
Register-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0, rA ≠ rD
Load Word and Zero
Ta bl e 3 -1 6 lists the PowerPC load word and zero instructions. These instructions load a word
from memory into rD.
Table 3-16: Load-Word and Zero Instructions
Operand
Syntax
rD,d(rA)
rD,rA,rB
MnemonicNameAddressing Mode
lwzLoad Word and ZeroRegister-indirect with immediate index
EA = (rA|0) + d
lwzuLoad Word and Zero with UpdateRegister-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0, rA ≠ rD
lwzxLoad Word and Zero IndexedRegister-indirect with index
EA = (rA|0) + (rB)
lwzuxLoad Word and Zero with Update
Indexed
Register-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0, rA ≠ rD
Load Halfword Algebraic
Ta bl e 3 -1 7 lists the PowerPC load halfword algebraic instructions. These instructions load a
halfword from memory into the lower-16 bits of rD. The upper-16 bits of rD are filled with
a copy of the most-significant bit (bit 16) of the operand.
lhaLoad Halfword AlgebraicRegister-indirect with immediate index
EA = (rA|0) + d
lhauLoad Halfword Algebraic with
Update
lhaxLoad Halfword Algebraic IndexedRegister-indirect with index
lhauxLoad Halfword Algebraic with
Update Indexed
Register-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0, rA ≠ rD
EA = (rA|0) + (rB)
Register-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0, rA ≠ rD
Operand
Syntax
rD,d(rA)
rD,rA,rB
March 2002 Releasewww.xilinx.com383Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Store Instructions
Integer-store instructions read an operand from a GPR source register, rS, and write it into
memory. Store-with-update instructions are provided, but the following two rules apply:
•rA must not be equal to 0. If rA = 0, the instruction form is invalid.
•If rS = rA, rS is written to memory first, and then the effective address is loaded into
rS.
In the PPC405, the above invalid instruction form produces a boundedly-undefined result.
In other PowerPC implementations, that form can cause a program exception.
Store Byte
Ta bl e 3 -1 8 lists the PowerPC store byte instructions. These instructions store the lower-eight
bits of rS into the specified byte location in memory.
Table 3-18: Store Byte Instructions
Chapter 3: User Programming Model
MnemonicNameAddressing Mode
stbStore ByteRegister-indirect with immediate index
EA = (rA|0) + d
stbuStore Byte with UpdateRegister-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0
stbxStore Byte IndexedRegister-indirect with index
EA = (rA|0) + (rB)
stbuxStore Byte with Update IndexedRegister-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0
Store Halfword
Ta bl e 3 -1 9 lists the PowerPC store halfword instructions. These instructions store the lower-
16 bits of rS into the specified halfword location in memory.
sthStore HalfwordRegister-indirect with immediate index
EA = (rA|0) + d
sthuStore Halfword with UpdateRegister-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0
sthxStore Halfword IndexedRegister-indirect with index
EA = (rA|0) + (rB)
sthuxStore Halfword with Update IndexedRegister-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0
Store Word
Ta bl e 3 -2 0 lists the PowerPC store word instructions. These instructions store the entire
contents of rS into the specified word location in memory.
Table 3-20: Store Word Instructions
Operand
Syntax
rS,d(rA)
rS,rA,rB
MnemonicNameAddressing Mode
stwStore WordRegister-indirect with immediate index
EA = (rA|0) + d
stwuStore Word with UpdateRegister-indirect with immediate index
EA = (rA) + d
rA ← EA
rA ≠ 0
stwxStore Word IndexedRegister-indirect with index
EA = (rA|0) + (rB)
stwuxStore Word with Update IndexedRegister-indirect with index
EA = (rA) + (rB)
rA ← EA
rA ≠ 0
Load and Store with Byte-Reverse Instructions
Ta bl e 3 -2 1 lists the PowerPC load and store with byte-reverse instructions. Figure 3-20 shows
(using big-endian memory) how bytes are moved between memory and the GPRs for each
of the byte-reverse instructions. When an lhbrx instruction is executed, the unloaded bytes
in rD are cleared to 0.
When used in a system operating with the default big-endian byte order, these instructions
have the effect of loading and storing data in little-endian order. Likewise, when used in a
system operating with little-endian byte order, these instructions have the effect of loading
Operand
Syntax
rS,d(rA)
rS,rA,rB
March 2002 Releasewww.xilinx.com385Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
and storing data in big-endian order. For more information about big-endian and littleendian byte ordering, see Byte Ordering, page 349.
Table 3-21: Load and Store with Byte-Reverse Instructions
Chapter 3: User Programming Model
MnemonicNameAddressing Mode
lhbrxLoad Halfword Byte-Reverse Indexed Register-indirect with index
lwbrxLoad Word Byte-Reverse Indexed
sthbrxStore Halfword Byte-Reverse IndexedRegister-indirect with index
stwbrxStore Word Byte-Reverse Indexed
lwbrx
Memory Word
03124816
Byte 1Byte 2Byte 3Byte 0
031
Byte 2Byte 1Byte 0Byte 3
rD
24816
EA = (rA|0) + (rB)
EA = (rA|0) + (rB)
Big-Endian
Little-Endian
stwbrx
Memory Word
03124816
Byte 2Byte 1Byte 0Byte 3
031
Byte 1Byte 2Byte 3Byte 0
rS
Operand
Syntax
rD,rA,rB
rS,rA,rB
24816
lhbrx
Memory Halfword
0815
Byte 1Byte 0
03124816
0000_0000Byte 1Byte 00000_0000
rD
Figure 3-20:Load and Store with Byte-Reverse Instructions
Load and Store Multiple Instructions
Ta bl e 3 -2 2 lists the PowerPC load and store multiple instructions and their operation.
Figure 3-21 shows how bytes are moved between memory and the GPRs for each of these
instructions.
These instructions are used to move blocks of data between memory and the GPRs. When
the load multiple word instruction (lmw) is executed, rD through r31 are loaded with n
consecutive words from memory, where n=32-rD. For the lmw instruction, if rA is in the
range of registers to be loaded, or if rD=0, the instruction form is invalid. When the store multiple word instruction (stmw) is executed, the n consecutive words in rS through r31 are
stored into memory, where n=32-rS.
Table 3-22: Load and Store Multiple Instructions
R
MnemonicNameAddressing Mode
lmwLoad Multiple WordRegister-indirect with immediate index
EA = (rA|0) + d
stmwStore Multiple WordRegister-indirect with immediate index
EA = (rA|0) + d
lmw
EA
EA + 4(n-1)
Word 0
. . .
Word n-1
Word n-1
MemoryGPRs
Word 0
. . .
Operand
Syntax
rD,d(rA)
rS,d(rA)
r0
. . .
rD
. . .
r31
r0
. . .
rD
. . .
r31
Word 0
. . .
Word n-1
Figure 3-21:Load and Store Multiple Instructions
Load and Store String Instructions
Ta bl e 3 -2 3 lists the PowerPC load and store string instructions and their addressing modes.
See the individual instruction listings in Chapter 11, Instruction Set for more information
on their operation and restrictions on the instruction forms.
stmw
Word 0
. . .
Word n-1
MemoryGPRs
EA
EA + 4(n-1)
UG011_05_033101
March 2002 Releasewww.xilinx.com387Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 3-23: Load and Store String Instructions
Chapter 3: User Programming Model
MnemonicNameAddressing Mode
lswiLoad String Word ImmediateRegister-indirect
EA = (rA|0)
lswxLoad String Word IndexedRegister-indirect with index
EA = (rA|0) + (rB)
stswiStore String Word ImmediateRegister-indirect
EA = (rA|0)
stswxStore String Word IndexedRegister-indirect with index
EA = (rA|0) + (rB)
These instructions are used to move up to 32 consecutive bytes of data between memory
and the GPRs without concern for alignment. The instructions can be used for short moves
between arbitrary memory locations or for long moves between misaligned memory
fields. Performance of these instructions is degraded if the leading and/or trailing bytes
are not aligned on a word boundary (see Performance Effects of Operand Alignment,
page 353 for more information).
The immediate form of the instructions take the byte count, n, from the NB instruction
field. If NB=0, then n=32. The indexed forms take the byte count from XER[25:31]. Unlike
the immediate forms, if XER[25:31]=0, then n=0. For the lswx instruction, the contents of rD are undefined if n=0.
The n bytes are loaded into and stored from registers beginning with the most-significant
register byte. For loads, any unfilled low-order register bytes are cleared to 0. The sequence
of registers loaded or stored wraps through r0 if necessary. Figure 3-22 shows an example
of the string-instruction operation.
Integer instructions operate on the contents of GPRs. They use the GPRs (and sometimes
immediate values coded in the instruction) as source operands. Results are written into
GPRs. These instructions do not operate on memory locations. Integer instructions treat
the source operands as signed integers unless the instruction is explicitly identified as
performing an unsigned operation. For example, the multiply high-word unsigned (mulhwu)
and divide-word unsigned (divwu) instructions interpret both operands as unsigned
integers.
The following types of integer instructions are supported by the PowerPC architecture:
•Arithmetic Instructions
•Logical Instructions
•Compare Instructions
•Rotate Instructions
•Shift Instructions
The arithmetic, shift, and rotate instructions can update and/or read bits from the XER.
Those instructions, plus the integer-logical instructions, can also update bits in the CR.
Unless otherwise noted, when XER and/or CR are updated, they reflect the value written
MemoryGPRs
UG011_06_033101
Figure 3-22:Load and Store String Instructions
March 2002 Releasewww.xilinx.com389Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
to the destination register. XER and CR can be updated by the integer instructions in the
following ways:
•The XER[CA] bit is updated to reflect the carry out of bit 0 in the result.
•The XER[OV] bit is set or cleared to reflect a result overflow. When XER[OV] is set,
XER[SO] is also set to reflect a summary overflow. XER[SO] can only be cleared using
the mtspr and mcrxr instructions. Instructions that update these bits have the
overflow-enable (OE) bit set to 1 in the instruction encoding. This is indicated by the
“o” suffix in the instruction mnemonic.
•Bits in CR0 (CR[0:3]) are updated to reflect a signed comparison of the result to zero.
Instructions that update CR0 have the record (Rc) bit set to 1 in the instruction
encoding. This is indicated by the “.” suffix in the instruction mnemonic. See CR0
Field, page 361, for information on how these bits are updated.
Instructions that update XER[OV] or XER[CA] can delay the execution of subsequent
instructions. See Fixed-Point Exception Register (XER), page 363 for more information on
these register bits.
Arithmetic Instructions
The integer-arithmetic instructions support addition, subtraction, multiplication, and
division between operands in the GPRs and in some cases between GPRs and signedimmediate values.
Chapter 3: User Programming Model
Integer-Addition Instructions
Ta bl e 3 -2 4 shows the PowerPC integer-addition instructions. The instructions in this table
are grouped by the type of addition operation they perform. For each type of instruction
shown, the “Operation” column indicates the addition-operation performed, and on an
instruction-by-instruction basis, how the XER and CR registers are updated (if at all).
“SIMM” indicates an immediate value that is sign-extended prior to being used in the
operation.
The add-extended instructions can be used to perform addition on integers larger than 32
bits. For example, assume a 64-bit integer i is represented by the register pair r3:r4, where
r3 contains the most-significant 32 bits of i, and r4 contains the least-significant 32 bits. The
64-bit integer j is similarly represented by the register pair r5:r6. The 64-bit result i+j=r
(represented by the pair r7:r8) is produced by pairing adde with addc as follows:
addcr8,r6,r4! Add the least-significant words and record a
! carry.
adder7,r5,r3! Add the most-significant words, using
! previous carry.
Table 3-24: Integer-Addition Instructions
MnemonicNameOperation
Add Instructions
addAdd
rD is loaded with the sum (rA) + (rB).
XER and CR0 are
not updated.
Operand
Syntax
rD,rA,rB
add.Add and Record
addoAdd with Overflow Enabled
addo.Add with Overflow Enabled and
Record
390www.xilinx.comMarch 2002 Release
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
XER[CA] and CR0 are updated to reflect the result.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect
the result.
Integer-Subtraction Instructions
Ta bl e 3 -2 5 shows the PowerPC integer-subtraction instructions. The instructions in this table
are grouped by the type of subtraction operation they perform. For each type of instruction
shown, the “Operation” column indicates the subtraction-operation performed. The
column also shows, on an instruction-by-instruction basis, how the XER and CR registers
are updated (if at all). The subtraction operation is expressed as addition so that the two’s-
complement operation is clear. “SIMM” indicates an immediate value that is signextended prior to being used in the operation.
The integer-subtraction instructions subtract the second operand (rA) from the third
operand (rB). Simplified mnemonics are provided with a more familiar operand ordering,
whereby the third operand is subtracted from the second. Simplified mnemonics are also
defined for the addi instruction to provide a subtract-immediate operation. See Subtract
Instructions, page 831 for more information.
The subtract-from extended instructions can be used to perform subtraction on integers
larger than 32 bits. For example, assume a 64-bit integer i is represented by the register pair
r3:r4, where r3 contains the most-significant 32 bits of i, and r4 contains the least-significant
32 bits. The 64-bit integer j is similarly represented by the register pair r5:r6. The 64-bit
result i−j=r (represented by the pair r7:r8) is produced by pairing subfe with subfc as
follows:
subfc r8,r6,r4! Subtract the least-significant words and record a
! carry.
subfe r7,r5,r3! Subtract the most-significant words, using
! previous carry.
Operand
Syntax
rD,rA
Table 3-25: Integer-Subtraction Instructions
MnemonicNameOperation
Subtract-From Instructions
subfSubtract from
subf.Subtract from and Record
subfoSubtract from with Overflow Enabled
subfo.Subtract from with Overflow Enabled
and Record
392www.xilinx.comMarch 2002 Release
rD is loaded with the sum ¬(rA) + (rB) + 1.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
XER[CA] and CR0 are updated to reflect the result.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect
the result.
Subtract-From Immediate InstructionsrD is loaded with the sum ¬(rA) + SIMM + 1.
subficSubtract from Immediate Carrying
XER[CA] is updated to reflect the result.
Subtract-From Extended InstructionsrD is loaded with the sum ¬(rA) + (rB) + XER[CA].
subfeSubtract from Extended
subfe.Subtract from Extended and Record
subfeoSubtract from Extended with
Overflow Enabled
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the result.
XER[CA,OV,SO] are updated to reflect the result.
Operand
Syntax
rD,rA,rB
rD,rA,SIMM
rD,rA,rB
subfeo.Subtract from Extended with
Overflow Enabled and Record
XER[CA,OV,SO] and CR0 are updated to reflect
the result.
Subtract-From Minus-One-Extended InstructionsrD is loaded with the sum ¬(rA) + XER[CA] + 0xFFFF_FFFF.
subfmeSubtract from Minus One Extended
subfme.Subtract from Minus One Extended
and Record
subfmeoSubtract from Minus One Extended
with Overflow Enabled
subfmeo.Subtract from Minus One Extended
with Overflow Enabled and Record
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the result.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect
the result.
rD,rA
Subtract-From Zero-Extended InstructionsrD is loaded with the sum ¬(rA) + XER[CA].
subfzeSubtract from Zero Extended
subfze.Subtract from Zero Extended and
Record
subfzeoSubtract from Zero Extended with
Overflow Enabled
subfzeo.Subtract from Zero Extended with
Overflow Enabled and Record
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the result.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect
the result.
rD,rA
Negation Instructions
Ta bl e 3 -2 6 shows the PowerPC integer-negation instructions. Negation takes the operand
specified by rA and writes the two’s-compliment equivalent in rD. For each instruction
shown, the “Operation” column indicates (on an instruction-by-instruction basis) how the
XER and CR registers are updated (if at all).
March 2002 Releasewww.xilinx.com393Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 3-26: Negation Instructions
Chapter 3: User Programming Model
MnemonicNameOperation
Negation Instructions
negNegate
neg.Negate and Record
negoNegate with Overflow Enabled
nego.Negate with Overflow Enabled and
Record
rD is loaded with the sum ¬(rA) + 1.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
not updated.
Multiply Instructions
Ta bl e 3 -2 7 shows the PowerPC integer-multiply instructions. Multiplication of two 32-bit
values can result in a 64-bit result. The multiply low-word instructions are used with the
multiply high-word instructions to calculate the full 64-bit product. For each type of
instruction shown, the “Operation” column indicates the multiplication-operation
performed. The column also shows, on an instruction-by-instruction basis, how the XER
and CR registers are updated (if at all). “SIMM” indicates an immediate value that is signextended prior to being used in the operation.
Table 3-27: Multiply Instructions
MnemonicNameOperation
Operand
Syntax
rD,rA
Operand
Syntax
Multiply Low-Word Instructions
mullwMultiply Low Word
mullw.Multiply Low Word and Record
mullwoMultiply Low Word with Overflow
Enabled
mullwo.Multiply Low Word with Overflow
Enabled and Record
rD is loaded with the low-32 bits of the product (rA) × (rB).
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
not updated.
rD,rA,rB
Multiply Low-Word Immediate InstructionsrD is loaded with the low-32 bits of the product (rA) × SIMM.
mulliMultiply Low Immediate
XER and CR0 are
not updated.
rD,rA,SIMM
Multiply High-Word InstructionsrD is loaded with the high-32 bits of the product (rA) × (rB).
mulhwMultiply High Word
mulhw.Multiply High Word and Record
XER and CR0 are
CR0 is updated to reflect the result.
not updated.
rD,rA,rB
Multiply High-Word Unsigned InstructionsrD is loaded with the high-32 bits of the product (rA) × (rB). The
contents of rA and rB are interpreted as unsigned integers.
Ta bl e 3 -2 8 shows the PowerPC integer-divide instructions. Only the low-32 bits of the
quotient are returned. The remainder is not supplied as a result of executing these
instructions. For each type of instruction shown, the “Operation” column indicates the
divide-operation performed. The column also shows, on an instruction-by-instruction
basis, how the XER and CR registers are updated (if at all).
Table 3-28: Divide Instructions
R
MnemonicNameOperation
Divide-Word Instructions
divwDivide Word
divw.Divide Word and Record
divwoDivide Word with Overflow Enabled
divwo.Divide Word with Overflow Enabled
and Record
rD is loaded with the low-32 bits of the 64-bit quotient (rA) ÷ (rB).
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
not updated.
Operand
Syntax
rD,rA,rB
Divide-Word Unsigned InstructionsrD is loaded with the low-32 bits of the 64-bit quotient (rA) ÷ (rB).
The contents of rA and rB are interpreted as unsigned integers.
divwuDivide Word Unsigned
divwu.Divide Word Unsigned and Record
divwuoDivide Word Unsigned with Overflow
Enabled
divwuo.Divide Word Unsigned with Overflow
Enabled and Record
XER and CR0 are not updated.
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
rD,rA,rB
Logical Instructions
The logical instructions perform bit operations on the 32-bit operands. If an immediate
value is specified as an operand, the processor either zero-extends or left-shifts it prior to
performing the operation, depending on the instruction. If the instruction has the record
(Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of
the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic.
The logical instructions do not update any bits in the XER register.
In the operand syntax for logical instructions, the rA operand specifies a destination register
rather than a source register. rS is used to specify one of the source registers.
AND and NAND Instructions
Ta bl e 3 -2 9 shows the PowerPC AND and NAND instructions. For each type of instruction
shown, the “Operation” column indicates the Boolean operation performed. The column
also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
March 2002 Releasewww.xilinx.com395Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 3-29: AND and NAND Instructions
Chapter 3: User Programming Model
MnemonicNameOperation
AND Instructions
andAND
and.AND and Record
AND-Immediate Instructions
andi.AND Immediate and Record
rA is loaded with the logical result (rS) AND (rB).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) AND UIMM.
CR0 is updated to reflect the result.
Operand
Syntax
rA,rS,rB
rA,rS,UIMM
AND Immediate-Shifted InstructionsrA is loaded with the logical result (rS) AND (UIMM || 0x0000)
andis.AND Immediate Shifted and Record
CR0 is updated to reflect the result.
rA,rS,UIMM
AND with Complement InstructionsrA is loaded with the logical result (rS) AND ¬(rB).
andcAND with Complement
andc.AND with Complement and Record
not updated.
CR0 is
CR0 is updated to reflect the result.
rA,rS,rB
NAND InstructionsrA is loaded with the logical result ¬((rS) AND (rB)).
nandNAND
nand.NAND and Record
not updated.
CR0 is
CR0 is updated to reflect the result.
rA,rS,rB
OR and NOR Instructions
Ta bl e 3 -3 0 shows the PowerPC OR and NOR instructions. For each type of instruction
shown, the “Operation” column indicates the Boolean operation performed. The column
also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
Simplified mnemonics are provided for some common operations that use the OR and
NOR instructions, such as move register and complement (not) register. See Other
Simplified Mnemonics, page 834 for more information.
Table 3-30: OR and NOR Instructions
MnemonicNameOperation
NOR Instructions
norNOR
nor.NOR and Record
OR Instructions
orOR
or.OR and Record
OR-Immediate Instructions
rA is loaded with the logical result ¬((rS) OR (rB)).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) OR (rB).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) OR UIMM.
rA is loaded with the logical result (rS) OR (UIMM || 0x0000)
CR0 is
not updated.
OR with Complement InstructionsrA is loaded with the logical result (rS) OR ¬(rB).
orcOR with Complement
orc.OR with Complement and Record
not updated.
CR0 is
CR0 is updated to reflect the result.
XOR and Equivalence Instructions
Ta bl e 3 -3 1 shows the PowerPC XOR and equivalence (XNOR) instructions. For each type of
instruction shown, the “Operation” column indicates the Boolean operation performed.
The column also shows, on an instruction-by-instruction basis, whether the CR0 field is
updated.
Table 3-31: XOR and Equivalence Instructions
MnemonicNameOperation
Equivalence Instructions
rA is loaded with the logical result ¬((rS) XOR (rB)).
Operand
Syntax
rA,rS,UIMM
rA,rS,rB
Operand
Syntax
eqvEquivalent
eqv.Equivalent and Record
XOR Instructions
xorXOR
xor.XOR and Record
XOR-Immediate Instructions
xoriXOR Immediate
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) XOR (rB).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) XOR UIMM.
not updated.
CR0 is
rA,rS,rB
rA,rS,rB
rA,rS,UIMM
XOR Immediate-Shifted InstructionsrA is loaded with the logical result (rS) XOR (UIMM || 0x0000)
xorisXOR Immediate Shifted
CR0 is
not updated.
rA,rS,UIMM
Sign-Extension Instructions
Ta bl e 3 -3 2 shows the sign-extension instructions. These instructions sign-extend the value
in the rS register and write the result in the rA register. For each type of instruction shown,
the “Operation” column indicates the operation performed. The column also shows, on an
instruction-by-instruction basis, whether the CR0 field is updated.
March 2002 Releasewww.xilinx.com397Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Table 3-32: Sign-Extension Instructions
Chapter 3: User Programming Model
MnemonicNameOperation
Extend-Sign Byte Instructions
extsbExtend Sign Byte
extsb.Extend Sign Byte and Record
Extend-Sign Halfword Instructions
extshExtend Sign Halfword
extsh.Extend Sign Halfword and Record
rA[24:31] is loaded with (rS[24:31]). The remaining bits rA[0:23] are each loaded with a copy of (rS[24]).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA[16:31] is loaded with (rS[16:31]). The remaining bits rA[0:15] are
each loaded with a copy of (rS[16]).
not updated.
CR0 is
CR0 is updated to reflect the result.
Count Leading-Zeros Instructions
Ta bl e 3 -3 3 shows the count leading-zeros instructions. These instructions count the number
of consecutive zero bits in the rS register starting at bit 0. The count result is written to the
rA register. For each type of instruction shown, the “Operation” column indicates the
operation performed. The column also shows, on an instruction-by-instruction basis,
whether the CR0 field is updated.
Table 3-33: Count Leading-Zeros Instructions
Operand
Syntax
rA,rS
rA,rS
MnemonicNameOperation
Count Leading-Zeros Instructions
cntlzwCount Leading Zeros Word
cntlzw.Count Leading Zeros Word and
Record
rA is loaded with a count of leading zeros in rS.
not updated.
CR0 is
CR0 is updated to reflect the result. CR0[LT] is always cleared to 0.
Compare Instructions
The integer-compare instructions support algebraic and logical comparisons between
operands in the GPRs and between GPRs and immediate values. Immediate values are
signed in algebraic comparisons and unsigned in logical comparisons.
All compare instructions have four operands. The first operand, crfD, specifies the field in
the CR register that is updated with the comparison result. The left-most three bits in the
CR field are updated to reflect a less-than, greater-than, or equal comparison. The fourth
(least-significant) bit is updated with a copy of XER[SO]. The crfD operand can be omitted
if the comparison results are written to CR0. See CRn Fields (Compare Instructions),
page 362 for more information on the CR fields.
The second operand specifies the operand length. This is referred to the “L” bit in the
compare-instruction encoding. When using the compare instructions on 32-bit PowerPC
implementations like the PPC405, this bit must always be coded as 0. It cannot be omitted
from the standard instruction syntax. Simplified mnemonics are provided that omit this
operand. See Compare Instructions, page 828 for more information.
The last two operands specify the quantities to be compared (the contents of a register and
a register or immediate value).
Ta bl e 3 -3 4 shows the PowerPC algebraic-comparison instructions. During comparison, both
operands are treated as signed integers. If a comparison is made with a signed-immediate
value (SIMM), that value is sign-extended by the processor prior to performing the
comparison.
Table 3-34: Algebraic-Comparison Instructions
R
MnemonicNameOperation
cmpComparecrfD[LT,GT,EQ] are loaded with the result of
algebraically comparing (rA) with (rB). CR[SO] is
loaded with a copy of XER[SO].
cmpiCompare ImmediatecrfD[LT,GT,EQ] are loaded with the result of
algebraically comparing (rA) with SIMM. CR[SO]
is loaded with a copy of XER[SO].
Logical-Comparison Instructions
Ta bl e 3 -3 5 shows the PowerPC logical-comparison instructions. During comparison, both
operands are treated as unsigned integers. If a comparison is made with an unsignedimmediate value (UIMM), that value is zero extended by the processor prior to performing
the comparison.
Table 3-35: Logical-Comparison Instructions
MnemonicNameOperation
cmplCompare LogicalcrfD[LT,GT,EQ] are loaded with the result of
logically comparing (rA) with (rB). CR[SO] is
loaded with a copy of XER[SO].
Operand
Syntax
crfD,0,rA,rB
crfD,0,rA,SIMM
Operand
Syntax
crfD,0,rA,rB
cmpliCompare Logical ImmediatecrfD[LT,GT,EQ] are loaded with the result of
logically comparing (rA) with UIMM. CR[SO] is
loaded with a copy of XER[SO].
Rotate Instructions
Rotate instructions operate on 32-bit data in the GPRs, returning the result in a second
GPR. These instructions rotate data to the left—the direction of least-significant bit to mostsignificant bit. Bits rotated out of the most-significant bit (bit 0) are rotated into the leastsignificant bit (bit 31). Programmers can achieve apparent right rotation using these leftrotation instructions by specifying a rotation amount of 32-n, where n is the number of bits
to rotate right.
If the rotate instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0
(CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.”
suffix in the instruction mnemonic. Rotate instructions do not update any bits in the XER
register.
In the operand syntax for rotate instructions, the rA operand specifies the destination
register rather than a source register. rS is used to specify the source register.
Simplified mnemonics using the rotate instructions are provided for easy coding of
extraction, insertion, left or right justification, and other bit-manipulation operations. See
Rotate and Shift Instructions, page 829 for more information.
crfD,0,rA,UIMM
March 2002 Releasewww.xilinx.com399Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Mask Generation
The rotate instructions write their results into the destination register under the control of
a mask specified in the rotate-instruction encoding. The mask is used to write or insert a
partial result into the destination register.
Rotate masks are 32-bits long. Two instruction-opcode fields are used to specify the mask:
MB and ME. MB is a 5-bit field specifying the starting bit position of the mask and ME is a
5-bit field specifying the ending bit position of the mask. The mask consists of all 1’s from
MB to ME inclusive and all 0’s elsewhere. If MB > ME, the string of 1’s wraps around from
bit 31 to bit 0. In this case, 0’s are found from ME to MB exclusive. The generation of an all-
zero mask is not possible.
The function of the MASK(MB,ME) generator is summarized as:
Figure 3-23 shows the generated mask for both cases.
Ta bl e 3 -3 6 shows the PowerPC rotate left then AND-with-mask instructions. For each type of
instruction shown, the “Operation” column indicates the rotate operation performed. The
column also shows, on an instruction-by-instruction basis, whether the CR0 field is
updated.
Table 3-36: Rotate Left then AND-with-Mask Instructions
MnemonicNameOperation
Rotate Left then AND-with-Mask Immediate
Instructions
rA is loaded with the masked result of left-rotating (rS) the number of
bits specified by SH. The mask is specified by operands MB and ME.
Table 3-36: Rotate Left then AND-with-Mask Instructions (Continued)
R
MnemonicNameOperation
Rotate Left then AND-with-Mask Instructions
rlwnmRotate Left Word then AND with
Mask
rlwnm.Rotate Left Word then AND with
Mask and Record
rA is loaded with the masked result of left-rotating (rS) the number of
bits specified by (rB). The mask is specified by operands MB and ME.
not updated.
CR0 is
CR0 is updated to reflect the result.
These instructions left rotate GPR contents and logically AND the result with the mask
prior to writing it into the destination GPR. The destination register contains the rotated
result in the unmasked bit positions (mask bits with 1’s), and 0’s in the masked bit
positions (mask bits with 0’s). Rotation amounts are specified using an immediate field in
the instruction (the SH opcode field) or using a value in a register.
Figure 3-24 shows an example of a rotate left then AND-with-mask immediate instruction.
In this example, the rotation amount is 16 bits as specified by the SH field in the instruction.
The mask specifies an unmasked byte in bit positions 16:23 (MB=16, ME=23) and masks all
other bit positions. The example shows the original contents of the destination register, rA,
and the source register, rS. rS is left-rotated 16 bits and the result is written to rA after
ANDing with the mask. This has the effect of extracting byte 0 from rS (rS[0:7]) and placing
it in byte 2 of rA (rA[16:23]).
Operand
Syntax
rA,rS,rB,MB,ME
031
rA
rS
Rotate
rS
Mask
MB=16
ME=23
rA
0xFF0xEE0xDD0xCC
031
0x880x770x660x55
031
0x660x550x880x77
Rotate by SH=16 bits
0162331
0000_0000_0000_0000
031
0x000x000x880x00
1111_1111
0000_0000
UG011_16_033101
Figure 3-24: Rotate Left then AND-with-Mask Immediate Example
Rotate Left then Mask-Insert Instructions
Ta bl e 3 -3 6 shows the PowerPC rotate left then mask-insert instructions. For each type of
instruction shown, the “Operation” column indicates the rotate operation performed. The
March 2002 Releasewww.xilinx.com401Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
column also shows, on an instruction-by-instruction basis, whether the CR0 field is
updated.
Table 3-37: Rotate Left then Mask-Insert Instructions
Chapter 3: User Programming Model
MnemonicNameOperation
Rotate Left then Mask-Insert Immediate
Instructions
rlwimiRotate Left Word Immediate then
Mask Insert
rlwimi.Rotate Left Word Immediate then
Mask Insert and Record
The masked result of left-rotating (rS) the number of bits specified by
SH is inserted into rA. The mask is specified by operands MB and ME.
not updated.
CR0 is
CR0 is updated to reflect the result.
These instructions left rotate GPR contents and insert the results into the destination GPR
under control of the mask. The destination register contains the rotated result in the
unmasked bit positions (mask bits with 1’s) and the original contents of the destination
register in the masked bit positions (mask bits with 0’s). Rotation amounts are specified
using an immediate field in the instruction (the SH opcode field).
Figure 3-25 shows an example of a rotate left then mask-insert immediate instruction. In
this example, the rotation amount is 16 bits as specified by the SH field in the instruction.
The mask specifies an unmasked byte in bit positions 16:23 (MB=16, ME=23) and masks all
other bit positions. The example shows the original contents of the destination register, rA,
and the source register, rS. rS is rotated 16 bits and the result is inserted into rA after
ANDing with the mask. This has the effect of extracting byte 0 from rS (rS[0:7]) and
inserting it into byte 2 of rA (rA[16:23]), leaving all remaining bytes in rA unmodified.
Operand
Syntax
rA,rS,SH,MB,ME
031
rA
rS
Rotate
rS
Mask
MB=16
ME=23
rA
0xFF0xEE0xDD0xCC
031
0x880x770x660x55
031
0x660x550x880x77
Rotate by SH=16 bits
0162331
0000_0000_0000_0000
031
0xFF0xEE0x880xCC
1111_1111
0000_0000
UG011_17_033101
Figure 3-25:Rotate Left then Mask-Insert Immediate Example
Shift instructions operate on 32-bit data in the GPRs and return the result in a GPR. Both
logical and algebraic shifts are provided:
•Logical left-shift instructions shift bits from the direction of least-significant bit to most-
significant bit. Bits shifted out of bit 0 are lost. The vacated bit positions on the right
are filled with zeros.
•Logical right-shift instructions shift bits from the direction of most-significant bit to
least-significant bit. Bits shifted out of bit 31 are lost. The vacated bit positions on the
left are filled with zeros.
•Algebraic right-shift instructions shift bits from the direction of most-significant bit to
least-significant bit. Bits shifted out of bit 31 are lost. The vacated bit positions on the
left are filled with a copy of the original bit 0 (the value prior to starting the shift).
If the shift instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0
(CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.”
suffix in the instruction mnemonic. Algebraic right-shift instructions update XER[CA] to
reflect the result of the operation but the other shift instructions do not modify XER[CA].
XER[OV,SO] are not modified by any shift instructions.
In the operand syntax for shift instructions, the rA operand specifies the destination register
rather than a source register. rS is used to specify the source register.
Simplified mnemonics using the rotate instructions are provided for coding of logical shiftleft immediate and logical shift-right immediate operations. See Rotate and Shift
Instructions, page 829 for more information.
Logical-Shift Instructions
Ta bl e 3 -3 8 shows the PowerPC logical-shift instructions. For each type of instruction shown,
the “Operation” column indicates the shift operation performed. The column also shows,
on an instruction-by-instruction basis, whether the CR0 field is updated. XER is not
updated by these instructions.
Table 3-38: Logical-Shift Instructions
MnemonicNameOperation
Shift-Left-Logical Instructions
slwShift Left Word
slw.Shift Left Word and Record
Shift-Right-Logical Instructions
srwShift Right Word
srw.Shift Right Word and Record
Figure 3-26 shows two examples of logical-shift operations. The top example shows a left
shift of seven bits, and the bottom example shows a right shift of seven bits. As is seen in
these examples, bits shifted out of the register are lost and vacated bits are filled with zeros.
rA is loaded with the result of logically left-shifting (rS) the number
of bits specified by (rB).
CR0 is not updated.
CR0 is updated to reflect the result.
rA is loaded with the result of logically right-shifting (rS) the
number of bits specified by (rB).
CR0 is not updated.
CR0 is updated to reflect the result.
Operand
Syntax
rA,rS,rB
rA,rS,rB
March 2002 Releasewww.xilinx.com403Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
Chapter 3: User Programming Model
Left Shift
031
rS
1000_011
rA
1000_0111_0110_0101_0100_0011_0010_0001
031
1011_0010_1010_0001_1001_0000_1000_0000
031
1011_0010_1010_0001_1001_0000_1000_0000
Shift by 7 bits
Right Shift
031
rS
Shift by 7 bits
rA
1000_0111_0110_0101_0100_0011_0010_0001
031
0000_0001_0000_1110_1100_1010_1000_0110010_0001
031
0000_0001_0000_1110_1100_1010_1000_0110
Figure 3-26:Logical-Shift Examples
Algebraic-Shift Instructions
Ta bl e 3 -3 9 shows the PowerPC algebraic-shift instructions. For each type of instruction
shown, the “Operation” column indicates the shift operation performed. The column also
shows, on an instruction-by-instruction basis, whether the CR0 field is updated. XER[CA]
is always updated by these instructions to reflect the result.
The shift-right-algebraic instructions can be followed by an addze instruction to
implement a divide-by-2
information.
Table 3-39: Algebraic-Shift Instructions
MnemonicNameOperation
Shift-Right-Algebraic Immediate Instructions
srawiShift Right Algebraic Word Immediate
srawi.Shift Right Algebraic Word Immediate
and Record
n
operation. See Multiple-Precision Shifts, page 840, for more
rA is loaded with the result of algebraically right-shifting (rS) the
number of bits specified by SH.
CR0 is not updated. XER[CA] is updated to reflect
the result.
CR0 and XER[CA] are updated to reflect the result.
rA is loaded with the result of algebraically right-shifting (rS) the
number of bits specified by (rB).
not updated. XER[CA] is updated to reflect
CR0 is
the result.
CR0 and XER[CA] are updated to reflect the result.
Figure 3-27 shows an example of an algebraic-shift operation. In this example, a shift of
seven bits is performed. Bits shifted out of the least-significant register bit are lost and
vacated bits on the left side are filled with a copy of the original bit 0 (prior to the shift). In
this example, the original value of bit 0 is 0b1.
031
rS
Shift by 7 bits
rA
1000_0111_0110_0101_0100_0011_0010_0001
031
1111_1111_0000_1110_1100_1010_1000_0110010_0001
031
1111_1111_0000_1110_1100_1010_1000_0110
Operand
Syntax
rA,rS,rB
Figure 3-27:Algebraic-Shift Example
Multiply-Accumulate Instruction-Set Extensions
The PPC405 supports an integer multiply-accumulate instruction-set extension that provides
functions usable by certain computationally intensive applications, such as those that
implement DSP algorithms. These instructions comply with the architectural requirements
for auxiliary-processor units (APUs) defined by the PowerPC embedded-environment
architecture and the PowerPC Book-E architecture. They are considered implementationdependent instructions and are not part of the PowerPC architecture, the PowerPC
embedded-environment architecture, or the PowerPC Book-E architecture. Programs that
use these instructions are not portable to all PowerPC implementations.
The multiply-accumulate instruction-set extensions include multiply-accumulate
instructions, negative multiply-accumulate instructions, and multiply-halfword
instructions.
Modulo and Saturating Arithmetic
The multiply-accumulate and negative multiply-accumulate instructions produce a 33-bit
intermediate result. The method used to store this result in the 32-bit destination register
depends on whether the instruction performs modulo arithmetic or saturating arithmetic.
With modulo-arithmetic instructions, the most-significant bit in the intermediate result is
discarded and the low-32 bits of this result are stored in the destination register.
With saturating-arithmetic instructions, the low 32-bits of the intermediate result are
stored in the destination register if the intermediate result does not overflow 32-bits.
However, if the intermediate result overflows what is representable in 32-bits, the
UG011_19_033101
March 2002 Releasewww.xilinx.com405Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
instruction loads the nearest representable value into the destination register. For the
various instruction forms, these results are:
•Signed arithmetic—if the result exceeds 2
the destination register with 2
•Signed arithmetic—if the result is less than −2
the destination register with −2
•Unsigned arithmetic—if the result exceeds 2
loads the destination register with 2
Multiply-Accumulate Instructions
Multiply-Accumulate Cross-Halfword to Word Instructions
Ta bl e 3 -4 0 shows the PPC405 integer multiply-accumulate cross-halfword to word instructions.
These instructions take the lower halfword of the first source operand (rA[16:31]) and
multiply it with the upper halfword of the second source operand (rB[0:15]), producing a
32-bit product. The product is signed or unsigned, depending on the instruction. This
product is added to the value in the destination register, rD, producing a 33-bit
intermediate result. Generally, rD is loaded with the lower-32 bits of the 33-bit
intermediate result. However, if the instruction performs saturating arithmetic and the
intermediate result overflows, rD is loaded with the nearest representable value (see
Modulo and Saturating Arithmetic, above).
For each type of instruction shown in Ta b le 3 - 40 , the “Operation” column indicates the
multiply-accumulate operation performed. The column also shows, on an instruction-byinstruction basis, how the XER and CR registers are updated (if at all).
Chapter 3: User Programming Model
31
−1 (> 0x7FFF_FFFF), the instruction loads
31
−1.
31
(< 0x8000_0000), the instruction loads
31
.
32
−1 (> 0xFFFF_FFFF), the instruction
32
−1.
Table 3-40: Multiply-Accumulate Cross-Halfword to Word Instructions
MnemonicNameOperation
Multiply-Accumulate Cross-Halfword to Word
Modulo Signed Instructions
macchwMultiply Accumulate Cross Halfword
to Word Modulo Signed
macchw.Multiply Accumulate Cross Halfword
to Word Modulo Signed and Record
macchwoMultiply Accumulate Cross Halfword
to Word Modulo Signed with
Overflow Enabled
macchwo.Multiply Accumulate Cross Halfword
to Word Modulo Signed with
Overflow Enabled and Record
rD is added to the signed product (rA[16:31]) × (rB[0:15]),
producing a 33-bit result. The low-32 bits of this result are stored in
rD.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
Table 3-40: Multiply-Accumulate Cross-Halfword to Word Instructions (Continued)
R
MnemonicNameOperation
Multiply-Accumulate Cross-Halfword to Word
Saturate Signed Instructions
macchwsMultiply Accumulate Cross Halfword
to Word Saturate Signed
macchws.Multiply Accumulate Cross Halfword
to Word Saturate Signed and Record
macchwsoMultiply Accumulate Cross Halfword
to Word Saturate Signed with
Overflow Enabled
macchwso.Multiply Accumulate Cross Halfword
to Word Saturate Signed with
Overflow Enabled and Record
Multiply-Accumulate Cross-Halfword to Word
Saturate Unsigned Instructions
macchwsuMultiply Accumulate Cross Halfword
to Word Saturate Unsigned
rD is added to the signed product (rA[16:31]) × (rB[0:15]),
producing a 33-bit result. If the result does not overflow, the low-32
bits of this result are stored in rD. Otherwise, the nearestrepresentable value is stored in rD.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
rD is added to the unsigned product (rA[16:31]) × (rB[0:15]),
producing a 33-bit result. If the result does not overflow, the low-32
bits of this result are stored in rD. Otherwise, the nearestrepresentable value is stored in rD.
XER and CR0 are
not updated.
not updated.
Operand
Syntax
rD,rA,rB
rD,rA,rB
macchwsu.Multiply Accumulate Cross Halfword
to Word Saturate Unsigned and
Record
macchwsuoMultiply Accumulate Cross Halfword
to Word Saturate Unsigned with
Overflow Enabled
macchwsuo.Multiply Accumulate Cross Halfword
to Word Saturate Unsigned with
Overflow Enabled and Record
Multiply-Accumulate Cross-Halfword to Word
Modulo Unsigned Instructions
macchwuMultiply Accumulate Cross Halfword
to Word Modulo Unsigned
macchwu.Multiply Accumulate Cross Halfword
to Word Modulo Unsigned and
Record
macchwuoMultiply Accumulate Cross Halfword
to Word Modulo Unsigned with
Overflow Enabled
macchwuo.Multiply Accumulate Cross Halfword
to Word Modulo Unsigned with
Overflow Enabled and Record
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
rD is added to the unsigned product (rA[16:31]) × (rB[0:15]),
producing a 33-bit result. The low-32 bits of this result are stored in
rD.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.
not updated.
rD,rA,rB
Figure 3-28 shows the operation of the integer multiply-accumulate cross-halfword to
word instructions.
March 2002 Releasewww.xilinx.com407Virtex-II Pro™ Platform FPGA Documentation1-800-255-7778
R
03116
Chapter 3: User Programming Model
rA
03115
rB
×
031
rD
032
1
+
Intermediate Result
031
rD
UG011_20_033101
Figure 3-28: Multiply-Accumulate Cross-Halfword to Word Operation
Multiply-Accumulate High-Halfword to Word Instructions
Ta bl e 3 -4 1 shows the PPC405 multiply-accumulate high-halfword to word instructions. These
instructions multiply the high halfword of both source operands, rA[0:15] and rB[0:15],
producing a 32-bit product. The product is signed or unsigned, depending on the
instruction. This product is added to the value in the destination register, rD, producing a
33-bit intermediate result. Generally, rD is loaded with the lower-32 bits of the 33-bit
intermediate result. However, if the instruction performs saturating arithmetic and the
intermediate result overflows, rD is loaded with the nearest representable value (see
Modulo and Saturating Arithmetic, page 405).
For each type of instruction shown in Ta b le 3 - 41 , the “Operation” column indicates the
multiply-accumulate operation performed. The column also shows, on an instruction-byinstruction basis, how the XER and CR registers are updated (if at all).
Table 3-41: Multiply-Accumulate High-Halfword to Word Instructions
MnemonicNameOperation
Multiply-Accumulate High-Halfword to Word
Modulo Signed Instructions
machhwMultiply Accumulate High Halfword
to Word Modulo Signed
rD is added to the signed product (rA[0:15]) × (rB[0:15]), producing
a 33-bit result. The low-32 bits of this result are stored in rD.
XER and CR0 are not updated.
Operand
Syntax
rD,rA,rB
machhw.Multiply Accumulate High Halfword
to Word Modulo Signed and Record
machhwoMultiply Accumulate High Halfword
to Word Modulo Signed with
Overflow Enabled
machhwo.Multiply Accumulate High Halfword
to Word Modulo Signed with
Overflow Enabled and Record
408www.xilinx.comMarch 2002 Release
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the
result.