Xilinx PPC405 User Manual

Volume 2(a): PPC405 User Manual
Virtex-II Pro™ Platform FPGA Developer’s Kit
March 2002 Release
R
R
The shadow X shown above is a trademark of Xilinx, Inc.
"Xilinx" and the Xilinx logo are registered trademarks of Xilinx, Inc. Any rights not expressly granted herein are reserved.
CoolRunner, RocketChips, Rocket IP, Spartan, StateBENCH, StateCAD, Virtex, XACT, XC2064, XC3090, XC4005, XC5210 are registered Trademarks of Xilinx, Inc.
ACE Controller, ACE Flash, A.K.A. Speed, Alliance Series, AllianceCORE, Bencher, ChipScope, Configurable Logic Cell, CORE Generator, CoreLINX, Dual Block, EZTag, Fast CLK, Fast CONNECT, Fast FLASH, FastMap, Fast Zero Power, Foundation, Gigabit Speeds...and Beyond!, HardWire, HDL Bencher, IRL, J Drive, JBits, LCA, LogiBLOX, Logic Cell, LogiCORE, LogicProfessor, MicroBlaze, MicroVia, MultiLINX, Nano­Blaze, PicoBlaze, PLUSASM, PowerGuide, PowerMaze, QPro, Real-PCI, Rocket I/O, SelectI/O, SelectRAM, SelectRAM+, Silicon Xpresso, Smartguide, Smart-IP, Smar tSearch, SMARTswitch, System ACE, Testbench In A Minute, TrueMap, UIM, VectorMaze, VersaBlock, VersaRing, Virtex-II Pro, Wave Table, WebFITTER, WebPACK, WebPOWERED, XABEL, XACT-Floorplanner, XACT-Performance, XACTstep Advanced, XACTstep Foundry, XAM, XAPP, X-BLOX +, XC designated products, XChecker, XDM, XEPLD, Xilinx Foundation Series, Xilinx XDTV, Xinfo, XSI, XtremeDSP and ZERO+ are trademarks of Xilinx, Inc.
The Programmable Logic Company is a service mark of Xilinx, Inc.
The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: IBM IBM Logo PowerPC PowerPC Logo Blue Logic CoreConnect CodePack
All other trademarks are the property of their respective owners.
Xilinx does not assume any liability arising out of the application or use of any product described or shown herein; nor does it convey any license under its patents, copyrights, or maskwork rights or any rights of others. Xilinx reserves the right to make changes, at any time, in order to improve reliability, function or design and to supply the best product possible. Xilinx will not assume responsibility for the use of any circuitry described herein other than circuitry entirely embodied in its products. Xilinx provides any design, code, or information shown or described herein "as is." By providing the design, code, or information as one possible implementation of a feature, application, or standard, Xilinx makes no rep­resentation that such implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of any such implementation, including but not limited to any warranties or representations that the implementation is free from claims of infringement, as well as any implied warranties of mer­chantability or fitness for a particular purpose. Xilinx assumes no obligation to correct any errors contained herein or to advise any user of this text of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or software support or assistance provided to a user.
Xilinx products are not intended for use in life support appliances, devices, or systems. Use of a Xilinx product in such applications without the written consent of the appropriate Xilinx officer is prohibited.
Copyright 2002 Xilinx, Inc. All Rights Reserved.
Virtex-II Pro™ Platform FPGA Developer’s Kit www.xilinx.com March 2002 Release
1-800-255-7778
R

About This Book

Preface
This document is intended to serve as a stand-alone reference for application and system programmers of the PowerPC following documents:
PowerPC 405 Embedded Processor Core User’s Manual published by IBM Corporation
(IBM order number SA14-2339-01).
The IBM PowerPC Embedded Environment Architectural Specifications for IBM PowerPC Embedded Controllers, published by IBM Corporation.
PowerPC Microprocessor Family: The Programming Environments published by IBM Corporation (IBM order number G522-0290-01).
IBM PowerPC Embedded Processors Application Note: PowerPC 400 Series Caches: Programming and Coherency Issues.
IBM PowerPC Embedded Processors Application Note: PowerPC 40x Watch Dog Timer.
IBM PowerPC Embedded Processors Application Note: Programming Model Differences
of the IBM PowerPC 400 Family and 600/700 Family Processors.

Document Organization

Chapter 1, Introduction to the PPC405, provides a general understanding of the
PPC405 as an implementation of the PowerPC embedded-environment architecture. This chapter also contains an overview of the features supported by the PPC405.
Chapter 2, Operational Concepts, introduces the processor operating modes,
execution model, synchronization, operand conventions, and instruction conventions.
Chapter 3, User Programming Model, describes the registers and instructions
available to application software.
Chapter 4, PPC405 Privileged-Mode Programming Model, introduces the registers
and instructions available to system software.
Chapter 5, Memory-System Management, describes the operation of the memory
system, including caches. Real-mode storage control is also described in this chapter.
Chapter 6, Virtual-Memory Management, describes virtual-to-physical address
translation as supported by the PPC405. Virtual-mode storage control is also described in this chapter.
Chapter 7, Exceptions and Interrupts, provides details of all exceptions recognized by
the PPC405 and how software can use the interrupt mechanism to handle exceptions.
Chapter 8, Timer Resources, describes the timer registers and timer-interrupt controls
available in the PPC405.
Chapter 9, Debugging, describes the debug resources available to software and
hardware debuggers.
Chapter 10, Reset and Initialization, describes the state of the PPC405 following reset
®
405D5 processor. It combines information from the
March 2002 Release www.xilinx.com 311 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Preface
and the requirements for initializing the processor.
Chapter 11, Instruction Set, provides a detailed description of each instruction
supported by the PPC405.
Appendix A, Register Summary, is a reference of all registers supported by the
PPC405.
Appendix B, Instruction Summary, lists all instructions sorted by mnemonic, opcode,
function, and form. Each entry for an instruction shows its complete encoding. General instruction-set information is also provided.
Appendix C, Simplified Mnemonics, lists the simplified mnemonics recognized by
many PowerPC assemblers. These mnemonics provide a shorthand means of specifying frequently-used instruction encodings and can greatly improve assembler code readability.
Appendix D, Programming Considerations, provides information on improving
performance of software written for the PPC405.
®
Appendix E, PowerPC
6xx/7xx Compatibility, describes the programming model
differences between the PPC405 and PowerPC 6xx and 7xx series processors.
®
Appendix F, PowerPC
Book-E Compatibility, describes the programming model
differences between the PPC405 and PowerPC Book-E processors.

Document Conventions

General Conventions

Ta bl e 1 lists the general notational conventions used throughout this document.
Table P-1: General Notational Conventions
Convention Definition
mnemonic Instruction mnemonics are shown in lower-case bold.
. (period) Update. When used as a character in an instruction
! (exclamation) In instruction listings, an exclamation (!) indicates the
variable Variable items are shown in italic.
<optional> Optional items are shown in angle brackets.
ActiveLow
n A decimal number.
0xn A hexadecimal number.
mnemonic, a period (.) means that the instruction updates the condition-register field.
start of a comment.
An overbar indicates an active-low signal.
0bn A binary number.
(rn) The contents of GPR rn.
(rA|0) The contents of the register rA, or 0 if the rA instruction
field is 0.
cr_bit Used in simplified mnemonics to specify a CR-bit
position (0 to 31) used as an operand.
312 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Document Conventions
R
Table P-1: General Notational Conventions (Continued)
Convention Definition
cr_field Used in simplified mnemonics to specify a CR field
(0 to 7) used as an operand.
OBJECT
OBJECT
OBJECT
REGISTER[FIELD] Fields within any register are shown in square brackets.
REGISTER[FIELD, FIELD
REGISTER[FIELD:FIELD] A

Instruction Fields

Ta bl e 2 lists the instruction fields used in the various instruction formats. They are found in
the instruction encodings and pseudocode, and are referred to throughout this document when describing instructions. The table includes the bit locations for the field within the instruction encoding.
Table P-2: Instruction Field Definitions
Field Location Description
b
b:b
b,b, . . .
A single bit in any object (a register, an instruction, an address, or a field) is shown as a subscripted number or name.
A range of bits in any object (a register, an instruction, an address, or a field).
A list of bits in any object (a register, an instruction, an address, or a field).
]A list of fields in any register.
. . .
range of fields in any register.
AA 30 Absolute-address bit (branch instructions).
0The immediate field represents an address relative to the current instruction address (CIA). The effective address (EA) of the branch is either the sum of the LI field sign-extended to 32 bits and the branch instruction address, or the sum of the BD field sign-extended to 32 bits and the branch instruction address.
1The immediate field represents an absolute address. The EA of the branch is either the LI field or the BD field, sign-extended to 32 bits.
BD 16:29 An immediate field specifying a 14-bit signed two’s-complement
branch displacement. This field is concatenated on the right with 0b00 and sign-extended to 32 bits.
BI 11:15 Specifies a bit in the CR used as a source for the condition of a
conditional-branch instruction.
BO 6:10 Specifies options for conditional-branch instructions. See
Conditional Branch Control, page 367
crbA 11:15 Specifies a bit in the CR used as a source of a CR-logical instruction.
crbB 16:20 Specifies a bit in the CR used as a source of a CR-logical instruction.
crbD 6:10 Specifies a bit in the CR used as a destination of a CR-Logical
instruction.
March 2002 Release www.xilinx.com 313 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Preface
Table P-2: Instruction Field Definitions (Continued)
Field Location Description
crfD 6:8 Specifies a field in the CR used as a target in a compare or mcrf
instruction.
crfS 11:13 Specifies a field in the CR used as a source in a mcrf instruction.
CRM 12:19 The field mask used to identify CR fields to be updated by the
mtcrf instruction.
d 16:31 Specifies a 16-bit signed twos-complement integer displacement
for load/store instructions.
DCRF 11:20 A split field used to specify a device control register (DCR). The
field is used to form the DCR number (DCRN).
E 16 A single-bit immediate field in the wrteei instruction specifying the
value to be written to the MSR[EE] bit.
LI 6:29 An immediate field specifying a 24-bit signed two’s-complement
branch displacement. This field is concatenated on the right with 0b00 and sign-extended to 32 bits.
LK 31 Link bit.
0Do not update the link register (LR).
1Update the LR with the address of the next instruction.
MB 21:25 Mask begin. Used in rotate-and-mask instructions to specify the
beginning bit of a mask.
ME 26:30 Mask end. Used in rotate-and-mask instructions to specify the
ending bit of a mask.
NB 16:20 Specifies the number of bytes to move in an immediate-string load
or immediate-string store.
OE 21 Enables setting the OV and SO fields in the fixed-point exception
register (XER) for extended arithmetic.
OPCD 0:5 Primary opcode. Primary opcodes, in decimal, appear in the
instruction format diagrams presented with individual instructions. The OPCD field name does not appear in instruction descriptions.
rA 11:15 Specifies a GPR source operand and/or destination operand.
rB 16:20 Specifies a GPR source operand.
Rc 31 Record bit.
0Instruction does not update the CR.
1Instruction updates the CR to reflect the result of an operation.
See Condition Register (CR), page 361 for a further discussion of how the CR bits are set.
rD 6:10 Specifies a GPR destination operand.
rS 6:10 Specifies a GPR source operand.
SH 16:20 Specifies a shift amount.
314 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Document Conventions
R
Table P-2: Instruction Field Definitions (Continued)
Field Location Description
SIMM 16:31 An immediate field used to specify a 16-bit signed-integer value.
SPRF 11:20 A split field used to specify a special purpose register (SPR). The
field is used to form the SPR number (SPRN).
TBRF 11:20 A split field used to specify a time-base register (TBR). The field is
used to form the TBR number (TBRN).
TO 6:10 Specifies the trap conditions, as defined in the tw and twi
instruction descriptions.
UIMM 16:31 An immediate field used to specify a 16-bit unsigned-integer value.
XO 21:30 Extended opcode for instructions without an OE field. Extended
opcodes, in decimal, appear in the instruction format diagrams presented with individual instructions. The XO field name does not appear in instruction descriptions.
XO 22:30 Extended opcode for instructions with an OE field. Extended
opcodes, in decimal, appear in the instruction format diagrams presented with individual instructions. The XO field name does not appear in instruction descriptions.

Pseudocode Conventions

Ta bl e 3 lists additional conventions used primarily in the pseudocode describing the
operation of each instruction.
Table P-3: Pseudocode Conventions
Convention Definition
Assignment AND logical operator ¬ NOT logical operator OR logical operator Exclusive-OR (XOR) logical operator
+Twos-complement addition
Twos-complement subtraction, unary minus
× Multiplication ÷ Division yielding a quotient
% Remainder of an integer division. For example, (33 % 32) = 1.
|| Concatenation =, ≠ Equal, not-equal relations
<, > Signed comparison relations
u
u
, Unsigned comparison relations
>
<
c
0:3
A four-bit object used to store condition results in compare instructions.
March 2002 Release www.xilinx.com 315 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Preface
Table P-3: Pseudocode Conventions (Continued)
Convention Definition
n
b The bit or bit value b is replicated n times.
x Bit positions that are don’t-cares. CEIL(n) Least integer n.
CIA Current instruction address. The 32-bit address of the instruction
being described by a sequence of pseudocode. This address is used to set the next instruction address (NIA). Does not correspond to any architected register.
DCR(DCRN) A specific device control register, as indicated by DCRN.
DCRN The device control register number formed using the split DCRF
field in a mfdcr or mtdcr instruction.
do Do loop. “to” and “by” clauses specify incrementing an iteration
variable. while and until clauses specify terminating conditions. Indenting indicates the scope of a loop.
EA Effective address. The 32-bit address that specifies a location in
main storage. Derived by applying indexing or indirect addressing rules to the specified operand.
EXTS(n) The result of extending
if...then...else... Conditional execution: if
n on the left with sign bits.
condition then a else b, where a and b
represent one or more pseudocode statements. Indenting indicates the ranges of
a and b. If b is null, the else does not
appear.
instruction(EA) An instruction operating on a data-cache block or instruction-
cache block associated with an EA.
leave Leave innermost do-loop or the do-loop specified by the leave
statement.
MASK(MB,ME) Mask having 1s in positions MB through ME (wrapping if
MB > ME) and 0s elsewhere.
MS(addr, n) The number of bytes represented by
storage represented by
addr.
n at the location in main
NIA Next instruction address. The 32-bit address of the next
instruction to be executed. In pseudocode, a successful branch is indicated by assigning a value to NIA. For instructions that do not branch, the NIA is CIA +4.
RESERVE Reserve bit. Indicates whether a process has reserved a block of
storage.
ROTL((RS),n) Rotate left. The contents of RS are shifted left the number of bits
specified by
n.
SPR(SPRN) A specific special-purpose register, as indicated by SPRN.
316 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Registers

R
Table P-3: Pseudocode Conventions (Continued)
Convention Definition
SPRN The special-purpose register number formed using the split
SPRF field in a mfspr or mtspr instruction
TBR(TBRN) A specific time-base register, as indicated by TBRN.
TBRN The time-base register number formed using the split TBRF field
in a mftb instruction.

Operator Precedence

Ta bl e 4 lists the pseudocode operators and their associativity in descending order of
precedence
:
Table P-4: Operator Precedence
Operators Associativity
Registers
REGISTER
n
b Right to left
, REGISTER[FIELD], function evaluation Left to right
b
¬, – (unary minus) Right to left
×, ÷ Left to right
+, – Left to right || Left to right
u
, <, >, , Left to right
=,
u
>
<
, ⊕ Left to right
Left to right None
Ta bl e 5 lists the PPC405 registers and their descriptive names.
Table P-5: PPC405 Registers
Register Descriptive Name
CCR0 Core-configuration register 0
CR Condition register
CTR Count register
DACn Data-address compare n
DBCRn Debug-control register n
DBSR Debug-status register
DCCR Data-cache cacheability register
DCWR Data-cache write-through register
March 2002 Release www.xilinx.com 317 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Preface
Table P-5: PPC405 Registers (Continued)
Register Descriptive Name
DEAR Data-error address register
DVCn Data-value compare n
ESR Exception-syndrome register
EVPR Exception-vector prefix register
GPR General-purpose register. Specific GPRs are identified using the
notational convention rn (see below)
IACn Instruction-address compare n
ICCR Instruction-cache cacheability register
ICDBDR Instruction-cache debug-data register
LR Link register
MSR Machine-state register
PID Process ID
PIT Programmable-interval timer

Terms

PVR Processor-version register
rn Specifies GPR n (r15, for example)
SGR Storage-guarded register
SLER Storage little-endian register
SPRGn SPR general-purpose register n
SRRn Save/restore register n
SU0R Storage user-defined 0 register
TBL Time-base lower
TBU Time-base upper
TCR Timer-control register
TSR Timer-status register
USPRGn User SPR general-purpose register n
XER Fixed-point exception register
ZPR Zone-protection register
atomic access
A memory access that attempts to read from and write to the same address uninterrupted by other accesses to that address. The term refers to the fact that such transactions are indivisible.
big endian
A memory byte ordering where the address of an item corresponds to the most-significant byte.
318 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Terms
R
Book-E
cache block
cacheline
clear
cache set
congruence class
dirty
doubleword
effective address
exception
fill buffer
An version of the PowerPC architecture designed specifically for embedded applications.
Synonym for cacheline.
A portion of a cache array that contains a copy of contiguous system-memory addresses. Cachelines are 32-bytes long and aligned on a 32-byte address.
To write a bit value of 0.
Synonym for congruence class.
A collection of cachelines with the same index.
An indication that cache information is more recent than the copy in memory.
Eight bytes, or 64 bits.
The untranslated memory address as seen by a program.
An abnormal event or condition that requires the processor’s attention. They can be caused by instruction execution or an external device. The processor records the occurrence of an exception and they often cause an interrupt to occur.
A buffer that receives and sends data and instructions between the processor and PLB. It is used when cache misses occur and when access to non-cacheable memory occurs.
flush
GB
halfword
hit
interrupt
invalidate
KB
line buffer
little endian
logical address
MB
A cache or TLB operation that involves writing back a modified entry to memory, followed by an invalidation of the entry.
Gigabyte, or one-billion bytes.
Two bytes, or 16 bits.
For cache arrays and TLB arrays, an indication that requested information exists in the accessed array.
The process of stopping the currently executing program so that an exception can be handled.
A cache or TLB operation that causes an entry to be marked as invalid. An invalid entry can be subsequently replaced.
Kilobyte, or one-thousand bytes.
A buffer located in the cache array that can temporarily hold the contents of an entire cacheline. It is loaded with the contents of a cacheline when a cache hit occurs.
A memory byte ordering where the address of an item corresponds to the least-significant byte.
Synonym for effective address.
Megabyte, or one-million bytes.
memory
miss
Collectively, cache memory and system memory.
For cache arrays and TLB arrays, an indication that requested information does not exist in the accessed array.
March 2002 Release www.xilinx.com 319 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Preface
OEA
on chip
pending
physical address
PLB
privileged mode
process
problem state
The PowerPC operating-environment architecture, which defines the memory-management model, supervisor-level registers and instructions, synchronization requirements, the exception model, and the time-base resources as seen by supervisor programs.
In system-on-chip implementations, this indicates on the same chip as the processor core, but external to the processor core.
As applied to interrupts, this indicates that an exception occurred, but the interrupt is disabled. The interrupt occurs when it is later enabled.
The address used to access physically-implemented memory. This address can be translated from the effective address. When address translation is not used, this address is equal to the effective address.
Processor local bus.
The operating mode typically used by system software. Privileged operations are allowed and software can access all registers and memory.
A program (or portion of a program) and any data required for the program to run.
Synonym for user mode.
real address
scalar
set
sticky
string
supervisor state
system memory
tag
UISA
Synonym for physical address.
Individual data objects and instructions. Scalars are of arbitrary size.
To write a bit value of 1.
A bit that can be set by software, but cleared only by the processor. Alternatively, a bit that can be cleared by software, but set only by the processor.
A sequence of consecutive bytes.
Synonym for privileged mode.
Physical memory installed in a computer system external to the processor core, such RAM, ROM, and flash.
As applied to caches, a set of address bits used to uniquely identify a specific cacheline within a congruence class. As applied to TLBs, a set of address bits used to uniquely identify a specific entry within the TLB.
The PowerPC user instruction-set architecture, which defines the base user-level instruction set, registers, data types, the memory model, the programming model, and the exception model as seen by user programs.
user mode
The operating mode typically used by application software. Privileged operations are not allowed in user mode, and software can access a restricted set of registers and memory.
320 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Additional Reading

R
VEA
virtual address
word
Additional Reading
In addition to the source documents listed on page 311, the following documents contain additional information of potential interest to readers of this manual:
The PowerPC Architecture: A Specification for a New Family of RISC Processors, IBM 5/1994. Published by Morgan Kaufmann Publishers, Inc. San Francisco (ASIN:
1558603166).
Book E: Enhanced PowerPC Architecture, IBM 3/2000.
The PowerPC Compiler Writers Guide, IBM 1/1996. Published by Warthman Associates,
Palo Alto, CA (ISBN 0-9649654-0-2).
Optimizing PowerPC Code : Programming the PowerPC Chip in Assembly Language, by Gary Kacmarcik (ASIN: 0201408392)
PowerPC Programming Pocket Book, by Steve Heath (ISBN 0750621117).
Computer Architecture: A Quantitative Approach, by John L. Hennessy and David A.
Patterson.
The PowerPC virtual-environment architecture, which defines a multi-access memory model, the cache model, cache-control instructions, and the time-base resources as seen by user programs.
An intermediate address used to translate an effective address into a physical address. It consists of a process ID and the effective address. It is only used when address translation is enabled.
Four bytes, or 32 bits.
March 2002 Release www.xilinx.com 321 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Preface
322 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro Platform FPGA Documentation
R

Introduction to the PPC405

The PPC405 is a 32-bit implementation of the PowerPC® embedded-environment architecture that is derived from the PowerPC architecture. Specifically, the PPC405 is an embedded PowerPC 405D5 processor core.
The PowerPC architecture provides a software model that ensures compatibility between implementations of the PowerPC family of microprocessors. The PowerPC architecture defines parameters that guarantee compatible processor implementations at the application-program level, allowing broad flexibility in the development of derivative PowerPC implementations that meet specific market requirements.
This chapter provides an overview of the PowerPC architecture and an introduction to the features of the PPC405 core.

PowerPC Architecture Overview

Chapter 1
The PowerPC architecture is a 64-bit architecture with a 32-bit subset. The material in this document only covers aspects of the 32-bit architecture implemented by the PPC405.
In general, the PowerPC architecture defines the following:
Instruction set
Programming model
Memory model
Exception model
Memory-management model
Time-keeping model
Instruction Set
The instruction set specifies the types of instructions (such as load/store, integer arithmetic, and branch instructions), the specific instructions, and the encoding used for the instructions. The instruction set definition also specifies the addressing modes used for accessing memory.
Programming Model
The programming model defines the register set and the memory conventions, including details regarding the bit and byte ordering, and the conventions for how data are stored.
Memory Model
The memory model defines the address-space size and how it is subdivided into pages. It also defines attributes for specifying memory-region cacheability, byte ordering (big­endian or little-endian), coherency, and protection.
March 2002 Release www.xilinx.com 323 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Exception Model
The exception model defines the set of exceptions and the conditions that can cause those exceptions. The model specifies exception characteristics, such as whether they are precise or imprecise, synchronous or asynchronous, and maskable or non-maskable. The model defines the exception vectors and a set of registers used when interrupts occur as a result of an exception. The model also provides memory space for implementation-specific exceptions.
Memory-Management Model
The memory-management model defines how memory is partitioned, configured, and protected. The model also specifies how memory translation is performed, defines special memory-control instructions, and specifies other memory-management characteristics.
Time-Keeping Model
The time-keeping model defines resources that permit the time of day to be determined and the resources and mechanisms required for supporting timer-related exceptions.

PowerPC Architecture Levels

These above aspects of the PowerPC architecture are defined at three levels . This layering provides flexibility by allowing degrees of software compatibility across a wide range of implementations. For example, an implementation such as an embedded controller can support the user instruction set, but not the memory management, exception, and cache models where it might be impractical to do so.
The three levels of the PowerPC architecture are defined in Tab le 1 -1 .
Chapter 1: Introduction to the PPC405
Table 1-1: Three Levels of PowerPC Architecture
User Instruction-Set Architecture
Virtual Environment Architecture
(UISA)
Defines the architecture level to which user-level (sometimes referred to as problem state) software should conform
Defines the base user-level instruction set, user-level registers, data types, floating­point memory conventions, exception model as seen by user programs, memory model, and the programming model
Defines additional user-level functionality that falls outside typical user-level software requirements
Describes the memory model for an environment in which multiple devices can access memory
Defines aspects of the cache model and cache-control instructions
Defines the time-base resources from a user-level perspective
Note: All PowerPC implementations adhere to the UISA.
Note: Implementations that conform to the VEA level are guaranteed to conform to the UISA level.
The PowerPC architecture requires that all PowerPC implementations adhere to the UISA, offering compatibility among all PowerPC application programs. However, different versions of the VEA and OEA are permitted.
Embedded applications written for the PPC405 are compatible with other PowerPC implementations. Privileged software generally is not compatible. The migration of
(VEA)
Operating Environment
Architecture (OEA)
Defines supervisor-level resources typically required by an operating system
Defines the memory­management model, supervisor­level registers, synchronization requirements, and the exception model
Defines the time-base resources from a supervisor-level perspective
Note: Implementations that conform to the OEA level are guaranteed to conform to the UISA and VEA levels.
324 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PowerPC Architecture Overview
privileged software from the PowerPC architecture to the PPC405 is in many cases straightforward because of the simplifications made by the PowerPC embedded­environment architecture. Software developers who are concerned with cross­compatibility of privileged software between the PPC405 and other PowerPC implementations should refer to Appendix E, PowerPC
Latitude Within the PowerPC Architecture Levels
Although the PowerPC architecture defines parameters necessary to ensure compatibility among PowerPC processors, it also allows a wide range of options for individual implementations. These are:
Some resources are optional, such as certain registers, bits within registers, instructions, and exceptions.
Implementations can define additional privileged special-purpose registers (SPRs), exceptions, and instructions to meet special system requirements, such as power management in processors designed for very low-power operation.
Implementations can define many operating parameters. For example, the PowerPC architecture can define the possible condition causing an alignment exception. A particular implementation can choose to solve the alignment problem without causing an exception.
Processors can implement any architectural resource or instruction with assistance from software (that is, they can trap and emulate) as long as the results (aside from performance) are identical to those specified by the architecture. In this case, a complete implementation requires both hardware and software.
Some parameters are defined at one level of the architecture and defined more specifically at another. For example, the UISA defines conditions that can cause an alignment exception and the OEA specifies the exception itself.
®
6xx/7xx Compatibility.
R
Features Not Defined by the PowerPC Architecture
Because flexibility is an important feature of the PowerPC architecture, many aspects of processor design (typically relating to the hardware implementation) are not defined, including the following:
System-Bus Interface
Although many implementations can share similar interfaces, the PowerPC architecture does not define individual signals or the bus protocol. For example, the OEA allows each implementation to specify the signal or signals that trigger a machine-check exception.
Cache Design
The PowerPC architecture does not define the size, structure, replacement algorithm, or mechanism used for maintaining cache coherency. The PowerPC architecture supports, but does not require, the use of separate instruction and data caches.
Execution Units
The PowerPC architecture is a RISC architecture, and as such has been designed to facilitate the design of processors that use pipelining and parallel execution units to maximize instruction throughput. However, the PowerPC architecture does not define the internal hardware details of an implementation. For example, one processor might implement two units dedicated to executing integer-arithmetic instructions and another might implement a single unit for executing all integer instructions.
Other Internal Microarchitecture Issues
The PowerPC architecture does not specify the execution unit responsible for executing a particular instruction. The architecture does not define details regarding the instruction­fetch mechanism, how instructions are decoded and dispatched, and how results are written to registers. Dispatch and write-back can occur in-order or out-of-order. Although
March 2002 Release www.xilinx.com 325 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 1: Introduction to the PPC405
the architecture specifies certain registers, such as the GPRs and FPRs, implementations can use register renaming or other schemes to reduce the impact of data dependencies and register contention.
Implementation-Specific Registers
Each implementation can have its own unique set of implementation registers that are not defined by the architecture.

PowerPC Embedded-Environment Architecture

The PowerPC embedded-environment architecture is optimized for embedded controllers. This architecture is a forerunner to the PowerPC Book-E architecture. The PowerPC embedded-environment architecture provides an alternative definition for certain features specified by the PowerPC VEA and OIA. Implementations that adhere to the PowerPC embedded-environment architecture also adhere to the PowerPC UISA. PowerPC embedded-environment processors are 32-bit only implementations and thus do not include the special 64-bit extensions to the PowerPC UISA. Also, floating-point support can be provided either in hardware or software by PowerPC embedded-environment processors.
Figure 1-1 shows the relationship between the PowerPC embedded-environment
architecture, the PowerPC architecture, and the PowerPC Book-E architecture.
PowerPC
Embedded-Environment Architecture
32-Bit Only
VEA Enhancements
- True Little-Endian Support
- Enhanced Cache Management
OEA Enhancements
- Simplified Memory Management
- Software-Managed TLB
- Variable Page Sizes
- Interrupt Extensions
- Critical/Non-Critical
- Virtual-Memory Relocatable
- Timer Extensions
- Debug Extensions
64-Bit UISA Extensions Synchronization Using Memory Barriers
PowerPC
Book-E Architecture
UISA
PowerPC
Architecture
32-Bit/64-Bit Modes OEA
- Hashed Paging
- Segments, BATs
UG011_38_090701
Figure 1-1: Relationship of PowerPC Architectures
The PowerPC embedded-environment architecture features:
Memory management optimized for embedded software environments.
Cache-management instructions for optimizing performance and memory control in
complex applications that are graphically and numerically intensive.
Storage attributes for controlling memory-system behavior.
326 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PowerPC Architecture Overview
Special-purpose registers for controlling the use of debug resources, timer resources, interrupts, real-mode storage attributes, memory-management facilities, and other architected processor resources.
A device-control-register address space for managing on-chip peripherals such as memory controllers.
A dual-level interrupt structure and interrupt-control instructions.
Multiple timer resources.
Debug resources that enable hardware-debug and software-debug functions such as
instruction breakpoints, data breakpoints, and program single-stepping.
Virtual Environment
The virtual environment defines architectural features that enable application programs to create or modify code, to manage storage coherency, and to optimize memory-access performance. It defines the cache and memory models, the timekeeping resources from a user perspective, and resources that are accessible in user mode but are primarily used by system-library routines. The following summarizes the virtual-environment features of the PowerPC embedded-environment architecture:
Storage model:
- Storage-control instructions as defined in the PowerPC virtual-environment
- Storage attributes for controlling memory-system behavior. These are: write-
- Operand-placement requirements and their effect on performance.
The time-base function as defined by the PowerPC virtual-environment architecture, for user-mode read access to the 64-bit time base.
R
architecture. These instructions are used to manage instruction caches and data caches, and for synchronizing and ordering instruction execution.
through, cacheability, memory coherence (optional), guarded, and endian.
March 2002 Release www.xilinx.com 327 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 1: Introduction to the PPC405
Operating Environment
The operating environment describes features of the architecture that enable operating systems to allocate and manage storage, to handle errors encountered by application programs, to support I/O devices, and to provide operating-system services. It specifies the resources and mechanisms that require privileged access, including the memory­protection and address-translation mechanisms, the exception-handling model, and privileged timer resources. Tab le 1 -2 summarizes the operating-environment features of the PowerPC embedded-environment architecture.
Table 1-2: Operating-Environment Features of the PowerPC Embedded-Environment Architecture
Operating
Environment
Register model
Storage model
Exception model
Debug model
Time-keeping model
Synchronization requirements
Reset and initialization requirements
Features
Privileged special-purpose registers (SPRs) and instructions for accessing those registers
Device control registers (DCRs) and instructions for accessing those registers
Privileged cache-management instructions
Storage-attribute controls
Address translation and memory protection
Privileged TLB-management instructions
Dual-level interrupt structure supporting various exception types
Specification of interrupt priorities and masking
Privileged SPRs for controlling and handling exceptions
Interrupt-control instructions
Specification of how partially executed instructions are handled when an interrupt
occurs
Privileged SPRs for controlling debug modes and debug events
Specification for seven types of debug events
Specification for allowing a debug event to cause a reset
The ability of the debug mechanism to freeze the timer resources
64-bit time base
32-bit decrementer (the programmable-interval timer)
Three timer-event interrupts:
- Programmable-interval timer (PIT)
- Fixed-interval timer (FIT)
-Watchdog timer (WDT)
Privileged SPRs for controlling the timer resources
The ability to freeze the timer resources using the debug mechanism
Requirements for special registers and the TLB
Requirements for instruction fetch and for data access
Specifications for context synchronization and execution synchronization
Specification for two internal mechanisms that can cause a reset:
- Debug-control register (DBCR)
- Timer-control register (TCR)
Contents of processor resources after a reset
The software-initialization requirements, including an initialization code example
328 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

PowerPC Book-E Architecture

The PowerPC Book-E architecture extends the capabilities introduced in the PowerPC embedded-environment architecture. Although not a PowerPC Book-E implementation, many of the features available in the 32-bit subset of the PowerPC Book-E architecture are available in the PPC405. The PowerPC Book-E architecture and the PowerPC embedded­environment architecture differ in the following general ways:
64-bit addressing and 64-bit operands are available. Unlike 64-bit mode in the PowerPC UISA, 64-bit support in PowerPC Book-E architecture is non-modal and instead defines new 64-bit instructions and flags.
Real mode is eliminated, and the memory-management unit is active at all times. The elimination of real mode results in the elimination of real-mode storage-attribute registers.
Memory synchronization requirements are changed in the architecture and a memory-barrier instruction is introduced.
A small number of new instructions are added to the architecture and several instructions are removed.
Several SPR addresses and names are changed in the architecture, as are the assignment and meanings of some bits within certain SPRs.
Embedded applications written for the PPC405 are compatible with PowerPC Book-E implementations. Privileged software is, in general, not compatible, but the differences are relatively minor. Software developers who are concerned with cross-compatibility of privileged software between the PPC405 and PowerPC Book-E implementations should
®
refer to Appendix F, PowerPC
Book-E Compatibility.
R
PPC405 Features
The PPC405 processor core is an implementation of the PowerPC embedded-environment architecture. The processor provides fixed-point embedded applications with high performance at low power consumption. It is compatible with the PowerPC UISA. Much of the PPC405 VEA and OEA support is also available in implementations of the PowerPC Book-E architecture. Key features of the PPC405 include:
A fixed-point execution unit fully compliant with the PowerPC UISA:
PowerPC embedded-environment architecture extensions providing additional
Performance-enhancing features, including:
- 32-bit architecture, containing thirty-two 32-bit general purpose registers (GPRs).
support for embedded-systems applications:
- True little-endian operation
- Flexible memory management
- Multiply-accumulate instructions for computationally intensive applications
- Enhanced debug capabilities
- 64-bit time base
- 3 timers: programmable interval timer (PIT), fixed interval timer (FIT), and
watchdog timer (All are synchronous with the time base)
- Static branch prediction
- Five-stage pipeline with single-cycle execution of most instructions, including
loads and stores
- Multiply-accumulate instructions
- Hardware multiply/divide for faster integer arithmetic (4-cycle multiply, 35-cycle
divide)
- Enhanced string and multiple-word handling
March 2002 Release www.xilinx.com 329 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 1: Introduction to the PPC405
- Support for unaligned loads and unaligned stores to cache arrays, main memory,
and on-chip memory (OCM)
- Minimized interrupt latency
Integrated instruction-cache:
- 16 KB, 2-way set associative
- Eight words (32 bytes) per cacheline
- Fetch line buffer
- Instruction-fetch hits are supplied from the fetch line buffer
- Programmable prefetch of next-sequential line into the fetch line buffer
- Programmable prefetch of non-cacheable instructions: full line (eight words) or
half line (four words)
- Non-blocking during fetch line fills
Integrated data-cache:
- 16 KB, 2-way set associative
- Eight words (32 bytes) per cacheline
- Read and write line buffers
- Load and store hits are supplied from/to the line buffers
- Write-back and write-through support
- Programmable load and store cacheline allocation
- Operand forwarding during cacheline fills
- Non-blocking during cacheline fills and flushes
Support for on-chip memory (OCM) that can provide memory-access performance identical to a cache hit
Flexible memory management:
- Translation of the 4 GB logical-address space into the physical-address space
- Independent control over instruction translation and protection, and data
translation and protection
- Page-level access control using the translation mechanism
- Software control over the page-replacement strategy
- Write-through, cacheability, user-defined 0, guarded, and endian (WIU0GE)
storage-attribute control for each virtual-memory region
- WIU0GE storage-attribute control for thirty-two 128 MB regions in real mode
- Additional protection control using zones
Enhanced debug support with logical operators:
- Four instruction-address compares
- Two data-address compares
- Two data-value compares
- JTAG instruction for writing into the instruction cache
- Forward and backward instruction tracing
Advanced power management support

Privilege Modes

Software running on the PPC405 can do so in one of two privilege modes: privilieged and user. The privilege modes supported by the PPC405 are described in Processor Operating
Modes, page 343.
330 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PPC405 Features

Address Translation Modes

R
Privileged Mode
Privileged mode allows programs to access all registers and execute all instructions supported by the processor. Normally, the operating system and low-level device drivers operate in this mode.
User Mode
User mode restricts access to some registers and instructions. Normally, application programs operate in this mode.
The PPC405 also supports two modes of address translation: real and virtual. Refer to
Chapter 6, Virtual-Memory Management, for more information on address translation.
Real Mode
In real mode, programs address physical memory directly.
Virtual Mode
In virtual mode, programs address virtual memory and virtual-memory addresses are translated by the processor into physical-memory addresses. This allows programs to access much larger address spaces than might be implemented in the system.

Addressing Modes

Whether the PPC4 05 is running in real mode or virtual mode, data addressing is supported by the load and store instructions using one of the following addressing modes:
Register-indirect with immediate indexA base address is stored in a register, and a displacement from the base address is specified as an immediate value in the instruction.
Register-indirect with indexA base address is stored in a register, and a displacement from the base address is stored in a second register.
Register indirectThe data address is stored in a register.
Instructions that use the two indexed forms of addressing also allow for automatic updates to the base-address register. With these instruction forms, the new data address is calculated, used in the load or store data access, and stored in the base-address register.
The data-addressing modes are described in Operand-Address Calculation, page 378.
With sequential-instruction execution, the next-instruction address is calculated by adding four bytes to the current-instruction address. In the case of branch instructions, however, the next-instruction address is determined using one of four branch-addressing modes:
Branch to relativeThe next-instruction address is at a location relative to the current­instruction address.
Branch to absoluteThe next-instruction address is at an absolute location in memory.
Branch to link registerThe next-instruction address is stored in the link register.
Branch to count registerThe next-instruction address is stored in the count register.
The branch-addressing modes are described in Branch-Target Address Calculation,
page 372.

Data Types

PPC405 instructions support byte, halfword, and word operands. Multiple-word operands are supported by the load/store multiple instructions and byte strings are supported by
March 2002 Release www.xilinx.com 331 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
the load/store string instructions. Integer data are either signed or unsigned, and signed data is represented using twos-complement format.
The address of a multi-byte operand is determined using the lowest memory address occupied by that operand. For example, if the four bytes in a word operand occupy addresses 4, 5, 6, and 7, the word address is 4. The PPC405 supports both big-endian (an operands most-significant byte is at the lowest memory address) and little-endian (an operands least-significant byte is at the lowest memory address) addressing.
See Operand Conventions, page 347, for more information on the supported data types and byte ordering.

Register Set Summary

Figure 1-2, page 333 shows the registers contained in the PPC405. Descriptions of the
registers are in the following sections.
Chapter 1: Introduction to the PPC405
332 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PPC405 Features
R
User Registers
General-Purpose Registers
r0 r1
. . .
r31
Condition Register
CR
Fixed-Point Exception Register
XER
Link Register
LR
Count Register
CTR
User-SPR General-Purpose
Registers
USPRG0
SPR General-Purpose
Registers
Time-Base Registers
(read only)
SPRG4 SPRG5 SPRG6 SPRG7
(read only)
TBU
TBL
Privileged Registers
Machine-State Register
MSR
Core-Configuration Register
CCR0
SPR General-Purpose
Registers
SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7
Exception-Handling Registers
EVPR
ESR DEAR SRR0 SRR1 SRR2 SRR3
Memory-Management
Registers
PID
ZPR
Storage-Attribute Control
Registers
DCCR
DCWR
ICCR
SGR SLER SU0R
Debug Registers
DBSR DBCR0 DBCR1
DAC1 DAC2 DVC1 DVC2
IAC1 IAC2 IAC3 IAC4
ICDBR
Timer Registers
TCR
TSR
PIT
Processor-Version Register
PVR
Time-Base Registers
TBU
TBL
UG011_51_033101
Figure 1-2: PPC405 Registers
General-Purpose Registers
The processor contains thirty-two 32-bit general-purpose registers (GPRs), identified as r0 through r31. The contents of the GPRs are read from memory using load instructions and written to memory using store instructions. Computational instructions often read operands from the GPRs and write their results in GPRs. Other instructions move data between the GPRs and other registers. GPRs can be accessed by all software. See General-
Purpose Registers (GPRs), page 360, for more information.
March 2002 Release www.xilinx.com 333 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Special-Purpose Registers
The processor contains a number of 32-bit special-purpose registers (SPRs). SPRs provide access to additional processor resources, such as the count register, the link register, debug resources, timers, interrupt registers, and others. Most SPRs are accessed only by privileged software, but a few, such as the count register and link register, are accessed by all software. See User Registers, page 359, and Privileged Registers, page 429 for more information.
Machine-State Register
The 32-bit machine-state register (MSR) contains fields that control the operating state of the processor. This register can be accessed only by privileged software. See Machine-State
Register, page 431, for more information.
Condition Register
The 32-bit condition register (CR) contains eight 4-bit fields, CR0–CR7. The values in the CR fields can be used to control conditional branching. Arithmetic instructions can set CR0 and compare instructions can set any CR field. Additional instructions are provided to perform logical operations and tests on CR fields and bits within the fields. The CR can be accessed by all software. See Condition Register (CR), page 361, for more information.
Device Control Registers
Chapter 1: Introduction to the PPC405
The 32-bit device control registers (not shown) are used to configure, control, and report status for various external devices that are not part of the PPC405 processor. Although the DCRs are not part of the PPC405 implementation, they are accessed using the mtdcr and mfdcr instructions. The DCRs can be accessed only by privileged software. See the PPC405
Processor Block Manual for more information on implementing DCRs.

PPC405 Organization

As shown in Figure 1-3, the PPC405 processor contains the following elements:
A 5-stage pipeline consisting of fetch, decode, execute, write-back, and load write­back stages
A virtual-memory-management unit that supports multiple page sizes and a variety of storage-protection attributes and access-control options
Separate instruction-cache and data-cache units
Debug support, including a JTAG interface
Three programmable timers
The following sections provide an overview of each element.
334 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PPC405 Features
R
PLB Master
Read Interface
I-Cache
Array
Instruction-Cache
I-Cache
Controller
Unit
Cache Units
Data-Cache
Unit
D-Cache
Array
D-Cache
Controller
Instruction
OCM
Instruction
Shadow-TLB
(4-Entry)
Unified TLB
(64-Entry)
Data
Shadow-TLB
(8-Entry)
Fetch
and
Decode
Logic
32x32
GPR
CPUMMU
3-Element
Fetch Queue
Execute Unit
ALU MAC
Timers
Timers
and
Debug
Debug
Logic
PLB Master
Read Interface
PLB Master
Write Interface
Data
OCM
External-Interrupt
Controller Interface
JTAG
Instruction
Figure 1-3: PPC405 Organization
Central-Processing Unit
The PPC405 central-processing unit (CPU) implements a 5-stage instruction pipeline consisting of fetch, decode, execute, write-back, and load write-back stages.
The fetch and decode logic sends a steady flow of instructions to the execute unit. All instructions are decoded before they are forwarded to the execute unit. Instructions are queued in the fetch queue if execution stalls. The fetch queue consists of three elements: two prefetch buffers and a decode buffer. If the prefetch buffers are empty instructions flow directly to the decode buffer.
Up to two branches are processed simultaneously by the fetch and decode logic. If a branch cannot be resolved prior to execution, the fetch and decode logic predicts how that branch is resolved, causing the processor to speculatively fetch instructions from the predicted path. Branches with negative-address displacements are predicted as taken, as are branches that do not test the condition register or count register. The default prediction can be overridden by software at assembly or compile tim e. This capability is described further in Branch Prediction, page 370.
The PPC405 has a single-issue execute unit containing the general-purpose register file (GPR), arithmetic-logic unit (ALU), and the multiply-accumulate unit (MAC). The GPRs consist of thirty-two 32-bit registers that are accessed by the execute unit using three read ports and two write ports. During the decode stage, data is read out of the GPRs for use by the execute unit. During the write-back stage, results are written to the GPR. The use of five read/write ports on the GPRs allows the processor to execute load/store operations in parallel with ALU and MAC operations.
Trace
UG011_29_033101
March 2002 Release www.xilinx.com 335 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
The execute unit supports all 32-bit PowerPC UISA integer instructions in hardware, and is compliant with the PowerPC embedded-environment architecture specification. Floating­point operations are not supported.
The MAC unit supports implementation-specific multiply-accumulate instructions and multiply-halfword instructions. MAC instructions operate on either signed or unsigned 16-bit operands, and they store their results in a 32-bit GPR. These instructions can produce results using either modulo arithmetic or saturating arithmetic. All MAC instructions have a single cycle throughput. See Multiply-Accumulate Instruction-Set
Extensions, page 405 for more information.
Exception Handling Logic
Exceptions are divided into two classes: critical and noncritical. The PPC405 CPU services exceptions caused by error conditions, the internal timers, debug events, and the external interrupt controller (EIC) interface. Across the two classes, a total of 19 possible exceptions are supported, including the two provided by the EIC interface.
Each exception class has its own pair of save/restore registers. SRR0 and SRR1 are used for noncritical interrupts, and SRR2 and SRR3 are used for critical interrupts. The exception­return address and the machine state are written to these registers when an exception occurs, and they are automatically restored when an interrupt handler exits using the return-from-interrupt (rfi) or return-from critical-interrupt (rfci) instruction. Use of separate save/restore registers allows the PPC405 to handle critical interrupts independently of noncritical interrupts.
See Chapter 7, Exceptions and Interrupts, for information on exception handling in the PPC405.
Chapter 1: Introduction to the PPC405
Memory Management Unit
The PPC405 supports 4 GB of flat (non-segmented) address space. The memory­management unit (MMU) provides address translation, protection functions, and storage­attribute control for this address space. The MMU supports demand-paged virtual memory using multiple page sizes of 1 KB, 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB and 16 MB. Multiple page sizes can improve memory efficiency and minimize the number of TLB misses. When supported by system software, the MMU provides the following functions:
Translation of the 4 GB logical-address space into a physical-address space.
Independent enabling of instruction translation and protection from that of data
translation and protection.
Page-level access control using the translation mechanism.
Software control over the page-replacement strategy.
Additional protection control using zones.
Storage attributes for cache policy and speculative memory-access control.
The translation look-aside buffer (TLB) is used to control memory translation and protection. Each one of its 64 entries specifies a page translation. It is fully associative, and can simultaneously hold translations for any combination of page sizes. To prevent TLB contention between data and instruction accesses, a 4-entry instruction and an 8-entry data shadow-TLB are maintained by the processor transparently to software.
Software manages the initialization and replacement of TLB entries. The PPC405 includes instructions for managing TLB entries by software running in privileged mode. This capability gives significant control to system software over the implementation of a page replacement strategy. For example, software can reduce the potential for TLB thrashing or delays associated with TLB-entry replacement by reserving a subset of TLB entries for globally accessible pages or critical pages.
Storage attributes are provided to control access of memory regions. When memory translation is enabled, storage attributes are maintained on a page basis and read from the
336 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PPC405 Features
R
TLB when a memory access occurs. When memory translation is disabled, storage attributes are maintained in storage-attribute control registers. A zone-protection register (ZPR) is provided to allow system software to override the TLB access controls without requiring the manipulation of individual TLB entries. For example, the ZPR can provide a simple method for denying read access to certain application programs.
Chapter 6, Virtual-Memory Management, describes these memory-management
resources in detail.
Instruction and Data Caches
The PPC405 accesses memory through the instruction-cache unit (ICU) and data-cache unit (DCU). Each cache unit includes a PLB-master interface, cache arrays, and a cache controller. Hits into the instruction cache and data cache appear to the CPU as single-cycle memory accesses. Cache misses are handled as requests over the PLB bus to another PLB device, such as an external-memory controller.
The PPC405 implements separate instruction-cache and data-cache arrays. Each is 16 KB in size, is two-way set-associative, and operates using 8-word (32 byte) cachelines. The caches are non-blocking, allowing the PPC405 to overlap instruction execution with reads over the PLB (when cache misses occur).
The cache controllers replace cachelines according to a least-recently used (LRU) replacement policy. When a cacheline fill occurs, the most-recently accessed line in the cache set is retained and the other line is replaced. The cache controller updates the LRU during a cacheline fill.
The ICU supplies up to two instructions every cycle to the fetch and decode unit. The ICU can also forward instructions to the fetch and decode unit during a cacheline fill, minimizing execution stalls caused by instruction-cache misses. When the ICU is accessed, four instructions are read from the appropriate cacheline and placed temporarily in a line buffer. Subsequent ICU accesses check this line buffer for the requested instruction prior to accessing the cache array. This allows the ICU cache array to be accessed as little as once every four instructions, significantly reducing ICU power consumption.
The DCU can independently process load/store operations and cache-control instructions. The DCU can also dynamically reprioritize PLB requests to reduce the length of an execution stall. For example, if the DCU is busy with a low-priority request and a subsequent storage operation requested by the CPU is stalled, the DCU automatically increases the priority of the current (low-priority) request. The current request is thus finished sooner, allowing the DCU to process the stalled request sooner. The DCU can forward data to the execute unit during a cacheline fill, further minimizing execution stalls caused by data-cache misses.
Additional features allow programmers to tailor data-cache performance to a specific application. The DCU can function in write-back or write-through mode, as determined by the storage-control attributes. Loads and stores that do not allocate cachelines can also be specified. Inhibiting certain cacheline fills can reduce potential pipeline stalls and unwanted external-bus traffic.
See Chapter 5, Memory-System Management, for details on the operation and control of the PPC405 caches.
Timer Resources
The PPC405 contains a 64-bit time base and three timers. The time base is incremented synchronously using the CPU clock or an external clock source. The three timers are incremented synchronously with the time base. (See Chapter 8, Timer Resources, for more information on these features.) The three timers supported by the PPC405 are:
Programmable Interval Timer
Fixed Interval Timer
Watc h dog Ti m er
March 2002 Release www.xilinx.com 337 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 1: Introduction to the PPC405
Programmable Interval Timer
The programmable interval timer (PIT) is a 32-bit register that is decremented at the time-base increment frequency. The PIT register is loaded with a delay value. When the PIT count reaches 0, a PIT interrupt occurs. Optionally, the PIT can be programmed to automatically reload the last delay value and begin decrementing again.
Fixed Interval Timer
The fixed interval timer (FIT) causes an interrupt when a selected bit in the time-base register changes from 0 to 1. Programmers can select one of four predefined bits in the time-base for triggering a FIT interrupt.
Watchdog Timer
The watchdog timer causes a hardware reset when a selected bit in the time-base register changes from 0 to 1. Programmers can select one of four predefined bits in the time-base for triggering a reset, and the type of reset can be defined by the programmer.
Note: The time-base register alone does not cause interrupts to occur.
Debug
The PPC405 debug resources include special debug modes that support the various types of debugging used during hardware and software development. These are:
Internal-debug mode for use by ROM monitors and software debuggers
External-debug mode for use by JTAG debuggers
Debug-wait mode, which allows the servicing of interrupts while the processor appears
to be stopped
Real-time trace mode, which supports event triggering for real-time tracing
Debug events are supported that allow developers to manage the debug process. Debug modes and debug events are controlled using debug registers in the processor. The debug registers are accessed either through software running on the processor or through the JTAG port. The JTAG port can also be used for board tests.
The debug modes, events, controls, and interfaces provide a powerful combination of debug resources for hardware and software development tools. Chapter 9, Debugging, describes these resources in detail.
PPC405 Interfaces
The PPC405 provides a set of interfaces that supports the attachment of cores and user logic. The software resources used to manage the PPC405 interfaces are described in the
Core-Configuration Register, page 459 . For information on the hardware operation, use,
and electrical characteristics of these interfaces, refer to the PPC405 Processor Block
Manual. The following interfaces are provided:
Processor local bus interface
Device control register interface
Clock and power management interface
JTAG port interface
On-chip interrupt controller interface
On-chip memory controller interface
Processor Local Bus
The processor local bus (PLB) interface provides a 32-bit address and three 64-bit data buses attached to the instruction-cache and data-cache units. Two of the 64-bit buses are attached to the data-cache unit, one supporting read operations and the other supporting write operations. The third 64-bit bus is attached to the instruction-cache unit to support instruction fetching.
338 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
PPC405 Features
R
Device Control Register
The device control register (DCR) bus interface supports the attachment of on-chip registers for device control. Software can access these registers using the mfdcr and mtdcr instructions.
Clock and Power Management
The clock and power-management interface supports several methods of clock distribution and power management.
JTAG Port
The JTAG port interface supports the attachment of external debug tools. Using the JTAG test-access port, a debug tool can single-step the processor and examine internal-processor state to facilitate software debugging. This capability complies with the IEEE 1149.1 specification for vendor-specific extensions, and is therefore compatible with standard JTAG hardware for boundary-scan system testing.
On-Chip Interrupt Controller
The on-chip interrupt controller interface is an external interrupt controller that combines asynchronous interrupt inputs from on-chip and off-chip sources and presents them to the core using a pair of interrupt signals (critical and noncritical). Asynchronous interrupt sources can include external signals, the JTAG and debug units, and any other on-chip peripherals.
On-Chip Memory Controller
An on-chip memory (OCM) interface supports the attachment of additional memory to the instruction and data caches that can be accessed at performance levels matching the cache arrays.
March 2002 Release www.xilinx.com 339 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 1: Introduction to the PPC405
340 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro Platform FPGA Documentation
R

Operational Concepts

This chapter describes the operational concepts governing the PPC405 programming model. These concepts include the execution and memory-access models, processor operating modes, memory organization and management, and instruction conventions.

Execution Model

From a software viewpoint, PowerPC® processors implement a sequential-execution model. That is, the processors appear to execute instructions in program order. Internally and invisible to software, PowerPC processors can execute instructions out-of-order and can speculatively execute instructions. The processor is responsible for maintaining an in­order execution state visible to software. The execution of an instruction sequence can be interrupted by an exception caused by one of the executing instructions or by an asynchronous event. The PPC405 does not support out-of-order instruction execution. However, the processor does support speculative instruction execution, typically by predicting the outcome of branch instructions.
As described in Ordering Memory Accesses, page 448, the PowerPC architecture specifies a weakly consistent memory model for shared-memory multiprocessor systems. The weakly consistent memory model allows system bus operations to be reordered dynamically. The goal of reordering bus operations is to reduce the effect of memory latency and improving overall performance. In single-processor systems, loads and stores can be reordered dynamically to allow efficient utilization of the processor bus. Loads can be performed speculatively to enhance the speculative-execution capabilities. This model provides an opportunity for significantly improved performance over a model that has stronger memory-consistency rules, but places the responsibility for access ordering on the programmer.
When a program requires strict instruction-execution ordering or memory-access ordering for proper execution, the programmer must insert the appropriate ordering or synchronization instructions into the program. These instructions are described in
Synchronizing Instructions, page 424. The concept of synchronization is described in the Synchronization Operations section that follows.
The PPC405 supports many aspects of the weakly consistent model but not all of them. Specifically, the PPC405 does not provide hardware support for multiprocessor memory coherency and does not support speculative loads. If the order of memory accesses is important to the correct operation of a program, care must be taken in porting such a program from the PPC405 to a processor that supports multiprocessor memory coherency and speculative loads.
Chapter 2
March 2002 Release www.xilinx.com 341 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R

Synchronization Operations

Various forms of synchronizing operations can be used by programs executing on the PPC405 processor to control the behavior of instruction execution and memory accesses. Synchronizing operations fall into the following three categories:
Context synchronization
Execution synchronization
Storage synchronization
Each synchronization category is described in the following sections. Instructions provided by the PowerPC architecture for synchronization purposes are described on
page 424.

Context Synchronization

The state of the execution environment (privilege level, translation mode, and memory protection) defines a programs context. An instruction or event is context synchronizing if the operation satisfies all of the following conditions:
Instruction dispatch is halted when the operation is recognized by the processor. This means the instruction-fetch mechanism stops issuing (sending) instructions to the execution units.
The operation is not initiated (for instructions, this means dispatched) until all prior instructions complete execution to a point where they report any exceptions they cause to occur. In the case of an instruction-synchronize (isync) instruction, the isync does not complete execution until all prior instructions complete execution to a point where they report any exceptions they cause to occur.
All instructions that precede the operation complete execution in the context they were initiated. This includes privilege level, translation mode, and memory protection.
All instructions following the operation complete execution in the new context established by the operation.
If the operation is an exception, or directly causes an exception to occur (for example, the sc instruction causes a system-call exception), the operation is not initiated until all higher-priority exceptions are recognized by the exception mechanism.
The system-call instruction (sc), return-from-interrupt instructions (rfi and rfci), and most exceptions are examples of context-synchronizing operations.
Context-synchronizing operations do not guarantee that subsequent memory accesses are performed using the memory context established by previous instructions. When memory-access ordering must be enforced, storage-synchronizing instructions are required.
Chapter 2: Operational Concepts

Execution Synchronization

An instruction is execution synchronizing if it satisfies the conditions of the first two items (as described above) for context synchronization:
Instruction dispatch is halted when the operation is recognized by the processor. This means the instruction-fetch mechanism stops issuing (sending) instructions to the execution units.
The operation is not initiated until all instructions in execution complete to a point where they report any exceptions they cause to occur. In the case of a synchronize (sync) instruction, the sync does not complete execution until all prior instructions complete execution to a point where they report any exceptions they cause to occur.
The sync and move-to machine-state register (mtmsr) instructions are examples of execution­synchronizing instructions.
342 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Processor Operating Modes

All context-synchronizing instructions are execution synchronizing. However, unlike a context-synchronizing operation, there is no guarantee that subsequent instructions execute in the context established by an execution-synchronizing instruction. The new context becomes effective sometime after the execution-synchronizing instruction completes and before or during a subsequent context-synchronizing operation.

Storage Synchronization

The PowerPC architecture specifies a weakly consistent memory model for shared­memory multiprocessor systems. With this model, the order that the processor performs memory accesses, the order that those accesses complete in memory, and the order that those accesses are viewed as occurring by another processor can all differ. The PowerPC architecture supports storage-synchronizing operations that provide a capability for enforcing memory-access ordering, allowing programs to share memory. Support is also provided to allow programs executing on a processor to share memory with some other mechanism that can access memory, such as an I/O device.
Device control registers (DCRs) are treated as memory-mapped registers from a synchronization standpoint. Storage-synchronization operations must be used to enforce synchronization of DCR reads and writes.
Processor Operating Modes
R
The PowerPC architecture defines two levels of privilege, each with an associated processor operating mode:
Privileged mode
User mode
The processor operating mode is controlled by the privilege-level field in the machine-state register (MSR[PR]). When MSR[PR] = 0, the processor operates in privileged mode. When MSR[PR] = 1, the processor operates in user mode. MSR[PR] = 0 following reset, placing the processor in privileged mode. See Machine-State Register, page 431 for more information on this register.
Attempting to execute a privileged instruction when in user mode causes a privileged­instruction program exception (see Program Interrupt (0x0700), page 511).
Throughout this book, the terms privileged and system are used interchangeably to refer to software that operates under the privileged-programming model. Likewise, the terms user and application are used to refer to software that operates under the user-programming model. Registers and instructions are defined as either privileged or user, indicating which of the two programming models they belong to. User registers and user instructions belong to both the user-programming and privileged-programming models.

Privileged Mode

Privileged mode allows programs to access all registers and execute all instructions supported by the processor. The privileged-programming model comprises the entire register set and instruction set supported by the PPC405. Operating systems are typically the only software that runs in privileged mode.
The registers available only in privileged mode are shown in Figure 4-1, page 430. Refer to the corresponding section describing each register for more information. The instructions available only in privileged mode are shown in Ta b le 4 - 3, pa g e 43 4 . The operation of each instruction is described in Chapter 11, Instruction Set.
Privileged mode is sometimes referred to as supervisor state.
March 2002 Release www.xilinx.com 343 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R

User Mode

User mode restricts access to some registers and instructions. The user-programming model comprises the register set and instruction set supported by the processor running in user mode, and is a subset of the privileged-programming model. Operating systems typically confine the execution of application programs to user mode, thereby protecting system resources and other software from the effects of errant applications.
The registers available in user mode are shown in Figure 3-1, page 360. Refer to the corresponding section in Chapter 3 for a description of each register. All instructions are available in user mode except as shown in Table 4 - 3, p age 4 34.
User mode is sometimes referred to as problem state.

Memory Organization

PowerPC programs reference memory using an effective address computed by the processor when executing a load, store, branch, or cache-control instruction, and when fetching the next-sequential instruction. Depending on the address-relocation mode, this effective address is either used to directly access physical memory or is treated as a virtual address that is translated into physical memory.

Effective-Address Calculation

Chapter 2: Operational Concepts
Programs reference memory using an effective address (also called a logical address). An effective address (EA) is the 32-bit unsigned sum computed by the processor when accessing memory, executing a branch instruction, or fetching the next-sequential instruction. An EA is often referred to as the next-instruction address (NIA) when it is used to fetch an instruction (sequentially or as the result of a branch). The input values and method used by the processor to calculate an EA depend on the instruction that is executed.
When accessing data in memory, effective addresses are calculated in one of the following ways:
EA = (rA|0)this is referred to as register-indirect addressing.
EA = (rA|0) + offsetthis is referred to as register-indirect with immediate-index
addressing.
EA = (rA|0) + (rB)this is referred to as register-indirect with index addressing.
Note: In the above, the notation (rA|0) specifies the following:
If the rA instruction field is 0, the base address is 0. If the rA instruction field is not 0, the contents of register rA are used as the base address.
When instructions execute sequentially, the next-instruction effective address is the current-instruction address (CIA) + 4. This is because all instructions are four bytes long. When branching to a new address, the next-instruction effective address is calculated in one of the following ways:
NIA = CIA + displacementthis is referred to as branch-to-relative addressing.
NIA = displacementthis is referred to as branch-to-absolute addressing.
NIA = (LR)this is referred to as branch to link-register addressing.
NIA = (CTR)this is referred to as branch to count-register addressing.
When the NIA is calculated for a branch instruction, the two low-order bits (30:31) are always cleared to 0, forcing word-alignment of the address. This is true even when the address is contained in the LR or CR, and the register contents are not word-aligned.
All effective-address computations are performed by the processor using unsigned binary arithmetic. Carries from bit 0 are ignored and the effective address wraps from the
32
maximum address (2
-1) to address 0 when the calculation overflows.
344 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Memory Management

Physical Memory

Virtual Memory

R
Physical memory represents the address space of memory installed in a computer system, including memory-mapped I/O devices. Generally, the amount of physical memory actually available in a system is smaller than that supported by the processor. When address translation is supported by the operating systemas it is in virtual-memory systemsthe very-large virtual-address space is translated into the smaller physical­address space using the memory-management resources supported by the processor.
The PPC405 supports up to four gigabytes of physical memory using a 32-bit physical address. A hierarchical-memory system involving external (system) memory and the caches internal to the processor are employed to support that address space. The PPC405 supports separate level-1 (L1) caches for instructions and data. The operation and control of these caches is described in Chapter 5, Memory-System Management.
Virt u al m e mory is a relocatable address space that is generally larger than the physical­memory space installed in a computer system. Operating systems relocate (map) applications and data in virtual memory so it appears that more memory is available than actually exists. Virtual memory software moves unused instructions and data between physical memory and external storage devices (such as a hard drive) when insufficient physical memory is available. The PPC405 supports a 40-bit virtual address that allows privileged software to manage a one-terabyte virtual-memory space.
Memory Management
Memory management describes the collection of mechanisms used to translate the addresses generated by programs into physical-memory addresses. Memory management also consists of the mechanisms used to characterize memory-region behavior, also referred to as storage control. Memory management is performed by privileged-mode software and is completely transparent to user-mode programs running in virtual mode.
The PPC405 is a PowerPC embedded-environment implementation. The memory­management resources defined by the PowerPC embedded-environment architecture (and its successor, the PowerPC Book-E architecture) differ significantly from the resources defined by the PowerPC architecture. The resources defined by the PowerPC embedded environment architecture are well-suited for the special requirements of embedded-system applications. The resources defined by the PowerPC architecture better meet the requirements of desktop and commercial-workstation systems.
Generally, the differences between the two memory-management mechanisms are as follows:
The PPC405 supports software page translation and provides special instructions for managing the page tables and the translation look-aside buffer (TLB) internal to the processor. The page-translation table format, organization, and search algorithms are software-dependent and transparent to the PPC405 processor. The PowerPC architecture, on the other hand, defines the page-translation table organization, format, and search algorithms. It does not define support for the special page table and TLB instructions but instead assumes the processor hardware is responsible for searching page tables and updating the TLB.
The PPC405 supports variable-sized pages. The PowerPC architecture defines fixed-size pages of 4 KB.
The PPC405 does not support the segment-translation mechanism defined by the PowerPC architecture.
The PPC405 does not support the block-address-translation (BAT) mechanism defined by the PowerPC architecture.
March 2002 Release www.xilinx.com 345 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 2: Operational Concepts
Additional storage-control attributes not defined by the PowerPC architecture are supported by the PPC405. The methods for using these attributes to characterize memory regions also differ.
At a high level, Figure 2-1 shows the differences between 32-bit memory management in the PowerPC embedded-environment architecture (and PowerPC Book-E architecture) and in the PowerPC architecture. See Chapter 6, Virtual-Memory Management for more information on the resources supported by the PPC405. Additional information on the
®
differences with the PowerPC architecture is described in Appendix E, PowerPC
6xx/7xx
Compatibility. PowerPC Book-E architecture extends the resources first defined by the
PowerPC embedded-environment architecture. A description of those extensions is in
®
Appendix F, PowerPC
Book-E Compatibility.
PowerPC Embedded Environment
PowerPC Book-E
32-Bit Effective Address
PID
40-Bit Virtual Address
Page
Translation
32-Bit Physical Address
PowerPC Architecture
32-Bit Effective Address
Segment
Translation
51-Bit Virtual Address
Page
Translation
32-Bit Physical Address
Block
Address
Translation
UG011_13_033101
Figure 2-1: PowerPC 32-Bit Memory Management

Addressing Modes

Programs can use 32-bit effective addresses to reference the 4 GB physical-address space using one of two addressing modes:
Real mode
Virtual mode
Real mode and virtual mode are enabled and disabled independently for instruction fetches and data accesses. The instruction-fetch address mode is controlled using the instruction-relocate (IR) field in the machine-state register (MSR). When MSR[IR] = 0, instruction fetches are performed in real mode. When MSR[IR] = 1, instruction fetches are performed in virtual mode. Similarly, the data-access address mode is controlled using the data-relocate (DR) field in the MSR. When MSR[DR] = 0, data accesses are performed in real mode. Setting MSR[DR] = 1 enables virtual mode for data accesses. See Virtual Mode,
page 472 for more information on these fields.
346 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operand Conventions

Real Mode
In real mode, an effective address is used directly as the physical address into the 4 GB address space. Here, the logical-address space is mapped directly onto the physical­address space.
Virtual Mode
In virtual mode, address translation is enabled. Effective addresses are translated into physical addresses using the memory-management unit, as shown in Figure 2-1, page 346. In this mode, pages within the logical-address space are mapped onto pages in the physical-address space. An overview of memory management is provided in the following section.
Operand Conventions
Bit positions within registers and memory operands (bytes, halfwords, and words) are numbered consecutively from left to right, starting with zero. The most-significant bit is always numbered 0. The number assigned to the least-significant bit depends on the size of the register or memory operand, as follows:
Bytethe least-significant bit is numbered 7.
Halfwordthe least-significant bit is numbered 15.
Wo rd the least-significant bit is numbered 31.
A bit set to 1 has a numerical value associated with its position (b) relative to the least­significant bit (lsb). This value is equal to 2(lsb-b). For example, if bit 5 is set to 1 in a byte, halfword, or word memory operand, its value is determined as follows:
Bytethe value is 2(7-5), or 4 .
Halfwordthe value is 2(15-5), or 1024 .
Wo rd the value is 2(31-5), or 67108864 .
Bytes in memory are addressed consecutively starting with zero. The PPC405 supports both big-endian and little-endian byte ordering, with big-endian being the default byte ordering. Bit ordering within bytes and registers is always big endian.
The operand length is implicit for each instruction. Memory operands can be bytes (eight bits), halfwords (two bytes), words (four bytes), or strings (one to 128 bytes). For the load/store multiple instructions, memory operands are a sequence of words. The address of any memory operand is the address of its first byte (that is, of its lowest-numbered byte).
Figure 2-2 shows how word, halfword, and byte operands appear in memory (using big-
endian ordering) and in a register. The memory operand appears on the left in this diagram and the equivalent register representation appears on the right.
The following sections describe the concepts of byte ordering and data alignment, and their significance to the PowerPC PPC405.
R
March 2002 Release www.xilinx.com 347 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 2: Operational Concepts
Bit Weight
Bit Number
Word
Halfword
Byte
Memory Content
LSB
MSB
Byte 3
Byte 2
Byte 1
Byte 0
7
2
Memory Content
LSB
MSB
Byte 1
Byte 0
7
2
Memory Content
MSB
Byte 0
7
2
Figure 2-2: Operand Data Types
Register Content
0
2
0x04
0x03
0x02
031
Byte 0 Byte 1 Byte 2 Byte 3
31
2
Memory Address
0x01
0x00
0
2
Register Content
Byte 0 Byte 1
15
2
0
2
0x04
0x03
0x02
031
Memory Address
0x01
0x00
0
2
Register Content
2
7
Byte 0
0
2
0x04
0x03
0x02
031
Memory Address
0x01
0x00
0
2
UG011_14_100901
348 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Operand Conventions

Byte Ordering

Structure-Mapping Examples
R
The order that addresses are assigned to individual bytes within a scalar (a single data object or instruction) is referred to as endianness. Halfwords, words, and doublewords all consist of more than one byte, so it is important to understand the relationship between the bytes in a scalar and the addresses of those bytes. For example, when the processor loads a register with a value from memory, it needs to know which byte in memory holds the high­order byte, which byte holds the next-highest-order byte, and so on.
Computer systems generally use one of the following two byte orders to address data:
Big-endian ordering assigns the lowest-byte address to the highest-order (“left-most) byte in the scalar. The next sequential-byte address is assigned to the next-highest byte, and so on. The term big endian is used because the big end of the scalar (when considered as a binary number) comes first in memory.
Little-endian ordering assigns the lowest-byte address to the lowest-order (“right- most) byte in the scalar. The next sequential-byte address is assigned to the next­lowest byte, and so on. The term little endian is used because the little end of the scalar (when considered as a binary number) comes first in memory.
The following sections further describe the differences between big-endian and little­endian byte ordering. The default byte ordering assumed by the PPC405 is big-endian. However, the PPC405 also fully supports little-endian peripherals and memory.
The following C language structure, s, contains an assortment of scalars and a character string. The comments show the values assumed in each structure element. These values show how the bytes comprising each structure element are mapped into memory.
struct {
int a; /* 0x1112_1314 word */ long long b; /* 0x2122_2324_2526_2728 doubleword */ char *c; /* 0x3132_3334 word */ char d[7]; /* ’A’,’B’,’C’,’D’,’E’,’F’,’G’ array of bytes */ short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */
} s;
C structure-mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries. The structure-mapping examples show how each scalar aligns on its natural boundary (the alignment boundary is equal to the scalar size). This alignment introduces padding of four bytes between a and b, one byte between d and e, and two bytes between e and f. The same amount of padding is present in both big-endian and little­endian mappings.
March 2002 Release www.xilinx.com 349 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 2: Operational Concepts
Big-Endian Mapping
The big-endian mapping of structure s follows. The contents of each byte, as defined in structure s, is shown as a (hexadecimal) number or character (for the string elements). Data addresses (in hexadecimal) are shown below the corresponding data value.
11 12 13 14
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
21 22 23 24 25 26 27 28
0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F
31 32 33 34 ’A’’B’’C’’D
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
E’’F’’G’ 51 52
0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F
61 62 63 64
0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27
Little-Endian Mapping
The little-endian mapping of structure s follows.
14 13 12 11
0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
28 27 26 25 24 23 22 21
0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F
34 33 32 31 ’A’’B’’C’’D
0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17
E’’F’’G’ 52 51
0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F
64 63 62 61
0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27
Little-Endian Byte Ordering Support
Except as noted, this book describes the processor from the perspective of big-endian operations. However, the PPC405 processor also fully supports little-endian operations. This support is provided by the endian (E) storage attribute described in the following sections. The endian-storage attribute is defined by both the PowerPC embedded­environment architecture and PowerPC Book-E architecture.
Little-endian mode, defined by the PowerPC architecture, is not implemented by the PPC405. Little-endian mode does not support true little-endian memory accesses. This is because little-endian mode modifies memory addresses rather than reordering bytes as they are accessed. Memory-address modification restricts how the processor can access misaligned data and I/O. The PPC405 little-endian support does not have these restrictions.
350 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Operand Conventions
Endian (E) Storage Attribute
R
The endian (E) storage attribute allows the PPC405 to support direct connection of little­endian peripherals and memory containing little-endian instructions and data. An E storage attribute is associated with every memory referenceinstruction fetch, data load, and data store. The E attribute specifies whether the memory region being accessed should be interpreted as big endian (E = 0) or little endian (E = 1).
If virtual mode is enabled (MSR[IR] = 1 or MSR[DR] = 1), the E field in the corresponding TLB entry defines the endianness of a memory region. When virtual mode is disabled (MSR[IR] = 0 and MSR[DR] = 0), the SLER defines the endianness of a memory region. See
Chapter 6, Virtual-Memory Management for more information on virtual memory, and Storage Little-Endian Register (SLER), page 455 for more information on the SLER.
When a memory region is defined as little endian, the processor accesses those bytes as if they are arranged in true little-endian order. Unlike the little-endian mode defined by the PowerPC architecture, no address modification is performed when accessing memory regions designated as little endian. Instead, the PPC405 reorders the bytes as they are transferred between the processor and memory.
On-the-fly reversal of bytes in little-endian memory regions is handled in one of two ways, depending on whether the memory access is an instruction fetch or a data access (load or store). The following sections describe byte reordering for both types of memory accesses.
Little-Endian Instruction Fetching
Instructi ons are word (four-byte) data types th at are always aligned on word boundaries i n memory. Instructions stored in a big-endian memory region are arranged with the most­significant byte (MSB) of the instruction word at the lowest byte address.
Consider the big-endian mapping of instruction p at address 0x00, where, for example, p is an add r7,r7,r4 instruction (instruction opcode bytes are shown in hexadecimal on top, with the corresponding byte address shown below):
MSB LSB
7C E7 22 14
0x00 0x01 0x02 0x03
In the little-endian mapping, instruction p is arranged with the least-significant byte (LSB) of the instruction word at the lowest byte address:
LSB MSB
14 22 E7 7C
0x00 0x01 0x02 0x03
The instruction decoder on the PPC405 assumes the instructions it receives are in big­endian order. When an instruction is fetched from memory, the instruction must be placed in the instruction queue in big-endian order so that the instruction is properly decoded. When instructions are fetched from little-endian memory regions, the four bytes of an instruction word are reversed by the processor before the instruction is decoded. This byte reversal occurs between memory and the instruction-cache unit (ICU) and is transparent to software. The ICU always stores instructions in big-endian order regardless of whether the instruction-memory region is defined as big endian or little endian. This means the bytes are already in the proper order when an instruction is transferred from the ICU to the instruction decoder.
If the endian-storage attribute is changed, the affected memory region must be reloaded with program and data structures using the new endian ordering. If the endian ordering of
March 2002 Release www.xilinx.com 351 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 2: Operational Concepts
instruction memory changes, the ICU must be made coherent with the updates. This is accomplished by invalidating the ICU and updating the instruction memory with instructions using the new endian ordering. Subsequent fetches from the updated memory region are interpreted correctly before they are cached and decoded. See Instruction-
Cache Control Instructions, page 456 for information on instruction-cache invalidation.
Little-Endian Data Accesses
Unlike instruction fetches, data accesses from little-endian memory regions are not byte­reversed between memory and the data-cache unit (DCU). The data-byte ordering stored in memory depends on the data size (byte, halfword, or word). The data size is not known until the data item is moved between memory and a general-purpose register. In the PPC405, byte reversal of load and store accesses is performed between the DCU and the GPRs.
When accessing data in a little-endian memory region, the processor automatically does the following regardless of data alignment:
For byte loads/stores, no reordering occurs
For halfword loads/stores, bytes are reversed within the halfword
For word loads/stores, bytes are reversed within the word
The big-endian and little-endian mappings of the structure s, shown in Structure-
Mapping Examples, page 349, demonstrate how the size of a data item determines its byte
ordering. For example:
The word a has its four bytes reversed within the word spanning addresses 0x000x03
The halfword e has its two bytes reversed within the halfword spanning addresses 0x1C0x1D
The array of bytes d (where each data item is a byte) is not reversed when the big-
endian and little-endian mappings are compared (For example, the character 'A' is located at address 14 in both the big-endian and little-endian mappings)
In little-endian memory regions, data alignment is treated as it is in big-endian memory regions. Unlike little-endian mode in the PowerPC architecture, no special alignment exceptions occur when accessing data in little-endian memory regions versus big-endian regions.
Load and Store Byte-Reverse Instructions
When accessing big-endian memory regions, load/store instructions move the more­significant register bytes to and from the lower-numbered memory addresses and the less­significant register bytes are moved to and from the higher-numbered memory addresses. The load/store with byte-reverse instructions, as described in Load and Store with Byte-
Reverse Instructions, page 385, do the opposite. The more-significant register bytes are
moved to and from the higher-numbered memory addresses, and the less-significant register bytes are moved to and from the lower-numbered memory addresses.
Even though the load/store with byte-reverse instructions can be used to access little­endian memory, the E storage attribute provides two advantages over using those instructions:
The load/store with byte-reverse instructions do not solve the problem of fetching instructions from a little-endian memory region. Only the E storage attribute mechanism supports little-endian instruction fetching.
Typical compilers cannot make general use of the load/store with byte-reverse instructions, so these instructions are normally used only in device drivers written in hand-coded assembler. However, compilers can take full advantage of the E storage­attribute mechanism, allowing application programmers working in a high-level language, such as C, to compile programs and data structures using little-endian ordering.
352 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Operand Conventions

Operand Alignment

R
The operand of a memory-access instruction has a natural alignment boundary equal to the operand length. In other words, the natural address of an operand is an integral multiple of the operand length. A memory operand is said to be aligned if it is aligned on its natural boundary, otherwise it is misaligned.
All instructions are words and are always aligned on word boundaries.
Ta bl e 2 -1 shows the value required by the least-significant four address bits (bits 28:31) of
each data type for it to be aligned in memory. A value of x in a given bit position indicates the address bit can have a value of 0 or 1.
Table 2-1: Memory Operand Alignment Requirements
Data Type Size
Aligned Address
Bits 28:31
Byte 8 Bits xxxx
Halfword 2 Bytes xxx0
Word 4 Bytes xx00
Doubleword 8 Bytes x000
The concept of alignment can be generally applied to any data in memory. For example, a 12-byte data item is said to be word aligned if its address is a multiple of four.
Some instructions require aligned memory operands. Also, alignment can affect performance. For single-register memory access instructions, the best performance is obtained when memory operands are aligned.
Alignment and Endian Storage Control
The endian storage-control attribute (E) does not affect how the processor handles operand alignment. Data alignment is handled identically for accesses to big-endian and little­endian memory regions. No special alignment exceptions occur when accessing data in little-endian memory regions. However, alignment exceptions that apply to big-endian memory accesses also apply to little-endian memory accesses.
Performance Effects of Operand Alignment
The performance of accesses varies depending on the following parameters:
Operand size
Operand alignment
Boundary crossing:
-None
-Cache block
-Page
To obtain the best performance across the widest range of PowerPC embedded­environment implementations and PowerPC Book-E processor implementations, programmers should assume the alignment performance effects described in Figure 2-2. This table applies to both big-endian and little-endian accesses. Figure 2-2 also applies to PowerPC processors running in the default big-endian mode. However, those same processors suffer further performance degradation when running in PowerPC little­endian mode.
March 2002 Release www.xilinx.com 353 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 2-2: Performance Effects of Operand Alignment
Size Byte Alignment None Cache Block Page
Byte 1 Optimal Not Applicable
Halfword 2 Optimal Not Applicable
Word 4 Optimal Not Applicable
Multiple Word 4 Good Good Good
Byte String 1 Good Good Poor
Note: Assumes both pages have identical storage-control attributes. Performance is poor otherwise.
Alignment Exceptions
Misalignment occurs when addresses are not evenly divided by the data-object size. The PPC405 automatically handles misalignments within word boundaries and across word boundaries, generally at a cost in performance. Some instructions cause an alignment exception if their operand is not properly aligned, as shown in Tab le 2 -3 .
Chapter 2: Operational Concepts
Operand Boundary Crossing
1 Good Good Poor
<4 Good Good Poor
1
Table 2-3: Instructions Causing Alignment Exceptions
Mnemonic Condition
dcbz
dcread, lwarx, stwcx
Cache-control instructions ignore the four least-significant bits of the EA. No alignment restrictions are placed on an EA when executing a cache-control instruction. However, certain storage-control attributes can cause an alignment exception to occur when a cache­control instruction is executed. If data-address translation is disabled (MSR[DR]=0) and a dcbz instruction references a non-cacheable memory region, or the memory region uses a write-through caching policy, an alignment exception occurs. The alignment exception allows the operating system to emulate the write-through caching policy. See Alignment
Interrupt (0x0600), page 510 for more information.

Instruction Conventions

Instruction Forms

Opcode tables and instruction listings often contain information regarding the instruction form. This information refers to the type of format used to encode the instruction. Grouping instructions by format is useful for programmers that must deal directly with machine­level code, particularly programmers that write assemblers and disassemblers.
The formats used for the instructions of the PowerPC embedded-environment architecture are shown in Instructions Grouped by Form, page 792. The Instruction Set Information,
page 797 also shows the form used by each instruction, listed alphabetically by mnemonic.
EA is in non-cacheable or write-through memory.
EA is not word aligned.
354 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Instruction Conventions

Instruction Classes

R
PowerPC instructions belong to one of the following three classes:
Defined
Illegal
Reserved
An instruction class is determined by examining the primary opcode, and the extended opcode if one exists. If the opcode and extended opcode combination does not specify a defined instruction or reserved instruction, the instruction is illegal. Although the definitions of these terms are consistent among PowerPC processor implementations, the assignment of these classifications is not. For example, an instruction specific to 64-bit implementations is considered defined for 64-bit implementations but illegal for 32-bit implementations.
In future versions of the PowerPC architecture, instruction encodings that are now illegal or reserved can become defined (by being added to the architecture) or reserved (by being assigned a special purpose in an implementation).
Boundedly Undefined
The results of executing an instruction are said to be boundedly undefined if those results could be achieved by executing an arbitrary sequence of instructions, starting in the machine state prior to executing the given instruction. Boundedly-undefined results for an instruction can vary between implementations and between different executions on the same implementation.
Defined Instruction Class
Defined instructions contain all the instructions defined by the PowerPC architecture. Defined instructions are guaranteed to be supported by all implementations of the PowerPC architecture. The only exceptions are the instructions defined only for 64-bit implementations, instructions defined only for 32-bit implementations, and instructions defined only for embedded implementations. A PowerPC processor can invoke the illegal­instruction error handler (through the program-interrupt handler) when an unimplemented instruction is encountered, allowing emulation of the instruction in software.
A defined instruction can have preferred forms and invalid forms as described in the following sections.
Preferred Instruction Forms
A preferred form of a defined instruction is one in which the instruction executes in an efficient manner. Any form other than the preferred form can take significantly longer to execute. The following instructions have preferred forms:
Load-multiple and store-multiple instructions
Load-string and store-string instructions
OR-immediate instruction (preferred form of no-operation)
Invalid Instruction Forms
An invalid form of a defined instruction is one in which one or more operands are coded incorrectly and in a manner that can be deduced only by examining the instruction encoding (primary and extended opcodes). For example, coding a value of 1 in a reserved bit (normally cleared to 0) produces an invalid instruction form.
The following instructions have invalid forms:
Branch-conditional instructions
Load with update and store with update instructions
Load multiple instructions
March 2002 Release www.xilinx.com 355 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Load string instructions
Integer compare instructions
On the PPC405, attempting to execute an invalid instruction form generally yields a boundedly-undefined result, although in some cases a program exception (illegal­instruction error) can occur.
Optional Instructions
The PowerPC architecture allows implementations to optionally support some defined instructions. The PPC405 does not implement the following instructions:
Floating-point instructions
External-control instructions (eciwx, ecowx)
Invalidate TLB entry (tlbie)
Illegal Instruction Class
Illegal instructions are grouped into the following categories:
Unused primary opcodes. The following primary opcodes are defined as illegal but can be defined by future extensions to the architecture:
1, 5, 6, 56, 57, 60, 61
Unused extended opcodes. Unused extended opcodes can be derived from information in Instructions Sorted by Opcode, page 781. The following primary opcodes have unused extended opcodes:
Chapter 2: Operational Concepts
19, 31, 59, 63
An instruction consisting entirely of zeros is guaranteed to be an illegal instruction. This increases the probability that an attempt to execute data or uninitialized memory causes an illegal-instruction error. If only the primary opcode consists of all zeros, the instruction is considered a reserved instruction, as described in the following section.
An attempt to execute an illegal instruction causes an illegal-instruction error (program exception). With the exception of an instruction consisting entirely of zeros, illegal instructions are available for future addition to the PowerPC architecture.
Reserved Instruction Class
Reserved instructions are allocated to specific implementation-dependent purposes not defined by the PowerPC architecture. An attempt to execute an unimplemented reserved instruction causes an illegal-instruction error (program exception). The following types of instructions are included in this class:
Instructions for the POWER architecture that have not been included in the PowerPC architecture.
Implementation-specific instructions used to conform to the PowerPC architecture specification. For example, load data-TLB entry (tlbld) and load instruction-TLB entry (tlbli) instructions in the PowerPC 603.
The instruction with primary opcode 0, when the instruction does not consist entirely of binary zeros.
Any other implementation-specific instruction not defined by the PowerPC architecture.
PowerPC Embedded-Environment Instructions
To support functions required in embedded-system applications, the PowerPC embedded­environment architecture defines instructions that are not part of the PowerPC architecture. Tab l e 2- 4 lists the instructions specific to the PPC405 and other PowerPC embedded-environment family implementations. From the standpoint of the PowerPC architecture, these instructions are part of the reserved class and are implementation
356 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Instruction Conventions
R
dependent. Programs using these instructions are not portable to implementations that do not support the PowerPC embedded-environment architecture.
In the table, the syntax “[o]” indicates the instruction has an overflow-enabled form that updates XER[OV,SO] as well as a non-overflow-enabled form. The syntax “[.]” indicates the instruction has a record form that updates CR[CR0] as well as a non-record form. The headings defined and allocated, as they are used in Tab le 2 -4 , are described in the following section, PowerPC Book-E Instruction Classes.
Table 2-4: PowerPC Embedded-Environment Instructions
Defined (Book-E) Allocated (Book-E)
mfdcr
mtdcr
rfci
wrtee
wrteei
tlbre
tlbsx[.]
tlbwe
dccci
dcread
iccci
icread

PowerPC Book-E Instruction Classes

The PowerPC Book-E architecture defines four instruction classes:
Defined
Allocated
Reserved
Preserved
Referring to Ta bl e 2 -4 , the first two columns indicate which PPC405 instructions are part of the defined instruction class and are guaranteed support in PowerPC Book-E processor implementations. The last three columns indicate which PPC405 instructions are part of the allocated instruction class. Support of these instructions by PowerPC Book-E processors is implementation-dependent.
macchw[o][.]
macchws[o][.]
macchwsu[o][.]
macchwu[o][.]
machhw[o][.]
machhws[o][.]
machhwsu[o][.]
machhwu[o][.]
maclhw[o][.]
maclhws[o][.]
maclhwsu[o][.]
maclhwu[o][.]
nmacchw[o][.]
nmacchws[o][.]
nmachhw[o][.]
nmachhws[o][.]
nmaclhw[o][.]
nmaclhws[o][.]
mulchw[.]
mulchwu[.]
mulhhw[.]
mulhhwu[.]
mullhw[.]
mullhwu[.]
Defined Book-E Instruction Class
The defined instruction class consists of all instructions defined by the PowerPC Book E architecture. In general, defined instructions are guaranteed to be supported by a PowerPC Book E processor as specified by the architecture, either within the processor implementation itself or within emulation software supported by the operating system.
Allocated Book-E Instruction Class
The allocated instruction class contains the set of instructions used for implementation­dependent and application-specific use, outside the scope of the PowerPC Book E architecture.
March 2002 Release www.xilinx.com 357 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Reserved Book-E Instruction Class
The reserved instruction class consists of all instruction primary opcodes (and associated extended opcodes, if applicable) that do not belong to either the defined class or the allocated class.
Preserved Book-E Instruction Class
The preserved instruction class is provided to support backward compatibility with previous generations of this architecture.
Chapter 2: Operational Concepts
358 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro Platform FPGA Documentation
R

User Programming Model

This chapter describes the processor resources and instructions available to all programs running on the PPC405, whether they are running in user mode or privileged mode. These resources and instructions are referred to as the user-programming model, which is a subset of the privileged-programming model. Applications are typically restricted to running in user mode. System software runs in privileged mode and has access to all register processor resources, and can execute all instructions supported by the PPC405. System software typically creates a context (execution environment) that protects itself and other applications from the effects of an errant application program.
The remaining chapters in this book generally describe aspects of the privileged­programming model and are not relevant to application programmers. There are two exceptions:
Chapter 5, Memory-System Management, describes cache management features
available to both system and application programs.
Chapter 8, Timer Resources, describes the time base, which can be read by
application programs.
Chapter 3

User Registers

Figure 3-1 shows the user registers supported by the PPC405, all of which are available to
software running in user mode and privileged mode. In the PPC405, all user registers are 32-bits wide, except for the time base as described in Time Base, page 524. Floating-point registers are not supported by the PPC405.
March 2002 Release www.xilinx.com 359 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
e
Chapter 3: User Programming Model
General-Purpose Registers
r0 r1
. . .
r31
Condition Register
CR
Time-Base Registers
TBR 0x10C
TBU

Special-Purpose Registers (SPRs)

Fixed-Point Exception Register
SPR 0x001
XER
Link Register
SPR 0x008
LR
Count Register
SPR 0x009
CTR
(read only)
TBR 0x10D
TBL
Figure 3-1: PPC405 User Registers
User-SPR General-Purpos
Registers
(SPR 0x100)
USPRG0
SPR General-Purpose
Registers
(read only)
SPR 0x104
SPRG4
SPR 0x105
SPRG5
SPR 0x106
SPRG6
SPR 0x107
SPRG7
UG011_30_033101
Most registers in the PPC405 are special-purpose registers, or SPRs. SPRs control the operation of debug facilities, timers, interrupts, storage control attributes, and other processor resources. All SPRs can be accessed explicitly using the move to special-purpose
register (mtspr) and move from special-purpose register (mfspr) instructions. See Special-
Purpose Register Instructions, page 424 for more information on these instructions. A few
registers are accessed as a by-product of executing certain instructions. For example, some branch instructions access and update the link register.
The PPC405 SPRs in the user-programming model are shown in Figure 3-1. The SPR number (SPRN) for each SPR is shown above the corresponding register. See Appendix A,
Special-Purpose Registers, page 770 for a complete list of all SPRs (user and privileged)
supported by the PPC405.
Simplified instruction mnemonics are available for the mtspr and mfspr instructions for some SPRs. See Special-Purpose Registers, page 830 for more information.

General-Purpose Registers (GPRs)

The PPC405 contains thirty-two 32-bit general-purpose registers (GPRs), numbered r0 through r31, as shown in Figure 3-2. Data from memory are read into GPRs using load instructions and the contents of GPRs are written to memory using store instructions. Most integer instructions use the GPRs for source and destination operands.
0 31
Figure 3-2: General Purpose Registers (R0-R31)
360 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
User Registers

Condition Register (CR)

The condition register (CR) is a 32-bit register that reflects the result of certain instructions and provides a mechanism for testing and conditional branching. The bits in the CR are grouped into eight 4-bit fields, CR0–CR7, as shown in Figure 3-3. The bits within an arbitrary CRn field are shown in Figure 3-4. In this figure, the bit positions shown are relative positions within the field rather than absolute positions within the CR register.
0 3 4 7 8 1112 1516 1920 2324 2728 31
CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7
Figure 3-3: Condition Register (CR)
0123
LT GT EQ SO
Figure 3-4: CRn Field
In the PPC405, the CR fields are modified in the following ways:
The mtcrf instruction can update specific fields in the CR from a GPR.
The mcrxr instruction can update a CR field with the contents of XER[0:3].
The mcrf instruction can copy one CR field into another CR field.
The condition-register logical instructions can update specific bits in the CR.
The integer-arithmetic instructions can update CR0 to reflect their result.
The integer-compare instructions can update a specific CR field to reflect their result.
Conditional-branch instructions can test bits in the CR and use the results of such a test as the branch condition.
R
CR0 Field
The CR0 field is updated to reflect the result of an integer instruction if the Rc opcode field (record bit) is set to 1. The addic., andi., and andis. instructions also update CR0 to reflect the result they produce. For all of these instructions, CR0 is updated as follows:
The instruction result is interpreted as a signed integer and algebraically compared to
0. The first three bits of CR0 (CR0[0:2]) are updated to reflect the result of the algebraic comparison.
The fourth bit of CR0 (CR0[3]) is copied from XER[SO].
The CR0 bits are interpreted as described in Tabl e 3- 1 . If any portion of the result is undefined, the value written into CR0[0:2] is undefined.
March 2002 Release www.xilinx.com 361 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 3: User Programming Model
Table 3-1: CR0-Field Bit Settings
Bit Name Function Description
0LTNegative
0Result is not negative.
1Result is negative.
1GTPositive
0Result is not positive.
1Result is positive.
2EQZero
0Result is not equal to zero.
1Result is equal to zero.
3 SO Summary overflow
0No overflow occurred.
1Overflow occurred.
CR1 Field
In PowerPC® implementations that support floating-point operations, the CR1 field can be updated by the processor to reflect the result of those operations. Because the PPC405 does not support floating-point operations in hardware, CR1 is not updated in this manner.
CRn Fields (Compare Instructions)
Any one of the eight CRn fields (including CR0 and CR1) can be updated to reflect the result of a compare instruction. The CRn-field bits are interpreted as described in Tab le 3 - 2.
This bit is set when the result is negative, otherwise it is cleared.
This bit is set when the result is positive (and not zero), otherwise it is cleared.
This bit is set when the result is zero, otherwise it is cleared.
This is a copy of the final state of XER[SO] at the completion of the instruction.
Table 3-2: CRn-Field Bit Settings
Bit Name Function Description
0 LT Less than
0
rA is not less than.
rA is less than.
1
1 GT Greater than
0rA is not greater than.
1
rA is greater than.
2EQEqual to
0rA is not equal.
rA is equal.
1
3 SO Summary overflow
0No overflow occurred.
1Overflow occurred.
This bit is set when
rA < SIMM or rB (signed comparison), or rA < UIMM or rB (unsigned comparison),
otherwise it is cleared.
This bit is set when
rA > SIMM or rB (signed comparison), or rA > UIMM or rB (unsigned comparison),
otherwise it is cleared.
This bit is set when
rA = SIMM or rB (signed comparison), or rA = UIMM or rB (unsigned comparison),
otherwise it is cleared.
This is a copy of the final state of XER[SO] at the completion of the instruction.
362 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
User Registers

Fixed-Point Exception Register (XER)

The fixed-point exception register (XER) is a 32-bit register that reflects the result of arithmetic operations that have resulted in an overflow or carry. This register is also used to indicate the number of bytes to be transferred by load/store string indexed instructions.
Figure 3-5 shows the format of the XER. The bits in the XER are defined as shown in Ta bl e 3 -3 .
0123 24 25 31
SO OV CA
Figure 3-5: Fixed Point Exception Register (XER)
Table 3-3: Fixed Point Exception Register (XER) Bit Definitions
Bit Name Function Description
TBC
R
0 SO Summary overflow
0No overflow occurred.
1Overflow occurred.
1OVOverflow
0No overflow occurred.
1Overflow occurred.
2CACarry
0Carry did not occur.
1Carry occurred.
3:24 Reserved
25:31 TBC Transfer-byte count TBC is modified using the mtspr instruction. It specifies the
SO is set to 1 whenever an instruction (except mtspr) sets the overflow bit (XER[OV]). Once set, the SO bit remains set until it is cleared to 0 by an mtspr instruction (specifying the XER) or an
mcrxr instruction. SO can be cleared to 0 and OV set to 1 using an mtspr instruction.
OV can be modified by instructions when the overflow-enable bit in the instruction encoding is set (OE=1). Add, subtract, and negate instructions set OV=1 if the carry out from the result msb is not equal to the carry out from the result msb + 1. Otherwise, they clear OV=0. Multiply and divide set OV=1 if the result cannot be represented in 32 bits. mtspr can be used to set OV=1, and mtspr and mcrxr can be used to clear OV=0.
CA can be modified by add-carrying, subtract-from-carrying, add- extended, and subtract-from-extended instructions. These instructions set CA=1 when there is a carry out from the result msb. Otherwise, they clear CA=0. Shift-right algebraic instructions set CA=1 if any 1 bits are shifted out of a negative operand. Otherwise, they clear CA=0. mtspr can be used to set CA=1, and mtspr and mcrxr can be used to clear CA=0.
number of bytes to be transferred by a load-string word indexed (lswx) or store-string word indexed (stswx) instruction.
The XER is an SPR with an address of 1 (0x001) and can be read and written using the mfspr and mtspr instructions. The mcrxr instruction can be used to move XER[0:3] into one of the seven CR fields.

Link Register (LR)

The link register (LR) is a 32-bit register that is used by branch instructions, generally for the purpose of subroutine linkage. Two types of branch instructions use the link register:
Branch-conditional to link-register (bclrx) instructions read the branch-target address from
the LR.
Branch instructions with the link-register update-option enabled load the LR with the effective address of the instruction following the branch instruction. The link-register update-option is enabled when the branch-instruction LK opcode field (bit 31) is set to 1.
The format of LR is shown in Figure 3-6.
March 2002 Release www.xilinx.com 363 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
0 31
Branch Address
Chapter 3: User Programming Model
Figure 3-6: Link Register (LR)
The LR is an SPR with an address of 8 (0x008) and can be read and written using the mfspr and mtspr instructions. It is possible for the processor to prefetch instructions along the target path specified by the LR provided the LR is loaded sufficiently ahead of the branch to link-register instruction, giving branch-prediction hardware time to calculate the branch address.
The two least-significant bits (LR[30:31]) can be written with any value. However, those bits are ignored and assumed to have a value of 0 when the LR is used as a branch-target address.
Some PowerPC processors implement a software-invisible link-register stack for performance reasons. Although the PPC405 processor does not implement such a stack, certain programming conventions should be followed so that software running on multiple PowerPC processors can benefit from this stack. See Link-Register Stack,
page 371 for more information.

Count Register (CTR)

The count register (CTR) is a 32-bit register that can be used by branch instructions in the following two ways:
The CTR can hold a loop count that is decremented by a conditional-branch instruction with an appropriately coded BO opcode field. The value in the CTR wraps to 0xFFFF_FFFF if the value in the register is 0 prior to the decrement. See
Conditional Branch Control, page 367 for information on encoding the BO opcode
field.
The CTR can hold the branch-target address used by
branch-conditional to count-register
(bcctrx) instructions.
The format of CTR is shown in Figure 3-7.
0 31
Count
Figure 3-7: Count Register (CTR)
The CTR is an SPR with an address of 9 (0x009) and can be read and written using the mfspr and mtspr instructions. It is possible for the processor to prefetch instructions along the target path specified by the CTR provided the CTR is loaded sufficiently ahead of the branch to count-register instruction, giving branch-prediction hardware time to calculate the branch address.
The two least-significant bits (CTR[30:31]) can be written with any value. However, those bits are ignored and assumed to have a value of 0 when the CTR is used as a branch-target address.

User-SPR General-Purpose Register

The user-SPR general-purpose register (USPRG0) is a 32-bit register that can be used by application software for any purpose. The value stored in this register does not have an effect on the operation of the PPC405 processor.
The format of USPRG0 is shown in Figure 3-8.
364 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
User Registers
0 31
General-Purpose Application-Software Data
Figure 3-8: User SPR General-Purpose Register (USPRG0)
The USPRG0 is an SPR with an address of 256 (0x100) and can be read and written using the mfspr and mtspr instructions.

SPR General-Purpose Registers

The SPR general-purpose registers (SPRG0–SPRG7) are 32-bit registers that can be used by system software for any purpose. Four of the registers (SPRG4–SPRG7) are available from user mode with read-only access. Application software can read the contents of SPRG4– SPRG7, but cannot modify them. The values stored in these registers do not affect the operation of the PPC405 processor.
The format of all SPRGn registers is shown in Figure 3-9.
0 31
General-Purpose System-Software Data
Figure 3-9: SPR General-Purpose Registers (SPRG4–SPRG7)
R
The SPRGn registers are SPRs with the following addresses:
SPRG4260 (0x104).
SPRG5261 (0x105).
SPRG6262 (0x106).
SPRG7263 (0x107).
These registers can be read using the mfspr instruction. In privileged mode, system software accesses these registers using different SPR numbers (see page 432).

Time-Base Registers

The time base is a 64-bit incrementing counter implemented as two 32-bit registers. The time-base upper register (TBU) holds time-base bits 0:31, and the time-base lower register (TBL) holds time-base bits 32:63. Figure 3-10 shows the format of the time base.
0 31
TBU (Time Base [0:31])
0 31
TBL (Time Base [32:63])
Figure 3-10: Time-Base Register
The TBU and TBL registers are SPRs with user-mode read access and privileged-mode write access. Reading the time-base registers requires use of the mftb instruction with the following addresses:
TBU269 (0x10D).
TBL268 (0x10C).
See Time Base, page 524, for information on using the time base.
March 2002 Release www.xilinx.com 365 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R

Exception Summary

An exception is an event that can be caused by a number of sources, including:
Error conditions arising from instruction execution.
Internal timer resources.
Internal debug resources.
External peripherals.
When an exception occurs, the processor can interrupt the currently executing program so that system software can deal with the exception condition. The action taken by an interrupt includes saving the processor context and transferring control to a predetermined exception-handler address operating under a new context. When the interrupt handler completes execution, it can return to the interrupted program by executing a return-from-interrupt instruction.
Exceptions are handled by privileged software. The exception mechanism is described in
Chapter 7, Exceptions and Interrupts. Following is a list of exceptions that can be caused
by the execution of an instruction in user mode.
Data-Storage Exception.
An attempt to access data in memory that results in a memory-protection violation causes the data-storage interrupt handler to be invoked.
Instruction-Storage Exception.
Chapter 3: User Programming Model
An attempt to access instructions in memory that result in a memory-protection violation causes the instruction-storage interrupt handler to be invoked.
Alignment Exception.
An attempt to access memory with an invalid effective-address alignment (for the specific instruction) causes the alignment-interrupt handler to be invoked.
Program Exception.
Three different types of interrupt handlers can be invoked when a program exception occurs: illegal instruction, privileged instruction, and system trap. The conditions causing a program interrupt include:
- An attempt to execute an illegal instruction causes the illegal-instruction interrupt
handler to be invoked.
- An attempt to execute an optional instruction not implemented by the PPC405
causes the illegal-instruction interrupt handler to be invoked.
- An attempt by a user-level program to execute a supervisor-level instruction
causes the privileged-instruction interrupt handler to be invoked.
- An attempt to execute a defined instruction with an invalid form causes either the
illegal-instruction interrupt handler or the privileged-instruction interrupt handler to be invoked.
- Executing a trap instruction can cause the system-trap interrupt handler to be
invoked.
Floating-Point Unavailable Exception.
On processors that support floating-point instructions, executing such instructions when the floating-point unit is disabled (MSR[FP]=0) invokes the floating-point­unavailable interrupt handler.
System-Call Exception.
The execution of an sc instruction causes the system-call interrupt handler to be invoked. The interrupt handler can be used to call a system-service routine.
Data TLB-Miss Exception.
366 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Branch and Flow-Control Instructions

If data translation is enabled, an attempt to access data in memory when a valid TLB entry is not present causes the data TLB-miss interrupt handler to be invoked.
Instruction TLB-Miss Exception.
If instruction translation is enabled, an attempt to access instructions in memory when a valid TLB entry is not present causes the instruction TLB-miss interrupt handler to be invoked.
Other exceptions can occur during user-mode program execution that are not directly caused by instruction execution. These are also described in Chapter 7:
Machine-check exceptions.
Exceptions caused by external devices.
Exceptions caused by a timer.
Debug exceptions.
Branch and Flow-Control Instructions
Branch instructions redirect program flow by altering the next-instruction address non­sequentially. Branches unconditionally or conditionally alter program flow forward or backward using either an absolute address or an address relative to the branch-instruction address. Branches calculate the target address using the contents of the CTR, LR, or fields within the branch instruction. Optionally, a branch-return address can be automatically loaded into the LR by setting the LK instruction-opcode bit to 1. This option is useful for specifying the return address for subroutine calls and causes the address of the instruction following the branch to be loaded in the LR. Branches are used for all non-sequential program flow including jumps, loops, calls and returns.
Branch-conditional instructions redirect program flow if a tested condition is true. These instructions can test a bit value within the CR, the value of the CTR, or both. Condition­register logical instructions are provided to set up the tests for branch-conditional instructions.
R

Conditional Branch Control

With branch-conditional instructions, the BO opcode field specifies the branch-control conditions and how the branch affects the CTR. The BO field can specify a test of the CR and it can specify that the CTR be decremented and tested. The BO field can also be initialized to reverse the default prediction performed by the processor. The bits within the BO field are defined as shown in Tab le 3 - 4.
Table 3-4: BO Field Bit Definitions
BO Bit Description
BO[0] CR Test Control
0Test the CR bit specified by the BI opcode field for the value indicated by BO[1].
1Do not test the CR.
BO[1] CR Test Value
0Test for CR[BI]=0. 1Test for CR[BI]=1.
March 2002 Release www.xilinx.com 367 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 3: User Programming Model
Table 3-4: BO Field Bit Definitions (Continued)
BO Bit Description
BO[2] CTR Test Control
0Decrement CTR by one, and test whether CTR satisfies the condition specified by BO[3].
1Do not change or test CTR.
BO[3] CTR Test Value
0Test for C T R 0. 1Test for C T R =0.
BO[4] Branch Prediction Reversal
0Apply standard branch prediction.
1Reverse the standard branch prediction.
The 5-bit BI opcode field in branch-conditional instructions specifies which of the 32 bits in the CR are used in the branch-condition test. For example, if BI=0b01010, CR
is used in
10
the test.
In some encodings of the BO field, certain BO bits are ignored. Ignored bits can be assigned a meaning in future extensions of the PowerPC architecture and should be cleared to 0. Valid BO field encodings are shown in Table 3 -5 . In this table, z indicates the ignored bits that should be cleared to 0. The y bit (BO[4]) specifies the branch-prediction behavior for the instruction as described in Specifying Branch-Prediction Behavior, page 370.
Table 3-5: Valid BO Opcode-Field Encoding
BO[0:4] Description
0000y
y
0001
zy
001
y
0100
y
0101
zy
011
z00y
1
z01y
1
z1zz
1

Branch Instructions

The following sections describe the branch instructions defined by the PowerPC architecture. A number of simplified mnemonics are defined for the branch instructions. See Branch Instructions, page 821 for more information.
Branch Unconditional
Decrement the CTR. Branch if the decremented CTR 0 and CR[BI]=0.
Decrement the CTR. Branch if the decremented CTR = 0 and CR[BI]=0.
Branch if CR[BI]=0.
Decrement the CTR. Branch if the decremented CTR 0 and CR[BI]=1.
Decrement the CTR. Branch if the decremented CTR=0 and CR[BI]=1.
Branch if CR[BI]=1.
Decrement the CTR. Branch if the decremented CTR ≠ 0.
Decrement the CTR. Branch if the decremented CTR = 0.
Branch always.
Ta bl e 3 -6 lists the PowerPC unconditional branch instructions. These branches specify a 26-
bit signed displacement to the branch-target address by appending the 24-bit LI instruction field with 0b00. The displacement value gives unconditional branches the ability to cover an address range of ±32 MB.
368 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Branch and Flow-Control Instructions
Table 3-6: Branch-Unconditional Instructions
R
Mnemonic Name Operation
b Branch Branch to relative address..
ba Branch Absolute Branch to absolute address.
bl Branch and Link Branch to relative address. LR is updated with the
address of the instruction following the branch.
bla Branch Absolute and Link Branch to absolute address. LR is updated with the
address of the instruction following the branch.
Branch Conditional
Ta bl e 3 -7 lists the PowerPC branch-conditional instructions. The BO field specifies the
condition tested by the branch, as shown in Tab le 3-5, page 3 6 8. The BI field specifies the CR bit used in the test. These branches specify a 16-bit signed displacement to the branch­target address by appending the 14-bit BD instruction field with 0b00. The displacement value gives conditional branches the ability to cover an address range of ±32 KB.
Table 3-7: Branch-Conditional Instructions
Mnemonic Name Operation
bc Branch Conditional Branch-conditional to relative address..
Operand
Syntax
tgt_addr
Operand
Syntax
BO,BI,tgt_addr
bca Branch Conditional Absolute Branch-conditional to absolute address.
bcl Branch Conditional and Link Branch-conditional to relative address. LR is
updated with the address of the instruction following the branch.
bcla Branch Conditional Absolute and
Link
Branch-conditional to absolute address. LR is updated with the address of the instruction following the branch.
Branch Conditional to Link Register
Ta bl e 3 -8 lists the PowerPC branch-conditional to link-register instructions. The BO field
specifies the condition tested by the branch, as shown in Table 3-5, p a ge 3 6 8. The BI field specifies the CR bit used in the test. The branch-target address is read from the LR, with LR[30:31] cleared to zero to form a word-aligned address. Using the 32-bit LR as a branch target gives these branches the ability to cover the full 4 GB address range.
Table 3-8: Branch-Conditional to Link-Register Instructions
Mnemonic Name Operation
bclr Branch Conditional to Link Register Branch-conditional to address in LR.
bclrl Branch Conditional to Link Register
and Link
Branch-conditional to address in LR. LR is updated with the address of the instruction following the branch.
Operand
Syntax
BO,BI
March 2002 Release www.xilinx.com 369 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Branch Conditional to Count Register
Ta bl e 3 -9 lists the PowerPC branch-conditional to count-register instructions. The BO field
specifies the condition tested by the branch, as shown in Table 3-5, p a ge 3 6 8. The BI field specifies the CR bit used in the test. The branch-target address is read from the CTR, with CTR[30:31] cleared to zero to form a word-aligned address. Using the 32-bit CTR as a branch target gives these branches the ability to cover the full 4 GB address range.
Table 3-9: Branch-Conditional to Count-Register Instructions
Chapter 3: User Programming Model
Mnemonic Name Operation
bcctr Branch Conditional to Count Register Branch-conditional to address in CTR.
bcctrl Branch Conditional to Count Register
and Link
Branch-conditional to address in CTR. LR is updated with the address of the instruction following the branch.

Branch Prediction

Conditional branches alter program flow based on the value of bits in the CR. If a condition is met by the CR bits, the branch instruction alters the next-instruction address non­sequentially. Otherwise, the next-sequential instruction following the branch is executed. When the processor encounters a conditional branch, it scans the execution pipelines to determine whether an instruction in progress can affect the CR bit tested by the branch. If no such instruction is found, the branch can be resolved immediately by checking the bit in the CR and taking the action defined by the branch instruction.
However, if a CR-altering instruction is detected, the branch is considered unresolved until the CR-altering instruction completes execution and writes its result to the CR. Prior to that time, the processor can predict how the branch is resolved. First, the processor uses special dynamic prediction hardware to analyze instruction flow and branch history to predict resolution of the current branch. If branches are predicted correctly, performance improvements can be realized because instruction execution does not stall waiting for the branch to be resolved. The PowerPC architecture provides software with the ability to override (reverse) the dynamic prediction using a static prediction hint encoded in the instruction opcode. This can be useful when it is known at compile time that a branch is likely to behave contrary to what the processor expects. The use of static prediction is described in the next section, Specifying Branch-Prediction Behavior.
When a prediction is made, instructions are fetched from the predicted execution path. If the processor determines the prediction was incorrect after the CR-altering instruction completes execution, all instructions fetched as a result of the prediction are discarded by the processor. Instruction fetch is restarted along the correct path. If the prediction was correct, instruction fetch and execution proceed normally along the predicted (and now resolved) path.
Branch prediction is most effective when the branch-target address is computed well in advance of resolving the branch. If a branch instruction contains immediate addressing operands, the processor can compute the branch-target address ahead of branch resolution. If the branch instruction uses the LR or CTR for addressing, it is important that the register is loaded by software sufficiently ahead of the branch instruction.
Operand
Syntax
BO,BI
Specifying Branch-Prediction Behavior
All PowerPC processors predict a conditional branch as taken using the following rules:
For the bcx instruction with a negative value in the displacement operand, the branch is predicted taken.
For all other branch-conditional instructions (bcx with a non-negative value in the displacement operand, bclrx, or bcctrx), the branch is predicted not taken.
370 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Branch and Flow-Control Instructions
Algorithmically, a branch is predicted taken if:
((BO[0] BO[2]) s) = 1
where s is the sign bit of the displacement operand, if the instruction has a displacement operand (bit 16 of the branch-conditional instruction encoding).
When the result of the above equation is 0, the branch is predicted not-taken and the processor speculatively fetches instructions that sequentially follow the branch instruction.
Examining the above equation, BO[0] BO[2]=1 only when the conditional branch tests nothing, meaning the branch is always taken. In this case, the processor predicts the branch as taken.
If the conditional branch tests anything (BO[0] BO[2]=0), s controls the prediction. In the bclrx and bcctrx instructions, bit 16 (s) is reserved and always 0. In this case those instructions are predicted not-taken.
Only the bcx instructions can specify a displacement value. The bcx instructions are commonly used at the end of loops to control the number of times a loop is executed. Here, the branch is taken every time the loop is executed except the last time, so a branch should normally be predicted as taken. Because the branch target is at the beginning of the loop, the branch displacement is negative and s=1, so the processor predicts the branch as taken. Forward branches have a positive displacement and are predicted not-taken.
When the y bit (BO[4]) is cleared to 0, the default branch prediction behavior described above is followed by the processor. Setting the y bit to 1 reverses the above behavior. For branch always encoding (BO[0], BO[2]), branch prediction cannot be reversed (no y bit is recognized).
The sign of the displacement operand (s) is used as described above even when the target is an absolute address. The default value for the y bit should be 0. Compilers can set this bit if it they determine that the prediction corresponding to y=1 is more likely to be correct than the prediction corresponding to y=0. Compilers that do not statically predict branches should always clear the y bit.
R
Link-Register Stack
Some processor implementations keep a stack (history) of the LR values most recently used by branch-and-link instructions. Those processors use this software-invisible stack to predict the target address of nested-subroutine returns. Although the PPC405 processor does not implement such a stack, the following programming conventions should be followed so that software running on multiple PowerPC processors can benefit from this stack.
In the following examples, let A, B, and Glue represent subroutine labels:
When obtaining the address of the next instruction, use the following form of branch­and-link:
bcl 20,31,$+4
Loop counts:
Keep loop counts in the CTR, and use one of the branch-conditional instructions to decrement the count and to control branching (for example, branching back to the start of a loop if the decremented CTR value is nonzero).
Computed go to, case statements, etc.:
Use the CTR to hold the branch-target address, and use the bcctr instruction with the link register option disabled (LK=0) to branch to the selected address.
Direct subroutine linkage, where A calls B and B returns to A:
- A calls Buse a branch instruction that enables the LR (LK=1).
March 2002 Release www.xilinx.com 371 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
- B returns to Ause the bclr instruction with the link-register option disabled
(LK=0). The return address is in, or can be restored to, the LR.
Indirect subroutine linkage, where A calls Glue, Glue calls B, and B returns to A rather than to Glue.
Such a calling sequence is common in linkage code where the subroutine that the programmer wants to call, B, is in a different module than the caller, A. The binder inserts glue code to mediate the branch:
- A calls Glueuse a branch instruction that sets the LR with the link-register
option enabled (LK=1).
- Glue calls Bwrite the address of B in the CTR, and use the bcctr instruction with
the link-register option disabled (LK=0).
- B returns to Ause the bclr instruction with the link-register option disabled
(LK=0). The return address is in, or can be restored to, the LR.

Branch-Target Address Calculation

Branch instructions compute the effective address (EA) of the next instruction using the following addressing modes:
Branch to relative (conditional and unconditional).
Branch to absolute (conditional and unconditional).
Branch to link register (conditional only).
Branch to count register (conditional only).
Instruction addresses are always assumed to be word aligned. PowerPC processors ignore the two low-order bits of the generated branch-target address.
Chapter 3: User Programming Model
Branch to Relative
Instructions that use branch-to-relative addressing generate the next-instruction address by right-extending 0b00 to the immediate-displacement operand (LI), and then sign­extending the result. That result is added to the current-instruction address to produce the next-instruction address. Branches using this addressing mode must have the absolute­addressing option disabled by clearing the AA instruction field (bit 30) to 0. The link­register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-11 shows how the branch-target address is generated when using the branch-to-
relative addressing mode.
372 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Branch and Flow-Control Instructions
Instruction Encoding
0 6 30 31
18
LI
AA
LK
R
031
Sign Extension
031
Current Instruction Address
031
6
LI
+
Branch Target Address
Figure 3-11: Branch-to-Relative Addressing
Branch-Conditional to Relative
If the branch conditions are met, instructions that use branch-conditional to relative addressing generate the next-instruction address by appending 0b00 to the immediate­displacement operand (BD) and sign-extending the result. That result is added to the current-instruction address to produce the next-instruction address. Branches using this addressing mode must have the absolute-addressing option disabled by clearing the AA instruction field (bit 30) to 0. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-12 shows how the branch-target address is generated when using the branch-
conditional to relative addressing mode.
30
00
UG011_11_033101
0 6 11 16 30 31
Instruction Encoding
Condition
Met?
031
Next Sequential Instruction Address
031
Current Instruction Address
Ye s
No
16
031
031
BO BI
+
Branch Target Address
Figure 3-12: Branch-Conditional to Relative Addressing
BD
BDSign Extension
LK
AA
3016
00
UG011_07_033101
March 2002 Release www.xilinx.com 373 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Branch to Absolute
Instructions that use branch-to-absolute addressing generate the next-instruction address by appending 0b00 to the immediate-displacement operand (LI) and sign-extending the result. Branches using this addressing mode must have the absolute-addressing option enabled by setting the AA instruction field (bit 30) to 1. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-13 shows how the branch-target address is generated when using the branch-to-
absolute addressing mode.
Instruction Encoding
Chapter 3: User Programming Model
0 6 30 31
18
LI
AA
LK
031
Sign Extension
031
6
LI
Branch Target Address
30
00
UG011_12_033101
Figure 3-13: Branch-to-Absolute Addressing
Branch-Conditional to Absolute
If the branch conditions are met, instructions that use branch-conditional to absolute addressing generate the next-instruction address by appending 0b00 to the immediate­displacement operand (BD) and sign-extending the result. Branches using this addressing mode must have the absolute-addressing option enabled by setting the AA instruction field (bit 30) to 1. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-14 shows how the branch-target address is generated when using the branch-
conditional to absolute-addressing mode.
0 6 11 16 30 31
Instruction Encoding
16
BO BI
BD
AA
LK
Condition
Met?
031
Next Sequential Instruction Address
Ye s
No
031
031
Branch Target Address
BDSign Extension
3016
00
UG011_08_033101
Figure 3-14: Branch-Conditional to Absolute Addressing
374 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Branch and Flow-Control Instructions
Branch-Conditional to Link Register
If the branch conditions are met, the branch-conditional to link-register instruction generates the next-instruction address by reading the contents of the LR and clearing the two low­order bits to zero. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-15 shows how the branch-target address is generated when using the branch-
conditional to link-register addressing mode.
R
0 6 11 16 31
Instruction Encoding
Condition
Met?
031
Next Sequential Instruction Address
Ye s
No
19
031
031
BO BI
Branch Target Address
00000 16
LR
Figure 3-15: Branch-Conditional to Link-Register Addressing
Branch-Conditional to Count Register
If the branch conditions are met, the branch-conditional to count-register instruction generates the next-instruction address by reading the contents of the CTR and clearing the two low-order bits to zero. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.
Figure 3-16 shows how the branch-target address is generated when using the branch-
conditional to count-register addressing mode.
21
LK
3029
00
UG011_09_033101
0 6 11 16 31
Instruction Encoding
Condition
Met?
031
Next Sequential Instruction Address
Ye s
No
19
031
031
BO BI
CTR
Branch Target Address
21
00000 528
LK
3029
00
UG011_10_033101
Figure 3-16: Branch-Conditional to Count-Register Addressing
March 2002 Release www.xilinx.com 375 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R

Condition-Register Logical Instructions

Ta bl e 3 -1 0 lists the PowerPC condition-register logical instructions. The condition-register
logical instructions perform logical operations on any two bits within the CR and store the result of the operation in any CR bit. The move condition-register field instruction is used to move any CR field (each field comprising four bits) to any other CR-field location. All of these instructions are considered flow-control instructions because they are generally used to set up conditions for testing by the branch-conditional instructions and to reduce the number of branches in a code sequence. Simplified mnemonics are defined for the condition-register logical instructions. See CR-Logical Instructions, page 828 for more information.
In Tabl e 3-10 , the instruction-operand fields crbA, crbB, and crbD all specify a single bit within the CR. The instruction-operand fields crfD and crfS specify a 4-bit field within the CR.
Table 3-10: Condition-Register Logical Instructions
Chapter 3: User Programming Model
Mnemonic Name Operation
crand Condition Register AND CR-bit crbA is ANDed with CR-bit crbB and the
result is stored in CR-bit crbD.
crandc Condition Register AND with
Complement
creqv Condition Register Equivalent
CR-bit crbA is ANDed with the bit crbB and the result is stored in CR-bit crbD.
CR-bit crbA is XORed with CR-bit crbB and the
complement of CR-
complemented result is stored in CR-bit crbD.
crnand Condition Register NAND
CR-bit crbA is ANDed with CR-bit crbB and the
complemented result is stored in CR-bit crbD.
crnor Condition Register NOR
CR-bit crbA is ORed with CR-bit crbB and the
complemented result is stored in CR-bit crbD.
cror Condition Register OR CR-bit crbA is ORed with CR-bit crbB and the
result is stored in CR-bit crbD.
crorc Condition Register OR with
Complement
crxor Condition Register XOR CR-bit crbA is XORed with CR-bit crbB and the
mcrf Move Condition Register Field CR-field crfS is copied into CR-field crfD. No other
CR-bit crbA is ORed with the bit crbB and the result is stored in CR-bit crbD.
result is stored in CR-bit crbD.
CR fields are modified.
complement of CR-
Operand
Syntax
crbD,crbA,crbB
,crfS
crfD

System Call

Ta bl e 3 -11 lists the PowerPC system-call instruction. The sc instruction is a user-level
instruction that can be used by a user-mode program to transfer control to a privileged­mode program (typically a system-service routine). Executing the sc instruction causes a system-call exception to occur. See System-Call Interrupt (0x0C00), page 514 for more information on the operation of this instruction.
Table 3-11: System-Call Instruction
Mnemonic Name Operation
sc System Call Causes a system-call exception to occur.
376 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Operand
Syntax
Branch and Flow-Control Instructions

System Trap

Ta bl e 3 -1 2 lists the PowerPC system-trap instructions. System-trap instructions are
normally used by software-debug applications to set breakpoints. These instructions test for a specified set of conditions and cause a program exception to occur if any of the conditions are met. If the tested conditions are not met, instruction execution continues normally with the instruction following the system-trap instruction (a program exception does not occur). The system-trap handler can be called from the program-interrupt handler when it is determined that a system-trap instruction caused the exception. See Program
Interrupt (0x0700), page 511 for more information on program exceptions caused by the
system-trap instructions.
Trap instructions can also be used to cause a debug exception. See Trap-Instruction Debug
Event, page 546 for more information.
Simplified mnemonics are defined for the system-trap instructions. See Trap Instructions,
page 832 for more information.
Table 3-12: System-Trap Instructions
R
Mnemonic Name Operation
tw Trap Word The contents of rA are compared with rB. A
program exception occurs if the comparison meets any test condition enabled by the TO operand.
twi Trap Word Immediate The contents of rA are compared with the sign-
extended SIMM operand. A program exception occurs if the comparison meets any test condition enabled by the TO operand.
The TO operand field in the system-trap instructions specifies the test conditions performed on the remaining two operands. Multiple test conditions can be set simultaneously, expanding the number of possible conditions that can cause the trap (program exception). If all bits in the TO operand field are set, the trap always occurs because one of the trap conditions is always met. The bits within the TO field are defined as shown in Ta bl e 3 -1 3.
Table 3-13: TO Field Bit Definitions
TO Bit Description
TO[0] Less-than arithmetic comparison.
0Ignore trap condition.
1Trap if first operand is arithmetically less-than second operand.
Operand
Syntax
TO,rA,rB
TO,rA,SIMM
TO[1] Greater-than arithmetic comparison.
0Ignore trap condition.
1Trap if first operand is arithmetically greater-than second operand.
March 2002 Release www.xilinx.com 377 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-13: TO Field Bit Definitions (Continued)
TO Bit Description
TO[2] Equal-to arithmetic comparison.
0Ignore trap condition.
1Trap if first operand is arithmetically equal-to second operand.
TO[3] Less-than unsigned comparison.
0Ignore trap condition.
1Trap if first operand is less-than second operand.
TO[4] Greater-than unsigned comparison.
0Ignore trap condition.
1Trap if first operand is greater-than second operand.

Integer Load and Store Instructions

The integer load and store instructions move data between the general-purpose registers and memory. Several types of loads and stores are supported by the PowerPC instruction set:
Load and zero
Load algebraic
Store
Load with byte reverse and store with byte reverse
Load multiple and store multiple
Load string and store string
Memory synchronization instructions
Memory accesses performed by the load and store instructions can occur out of order. Synchronizing instructions are provided to enforce strict memory-access ordering. See
Synchronizing Instructions, page 424 for more information.
In general, the PowerPC architecture defines a sequential-execution model. When a store instruction modifies an instruction-memory location, software synchronization is required to ensure subsequent instruction fetches from that location obtain the modified version of the instruction. See Self-Modifying Code, page 467 for more information.
Chapter 3: User Programming Model

Operand-Address Calculation

Integer load and store instructions generate effective addresses using one of three addressing modes: register-indirect with immediate index, register-indirect with index, or register indirect. These addressing modes are described in the following sections. For some instructions, update forms that load the calculated effective address into rA are also provided.
In the PPC405 processor, loads and stores to unaligned addresses can suffer from performance degradation. Refer to Performance Effects of Operand Alignment, page 353 for more information.
Register-Indirect with Immediate Index
Load and store instructions using this addressing mode contain a signed, 16-bit immediate index (d operand) and a general-purpose register operand, rA. The index is sign-extended to 32 bits and added to the contents of rA to generate the effective address. If the rA instruction field is 0 (specifying r0), a value of zerorather than the contents of r0is added to the sign-extended immediate index. The option to specify rA or 0 is shown in the instruction description as (rA|0).
378 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Load and Store Instructions
Figure 3-17 shows how an effective address is generated when using register-indirect with
immediate-index addressing.
R
rA=0?
No
Instruction Encoding
Ye s
031
0000 0000 0000 0000 0000 0000 0000 0000
031
Figure 3-17: Register-Indirect with Immediate-Index Addressing
Register-Indirect with Index
Load and store instructions using this addressing mode contain two general-purpose register operands, rA and rB. The contents of these two registers are added to generate the effective address. If the rA instruction field is 0 (specifying r0), a value of zerorather than the contents of r0is added to rB. The option to specify rA or 0 is shown in the instruction description as (rA|0).
Figure 3-18 shows how an effective address is generated when using register-indirect with
index addressing.
(rA)
0 6 11 16
Opcode
031
rD/rS rA
Sign Extension
d
16
d
31
+
031
Effective Address
UG011_02_033101
March 2002 Release www.xilinx.com 379 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
0 6 11 16 20 31
Instruction Encoding
Opcode
031
Chapter 3: User Programming Model
rD/rS rA rB
(rB)
Subopcode 0
rA=0?
No
Ye s
031
0000 0000 0000 0000 0000 0000 0000 0000
031
Figure 3-18: Register-Indirect with Index Addressing
Register Indirect
Only load-string and store-string instructions can use this addressing mode. This mode uses only the contents of the general-purpose register specified by the rA operand as the effective address. Rather than using the contents of r0, a zero in the rA operand causes an effective address of zero to be generated. The option to specify rA or 0 is shown in the instruction descriptions as (rA|0).
Figure 3-19 shows how an effective address is generated when using register-indirect
addressing.
+
(rA)
031
Effective Address
UG011_01_033101
0 6 11 16 20 31
Instruction Encoding
rA=0?
031
(rA)
Ye s
No
Opcode
031
031
rD/rS rA NB
0000 0000 0000 0000 0000 0000 0000 0000
Effective Address
Subopcode 0
UG011_03_033101
Figure 3-19: Register-Indirect Addressing
380 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Load and Store Instructions

Load Instructions

Integer-load instructions read an operand from memory and store it in a GPR destination register, rD. Each type of load is characterized by what they do with unused high-order bits in rD when the operand size is less than a word (32 bits). Load-and-zero instructions clear the unused high-order bits in rD to zero. Load-algebraic instructions fill the unused high-order bits in rD with a copy of the most-significant bit in the operand.
Load-with-update instructions are provided, but the following two rules apply:
rA must not be equal to 0. If rA = 0, the instruction form is invalid.
rA must not be equal to rD. If rA = rD, the instruction form is invalid.
In the PPC405, the above invalid instruction forms produce a boundedly-undefined result. In other PowerPC implementations, those forms can cause a program exception.
Load Byte and Zero
Ta bl e 3 -1 4 lists the PowerPC load byte and zero instructions. These instructions load a byte
from memory into the lower-eight bits of rD and clear the upper-24 bits of rD to 0.
Table 3-14: Load Byte and Zero Instructions
R
Mnemonic Name Addressing Mode
lbz Load Byte and Zero Register-indirect with immediate index
EA = (rA|0) + d
lbzu Load Byte and Zero with Update Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0, rA rD
lbzx Load Byte and Zero Indexed Register-indirect with index
EA = (rA|0) + (rB)
lbzux Load Byte and Zero with Update
Indexed
Register-indirect with index
EA = (rA) + (rB)
rA EA rA 0, rA rD
Load Halfword and Zero
Ta bl e 3 -1 5 lists the PowerPC load halfword and zero instructions. These instructions load a
halfword from memory into the lower-16 bits of rD and clear the upper-16 bits of rD to 0.
Operand
Syntax
rD,d(rA)
rD,rA,rB
March 2002 Release www.xilinx.com 381 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-15: Load Halfword and Zero Instructions
Chapter 3: User Programming Model
Mnemonic Name Addressing Mode
lhz Load Halfword and Zero Register-indirect with immediate index
EA = (rA|0) + d
lhzu Load Halfword and Zero with Update Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0, rA rD
lhzx Load Halfword and Zero Indexed Register-indirect with index
EA = (rA|0) + (rB)
lhzux Load Halfword and Zero with Update
Indexed
Register-indirect with index
EA = (rA) + (rB)
rA EA rA 0, rA rD
Load Word and Zero
Ta bl e 3 -1 6 lists the PowerPC load word and zero instructions. These instructions load a word
from memory into rD.
Table 3-16: Load-Word and Zero Instructions
Operand
Syntax
rD,d(rA)
rD,rA,rB
Mnemonic Name Addressing Mode
lwz Load Word and Zero Register-indirect with immediate index
EA = (rA|0) + d
lwzu Load Word and Zero with Update Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0, rA rD
lwzx Load Word and Zero Indexed Register-indirect with index
EA = (rA|0) + (rB)
lwzux Load Word and Zero with Update
Indexed
Register-indirect with index
EA = (rA) + (rB)
rA EA rA 0, rA rD
Load Halfword Algebraic
Ta bl e 3 -1 7 lists the PowerPC load halfword algebraic instructions. These instructions load a
halfword from memory into the lower-16 bits of rD. The upper-16 bits of rD are filled with a copy of the most-significant bit (bit 16) of the operand.
Operand
Syntax
rD,d(rA)
rD,rA,rB
382 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Load and Store Instructions
Table 3-17: Load Halfword Algebraic Instructions
R
Mnemonic Name Addressing Mode
lha Load Halfword Algebraic Register-indirect with immediate index
EA = (rA|0) + d
lhau Load Halfword Algebraic with
Update
lhax Load Halfword Algebraic Indexed Register-indirect with index
lhaux Load Halfword Algebraic with
Update Indexed
Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0, rA rD
EA = (rA|0) + (rB)
Register-indirect with index
EA = (rA) + (rB)
rA EA rA 0, rA rD
Operand
Syntax
rD,d(rA)
rD,rA,rB
March 2002 Release www.xilinx.com 383 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R

Store Instructions

Integer-store instructions read an operand from a GPR source register, rS, and write it into memory. Store-with-update instructions are provided, but the following two rules apply:
rA must not be equal to 0. If rA = 0, the instruction form is invalid.
If rS = rA, rS is written to memory first, and then the effective address is loaded into
rS.
In the PPC405, the above invalid instruction form produces a boundedly-undefined result. In other PowerPC implementations, that form can cause a program exception.
Store Byte
Ta bl e 3 -1 8 lists the PowerPC store byte instructions. These instructions store the lower-eight
bits of rS into the specified byte location in memory.
Table 3-18: Store Byte Instructions
Chapter 3: User Programming Model
Mnemonic Name Addressing Mode
stb Store Byte Register-indirect with immediate index
EA = (rA|0) + d
stbu Store Byte with Update Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0
stbx Store Byte Indexed Register-indirect with index
EA = (rA|0) + (rB)
stbux Store Byte with Update Indexed Register-indirect with index
EA = (rA) + (rB) rA ← EA rA ≠ 0
Store Halfword
Ta bl e 3 -1 9 lists the PowerPC store halfword instructions. These instructions store the lower-
16 bits of rS into the specified halfword location in memory.
Operand
Syntax
rS,d(rA)
rS,rA,rB
384 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Load and Store Instructions
Table 3-19: Store Halfword Instructions
R
Mnemonic Name Addressing Mode
sth Store Halfword Register-indirect with immediate index
EA = (rA|0) + d
sthu Store Halfword with Update Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0
sthx Store Halfword Indexed Register-indirect with index
EA = (rA|0) + (rB)
sthux Store Halfword with Update Indexed Register-indirect with index
EA = (rA) + (rB) rA ← EA rA ≠ 0
Store Word
Ta bl e 3 -2 0 lists the PowerPC store word instructions. These instructions store the entire
contents of rS into the specified word location in memory.
Table 3-20: Store Word Instructions
Operand
Syntax
rS,d(rA)
rS,rA,rB
Mnemonic Name Addressing Mode
stw Store Word Register-indirect with immediate index
EA = (rA|0) + d
stwu Store Word with Update Register-indirect with immediate index
EA = (rA) + d
rA EA rA 0
stwx Store Word Indexed Register-indirect with index
EA = (rA|0) + (rB)
stwux Store Word with Update Indexed Register-indirect with index
EA = (rA) + (rB) rA ← EA rA ≠ 0

Load and Store with Byte-Reverse Instructions

Ta bl e 3 -2 1 lists the PowerPC load and store with byte-reverse instructions. Figure 3-20 shows
(using big-endian memory) how bytes are moved between memory and the GPRs for each of the byte-reverse instructions. When an lhbrx instruction is executed, the unloaded bytes in rD are cleared to 0.
When used in a system operating with the default big-endian byte order, these instructions have the effect of loading and storing data in little-endian order. Likewise, when used in a system operating with little-endian byte order, these instructions have the effect of loading
Operand
Syntax
rS,d(rA)
rS,rA,rB
March 2002 Release www.xilinx.com 385 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
and storing data in big-endian order. For more information about big-endian and little­endian byte ordering, see Byte Ordering, page 349.
Table 3-21: Load and Store with Byte-Reverse Instructions
Chapter 3: User Programming Model
Mnemonic Name Addressing Mode
lhbrx Load Halfword Byte-Reverse Indexed Register-indirect with index
lwbrx Load Word Byte-Reverse Indexed
sthbrx Store Halfword Byte-Reverse Indexed Register-indirect with index
stwbrx Store Word Byte-Reverse Indexed
lwbrx
Memory Word
03124816
Byte 1 Byte 2 Byte 3Byte 0
031
Byte 2 Byte 1 Byte 0Byte 3
rD
24816
EA = (rA|0) + (rB)
EA = (rA|0) + (rB)
Big-Endian
Little-Endian
stwbrx
Memory Word
03124816
Byte 2 Byte 1 Byte 0Byte 3
031
Byte 1 Byte 2 Byte 3Byte 0
rS
Operand
Syntax
rD,rA,rB
rS,rA,rB
24816
lhbrx
Memory Halfword
0 815
Byte 1Byte 0
03124816
0000_0000 Byte 1 Byte 00000_0000
rD
Figure 3-20: Load and Store with Byte-Reverse Instructions

Load and Store Multiple Instructions

Ta bl e 3 -2 2 lists the PowerPC load and store multiple instructions and their operation. Figure 3-21 shows how bytes are moved between memory and the GPRs for each of these
instructions.
These instructions are used to move blocks of data between memory and the GPRs. When the load multiple word instruction (lmw) is executed, rD through r31 are loaded with n
sthbrx
Memory Halfword
0 8
031
15
Byte 2Byte 3
24816
Byte 1 Byte 2 Byte 3Byte 0
rS
UG011_04_091301
386 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Load and Store Instructions
consecutive words from memory, where n=32-rD. For the lmw instruction, if rA is in the range of registers to be loaded, or if rD=0, the instruction form is invalid. When the store multiple word instruction (stmw) is executed, the n consecutive words in rS through r31 are stored into memory, where n=32-rS.
Table 3-22: Load and Store Multiple Instructions
R
Mnemonic Name Addressing Mode
lmw Load Multiple Word Register-indirect with immediate index
EA = (rA|0) + d
stmw Store Multiple Word Register-indirect with immediate index
EA = (rA|0) + d
lmw
EA
EA + 4(n-1)
Word 0
. . .
Word n-1
Word n-1
Memory GPRs
Word 0
. . .
Operand
Syntax
rD,d(rA)
rS,d(rA)
r0
. . .
rD
. . .
r31
r0
. . .
rD
. . .
r31
Word 0
. . .
Word n-1
Figure 3-21: Load and Store Multiple Instructions

Load and Store String Instructions

Ta bl e 3 -2 3 lists the PowerPC load and store string instructions and their addressing modes.
See the individual instruction listings in Chapter 11, Instruction Set for more information on their operation and restrictions on the instruction forms.
stmw
Word 0
. . .
Word n-1
MemoryGPRs
EA
EA + 4(n-1)
UG011_05_033101
March 2002 Release www.xilinx.com 387 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-23: Load and Store String Instructions
Chapter 3: User Programming Model
Mnemonic Name Addressing Mode
lswi Load String Word Immediate Register-indirect
EA = (rA|0)
lswx Load String Word Indexed Register-indirect with index
EA = (rA|0) + (rB)
stswi Store String Word Immediate Register-indirect
EA = (rA|0)
stswx Store String Word Indexed Register-indirect with index
EA = (rA|0) + (rB)
These instructions are used to move up to 32 consecutive bytes of data between memory and the GPRs without concern for alignment. The instructions can be used for short moves between arbitrary memory locations or for long moves between misaligned memory fields. Performance of these instructions is degraded if the leading and/or trailing bytes are not aligned on a word boundary (see Performance Effects of Operand Alignment,
page 353 for more information).
The immediate form of the instructions take the byte count, n, from the NB instruction field. If NB=0, then n=32. The indexed forms take the byte count from XER[25:31]. Unlike the immediate forms, if XER[25:31]=0, then n=0. For the lswx instruction, the contents of rD are undefined if n=0.
The n bytes are loaded into and stored from registers beginning with the most-significant register byte. For loads, any unfilled low-order register bytes are cleared to 0. The sequence of registers loaded or stored wraps through r0 if necessary. Figure 3-22 shows an example of the string-instruction operation.
Operand
Syntax
rD,rA,NB
rD,rA,rB
rS,rA,NB
rS,rA,rB
388 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Instructions

R
Load String Example
EA
EA + (n-1)
r0
rD
r31
07
Byte 0
Byte 1
. . .
Byte n-2
Byte n-1
Memory GPRs
Store String Example
240816
Byte 1 Byte 2 Byte 3Byte 0
. . . . . . . . .. . .
Byte n-1Byte n-2
31240816
r0
Byte 1 Byte 2 Byte 3Byte 0
. . . . . . . . .. . .
Byte n-1 0000_0000 0000_0000Byte n-2
31
07
Byte 0
Byte 1
. . .
Byte n-2
Byte n-1
rD
r31
EA
EA + (n-1)
Integer Instructions
Integer instructions operate on the contents of GPRs. They use the GPRs (and sometimes immediate values coded in the instruction) as source operands. Results are written into GPRs. These instructions do not operate on memory locations. Integer instructions treat the source operands as signed integers unless the instruction is explicitly identified as performing an unsigned operation. For example, the multiply high-word unsigned (mulhwu) and divide-word unsigned (divwu) instructions interpret both operands as unsigned integers.
The following types of integer instructions are supported by the PowerPC architecture:
Arithmetic Instructions
Logical Instructions
Compare Instructions
Rotate Instructions
Shift Instructions
The arithmetic, shift, and rotate instructions can update and/or read bits from the XER. Those instructions, plus the integer-logical instructions, can also update bits in the CR. Unless otherwise noted, when XER and/or CR are updated, they reflect the value written
MemoryGPRs
UG011_06_033101
Figure 3-22: Load and Store String Instructions
March 2002 Release www.xilinx.com 389 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
to the destination register. XER and CR can be updated by the integer instructions in the following ways:
The XER[CA] bit is updated to reflect the carry out of bit 0 in the result.
The XER[OV] bit is set or cleared to reflect a result overflow. When XER[OV] is set,
XER[SO] is also set to reflect a summary overflow. XER[SO] can only be cleared using the mtspr and mcrxr instructions. Instructions that update these bits have the overflow-enable (OE) bit set to 1 in the instruction encoding. This is indicated by the
o suffix in the instruction mnemonic.
Bits in CR0 (CR[0:3]) are updated to reflect a signed comparison of the result to zero.
Instructions that update CR0 have the record (Rc) bit set to 1 in the instruction encoding. This is indicated by the “.” suffix in the instruction mnemonic. See CR0
Field, page 361, for information on how these bits are updated.
Instructions that update XER[OV] or XER[CA] can delay the execution of subsequent instructions. See Fixed-Point Exception Register (XER), page 363 for more information on these register bits.

Arithmetic Instructions

The integer-arithmetic instructions support addition, subtraction, multiplication, and division between operands in the GPRs and in some cases between GPRs and signed­immediate values.
Chapter 3: User Programming Model
Integer-Addition Instructions
Ta bl e 3 -2 4 shows the PowerPC integer-addition instructions. The instructions in this table
are grouped by the type of addition operation they perform. For each type of instruction shown, the “Operation” column indicates the addition-operation performed, and on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all). SIMM indicates an immediate value that is sign-extended prior to being used in the operation.
The add-extended instructions can be used to perform addition on integers larger than 32 bits. For example, assume a 64-bit integer i is represented by the register pair r3:r4, where r3 contains the most-significant 32 bits of i, and r4 contains the least-significant 32 bits. The 64-bit integer j is similarly represented by the register pair r5:r6. The 64-bit result i+j=r (represented by the pair r7:r8) is produced by pairing adde with addc as follows:
addc r8,r6,r4 ! Add the least-significant words and record a
! carry.
adde r7,r5,r3 ! Add the most-significant words, using
! previous carry.
Table 3-24: Integer-Addition Instructions
Mnemonic Name Operation
Add Instructions
add Add
rD is loaded with the sum (rA) + (rB).
XER and CR0 are
not updated.
Operand
Syntax
rD,rA,rB
add. Add and Record
addo Add with Overflow Enabled
addo. Add with Overflow Enabled and
Record
390 www.xilinx.com March 2002 Release
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Instructions
Table 3-24: Integer-Addition Instructions (Continued)
R
Mnemonic Name Operation
Add-Carrying Instructions
addc Add Carrying
addc. Add Carrying and Record
addco Add Carrying with Overflow Enabled
addco. Add Carrying with Overflow Enabled
and Record
Add-Immediate Instructions
addi Add Immediate
addic Add Immediate Carrying
addic. Add Immediate Carrying and Record
rD is loaded with the sum (rA) + (rB).
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
rD is loaded with the sum (rA|0) + SIMM.
XER and CR0 are
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
not updated.
Add Immediate-Shifted Instructions rD is loaded with the sum (rA|0) + (SIMM || 0x0000).
addis Add Immediate Shifted
Add-Extended Instructions
XER and CR0 are
rD is loaded with the sum (rA) + (rB) + XER[CA].
not updated.
Operand
Syntax
rD,rA,rB
rD,rA,SIMM
rD,rA,SIMM
adde Add Extended
adde. Add Extended and Record
addeo Add Extended with Overflow
Enabled
addeo. Add Extended with Overflow
Enabled and Record
Add to Minus-One-Extended Instructions
addme Add to Minus One Extended
addme. Add to Minus One Extended and
Record
addmeo Add to Minus One Extended with
Overflow Enabled
addmeo. Add to Minus One Extended with
Overflow Enabled and Record
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
rD is loaded with the sum (rA) + XER[CA] + 0xFFFF_FFFF.
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
rD,rA,rB
rD,rA
March 2002 Release www.xilinx.com 391 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-24: Integer-Addition Instructions (Continued)
Chapter 3: User Programming Model
Mnemonic Name Operation
Add to Zero-Extended Instructions
addze Add to Zero Extended
addze. Add to Zero Extended and Record
addzeo Add to Zero Extended with Overflow
Enabled
addzeo. Add to Zero Extended with Overflow
Enabled and Record
rD is loaded with the sum (rA) + XER[CA].
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
Integer-Subtraction Instructions
Ta bl e 3 -2 5 shows the PowerPC integer-subtraction instructions. The instructions in this table
are grouped by the type of subtraction operation they perform. For each type of instruction shown, the “Operation” column indicates the subtraction-operation performed. The column also shows, on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all). The subtraction operation is expressed as addition so that the two’s- complement operation is clear. “SIMM” indicates an immediate value that is sign­extended prior to being used in the operation.
The integer-subtraction instructions subtract the second operand (rA) from the third operand (rB). Simplified mnemonics are provided with a more familiar operand ordering, whereby the third operand is subtracted from the second. Simplified mnemonics are also defined for the addi instruction to provide a subtract-immediate operation. See Subtract
Instructions, page 831 for more information.
The subtract-from extended instructions can be used to perform subtraction on integers larger than 32 bits. For example, assume a 64-bit integer i is represented by the register pair r3:r4, where r3 contains the most-significant 32 bits of i, and r4 contains the least-significant 32 bits. The 64-bit integer j is similarly represented by the register pair r5:r6. The 64-bit result ij=r (represented by the pair r7:r8) is produced by pairing subfe with subfc as follows:
subfc r8,r6,r4 ! Subtract the least-significant words and record a
! carry.
subfe r7,r5,r3 ! Subtract the most-significant words, using
! previous carry.
Operand
Syntax
rD,rA
Table 3-25: Integer-Subtraction Instructions
Mnemonic Name Operation
Subtract-From Instructions
subf Subtract from
subf. Subtract from and Record
subfo Subtract from with Overflow Enabled
subfo. Subtract from with Overflow Enabled
and Record
392 www.xilinx.com March 2002 Release
rD is loaded with the sum ¬(rA) + (rB) + 1.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
1-800-255-7778 Virtex-II Pro Platform FPGA Documentation
not updated.
Operand
Syntax
rD,rA,rB
Integer Instructions
Table 3-25: Integer-Subtraction Instructions (Continued)
R
Mnemonic Name Operation
Subtract- From Carrying Instructions
subfc Subtract from Carrying
subfc. Subtract from Carrying and Record
subfco Subtract from Carrying with
Overflow Enabled
subfco. Subtract from Carrying with
Overflow Enabled and Record
rD is loaded with the sum ¬(rA) + (rB) + 1.
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
Subtract-From Immediate Instructions rD is loaded with the sum ¬(rA) + SIMM + 1.
subfic Subtract from Immediate Carrying
XER[CA] is updated to reflect the result.
Subtract-From Extended Instructions rD is loaded with the sum ¬(rA) + (rB) + XER[CA].
subfe Subtract from Extended
subfe. Subtract from Extended and Record
subfeo Subtract from Extended with
Overflow Enabled
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
Operand
Syntax
rD,rA,rB
rD,rA,SIMM
rD,rA,rB
subfeo. Subtract from Extended with
Overflow Enabled and Record
XER[CA,OV,SO] and CR0 are updated to reflect the result.
Subtract-From Minus-One-Extended Instructions rD is loaded with the sum ¬(rA) + XER[CA] + 0xFFFF_FFFF.
subfme Subtract from Minus One Extended
subfme. Subtract from Minus One Extended
and Record
subfmeo Subtract from Minus One Extended
with Overflow Enabled
subfmeo. Subtract from Minus One Extended
with Overflow Enabled and Record
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
rD,rA
Subtract-From Zero-Extended Instructions rD is loaded with the sum ¬(rA) + XER[CA].
subfze Subtract from Zero Extended
subfze. Subtract from Zero Extended and
Record
subfzeo Subtract from Zero Extended with
Overflow Enabled
subfzeo. Subtract from Zero Extended with
Overflow Enabled and Record
XER[CA] is updated to reflect the result.
XER[CA] and CR0 are updated to reflect the re­sult.
XER[CA,OV,SO] are updated to reflect the result.
XER[CA,OV,SO] and CR0 are updated to reflect the result.
rD,rA
Negation Instructions
Ta bl e 3 -2 6 shows the PowerPC integer-negation instructions. Negation takes the operand
specified by rA and writes the twos-compliment equivalent in rD. For each instruction shown, the “Operation” column indicates (on an instruction-by-instruction basis) how the XER and CR registers are updated (if at all).
March 2002 Release www.xilinx.com 393 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-26: Negation Instructions
Chapter 3: User Programming Model
Mnemonic Name Operation
Negation Instructions
neg Negate
neg. Negate and Record
nego Negate with Overflow Enabled
nego. Negate with Overflow Enabled and
Record
rD is loaded with the sum ¬(rA) + 1.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
not updated.
Multiply Instructions
Ta bl e 3 -2 7 shows the PowerPC integer-multiply instructions. Multiplication of two 32-bit
values can result in a 64-bit result. The multiply low-word instructions are used with the multiply high-word instructions to calculate the full 64-bit product. For each type of instruction shown, the “Operation” column indicates the multiplication-operation performed. The column also shows, on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all). “SIMM” indicates an immediate value that is sign­extended prior to being used in the operation.
Table 3-27: Multiply Instructions
Mnemonic Name Operation
Operand
Syntax
rD,rA
Operand
Syntax
Multiply Low-Word Instructions
mullw Multiply Low Word
mullw. Multiply Low Word and Record
mullwo Multiply Low Word with Overflow
Enabled
mullwo. Multiply Low Word with Overflow
Enabled and Record
rD is loaded with the low-32 bits of the product (rA) × (rB).
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
not updated.
rD,rA,rB
Multiply Low-Word Immediate Instructions rD is loaded with the low-32 bits of the product (rA) × SIMM.
mulli Multiply Low Immediate
XER and CR0 are
not updated.
rD,rA,SIMM
Multiply High-Word Instructions rD is loaded with the high-32 bits of the product (rA) × (rB).
mulhw Multiply High Word
mulhw. Multiply High Word and Record
XER and CR0 are
CR0 is updated to reflect the result.
not updated.
rD,rA,rB
Multiply High-Word Unsigned Instructions rD is loaded with the high-32 bits of the product (rA) × (rB). The
contents of rA and rB are interpreted as unsigned integers.
mulhwu Multiply High Word
mulhwu. Multiply High Word and Record
XER and CR0 are not updated.
CR0 is updated to reflect the result.
rD,rA,rB
394 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Instructions
Divide Instructions
Ta bl e 3 -2 8 shows the PowerPC integer-divide instructions. Only the low-32 bits of the
quotient are returned. The remainder is not supplied as a result of executing these instructions. For each type of instruction shown, the “Operation” column indicates the divide-operation performed. The column also shows, on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all).
Table 3-28: Divide Instructions
R
Mnemonic Name Operation
Divide-Word Instructions
divw Divide Word
divw. Divide Word and Record
divwo Divide Word with Overflow Enabled
divwo. Divide Word with Overflow Enabled
and Record
rD is loaded with the low-32 bits of the 64-bit quotient (rA) ÷ (rB).
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
not updated.
Operand
Syntax
rD,rA,rB
Divide-Word Unsigned Instructions rD is loaded with the low-32 bits of the 64-bit quotient (rA) ÷ (rB).
The contents of rA and rB are interpreted as unsigned integers.
divwu Divide Word Unsigned
divwu. Divide Word Unsigned and Record
divwuo Divide Word Unsigned with Overflow
Enabled
divwuo. Divide Word Unsigned with Overflow
Enabled and Record
XER and CR0 are not updated.
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
rD,rA,rB

Logical Instructions

The logical instructions perform bit operations on the 32-bit operands. If an immediate value is specified as an operand, the processor either zero-extends or left-shifts it prior to performing the operation, depending on the instruction. If the instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic.
The logical instructions do not update any bits in the XER register.
In the operand syntax for logical instructions, the rA operand specifies a destination register rather than a source register. rS is used to specify one of the source registers.
AND and NAND Instructions
Ta bl e 3 -2 9 shows the PowerPC AND and NAND instructions. For each type of instruction
shown, the “Operation” column indicates the Boolean operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
March 2002 Release www.xilinx.com 395 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-29: AND and NAND Instructions
Chapter 3: User Programming Model
Mnemonic Name Operation
AND Instructions
and AND
and. AND and Record
AND-Immediate Instructions
andi. AND Immediate and Record
rA is loaded with the logical result (rS) AND (rB).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) AND UIMM.
CR0 is updated to reflect the result.
Operand
Syntax
rA,rS,rB
rA,rS,UIMM
AND Immediate-Shifted Instructions rA is loaded with the logical result (rS) AND (UIMM || 0x0000)
andis. AND Immediate Shifted and Record
CR0 is updated to reflect the result.
rA,rS,UIMM
AND with Complement Instructions rA is loaded with the logical result (rS) AND ¬(rB).
andc AND with Complement
andc. AND with Complement and Record
not updated.
CR0 is
CR0 is updated to reflect the result.
rA,rS,rB
NAND Instructions rA is loaded with the logical result ¬((rS) AND (rB)).
nand NAND
nand. NAND and Record
not updated.
CR0 is
CR0 is updated to reflect the result.
rA,rS,rB
OR and NOR Instructions
Ta bl e 3 -3 0 shows the PowerPC OR and NOR instructions. For each type of instruction
shown, the “Operation” column indicates the Boolean operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
Simplified mnemonics are provided for some common operations that use the OR and NOR instructions, such as move register and complement (not) register. See Other
Simplified Mnemonics, page 834 for more information.
Table 3-30: OR and NOR Instructions
Mnemonic Name Operation
NOR Instructions
nor NOR
nor. NOR and Record
OR Instructions
or OR
or. OR and Record
OR-Immediate Instructions
rA is loaded with the logical result ¬((rS) OR (rB)).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) OR (rB).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) OR UIMM.
Operand
Syntax
rA,rS,rB
rA,rS,rB
ori OR Immediate
396 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
CR0 is
not updated.
rA,rS,UIMM
Integer Instructions
Table 3-30: OR and NOR Instructions (Continued)
R
Mnemonic Name Operation
OR Immediate-Shifted Instructions
oris OR Immediate Shifted
rA is loaded with the logical result (rS) OR (UIMM || 0x0000)
CR0 is
not updated.
OR with Complement Instructions rA is loaded with the logical result (rS) OR ¬(rB).
orc OR with Complement
orc. OR with Complement and Record
not updated.
CR0 is
CR0 is updated to reflect the result.
XOR and Equivalence Instructions
Ta bl e 3 -3 1 shows the PowerPC XOR and equivalence (XNOR) instructions. For each type of
instruction shown, the “Operation” column indicates the Boolean operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
Table 3-31: XOR and Equivalence Instructions
Mnemonic Name Operation
Equivalence Instructions
rA is loaded with the logical result ¬((rS) XOR (rB)).
Operand
Syntax
rA,rS,UIMM
rA,rS,rB
Operand
Syntax
eqv Equivalent
eqv. Equivalent and Record
XOR Instructions
xor XOR
xor. XOR and Record
XOR-Immediate Instructions
xori XOR Immediate
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) XOR (rB).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA is loaded with the logical result (rS) XOR UIMM.
not updated.
CR0 is
rA,rS,rB
rA,rS,rB
rA,rS,UIMM
XOR Immediate-Shifted Instructions rA is loaded with the logical result (rS) XOR (UIMM || 0x0000)
xoris XOR Immediate Shifted
CR0 is
not updated.
rA,rS,UIMM
Sign-Extension Instructions
Ta bl e 3 -3 2 shows the sign-extension instructions. These instructions sign-extend the value
in the rS register and write the result in the rA register. For each type of instruction shown, the Operation column indicates the operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
March 2002 Release www.xilinx.com 397 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Table 3-32: Sign-Extension Instructions
Chapter 3: User Programming Model
Mnemonic Name Operation
Extend-Sign Byte Instructions
extsb Extend Sign Byte
extsb. Extend Sign Byte and Record
Extend-Sign Halfword Instructions
extsh Extend Sign Halfword
extsh. Extend Sign Halfword and Record
rA[24:31] is loaded with (rS[24:31]). The remaining bits rA[0:23] are each loaded with a copy of (rS[24]).
not updated.
CR0 is
CR0 is updated to reflect the result.
rA[16:31] is loaded with (rS[16:31]). The remaining bits rA[0:15] are each loaded with a copy of (rS[16]).
not updated.
CR0 is
CR0 is updated to reflect the result.
Count Leading-Zeros Instructions
Ta bl e 3 -3 3 shows the count leading-zeros instructions. These instructions count the number
of consecutive zero bits in the rS register starting at bit 0. The count result is written to the rA register. For each type of instruction shown, the “Operation column indicates the operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
Table 3-33: Count Leading-Zeros Instructions
Operand
Syntax
rA,rS
rA,rS
Mnemonic Name Operation
Count Leading-Zeros Instructions
cntlzw Count Leading Zeros Word
cntlzw. Count Leading Zeros Word and
Record
rA is loaded with a count of leading zeros in rS.
not updated.
CR0 is
CR0 is updated to reflect the result. CR0[LT] is al­ways cleared to 0.

Compare Instructions

The integer-compare instructions support algebraic and logical comparisons between operands in the GPRs and between GPRs and immediate values. Immediate values are signed in algebraic comparisons and unsigned in logical comparisons.
All compare instructions have four operands. The first operand, crfD, specifies the field in the CR register that is updated with the comparison result. The left-most three bits in the CR field are updated to reflect a less-than, greater-than, or equal comparison. The fourth (least-significant) bit is updated with a copy of XER[SO]. The crfD operand can be omitted if the comparison results are written to CR0. See CRn Fields (Compare Instructions),
page 362 for more information on the CR fields.
The second operand specifies the operand length. This is referred to the “L” bit in the compare-instruction encoding. When using the compare instructions on 32-bit PowerPC implementations like the PPC405, this bit must always be coded as 0. It cannot be omitted from the standard instruction syntax. Simplified mnemonics are provided that omit this operand. See Compare Instructions, page 828 for more information.
The last two operands specify the quantities to be compared (the contents of a register and a register or immediate value).
Operand
Syntax
rA,rS
398 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Instructions
Algebraic-Comparison Instructions
Ta bl e 3 -3 4 shows the PowerPC algebraic-comparison instructions. During comparison, both
operands are treated as signed integers. If a comparison is made with a signed-immediate value (SIMM), that value is sign-extended by the processor prior to performing the comparison.
Table 3-34: Algebraic-Comparison Instructions
R
Mnemonic Name Operation
cmp Compare crfD[LT,GT,EQ] are loaded with the result of
algebraically comparing (rA) with (rB). CR[SO] is loaded with a copy of XER[SO].
cmpi Compare Immediate crfD[LT,GT,EQ] are loaded with the result of
algebraically comparing (rA) with SIMM. CR[SO] is loaded with a copy of XER[SO].
Logical-Comparison Instructions
Ta bl e 3 -3 5 shows the PowerPC logical-comparison instructions. During comparison, both
operands are treated as unsigned integers. If a comparison is made with an unsigned­immediate value (UIMM), that value is zero extended by the processor prior to performing the comparison.
Table 3-35: Logical-Comparison Instructions
Mnemonic Name Operation
cmpl Compare Logical crfD[LT,GT,EQ] are loaded with the result of
logically comparing (rA) with (rB). CR[SO] is loaded with a copy of XER[SO].
Operand
Syntax
crfD,0,rA,rB
crfD,0,rA,SIMM
Operand
Syntax
crfD,0,rA,rB
cmpli Compare Logical Immediate crfD[LT,GT,EQ] are loaded with the result of
logically comparing (rA) with UIMM. CR[SO] is loaded with a copy of XER[SO].

Rotate Instructions

Rotate instructions operate on 32-bit data in the GPRs, returning the result in a second GPR. These instructions rotate data to the leftthe direction of least-significant bit to most­significant bit. Bits rotated out of the most-significant bit (bit 0) are rotated into the least­significant bit (bit 31). Programmers can achieve apparent right rotation using these left­rotation instructions by specifying a rotation amount of 32-n, where n is the number of bits to rotate right.
If the rotate instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic. Rotate instructions do not update any bits in the XER register.
In the operand syntax for rotate instructions, the rA operand specifies the destination register rather than a source register. rS is used to specify the source register.
Simplified mnemonics using the rotate instructions are provided for easy coding of extraction, insertion, left or right justification, and other bit-manipulation operations. See
Rotate and Shift Instructions, page 829 for more information.
crfD,0,rA,UIMM
March 2002 Release www.xilinx.com 399 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Mask Generation
The rotate instructions write their results into the destination register under the control of a mask specified in the rotate-instruction encoding. The mask is used to write or insert a partial result into the destination register.
Rotate masks are 32-bits long. Two instruction-opcode fields are used to specify the mask: MB and ME. MB is a 5-bit field specifying the starting bit position of the mask and ME is a 5-bit field specifying the ending bit position of the mask. The mask consists of all 1s from MB to ME inclusive and all 0s elsewhere. If MB > ME, the string of 1s wraps around from bit 31 to bit 0. In this case, 0s are found from ME to MB exclusive. The generation of an all- zero mask is not possible.
The function of the MASK(MB,ME) generator is summarized as:
Figure 3-23 shows the generated mask for both cases.
Chapter 3: User Programming Model
if MB < ME then
mask[MB:ME] = 1’s
mask[all remaining bits] = 0s
else
mask[MB:31] = ones mask[0:ME] = ones mask[all remaining bits] = 0s
0MB ME 31
MB < ME
MB > ME
0 0 . . . 0 1 1 . . . 1 0 0 . . . 0
0ME MB 31
1 1 . . . 1 0 0 . . . 0 1 1 . . . 1
Figure 3-23: Rotate Mask Generation
Rotate Left then AND-with-Mask Instructions
Ta bl e 3 -3 6 shows the PowerPC rotate left then AND-with-mask instructions. For each type of
instruction shown, the “Operation” column indicates the rotate operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
Table 3-36: Rotate Left then AND-with-Mask Instructions
Mnemonic Name Operation
Rotate Left then AND-with-Mask Immediate Instructions
rA is loaded with the masked result of left-rotating (rS) the number of bits specified by SH. The mask is specified by operands MB and ME.
UG011_15_033101
Operand
Syntax
rlwinm Rotate Left Word Immediate then
AND with Mask
rlwinm. Rotate Left Word Immediate then
AND with Mask and Record
400 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
not updated.
CR0 is
CR0 is updated to reflect the result.
rA,rS,SH,MB,ME
Integer Instructions
Table 3-36: Rotate Left then AND-with-Mask Instructions (Continued)
R
Mnemonic Name Operation
Rotate Left then AND-with-Mask Instructions
rlwnm Rotate Left Word then AND with
Mask
rlwnm. Rotate Left Word then AND with
Mask and Record
rA is loaded with the masked result of left-rotating (rS) the number of bits specified by (rB). The mask is specified by operands MB and ME.
not updated.
CR0 is
CR0 is updated to reflect the result.
These instructions left rotate GPR contents and logically AND the result with the mask prior to writing it into the destination GPR. The destination register contains the rotated result in the unmasked bit positions (mask bits with 1s), and 0s in the masked bit positions (mask bits with 0s). Rotation amounts are specified using an immediate field in the instruction (the SH opcode field) or using a value in a register.
Figure 3-24 shows an example of a rotate left then AND-with-mask immediate instruction.
In this example, the rotation amount is 16 bits as specified by the SH field in the instruction. The mask specifies an unmasked byte in bit positions 16:23 (MB=16, ME=23) and masks all other bit positions. The example shows the original contents of the destination register, rA, and the source register, rS. rS is left-rotated 16 bits and the result is written to rA after ANDing with the mask. This has the effect of extracting byte 0 from rS (rS[0:7]) and placing it in byte 2 of rA (rA[16:23]).
Operand
Syntax
rA,rS,rB,MB,ME
031
rA
rS
Rotate
rS
Mask MB=16 ME=23
rA
0xFF 0xEE 0xDD 0xCC
031
0x88 0x77 0x66 0x55
031
0x66 0x55 0x88 0x77
Rotate by SH=16 bits
0162331
0000_0000_0000_0000
031
0x00 0x00 0x88 0x00
1111_1111
0000_0000
UG011_16_033101
Figure 3-24: Rotate Left then AND-with-Mask Immediate Example
Rotate Left then Mask-Insert Instructions
Ta bl e 3 -3 6 shows the PowerPC rotate left then mask-insert instructions. For each type of
instruction shown, the “Operation” column indicates the rotate operation performed. The
March 2002 Release www.xilinx.com 401 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.
Table 3-37: Rotate Left then Mask-Insert Instructions
Chapter 3: User Programming Model
Mnemonic Name Operation
Rotate Left then Mask-Insert Immediate Instructions
rlwimi Rotate Left Word Immediate then
Mask Insert
rlwimi. Rotate Left Word Immediate then
Mask Insert and Record
The masked result of left-rotating (rS) the number of bits specified by SH is inserted into rA. The mask is specified by operands MB and ME.
not updated.
CR0 is
CR0 is updated to reflect the result.
These instructions left rotate GPR contents and insert the results into the destination GPR under control of the mask. The destination register contains the rotated result in the unmasked bit positions (mask bits with 1s) and the original contents of the destination register in the masked bit positions (mask bits with 0s). Rotation amounts are specified using an immediate field in the instruction (the SH opcode field).
Figure 3-25 shows an example of a rotate left then mask-insert immediate instruction. In
this example, the rotation amount is 16 bits as specified by the SH field in the instruction. The mask specifies an unmasked byte in bit positions 16:23 (MB=16, ME=23) and masks all other bit positions. The example shows the original contents of the destination register, rA, and the source register, rS. rS is rotated 16 bits and the result is inserted into rA after ANDing with the mask. This has the effect of extracting byte 0 from rS (rS[0:7]) and inserting it into byte 2 of rA (rA[16:23]), leaving all remaining bytes in rA unmodified.
Operand
Syntax
rA,rS,SH,MB,ME
031
rA
rS
Rotate
rS
Mask MB=16 ME=23
rA
0xFF 0xEE 0xDD 0xCC
031
0x88 0x77 0x66 0x55
031
0x66 0x55 0x88 0x77
Rotate by SH=16 bits
0162331
0000_0000_0000_0000
031
0xFF 0xEE 0x88 0xCC
1111_1111
0000_0000
UG011_17_033101
Figure 3-25: Rotate Left then Mask-Insert Immediate Example
402 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Integer Instructions

Shift Instructions

R
Shift instructions operate on 32-bit data in the GPRs and return the result in a GPR. Both logical and algebraic shifts are provided:
Logical left-shift instructions shift bits from the direction of least-significant bit to most- significant bit. Bits shifted out of bit 0 are lost. The vacated bit positions on the right are filled with zeros.
Logical right-shift instructions shift bits from the direction of most-significant bit to least-significant bit. Bits shifted out of bit 31 are lost. The vacated bit positions on the left are filled with zeros.
Algebraic right-shift instructions shift bits from the direction of most-significant bit to least-significant bit. Bits shifted out of bit 31 are lost. The vacated bit positions on the left are filled with a copy of the original bit 0 (the value prior to starting the shift).
If the shift instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic. Algebraic right-shift instructions update XER[CA] to reflect the result of the operation but the other shift instructions do not modify XER[CA]. XER[OV,SO] are not modified by any shift instructions.
In the operand syntax for shift instructions, the rA operand specifies the destination register rather than a source register. rS is used to specify the source register.
Simplified mnemonics using the rotate instructions are provided for coding of logical shift­left immediate and logical shift-right immediate operations. See Rotate and Shift
Instructions, page 829 for more information.
Logical-Shift Instructions
Ta bl e 3 -3 8 shows the PowerPC logical-shift instructions. For each type of instruction shown,
the Operation column indicates the shift operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated. XER is not updated by these instructions.
Table 3-38: Logical-Shift Instructions
Mnemonic Name Operation
Shift-Left-Logical Instructions
slw Shift Left Word
slw. Shift Left Word and Record
Shift-Right-Logical Instructions
srw Shift Right Word
srw. Shift Right Word and Record
Figure 3-26 shows two examples of logical-shift operations. The top example shows a left
shift of seven bits, and the bottom example shows a right shift of seven bits. As is seen in these examples, bits shifted out of the register are lost and vacated bits are filled with zeros.
rA is loaded with the result of logically left-shifting (rS) the number of bits specified by (rB).
CR0 is not updated.
CR0 is updated to reflect the result.
rA is loaded with the result of logically right-shifting (rS) the number of bits specified by (rB).
CR0 is not updated.
CR0 is updated to reflect the result.
Operand
Syntax
rA,rS,rB
rA,rS,rB
March 2002 Release www.xilinx.com 403 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
Chapter 3: User Programming Model
Left Shift
031
rS
1000_011
rA
1000_0111_0110_0101_0100_0011_0010_0001
031
1011_0010_1010_0001_1001_0000_1000_0000
031
1011_0010_1010_0001_1001_0000_1000_0000
Shift by 7 bits
Right Shift
031
rS
Shift by 7 bits
rA
1000_0111_0110_0101_0100_0011_0010_0001
031
0000_0001_0000_1110_1100_1010_1000_0110 010_0001
031
0000_0001_0000_1110_1100_1010_1000_0110
Figure 3-26: Logical-Shift Examples
Algebraic-Shift Instructions
Ta bl e 3 -3 9 shows the PowerPC algebraic-shift instructions. For each type of instruction
shown, the “Operation” column indicates the shift operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated. XER[CA] is always updated by these instructions to reflect the result.
The shift-right-algebraic instructions can be followed by an addze instruction to implement a divide-by-2 information.
Table 3-39: Algebraic-Shift Instructions
Mnemonic Name Operation
Shift-Right-Algebraic Immediate Instructions
srawi Shift Right Algebraic Word Immediate
srawi. Shift Right Algebraic Word Immediate
and Record
n
operation. See Multiple-Precision Shifts, page 840, for more
rA is loaded with the result of algebraically right-shifting (rS) the number of bits specified by SH.
CR0 is not updated. XER[CA] is updated to reflect the result.
CR0 and XER[CA] are updated to reflect the re­sult.
UG011_18_033101
Operand
Syntax
rA,rS,SH
404 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Multiply-Accumulate Instruction-Set Extensions

Table 3-39: Algebraic-Shift Instructions (Continued)
R
Mnemonic Name Operation
Shift-Right-Algebraic Instructions
sraw Shift Right Algebraic Word
sraw. Shift Right Algebraic Word and
Record
rA is loaded with the result of algebraically right-shifting (rS) the number of bits specified by (rB).
not updated. XER[CA] is updated to reflect
CR0 is the result.
CR0 and XER[CA] are updated to reflect the re­sult.
Figure 3-27 shows an example of an algebraic-shift operation. In this example, a shift of
seven bits is performed. Bits shifted out of the least-significant register bit are lost and vacated bits on the left side are filled with a copy of the original bit 0 (prior to the shift). In this example, the original value of bit 0 is 0b1.
031
rS
Shift by 7 bits
rA
1000_0111_0110_0101_0100_0011_0010_0001
031
1111_1111_0000_1110_1100_1010_1000_0110 010_0001
031
1111_1111_0000_1110_1100_1010_1000_0110
Operand
Syntax
rA,rS,rB
Figure 3-27: Algebraic-Shift Example
Multiply-Accumulate Instruction-Set Extensions
The PPC405 supports an integer multiply-accumulate instruction-set extension that provides functions usable by certain computationally intensive applications, such as those that implement DSP algorithms. These instructions comply with the architectural requirements for auxiliary-processor units (APUs) defined by the PowerPC embedded-environment architecture and the PowerPC Book-E architecture. They are considered implementation­dependent instructions and are not part of the PowerPC architecture, the PowerPC embedded-environment architecture, or the PowerPC Book-E architecture. Programs that use these instructions are not portable to all PowerPC implementations.
The multiply-accumulate instruction-set extensions include multiply-accumulate instructions, negative multiply-accumulate instructions, and multiply-halfword instructions.

Modulo and Saturating Arithmetic

The multiply-accumulate and negative multiply-accumulate instructions produce a 33-bit intermediate result. The method used to store this result in the 32-bit destination register depends on whether the instruction performs modulo arithmetic or saturating arithmetic.
With modulo-arithmetic instructions, the most-significant bit in the intermediate result is discarded and the low-32 bits of this result are stored in the destination register.
With saturating-arithmetic instructions, the low 32-bits of the intermediate result are stored in the destination register if the intermediate result does not overflow 32-bits. However, if the intermediate result overflows what is representable in 32-bits, the
UG011_19_033101
March 2002 Release www.xilinx.com 405 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
instruction loads the nearest representable value into the destination register. For the various instruction forms, these results are:
Signed arithmeticif the result exceeds 2 the destination register with 2
Signed arithmeticif the result is less than −2 the destination register with −2
Unsigned arithmeticif the result exceeds 2 loads the destination register with 2

Multiply-Accumulate Instructions

Multiply-Accumulate Cross-Halfword to Word Instructions
Ta bl e 3 -4 0 shows the PPC405 integer multiply-accumulate cross-halfword to word instructions.
These instructions take the lower halfword of the first source operand (rA[16:31]) and multiply it with the upper halfword of the second source operand (rB[0:15]), producing a 32-bit product. The product is signed or unsigned, depending on the instruction. This product is added to the value in the destination register, rD, producing a 33-bit intermediate result. Generally, rD is loaded with the lower-32 bits of the 33-bit intermediate result. However, if the instruction performs saturating arithmetic and the intermediate result overflows, rD is loaded with the nearest representable value (see
Modulo and Saturating Arithmetic, above).
For each type of instruction shown in Ta b le 3 - 40 , the “Operation” column indicates the multiply-accumulate operation performed. The column also shows, on an instruction-by­instruction basis, how the XER and CR registers are updated (if at all).
Chapter 3: User Programming Model
31
1 (> 0x7FFF_FFFF), the instruction loads
31
1.
31
(< 0x8000_0000), the instruction loads
31
.
32
1 (> 0xFFFF_FFFF), the instruction
32
1.
Table 3-40: Multiply-Accumulate Cross-Halfword to Word Instructions
Mnemonic Name Operation
Multiply-Accumulate Cross-Halfword to Word Modulo Signed Instructions
macchw Multiply Accumulate Cross Halfword
to Word Modulo Signed
macchw. Multiply Accumulate Cross Halfword
to Word Modulo Signed and Record
macchwo Multiply Accumulate Cross Halfword
to Word Modulo Signed with Overflow Enabled
macchwo. Multiply Accumulate Cross Halfword
to Word Modulo Signed with Overflow Enabled and Record
rD is added to the signed product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. The low-32 bits of this result are stored in rD.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
not updated.
Operand
Syntax
rD,rA,rB
406 www.xilinx.com March 2002 Release
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Multiply-Accumulate Instruction-Set Extensions
Table 3-40: Multiply-Accumulate Cross-Halfword to Word Instructions (Continued)
R
Mnemonic Name Operation
Multiply-Accumulate Cross-Halfword to Word Saturate Signed Instructions
macchws Multiply Accumulate Cross Halfword
to Word Saturate Signed
macchws. Multiply Accumulate Cross Halfword
to Word Saturate Signed and Record
macchwso Multiply Accumulate Cross Halfword
to Word Saturate Signed with Overflow Enabled
macchwso. Multiply Accumulate Cross Halfword
to Word Saturate Signed with Overflow Enabled and Record
Multiply-Accumulate Cross-Halfword to Word Saturate Unsigned Instructions
macchwsu Multiply Accumulate Cross Halfword
to Word Saturate Unsigned
rD is added to the signed product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. If the result does not overflow, the low-32 bits of this result are stored in rD. Otherwise, the nearest­representable value is stored in rD.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
rD is added to the unsigned product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. If the result does not overflow, the low-32 bits of this result are stored in rD. Otherwise, the nearest­representable value is stored in rD.
XER and CR0 are
not updated.
not updated.
Operand
Syntax
rD,rA,rB
rD,rA,rB
macchwsu. Multiply Accumulate Cross Halfword
to Word Saturate Unsigned and Record
macchwsuo Multiply Accumulate Cross Halfword
to Word Saturate Unsigned with Overflow Enabled
macchwsuo. Multiply Accumulate Cross Halfword
to Word Saturate Unsigned with Overflow Enabled and Record
Multiply-Accumulate Cross-Halfword to Word Modulo Unsigned Instructions
macchwu Multiply Accumulate Cross Halfword
to Word Modulo Unsigned
macchwu. Multiply Accumulate Cross Halfword
to Word Modulo Unsigned and Record
macchwuo Multiply Accumulate Cross Halfword
to Word Modulo Unsigned with Overflow Enabled
macchwuo. Multiply Accumulate Cross Halfword
to Word Modulo Unsigned with Overflow Enabled and Record
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
rD is added to the unsigned product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. The low-32 bits of this result are stored in rD.
XER and CR0 are
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
not updated.
rD,rA,rB
Figure 3-28 shows the operation of the integer multiply-accumulate cross-halfword to
word instructions.
March 2002 Release www.xilinx.com 407 Virtex-II Pro Platform FPGA Documentation 1-800-255-7778
R
03116
Chapter 3: User Programming Model
rA
03115
rB
×
031
rD
032
1
+
Intermediate Result
031
rD
UG011_20_033101
Figure 3-28: Multiply-Accumulate Cross-Halfword to Word Operation
Multiply-Accumulate High-Halfword to Word Instructions
Ta bl e 3 -4 1 shows the PPC405 multiply-accumulate high-halfword to word instructions. These
instructions multiply the high halfword of both source operands, rA[0:15] and rB[0:15], producing a 32-bit product. The product is signed or unsigned, depending on the instruction. This product is added to the value in the destination register, rD, producing a 33-bit intermediate result. Generally, rD is loaded with the lower-32 bits of the 33-bit intermediate result. However, if the instruction performs saturating arithmetic and the intermediate result overflows, rD is loaded with the nearest representable value (see
Modulo and Saturating Arithmetic, page 405).
For each type of instruction shown in Ta b le 3 - 41 , the “Operation” column indicates the multiply-accumulate operation performed. The column also shows, on an instruction-by­instruction basis, how the XER and CR registers are updated (if at all).
Table 3-41: Multiply-Accumulate High-Halfword to Word Instructions
Mnemonic Name Operation
Multiply-Accumulate High-Halfword to Word Modulo Signed Instructions
machhw Multiply Accumulate High Halfword
to Word Modulo Signed
rD is added to the signed product (rA[0:15]) × (rB[0:15]), producing a 33-bit result. The low-32 bits of this result are stored in rD.
XER and CR0 are not updated.
Operand
Syntax
rD,rA,rB
machhw. Multiply Accumulate High Halfword
to Word Modulo Signed and Record
machhwo Multiply Accumulate High Halfword
to Word Modulo Signed with Overflow Enabled
machhwo. Multiply Accumulate High Halfword
to Word Modulo Signed with Overflow Enabled and Record
408 www.xilinx.com March 2002 Release
CR0 is updated to reflect the result.
XER[OV,SO] are updated to reflect the result.
XER[OV,SO] and CR0 are updated to reflect the result.
1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation
Loading...