Xilinx PPC405 User Manual

Download

Volume 2(a): PPC405 User Manual

Virtex-II Pro™ Platform FPGA Developer’s Kit

March 2002 Release

The Xilinx logo shown above is a registered trademark of Xilinx, Inc.

The shadow X shown above is a trademark of Xilinx, Inc.

"Xilinx" and the Xilinx logo are registered trademarks of Xilinx, Inc. Any rights not expressly granted herein are reserved.

CoolRunner, RocketChips, Rocket IP, Spartan, StateBENCH, StateCAD, Virtex, XACT, XC2064, XC3090, XC4005, XC5210 are registered Trademarks of Xilinx, Inc.

ACE Controller, ACE Flash, A.K.A. Speed, Alliance Series, AllianceCORE, Bencher, ChipScope, Configurable Logic Cell, CORE Generator, CoreLINX, Dual Block, EZTag, Fast CLK, Fast CONNECT, Fast FLASH, FastMap, Fast Zero Power, Foundation, Gigabit Speeds...and Beyond!, HardWire, HDL Bencher, IRL, J Drive, JBits, LCA, LogiBLOX, Logic Cell, LogiCORE, LogicProfessor, MicroBlaze, MicroVia, MultiLINX, NanoBlaze, PicoBlaze, PLUSASM, PowerGuide, PowerMaze, QPro, Real-PCI, Rocket I/O, SelectI/O, SelectRAM, SelectRAM+, Silicon Xpresso, Smartguide, Smart-IP, Smar tSearch, SMARTswitch, System ACE, Testbench In A Minute, TrueMap, UIM, VectorMaze, VersaBlock, VersaRing, Virtex-II Pro, Wave Table, WebFITTER, WebPACK, WebPOWERED, XABEL, XACT-Floorplanner, XACT-Performance, XACTstep Advanced, XACTstep Foundry, XAM, XAPP, X-BLOX +, XC designated products, XChecker, XDM, XEPLD, Xilinx Foundation Series, Xilinx XDTV, Xinfo, XSI, XtremeDSP and ZERO+ are trademarks of Xilinx, Inc.

The Programmable Logic Company is a service mark of Xilinx, Inc.

The following are trademarks of International Business Machines Corporation in the United States, or other countries, or both: IBM IBM Logo PowerPC PowerPC Logo Blue Logic CoreConnect CodePack

All other trademarks are the property of their respective owners.

Xilinx does not assume any liability arising out of the application or use of any product described or shown herein; nor does it convey any license under its patents, copyrights, or maskwork rights or any rights of others. Xilinx reserves the right to make changes, at any time, in order to improve reliability, function or design and to supply the best product possible. Xilinx will not assume responsibility for the use of any circuitry described herein other than circuitry entirely embodied in its products. Xilinx provides any design, code, or information shown or described herein "as is." By providing the design, code, or information as one possible implementation of a feature, application, or standard, Xilinx makes no representation that such implementation is free from any claims of infringement. You are responsible for obtaining any rights you may require for your implementation. Xilinx expressly disclaims any warranty whatsoever with respect to the adequacy of any such implementation, including but not limited to any warranties or representations that the implementation is free from claims of infringement, as well as any implied warranties of merchantability or fitness for a particular purpose. Xilinx assumes no obligation to correct any errors contained herein or to advise any user of this text of any correction if such be made. Xilinx will not assume any liability for the accuracy or correctness of any engineering or software support or assistance provided to a user.

Xilinx products are not intended for use in life support appliances, devices, or systems. Use of a Xilinx product in such applications without the written consent of the appropriate Xilinx officer is prohibited.

Virtex-II Pro™ Platform FPGA Developer’s Kit www.xilinx.com March 2002 Release

1-800-255-7778

About This Book

Preface

This document is intended to serve as a stand-alone reference for application and system programmers of the PowerPC following documents:

• PowerPC 405 Embedded Processor Core User’s Manual published by IBM Corporation

(IBM order number SA14-2339-01).

• The IBM PowerPC Embedded Environment Architectural Specifications for IBM PowerPC Embedded Controllers, published by IBM Corporation.

• PowerPC Microprocessor Family: The Programming Environments published by IBM Corporation (IBM order number G522-0290-01).

• IBM PowerPC Embedded Processors Application Note: PowerPC 400 Series Caches: Programming and Coherency Issues.

• IBM PowerPC Embedded Processors Application Note: PowerPC 40x Watch Dog Timer.

• IBM PowerPC Embedded Processors Application Note: Programming Model Differences

of the IBM PowerPC 400 Family and 600/700 Family Processors.

Document Organization

• Chapter 1, Introduction to the PPC405, provides a general understanding of the

PPC405 as an implementation of the PowerPC embedded-environment architecture. This chapter also contains an overview of the features supported by the PPC405.

• Chapter 2, Operational Concepts, introduces the processor operating modes,

execution model, synchronization, operand conventions, and instruction conventions.

• Chapter 3, User Programming Model, describes the registers and instructions

available to application software.

• Chapter 4, PPC405 Privileged-Mode Programming Model, introduces the registers

and instructions available to system software.

• Chapter 5, Memory-System Management, describes the operation of the memory

system, including caches. Real-mode storage control is also described in this chapter.

• Chapter 6, Virtual-Memory Management, describes virtual-to-physical address

translation as supported by the PPC405. Virtual-mode storage control is also described in this chapter.

• Chapter 7, Exceptions and Interrupts, provides details of all exceptions recognized by

the PPC405 and how software can use the interrupt mechanism to handle exceptions.

• Chapter 8, Timer Resources, describes the timer registers and timer-interrupt controls

available in the PPC405.

• Chapter 9, Debugging, describes the debug resources available to software and

hardware debuggers.

• Chapter 10, Reset and Initialization, describes the state of the PPC405 following reset

405D5 processor. It combines information from the

March 2002 Release www.xilinx.com 311 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Preface

and the requirements for initializing the processor.

• Chapter 11, Instruction Set, provides a detailed description of each instruction

supported by the PPC405.

• Appendix A, Register Summary, is a reference of all registers supported by the

PPC405.

• Appendix B, Instruction Summary, lists all instructions sorted by mnemonic, opcode,

function, and form. Each entry for an instruction shows its complete encoding. General instruction-set information is also provided.

• Appendix C, Simplified Mnemonics, lists the simplified mnemonics recognized by

many PowerPC assemblers. These mnemonics provide a shorthand means of specifying frequently-used instruction encodings and can greatly improve assembler code readability.

• Appendix D, Programming Considerations, provides information on improving

performance of software written for the PPC405.

• Appendix E, PowerPC

6xx/7xx Compatibility, describes the programming model

differences between the PPC405 and PowerPC 6xx and 7xx series processors.

• Appendix F, PowerPC

Book-E Compatibility, describes the programming model

differences between the PPC405 and PowerPC Book-E processors.

Document Conventions

General Conventions

Ta bl e 1 lists the general notational conventions used throughout this document.

Table P-1: General Notational Conventions

Convention Definition

mnemonic Instruction mnemonics are shown in lower-case bold.

. (period) Update. When used as a character in an instruction

! (exclamation) In instruction listings, an exclamation (!) indicates the

variable Variable items are shown in italic.

<optional> Optional items are shown in angle brackets.

ActiveLow

n A decimal number.

0xn A hexadecimal number.

mnemonic, a period (.) means that the instruction updates the condition-register field.

start of a comment.

An overbar indicates an active-low signal.

0bn A binary number.

(rn) The contents of GPR rn.

(rA|0) The contents of the register rA, or 0 if the rA instruction

field is 0.

cr_bit Used in simplified mnemonics to specify a CR-bit

position (0 to 31) used as an operand.

312 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Document Conventions

Table P-1: General Notational Conventions (Continued)

Convention Definition

cr_field Used in simplified mnemonics to specify a CR field

(0 to 7) used as an operand.

OBJECT

Instruction Fields

Ta bl e 2 lists the instruction fields used in the various instruction formats. They are found in

the instruction encodings and pseudocode, and are referred to throughout this document when describing instructions. The table includes the bit locations for the field within the instruction encoding.

Table P-2: Instruction Field Definitions

Field Location Description

b:b

b,b, . . .

A single bit in any object (a register, an instruction, an address, or a field) is shown as a subscripted number or name.

A range of bits in any object (a register, an instruction, an address, or a field).

A list of bits in any object (a register, an instruction, an address, or a field).

]A list of fields in any register.

. . .

range of fields in any register.

AA 30 Absolute-address bit (branch instructions).

0—The immediate field represents an address relative to the current instruction address (CIA). The effective address (EA) of the branch is either the sum of the LI field sign-extended to 32 bits and the branch instruction address, or the sum of the BD field sign-extended to 32 bits and the branch instruction address.

1—The immediate field represents an absolute address. The EA of the branch is either the LI field or the BD field, sign-extended to 32 bits.

BD 16:29 An immediate field specifying a 14-bit signed two’s-complement

branch displacement. This field is concatenated on the right with 0b00 and sign-extended to 32 bits.

BI 11:15 Specifies a bit in the CR used as a source for the condition of a

conditional-branch instruction.

BO 6:10 Specifies options for conditional-branch instructions. See

Conditional Branch Control, page 367

crbA 11:15 Specifies a bit in the CR used as a source of a CR-logical instruction.

crbB 16:20 Specifies a bit in the CR used as a source of a CR-logical instruction.

crbD 6:10 Specifies a bit in the CR used as a destination of a CR-Logical

instruction.

March 2002 Release www.xilinx.com 313 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Preface

Table P-2: Instruction Field Definitions (Continued)

Field Location Description

crfD 6:8 Specifies a field in the CR used as a target in a compare or mcrf

instruction.

crfS 11:13 Specifies a field in the CR used as a source in a mcrf instruction.

CRM 12:19 The field mask used to identify CR fields to be updated by the

mtcrf instruction.

d 16:31 Specifies a 16-bit signed two’s-complement integer displacement

for load/store instructions.

DCRF 11:20 A split field used to specify a device control register (DCR). The

field is used to form the DCR number (DCRN).

E 16 A single-bit immediate field in the wrteei instruction specifying the

value to be written to the MSR[EE] bit.

LI 6:29 An immediate field specifying a 24-bit signed two’s-complement

branch displacement. This field is concatenated on the right with 0b00 and sign-extended to 32 bits.

LK 31 Link bit.

0—Do not update the link register (LR).

1—Update the LR with the address of the next instruction.

MB 21:25 Mask begin. Used in rotate-and-mask instructions to specify the

beginning bit of a mask.

ME 26:30 Mask end. Used in rotate-and-mask instructions to specify the

ending bit of a mask.

NB 16:20 Specifies the number of bytes to move in an immediate-string load

or immediate-string store.

OE 21 Enables setting the OV and SO fields in the fixed-point exception

OPCD 0:5 Primary opcode. Primary opcodes, in decimal, appear in the

instruction format diagrams presented with individual instructions. The OPCD field name does not appear in instruction descriptions.

rA 11:15 Specifies a GPR source operand and/or destination operand.

rB 16:20 Specifies a GPR source operand.

Rc 31 Record bit.

0—Instruction does not update the CR.

1—Instruction updates the CR to reflect the result of an operation.

See Condition Register (CR), page 361 for a further discussion of how the CR bits are set.

rD 6:10 Specifies a GPR destination operand.

rS 6:10 Specifies a GPR source operand.

SH 16:20 Specifies a shift amount.

314 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Document Conventions

Table P-2: Instruction Field Definitions (Continued)

Field Location Description

SIMM 16:31 An immediate field used to specify a 16-bit signed-integer value.

SPRF 11:20 A split field used to specify a special purpose register (SPR). The

field is used to form the SPR number (SPRN).

TBRF 11:20 A split field used to specify a time-base register (TBR). The field is

used to form the TBR number (TBRN).

TO 6:10 Specifies the trap conditions, as defined in the tw and twi

instruction descriptions.

UIMM 16:31 An immediate field used to specify a 16-bit unsigned-integer value.

XO 21:30 Extended opcode for instructions without an OE field. Extended

opcodes, in decimal, appear in the instruction format diagrams presented with individual instructions. The XO field name does not appear in instruction descriptions.

XO 22:30 Extended opcode for instructions with an OE field. Extended

opcodes, in decimal, appear in the instruction format diagrams presented with individual instructions. The XO field name does not appear in instruction descriptions.

Pseudocode Conventions

Ta bl e 3 lists additional conventions used primarily in the pseudocode describing the

operation of each instruction.

Table P-3: Pseudocode Conventions

Convention Definition

← Assignment ∧ AND logical operator ¬ NOT logical operator ∨ OR logical operator ⊕ Exclusive-OR (XOR) logical operator

+Two’s-complement addition

– Two’s-complement subtraction, unary minus

× Multiplication ÷ Division yielding a quotient

% Remainder of an integer division. For example, (33 % 32) = 1.

|| Concatenation =, ≠ Equal, not-equal relations

<, > Signed comparison relations

, Unsigned comparison relations

0:3

A four-bit object used to store condition results in compare instructions.

March 2002 Release www.xilinx.com 315 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Preface

Table P-3: Pseudocode Conventions (Continued)

Convention Definition

b The bit or bit value b is replicated n times.

x Bit positions that are don’t-cares. CEIL(n) Least integer ≥ n.

CIA Current instruction address. The 32-bit address of the instruction

being described by a sequence of pseudocode. This address is used to set the next instruction address (NIA). Does not correspond to any architected register.

DCR(DCRN) A specific device control register, as indicated by DCRN.

DCRN The device control register number formed using the split DCRF

field in a mfdcr or mtdcr instruction.

do Do loop. “to” and “by” clauses specify incrementing an iteration

variable. “while” and “until” clauses specify terminating conditions. Indenting indicates the scope of a loop.

EA Effective address. The 32-bit address that specifies a location in

main storage. Derived by applying indexing or indirect addressing rules to the specified operand.

EXTS(n) The result of extending

if...then...else... Conditional execution: if

n on the left with sign bits.

condition then a else b, where a and b

represent one or more pseudocode statements. Indenting indicates the ranges of

a and b. If b is null, the else does not

appear.

instruction(EA) An instruction operating on a data-cache block or instruction-

cache block associated with an EA.

leave Leave innermost do-loop or the do-loop specified by the leave

statement.

MASK(MB,ME) Mask having 1’s in positions MB through ME (wrapping if

MB > ME) and 0’s elsewhere.

MS(addr, n) The number of bytes represented by

storage represented by

addr.

n at the location in main

NIA Next instruction address. The 32-bit address of the next

instruction to be executed. In pseudocode, a successful branch is indicated by assigning a value to NIA. For instructions that do not branch, the NIA is CIA +4.

RESERVE Reserve bit. Indicates whether a process has reserved a block of

storage.

ROTL((RS),n) Rotate left. The contents of RS are shifted left the number of bits

specified by

SPR(SPRN) A specific special-purpose register, as indicated by SPRN.

316 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Registers

Table P-3: Pseudocode Conventions (Continued)

Convention Definition

SPRN The special-purpose register number formed using the split

SPRF field in a mfspr or mtspr instruction

TBR(TBRN) A specific time-base register, as indicated by TBRN.

TBRN The time-base register number formed using the split TBRF field

in a mftb instruction.

Operator Precedence

Ta bl e 4 lists the pseudocode operators and their associativity in descending order of

precedence

Table P-4: Operator Precedence

Operators Associativity

Registers

b Right to left

, REGISTER[FIELD], function evaluation Left to right

¬, – (unary minus) Right to left

×, ÷ Left to right

+, – Left to right || Left to right

≠, <, >, , Left to right

∧, ⊕ Left to right

∨ Left to right ← None

Ta bl e 5 lists the PPC405 registers and their descriptive names.

Table P-5: PPC405 Registers

CCR0 Core-configuration register 0

CR Condition register

CTR Count register

DACn Data-address compare n

DBCRn Debug-control register n

DBSR Debug-status register

DCCR Data-cache cacheability register

DCWR Data-cache write-through register

March 2002 Release www.xilinx.com 317 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Preface

Table P-5: PPC405 Registers (Continued)

DEAR Data-error address register

DVCn Data-value compare n

ESR Exception-syndrome register

EVPR Exception-vector prefix register

GPR General-purpose register. Specific GPRs are identified using the

notational convention rn (see below)

IACn Instruction-address compare n

ICCR Instruction-cache cacheability register

ICDBDR Instruction-cache debug-data register

LR Link register

MSR Machine-state register

PID Process ID

PIT Programmable-interval timer

Terms

PVR Processor-version register

rn Specifies GPR n (r15, for example)

SGR Storage-guarded register

SLER Storage little-endian register

SPRGn SPR general-purpose register n

SRRn Save/restore register n

SU0R Storage user-defined 0 register

TBL Time-base lower

TBU Time-base upper

TCR Timer-control register

TSR Timer-status register

USPRGn User SPR general-purpose register n

XER Fixed-point exception register

ZPR Zone-protection register

atomic access

A memory access that attempts to read from and write to the same address uninterrupted by other accesses to that address. The term refers to the fact that such transactions are indivisible.

big endian

A memory byte ordering where the address of an item corresponds to the most-significant byte.

318 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Terms

Book-E

cache block

cacheline

clear

cache set

congruence class

dirty

doubleword

effective address

exception

fill buffer

An version of the PowerPC architecture designed specifically for embedded applications.

Synonym for cacheline.

A portion of a cache array that contains a copy of contiguous system-memory addresses. Cachelines are 32-bytes long and aligned on a 32-byte address.

To write a bit value of 0.

Synonym for congruence class.

A collection of cachelines with the same index.

An indication that cache information is more recent than the copy in memory.

Eight bytes, or 64 bits.

The untranslated memory address as seen by a program.

An abnormal event or condition that requires the processor’s attention. They can be caused by instruction execution or an external device. The processor records the occurrence of an exception and they often cause an interrupt to occur.

A buffer that receives and sends data and instructions between the processor and PLB. It is used when cache misses occur and when access to non-cacheable memory occurs.

flush

halfword

hit

interrupt

invalidate

line buffer

little endian

logical address

A cache or TLB operation that involves writing back a modified entry to memory, followed by an invalidation of the entry.

Gigabyte, or one-billion bytes.

Two bytes, or 16 bits.

For cache arrays and TLB arrays, an indication that requested information exists in the accessed array.

The process of stopping the currently executing program so that an exception can be handled.

A cache or TLB operation that causes an entry to be marked as invalid. An invalid entry can be subsequently replaced.

Kilobyte, or one-thousand bytes.

A buffer located in the cache array that can temporarily hold the contents of an entire cacheline. It is loaded with the contents of a cacheline when a cache hit occurs.

A memory byte ordering where the address of an item corresponds to the least-significant byte.

Synonym for effective address.

Megabyte, or one-million bytes.

memory

miss

Collectively, cache memory and system memory.

For cache arrays and TLB arrays, an indication that requested information does not exist in the accessed array.

March 2002 Release www.xilinx.com 319 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Preface

OEA

on chip

pending

physical address

PLB

privileged mode

process

problem state

The PowerPC operating-environment architecture, which defines the memory-management model, supervisor-level registers and instructions, synchronization requirements, the exception model, and the time-base resources as seen by supervisor programs.

In system-on-chip implementations, this indicates on the same chip as the processor core, but external to the processor core.

As applied to interrupts, this indicates that an exception occurred, but the interrupt is disabled. The interrupt occurs when it is later enabled.

The address used to access physically-implemented memory. This address can be translated from the effective address. When address translation is not used, this address is equal to the effective address.

Processor local bus.

The operating mode typically used by system software. Privileged operations are allowed and software can access all registers and memory.

A program (or portion of a program) and any data required for the program to run.

Synonym for user mode.

real address

scalar

set

sticky

string

supervisor state

system memory

tag

UISA

Synonym for physical address.

Individual data objects and instructions. Scalars are of arbitrary size.

To write a bit value of 1.

A bit that can be set by software, but cleared only by the processor. Alternatively, a bit that can be cleared by software, but set only by the processor.

A sequence of consecutive bytes.

Synonym for privileged mode.

Physical memory installed in a computer system external to the processor core, such RAM, ROM, and flash.

As applied to caches, a set of address bits used to uniquely identify a specific cacheline within a congruence class. As applied to TLBs, a set of address bits used to uniquely identify a specific entry within the TLB.

The PowerPC user instruction-set architecture, which defines the base user-level instruction set, registers, data types, the memory model, the programming model, and the exception model as seen by user programs.

user mode

The operating mode typically used by application software. Privileged operations are not allowed in user mode, and software can access a restricted set of registers and memory.

320 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Additional Reading

VEA

virtual address

word

Additional Reading

In addition to the source documents listed on page 311, the following documents contain additional information of potential interest to readers of this manual:

• The PowerPC Architecture: A Specification for a New Family of RISC Processors, IBM 5/1994. Published by Morgan Kaufmann Publishers, Inc. San Francisco (ASIN:

1558603166).

• Book E: Enhanced PowerPC Architecture, IBM 3/2000.

• The PowerPC Compiler Writer’s Guide, IBM 1/1996. Published by Warthman Associates,

Palo Alto, CA (ISBN 0-9649654-0-2).

• Optimizing PowerPC Code : Programming the PowerPC Chip in Assembly Language, by Gary Kacmarcik (ASIN: 0201408392)

• PowerPC Programming Pocket Book, by Steve Heath (ISBN 0750621117).

• Computer Architecture: A Quantitative Approach, by John L. Hennessy and David A.

Patterson.

•

The PowerPC virtual-environment architecture, which defines a multi-access memory model, the cache model, cache-control instructions, and the time-base resources as seen by user programs.

An intermediate address used to translate an effective address into a physical address. It consists of a process ID and the effective address. It is only used when address translation is enabled.

Four bytes, or 32 bits.

March 2002 Release www.xilinx.com 321 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Preface

322 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Introduction to the PPC405

The PPC405 is a 32-bit implementation of the PowerPC® embedded-environment architecture that is derived from the PowerPC architecture. Specifically, the PPC405 is an embedded PowerPC 405D5 processor core.

The PowerPC architecture provides a software model that ensures compatibility between implementations of the PowerPC family of microprocessors. The PowerPC architecture defines parameters that guarantee compatible processor implementations at the application-program level, allowing broad flexibility in the development of derivative PowerPC implementations that meet specific market requirements.

This chapter provides an overview of the PowerPC architecture and an introduction to the features of the PPC405 core.

PowerPC Architecture Overview

Chapter 1

The PowerPC architecture is a 64-bit architecture with a 32-bit subset. The material in this document only covers aspects of the 32-bit architecture implemented by the PPC405.

In general, the PowerPC architecture defines the following:

• Instruction set

• Programming model

• Memory model

• Exception model

• Memory-management model

• Time-keeping model

Instruction Set

The instruction set specifies the types of instructions (such as load/store, integer arithmetic, and branch instructions), the specific instructions, and the encoding used for the instructions. The instruction set definition also specifies the addressing modes used for accessing memory.

Programming Model

The programming model defines the register set and the memory conventions, including details regarding the bit and byte ordering, and the conventions for how data are stored.

Memory Model

The memory model defines the address-space size and how it is subdivided into pages. It also defines attributes for specifying memory-region cacheability, byte ordering (bigendian or little-endian), coherency, and protection.

March 2002 Release www.xilinx.com 323 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Exception Model

The exception model defines the set of exceptions and the conditions that can cause those exceptions. The model specifies exception characteristics, such as whether they are precise or imprecise, synchronous or asynchronous, and maskable or non-maskable. The model defines the exception vectors and a set of registers used when interrupts occur as a result of an exception. The model also provides memory space for implementation-specific exceptions.

Memory-Management Model

The memory-management model defines how memory is partitioned, configured, and protected. The model also specifies how memory translation is performed, defines special memory-control instructions, and specifies other memory-management characteristics.

Time-Keeping Model

The time-keeping model defines resources that permit the time of day to be determined and the resources and mechanisms required for supporting timer-related exceptions.

PowerPC Architecture Levels

These above aspects of the PowerPC architecture are defined at three levels . This layering provides flexibility by allowing degrees of software compatibility across a wide range of implementations. For example, an implementation such as an embedded controller can support the user instruction set, but not the memory management, exception, and cache models where it might be impractical to do so.

The three levels of the PowerPC architecture are defined in Tab le 1 -1 .

Chapter 1: Introduction to the PPC405

Table 1-1: Three Levels of PowerPC Architecture

User Instruction-Set Architecture

Virtual Environment Architecture

(UISA)

• Defines the architecture level to which user-level (sometimes referred to as problem state) software should conform

• Defines the base user-level instruction set, user-level registers, data types, floatingpoint memory conventions, exception model as seen by user programs, memory model, and the programming model

• Defines additional user-level functionality that falls outside typical user-level software requirements

• Describes the memory model for an environment in which multiple devices can access memory

• Defines aspects of the cache model and cache-control instructions

• Defines the time-base resources from a user-level perspective

Note: All PowerPC implementations adhere to the UISA.

Note: Implementations that conform to the VEA level are guaranteed to conform to the UISA level.

The PowerPC architecture requires that all PowerPC implementations adhere to the UISA, offering compatibility among all PowerPC application programs. However, different versions of the VEA and OEA are permitted.

Embedded applications written for the PPC405 are compatible with other PowerPC implementations. Privileged software generally is not compatible. The migration of

(VEA)

Operating Environment

Architecture (OEA)

• Defines supervisor-level resources typically required by an operating system

• Defines the memorymanagement model, supervisorlevel registers, synchronization requirements, and the exception model

• Defines the time-base resources from a supervisor-level perspective

Note: Implementations that conform to the OEA level are guaranteed to conform to the UISA and VEA levels.

324 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PowerPC Architecture Overview

privileged software from the PowerPC architecture to the PPC405 is in many cases straightforward because of the simplifications made by the PowerPC embeddedenvironment architecture. Software developers who are concerned with crosscompatibility of privileged software between the PPC405 and other PowerPC implementations should refer to Appendix E, PowerPC

Latitude Within the PowerPC Architecture Levels

Although the PowerPC architecture defines parameters necessary to ensure compatibility among PowerPC processors, it also allows a wide range of options for individual implementations. These are:

• Some resources are optional, such as certain registers, bits within registers, instructions, and exceptions.

• Implementations can define additional privileged special-purpose registers (SPRs), exceptions, and instructions to meet special system requirements, such as power management in processors designed for very low-power operation.

• Implementations can define many operating parameters. For example, the PowerPC architecture can define the possible condition causing an alignment exception. A particular implementation can choose to solve the alignment problem without causing an exception.

• Processors can implement any architectural resource or instruction with assistance from software (that is, they can trap and emulate) as long as the results (aside from performance) are identical to those specified by the architecture. In this case, a complete implementation requires both hardware and software.

• Some parameters are defined at one level of the architecture and defined more specifically at another. For example, the UISA defines conditions that can cause an alignment exception and the OEA specifies the exception itself.

6xx/7xx Compatibility.

Features Not Defined by the PowerPC Architecture

Because flexibility is an important feature of the PowerPC architecture, many aspects of processor design (typically relating to the hardware implementation) are not defined, including the following:

System-Bus Interface

Although many implementations can share similar interfaces, the PowerPC architecture does not define individual signals or the bus protocol. For example, the OEA allows each implementation to specify the signal or signals that trigger a machine-check exception.

Cache Design

The PowerPC architecture does not define the size, structure, replacement algorithm, or mechanism used for maintaining cache coherency. The PowerPC architecture supports, but does not require, the use of separate instruction and data caches.

Execution Units

The PowerPC architecture is a RISC architecture, and as such has been designed to facilitate the design of processors that use pipelining and parallel execution units to maximize instruction throughput. However, the PowerPC architecture does not define the internal hardware details of an implementation. For example, one processor might implement two units dedicated to executing integer-arithmetic instructions and another might implement a single unit for executing all integer instructions.

Other Internal Microarchitecture Issues

The PowerPC architecture does not specify the execution unit responsible for executing a particular instruction. The architecture does not define details regarding the instructionfetch mechanism, how instructions are decoded and dispatched, and how results are written to registers. Dispatch and write-back can occur in-order or out-of-order. Although

March 2002 Release www.xilinx.com 325 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 1: Introduction to the PPC405

the architecture specifies certain registers, such as the GPRs and FPRs, implementations can use register renaming or other schemes to reduce the impact of data dependencies and register contention.

Implementation-Specific Registers

Each implementation can have its own unique set of implementation registers that are not defined by the architecture.

PowerPC Embedded-Environment Architecture

The PowerPC embedded-environment architecture is optimized for embedded controllers. This architecture is a forerunner to the PowerPC Book-E architecture. The PowerPC embedded-environment architecture provides an alternative definition for certain features specified by the PowerPC VEA and OIA. Implementations that adhere to the PowerPC embedded-environment architecture also adhere to the PowerPC UISA. PowerPC embedded-environment processors are 32-bit only implementations and thus do not include the special 64-bit extensions to the PowerPC UISA. Also, floating-point support can be provided either in hardware or software by PowerPC embedded-environment processors.

Figure 1-1 shows the relationship between the PowerPC embedded-environment

architecture, the PowerPC architecture, and the PowerPC Book-E architecture.

PowerPC

Embedded-Environment Architecture

32-Bit Only

VEA Enhancements

- True Little-Endian Support

- Enhanced Cache Management

OEA Enhancements

- Simplified Memory Management

- Software-Managed TLB

- Variable Page Sizes

- Interrupt Extensions

- Critical/Non-Critical

- Virtual-Memory Relocatable

- Timer Extensions

- Debug Extensions

64-Bit UISA Extensions Synchronization Using Memory Barriers

PowerPC

Book-E Architecture

UISA

PowerPC

Architecture

32-Bit/64-Bit Modes OEA

- Hashed Paging

- Segments, BATs

UG011_38_090701

Figure 1-1: Relationship of PowerPC Architectures

The PowerPC embedded-environment architecture features:

• Memory management optimized for embedded software environments.

• Cache-management instructions for optimizing performance and memory control in

complex applications that are graphically and numerically intensive.

• Storage attributes for controlling memory-system behavior.

326 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PowerPC Architecture Overview

• Special-purpose registers for controlling the use of debug resources, timer resources, interrupts, real-mode storage attributes, memory-management facilities, and other architected processor resources.

• A device-control-register address space for managing on-chip peripherals such as memory controllers.

• A dual-level interrupt structure and interrupt-control instructions.

• Multiple timer resources.

• Debug resources that enable hardware-debug and software-debug functions such as

instruction breakpoints, data breakpoints, and program single-stepping.

Virtual Environment

The virtual environment defines architectural features that enable application programs to create or modify code, to manage storage coherency, and to optimize memory-access performance. It defines the cache and memory models, the timekeeping resources from a user perspective, and resources that are accessible in user mode but are primarily used by system-library routines. The following summarizes the virtual-environment features of the PowerPC embedded-environment architecture:

• Storage model:

- Storage-control instructions as defined in the PowerPC virtual-environment

- Storage attributes for controlling memory-system behavior. These are: write-

- Operand-placement requirements and their effect on performance.

• The time-base function as defined by the PowerPC virtual-environment architecture, for user-mode read access to the 64-bit time base.

architecture. These instructions are used to manage instruction caches and data caches, and for synchronizing and ordering instruction execution.

through, cacheability, memory coherence (optional), guarded, and endian.

March 2002 Release www.xilinx.com 327 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 1: Introduction to the PPC405

Operating Environment

The operating environment describes features of the architecture that enable operating systems to allocate and manage storage, to handle errors encountered by application programs, to support I/O devices, and to provide operating-system services. It specifies the resources and mechanisms that require privileged access, including the memoryprotection and address-translation mechanisms, the exception-handling model, and privileged timer resources. Tab le 1 -2 summarizes the operating-environment features of the PowerPC embedded-environment architecture.

Table 1-2: Operating-Environment Features of the PowerPC Embedded-Environment Architecture

Operating

Environment

Storage model

Exception model

Debug model

Time-keeping model

Synchronization requirements

Reset and initialization requirements

Features

• Privileged special-purpose registers (SPRs) and instructions for accessing those registers

• Device control registers (DCRs) and instructions for accessing those registers

• Privileged cache-management instructions

• Storage-attribute controls

• Address translation and memory protection

• Privileged TLB-management instructions

• Dual-level interrupt structure supporting various exception types

• Specification of interrupt priorities and masking

• Privileged SPRs for controlling and handling exceptions

• Interrupt-control instructions

• Specification of how partially executed instructions are handled when an interrupt

occurs

• Privileged SPRs for controlling debug modes and debug events

• Specification for seven types of debug events

• Specification for allowing a debug event to cause a reset

• The ability of the debug mechanism to freeze the timer resources

• 64-bit time base

• 32-bit decrementer (the programmable-interval timer)

• Three timer-event interrupts:

- Programmable-interval timer (PIT)

- Fixed-interval timer (FIT)

-Watchdog timer (WDT)

• Privileged SPRs for controlling the timer resources

• The ability to freeze the timer resources using the debug mechanism

• Requirements for special registers and the TLB

• Requirements for instruction fetch and for data access

• Specifications for context synchronization and execution synchronization

• Specification for two internal mechanisms that can cause a reset:

- Debug-control register (DBCR)

- Timer-control register (TCR)

• Contents of processor resources after a reset

• The software-initialization requirements, including an initialization code example

328 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

PowerPC Book-E Architecture

The PowerPC Book-E architecture extends the capabilities introduced in the PowerPC embedded-environment architecture. Although not a PowerPC Book-E implementation, many of the features available in the 32-bit subset of the PowerPC Book-E architecture are available in the PPC405. The PowerPC Book-E architecture and the PowerPC embeddedenvironment architecture differ in the following general ways:

• 64-bit addressing and 64-bit operands are available. Unlike 64-bit mode in the PowerPC UISA, 64-bit support in PowerPC Book-E architecture is non-modal and instead defines new 64-bit instructions and flags.

• Real mode is eliminated, and the memory-management unit is active at all times. The elimination of real mode results in the elimination of real-mode storage-attribute registers.

• Memory synchronization requirements are changed in the architecture and a memory-barrier instruction is introduced.

• A small number of new instructions are added to the architecture and several instructions are removed.

• Several SPR addresses and names are changed in the architecture, as are the assignment and meanings of some bits within certain SPRs.

Embedded applications written for the PPC405 are compatible with PowerPC Book-E implementations. Privileged software is, in general, not compatible, but the differences are relatively minor. Software developers who are concerned with cross-compatibility of privileged software between the PPC405 and PowerPC Book-E implementations should

refer to Appendix F, PowerPC

Book-E Compatibility.

PPC405 Features

The PPC405 processor core is an implementation of the PowerPC embedded-environment architecture. The processor provides fixed-point embedded applications with high performance at low power consumption. It is compatible with the PowerPC UISA. Much of the PPC405 VEA and OEA support is also available in implementations of the PowerPC Book-E architecture. Key features of the PPC405 include:

• A fixed-point execution unit fully compliant with the PowerPC UISA:

• PowerPC embedded-environment architecture extensions providing additional

• Performance-enhancing features, including:

- 32-bit architecture, containing thirty-two 32-bit general purpose registers (GPRs).

support for embedded-systems applications:

- True little-endian operation

- Flexible memory management

- Multiply-accumulate instructions for computationally intensive applications

- Enhanced debug capabilities

- 64-bit time base

- 3 timers: programmable interval timer (PIT), fixed interval timer (FIT), and

watchdog timer (All are synchronous with the time base)

- Static branch prediction

- Five-stage pipeline with single-cycle execution of most instructions, including

loads and stores

- Multiply-accumulate instructions

- Hardware multiply/divide for faster integer arithmetic (4-cycle multiply, 35-cycle

divide)

- Enhanced string and multiple-word handling

March 2002 Release www.xilinx.com 329 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 1: Introduction to the PPC405

- Support for unaligned loads and unaligned stores to cache arrays, main memory,

and on-chip memory (OCM)

- Minimized interrupt latency

• Integrated instruction-cache:

- 16 KB, 2-way set associative

- Eight words (32 bytes) per cacheline

- Fetch line buffer

- Instruction-fetch hits are supplied from the fetch line buffer

- Programmable prefetch of next-sequential line into the fetch line buffer

- Programmable prefetch of non-cacheable instructions: full line (eight words) or

half line (four words)

- Non-blocking during fetch line fills

• Integrated data-cache:

- 16 KB, 2-way set associative

- Eight words (32 bytes) per cacheline

- Read and write line buffers

- Load and store hits are supplied from/to the line buffers

- Write-back and write-through support

- Programmable load and store cacheline allocation

- Operand forwarding during cacheline fills

- Non-blocking during cacheline fills and flushes

• Support for on-chip memory (OCM) that can provide memory-access performance identical to a cache hit

• Flexible memory management:

- Translation of the 4 GB logical-address space into the physical-address space

- Independent control over instruction translation and protection, and data

translation and protection

- Page-level access control using the translation mechanism

- Software control over the page-replacement strategy

- Write-through, cacheability, user-defined 0, guarded, and endian (WIU0GE)

storage-attribute control for each virtual-memory region

- WIU0GE storage-attribute control for thirty-two 128 MB regions in real mode

- Additional protection control using zones

• Enhanced debug support with logical operators:

- Four instruction-address compares

- Two data-address compares

- Two data-value compares

- JTAG instruction for writing into the instruction cache

- Forward and backward instruction tracing

• Advanced power management support

Privilege Modes

Software running on the PPC405 can do so in one of two privilege modes: privilieged and user. The privilege modes supported by the PPC405 are described in Processor Operating

Modes, page 343.

330 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

Address Translation Modes

Privileged Mode

Privileged mode allows programs to access all registers and execute all instructions supported by the processor. Normally, the operating system and low-level device drivers operate in this mode.

User Mode

User mode restricts access to some registers and instructions. Normally, application programs operate in this mode.

The PPC405 also supports two modes of address translation: real and virtual. Refer to

Chapter 6, Virtual-Memory Management, for more information on address translation.

Real Mode

In real mode, programs address physical memory directly.

Virtual Mode

In virtual mode, programs address virtual memory and virtual-memory addresses are translated by the processor into physical-memory addresses. This allows programs to access much larger address spaces than might be implemented in the system.

Addressing Modes

Whether the PPC4 05 is running in real mode or virtual mode, data addressing is supported by the load and store instructions using one of the following addressing modes:

• Register-indirect with immediate index—A base address is stored in a register, and a displacement from the base address is specified as an immediate value in the instruction.

• Register-indirect with index—A base address is stored in a register, and a displacement from the base address is stored in a second register.

• Register indirect—The data address is stored in a register.

Instructions that use the two indexed forms of addressing also allow for automatic updates to the base-address register. With these instruction forms, the new data address is calculated, used in the load or store data access, and stored in the base-address register.

The data-addressing modes are described in Operand-Address Calculation, page 378.

With sequential-instruction execution, the next-instruction address is calculated by adding four bytes to the current-instruction address. In the case of branch instructions, however, the next-instruction address is determined using one of four branch-addressing modes:

• Branch to relative—The next-instruction address is at a location relative to the currentinstruction address.

• Branch to absolute—The next-instruction address is at an absolute location in memory.

• Branch to link register—The next-instruction address is stored in the link register.

• Branch to count register—The next-instruction address is stored in the count register.

The branch-addressing modes are described in Branch-Target Address Calculation,

page 372.

Data Types

PPC405 instructions support byte, halfword, and word operands. Multiple-word operands are supported by the load/store multiple instructions and byte strings are supported by

March 2002 Release www.xilinx.com 331 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

the load/store string instructions. Integer data are either signed or unsigned, and signed data is represented using two’s-complement format.

The address of a multi-byte operand is determined using the lowest memory address occupied by that operand. For example, if the four bytes in a word operand occupy addresses 4, 5, 6, and 7, the word address is 4. The PPC405 supports both big-endian (an operand’s most-significant byte is at the lowest memory address) and little-endian (an operand’s least-significant byte is at the lowest memory address) addressing.

See Operand Conventions, page 347, for more information on the supported data types and byte ordering.

Register Set Summary

Figure 1-2, page 333 shows the registers contained in the PPC405. Descriptions of the

registers are in the following sections.

Chapter 1: Introduction to the PPC405

332 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

User Registers

General-Purpose Registers

r0 r1

. . .

r31

Condition Register

Fixed-Point Exception Register

XER

Link Register

Count Register

CTR

User-SPR General-Purpose

Registers

USPRG0

SPR General-Purpose

Registers

Time-Base Registers

(read only)

SPRG4 SPRG5 SPRG6 SPRG7

(read only)

TBU

TBL

Privileged Registers

Machine-State Register

MSR

Core-Configuration Register

CCR0

SPR General-Purpose

Registers

SPRG0 SPRG1 SPRG2 SPRG3 SPRG4 SPRG5 SPRG6 SPRG7

Exception-Handling Registers

EVPR

ESR DEAR SRR0 SRR1 SRR2 SRR3

Memory-Management

Registers

PID

ZPR

Storage-Attribute Control

Registers

DCCR

DCWR

ICCR

SGR SLER SU0R

Debug Registers

DBSR DBCR0 DBCR1

DAC1 DAC2 DVC1 DVC2

IAC1 IAC2 IAC3 IAC4

ICDBR

Timer Registers

TCR

TSR

PIT

Processor-Version Register

PVR

Time-Base Registers

TBU

TBL

UG011_51_033101

Figure 1-2: PPC405 Registers

General-Purpose Registers

The processor contains thirty-two 32-bit general-purpose registers (GPRs), identified as r0 through r31. The contents of the GPRs are read from memory using load instructions and written to memory using store instructions. Computational instructions often read operands from the GPRs and write their results in GPRs. Other instructions move data between the GPRs and other registers. GPRs can be accessed by all software. See General-

Purpose Registers (GPRs), page 360, for more information.

March 2002 Release www.xilinx.com 333 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Special-Purpose Registers

The processor contains a number of 32-bit special-purpose registers (SPRs). SPRs provide access to additional processor resources, such as the count register, the link register, debug resources, timers, interrupt registers, and others. Most SPRs are accessed only by privileged software, but a few, such as the count register and link register, are accessed by all software. See User Registers, page 359, and Privileged Registers, page 429 for more information.

Machine-State Register

The 32-bit machine-state register (MSR) contains fields that control the operating state of the processor. This register can be accessed only by privileged software. See Machine-State

Condition Register

The 32-bit condition register (CR) contains eight 4-bit fields, CR0–CR7. The values in the CR fields can be used to control conditional branching. Arithmetic instructions can set CR0 and compare instructions can set any CR field. Additional instructions are provided to perform logical operations and tests on CR fields and bits within the fields. The CR can be accessed by all software. See Condition Register (CR), page 361, for more information.

Device Control Registers

Chapter 1: Introduction to the PPC405

The 32-bit device control registers (not shown) are used to configure, control, and report status for various external devices that are not part of the PPC405 processor. Although the DCRs are not part of the PPC405 implementation, they are accessed using the mtdcr and mfdcr instructions. The DCRs can be accessed only by privileged software. See the PPC405

Processor Block Manual for more information on implementing DCRs.

PPC405 Organization

As shown in Figure 1-3, the PPC405 processor contains the following elements:

• A 5-stage pipeline consisting of fetch, decode, execute, write-back, and load writeback stages

• A virtual-memory-management unit that supports multiple page sizes and a variety of storage-protection attributes and access-control options

• Separate instruction-cache and data-cache units

• Debug support, including a JTAG interface

• Three programmable timers

The following sections provide an overview of each element.

334 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

PLB Master

Read Interface

I-Cache

Array

Instruction-Cache

I-Cache

Controller

Unit

Cache Units

Data-Cache

Unit

D-Cache

Array

D-Cache

Controller

Instruction

OCM

Instruction

Shadow-TLB

(4-Entry)

Unified TLB

(64-Entry)

Data

Shadow-TLB

(8-Entry)

Fetch

and

Decode

Logic

32x32

GPR

CPUMMU

3-Element

Fetch Queue

Execute Unit

ALU MAC

Timers

and

Debug

Logic

PLB Master

Read Interface

PLB Master

Write Interface

Data

OCM

External-Interrupt

Controller Interface

JTAG

Instruction

Figure 1-3: PPC405 Organization

Central-Processing Unit

The PPC405 central-processing unit (CPU) implements a 5-stage instruction pipeline consisting of fetch, decode, execute, write-back, and load write-back stages.

The fetch and decode logic sends a steady flow of instructions to the execute unit. All instructions are decoded before they are forwarded to the execute unit. Instructions are queued in the fetch queue if execution stalls. The fetch queue consists of three elements: two prefetch buffers and a decode buffer. If the prefetch buffers are empty instructions flow directly to the decode buffer.

Up to two branches are processed simultaneously by the fetch and decode logic. If a branch cannot be resolved prior to execution, the fetch and decode logic predicts how that branch is resolved, causing the processor to speculatively fetch instructions from the predicted path. Branches with negative-address displacements are predicted as taken, as are branches that do not test the condition register or count register. The default prediction can be overridden by software at assembly or compile tim e. This capability is described further in Branch Prediction, page 370.

The PPC405 has a single-issue execute unit containing the general-purpose register file (GPR), arithmetic-logic unit (ALU), and the multiply-accumulate unit (MAC). The GPRs consist of thirty-two 32-bit registers that are accessed by the execute unit using three read ports and two write ports. During the decode stage, data is read out of the GPRs for use by the execute unit. During the write-back stage, results are written to the GPR. The use of five read/write ports on the GPRs allows the processor to execute load/store operations in parallel with ALU and MAC operations.

Trace

UG011_29_033101

March 2002 Release www.xilinx.com 335 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

The execute unit supports all 32-bit PowerPC UISA integer instructions in hardware, and is compliant with the PowerPC embedded-environment architecture specification. Floatingpoint operations are not supported.

The MAC unit supports implementation-specific multiply-accumulate instructions and multiply-halfword instructions. MAC instructions operate on either signed or unsigned 16-bit operands, and they store their results in a 32-bit GPR. These instructions can produce results using either modulo arithmetic or saturating arithmetic. All MAC instructions have a single cycle throughput. See Multiply-Accumulate Instruction-Set

Extensions, page 405 for more information.

Exception Handling Logic

Exceptions are divided into two classes: critical and noncritical. The PPC405 CPU services exceptions caused by error conditions, the internal timers, debug events, and the external interrupt controller (EIC) interface. Across the two classes, a total of 19 possible exceptions are supported, including the two provided by the EIC interface.

Each exception class has its own pair of save/restore registers. SRR0 and SRR1 are used for noncritical interrupts, and SRR2 and SRR3 are used for critical interrupts. The exceptionreturn address and the machine state are written to these registers when an exception occurs, and they are automatically restored when an interrupt handler exits using the return-from-interrupt (rfi) or return-from critical-interrupt (rfci) instruction. Use of separate save/restore registers allows the PPC405 to handle critical interrupts independently of noncritical interrupts.

See Chapter 7, Exceptions and Interrupts, for information on exception handling in the PPC405.

Chapter 1: Introduction to the PPC405

Memory Management Unit

The PPC405 supports 4 GB of flat (non-segmented) address space. The memorymanagement unit (MMU) provides address translation, protection functions, and storageattribute control for this address space. The MMU supports demand-paged virtual memory using multiple page sizes of 1 KB, 4 KB, 16 KB, 64 KB, 256 KB, 1 MB, 4 MB and 16 MB. Multiple page sizes can improve memory efficiency and minimize the number of TLB misses. When supported by system software, the MMU provides the following functions:

• Translation of the 4 GB logical-address space into a physical-address space.

• Independent enabling of instruction translation and protection from that of data

translation and protection.

• Page-level access control using the translation mechanism.

• Software control over the page-replacement strategy.

• Additional protection control using zones.

• Storage attributes for cache policy and speculative memory-access control.

The translation look-aside buffer (TLB) is used to control memory translation and protection. Each one of its 64 entries specifies a page translation. It is fully associative, and can simultaneously hold translations for any combination of page sizes. To prevent TLB contention between data and instruction accesses, a 4-entry instruction and an 8-entry data shadow-TLB are maintained by the processor transparently to software.

Software manages the initialization and replacement of TLB entries. The PPC405 includes instructions for managing TLB entries by software running in privileged mode. This capability gives significant control to system software over the implementation of a page replacement strategy. For example, software can reduce the potential for TLB thrashing or delays associated with TLB-entry replacement by reserving a subset of TLB entries for globally accessible pages or critical pages.

Storage attributes are provided to control access of memory regions. When memory translation is enabled, storage attributes are maintained on a page basis and read from the

336 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

TLB when a memory access occurs. When memory translation is disabled, storage attributes are maintained in storage-attribute control registers. A zone-protection register (ZPR) is provided to allow system software to override the TLB access controls without requiring the manipulation of individual TLB entries. For example, the ZPR can provide a simple method for denying read access to certain application programs.

Chapter 6, Virtual-Memory Management, describes these memory-management

resources in detail.

Instruction and Data Caches

The PPC405 accesses memory through the instruction-cache unit (ICU) and data-cache unit (DCU). Each cache unit includes a PLB-master interface, cache arrays, and a cache controller. Hits into the instruction cache and data cache appear to the CPU as single-cycle memory accesses. Cache misses are handled as requests over the PLB bus to another PLB device, such as an external-memory controller.

The PPC405 implements separate instruction-cache and data-cache arrays. Each is 16 KB in size, is two-way set-associative, and operates using 8-word (32 byte) cachelines. The caches are non-blocking, allowing the PPC405 to overlap instruction execution with reads over the PLB (when cache misses occur).

The cache controllers replace cachelines according to a least-recently used (LRU) replacement policy. When a cacheline fill occurs, the most-recently accessed line in the cache set is retained and the other line is replaced. The cache controller updates the LRU during a cacheline fill.

The ICU supplies up to two instructions every cycle to the fetch and decode unit. The ICU can also forward instructions to the fetch and decode unit during a cacheline fill, minimizing execution stalls caused by instruction-cache misses. When the ICU is accessed, four instructions are read from the appropriate cacheline and placed temporarily in a line buffer. Subsequent ICU accesses check this line buffer for the requested instruction prior to accessing the cache array. This allows the ICU cache array to be accessed as little as once every four instructions, significantly reducing ICU power consumption.

The DCU can independently process load/store operations and cache-control instructions. The DCU can also dynamically reprioritize PLB requests to reduce the length of an execution stall. For example, if the DCU is busy with a low-priority request and a subsequent storage operation requested by the CPU is stalled, the DCU automatically increases the priority of the current (low-priority) request. The current request is thus finished sooner, allowing the DCU to process the stalled request sooner. The DCU can forward data to the execute unit during a cacheline fill, further minimizing execution stalls caused by data-cache misses.

Additional features allow programmers to tailor data-cache performance to a specific application. The DCU can function in write-back or write-through mode, as determined by the storage-control attributes. Loads and stores that do not allocate cachelines can also be specified. Inhibiting certain cacheline fills can reduce potential pipeline stalls and unwanted external-bus traffic.

See Chapter 5, Memory-System Management, for details on the operation and control of the PPC405 caches.

Timer Resources

The PPC405 contains a 64-bit time base and three timers. The time base is incremented synchronously using the CPU clock or an external clock source. The three timers are incremented synchronously with the time base. (See Chapter 8, Timer Resources, for more information on these features.) The three timers supported by the PPC405 are:

• Programmable Interval Timer

• Fixed Interval Timer

• Watc h dog Ti m er

March 2002 Release www.xilinx.com 337 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 1: Introduction to the PPC405

Programmable Interval Timer

The programmable interval timer (PIT) is a 32-bit register that is decremented at the time-base increment frequency. The PIT register is loaded with a delay value. When the PIT count reaches 0, a PIT interrupt occurs. Optionally, the PIT can be programmed to automatically reload the last delay value and begin decrementing again.

Fixed Interval Timer

The fixed interval timer (FIT) causes an interrupt when a selected bit in the time-base register changes from 0 to 1. Programmers can select one of four predefined bits in the time-base for triggering a FIT interrupt.

Watchdog Timer

The watchdog timer causes a hardware reset when a selected bit in the time-base register changes from 0 to 1. Programmers can select one of four predefined bits in the time-base for triggering a reset, and the type of reset can be defined by the programmer.

Note: The time-base register alone does not cause interrupts to occur.

Debug

The PPC405 debug resources include special debug modes that support the various types of debugging used during hardware and software development. These are:

• Internal-debug mode for use by ROM monitors and software debuggers

• External-debug mode for use by JTAG debuggers

• Debug-wait mode, which allows the servicing of interrupts while the processor appears

to be stopped

• Real-time trace mode, which supports event triggering for real-time tracing

Debug events are supported that allow developers to manage the debug process. Debug modes and debug events are controlled using debug registers in the processor. The debug registers are accessed either through software running on the processor or through the JTAG port. The JTAG port can also be used for board tests.

The debug modes, events, controls, and interfaces provide a powerful combination of debug resources for hardware and software development tools. Chapter 9, Debugging, describes these resources in detail.

PPC405 Interfaces

The PPC405 provides a set of interfaces that supports the attachment of cores and user logic. The software resources used to manage the PPC405 interfaces are described in the

Core-Configuration Register, page 459 . For information on the hardware operation, use,

and electrical characteristics of these interfaces, refer to the PPC405 Processor Block

Manual. The following interfaces are provided:

• Processor local bus interface

• Device control register interface

• Clock and power management interface

• JTAG port interface

• On-chip interrupt controller interface

• On-chip memory controller interface

Processor Local Bus

The processor local bus (PLB) interface provides a 32-bit address and three 64-bit data buses attached to the instruction-cache and data-cache units. Two of the 64-bit buses are attached to the data-cache unit, one supporting read operations and the other supporting write operations. The third 64-bit bus is attached to the instruction-cache unit to support instruction fetching.

338 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

PPC405 Features

Device Control Register

The device control register (DCR) bus interface supports the attachment of on-chip registers for device control. Software can access these registers using the mfdcr and mtdcr instructions.

Clock and Power Management

The clock and power-management interface supports several methods of clock distribution and power management.

JTAG Port

The JTAG port interface supports the attachment of external debug tools. Using the JTAG test-access port, a debug tool can single-step the processor and examine internal-processor state to facilitate software debugging. This capability complies with the IEEE 1149.1 specification for vendor-specific extensions, and is therefore compatible with standard JTAG hardware for boundary-scan system testing.

On-Chip Interrupt Controller

The on-chip interrupt controller interface is an external interrupt controller that combines asynchronous interrupt inputs from on-chip and off-chip sources and presents them to the core using a pair of interrupt signals (critical and noncritical). Asynchronous interrupt sources can include external signals, the JTAG and debug units, and any other on-chip peripherals.

On-Chip Memory Controller

An on-chip memory (OCM) interface supports the attachment of additional memory to the instruction and data caches that can be accessed at performance levels matching the cache arrays.

March 2002 Release www.xilinx.com 339 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 1: Introduction to the PPC405

340 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operational Concepts

This chapter describes the operational concepts governing the PPC405 programming model. These concepts include the execution and memory-access models, processor operating modes, memory organization and management, and instruction conventions.

Execution Model

From a software viewpoint, PowerPC® processors implement a sequential-execution model. That is, the processors appear to execute instructions in program order. Internally and invisible to software, PowerPC processors can execute instructions out-of-order and can speculatively execute instructions. The processor is responsible for maintaining an inorder execution state visible to software. The execution of an instruction sequence can be interrupted by an exception caused by one of the executing instructions or by an asynchronous event. The PPC405 does not support out-of-order instruction execution. However, the processor does support speculative instruction execution, typically by predicting the outcome of branch instructions.

As described in Ordering Memory Accesses, page 448, the PowerPC architecture specifies a weakly consistent memory model for shared-memory multiprocessor systems. The weakly consistent memory model allows system bus operations to be reordered dynamically. The goal of reordering bus operations is to reduce the effect of memory latency and improving overall performance. In single-processor systems, loads and stores can be reordered dynamically to allow efficient utilization of the processor bus. Loads can be performed speculatively to enhance the speculative-execution capabilities. This model provides an opportunity for significantly improved performance over a model that has stronger memory-consistency rules, but places the responsibility for access ordering on the programmer.

When a program requires strict instruction-execution ordering or memory-access ordering for proper execution, the programmer must insert the appropriate ordering or synchronization instructions into the program. These instructions are described in

Synchronizing Instructions, page 424. The concept of synchronization is described in the Synchronization Operations section that follows.

The PPC405 supports many aspects of the weakly consistent model but not all of them. Specifically, the PPC405 does not provide hardware support for multiprocessor memory coherency and does not support speculative loads. If the order of memory accesses is important to the correct operation of a program, care must be taken in porting such a program from the PPC405 to a processor that supports multiprocessor memory coherency and speculative loads.

Chapter 2

March 2002 Release www.xilinx.com 341 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Synchronization Operations

Various forms of synchronizing operations can be used by programs executing on the PPC405 processor to control the behavior of instruction execution and memory accesses. Synchronizing operations fall into the following three categories:

• Context synchronization

• Execution synchronization

• Storage synchronization

Each synchronization category is described in the following sections. Instructions provided by the PowerPC architecture for synchronization purposes are described on

page 424.

Context Synchronization

The state of the execution environment (privilege level, translation mode, and memory protection) defines a program’s context. An instruction or event is context synchronizing if the operation satisfies all of the following conditions:

• Instruction dispatch is halted when the operation is recognized by the processor. This means the instruction-fetch mechanism stops issuing (sending) instructions to the execution units.

• The operation is not initiated (for instructions, this means dispatched) until all prior instructions complete execution to a point where they report any exceptions they cause to occur. In the case of an instruction-synchronize (isync) instruction, the isync does not complete execution until all prior instructions complete execution to a point where they report any exceptions they cause to occur.

• All instructions that precede the operation complete execution in the context they were initiated. This includes privilege level, translation mode, and memory protection.

• All instructions following the operation complete execution in the new context established by the operation.

• If the operation is an exception, or directly causes an exception to occur (for example, the sc instruction causes a system-call exception), the operation is not initiated until all higher-priority exceptions are recognized by the exception mechanism.

The system-call instruction (sc), return-from-interrupt instructions (rfi and rfci), and most exceptions are examples of context-synchronizing operations.

Context-synchronizing operations do not guarantee that subsequent memory accesses are performed using the memory context established by previous instructions. When memory-access ordering must be enforced, storage-synchronizing instructions are required.

Chapter 2: Operational Concepts

Execution Synchronization

An instruction is execution synchronizing if it satisfies the conditions of the first two items (as described above) for context synchronization:

• Instruction dispatch is halted when the operation is recognized by the processor. This means the instruction-fetch mechanism stops issuing (sending) instructions to the execution units.

• The operation is not initiated until all instructions in execution complete to a point where they report any exceptions they cause to occur. In the case of a synchronize (sync) instruction, the sync does not complete execution until all prior instructions complete execution to a point where they report any exceptions they cause to occur.

The sync and move-to machine-state register (mtmsr) instructions are examples of executionsynchronizing instructions.

342 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Processor Operating Modes

All context-synchronizing instructions are execution synchronizing. However, unlike a context-synchronizing operation, there is no guarantee that subsequent instructions execute in the context established by an execution-synchronizing instruction. The new context becomes effective sometime after the execution-synchronizing instruction completes and before or during a subsequent context-synchronizing operation.

Storage Synchronization

The PowerPC architecture specifies a weakly consistent memory model for sharedmemory multiprocessor systems. With this model, the order that the processor performs memory accesses, the order that those accesses complete in memory, and the order that those accesses are viewed as occurring by another processor can all differ. The PowerPC architecture supports storage-synchronizing operations that provide a capability for enforcing memory-access ordering, allowing programs to share memory. Support is also provided to allow programs executing on a processor to share memory with some other mechanism that can access memory, such as an I/O device.

Device control registers (DCRs) are treated as memory-mapped registers from a synchronization standpoint. Storage-synchronization operations must be used to enforce synchronization of DCR reads and writes.

Processor Operating Modes

The PowerPC architecture defines two levels of privilege, each with an associated processor operating mode:

• Privileged mode

• User mode

The processor operating mode is controlled by the privilege-level field in the machine-state register (MSR[PR]). When MSR[PR] = 0, the processor operates in privileged mode. When MSR[PR] = 1, the processor operates in user mode. MSR[PR] = 0 following reset, placing the processor in privileged mode. See Machine-State Register, page 431 for more information on this register.

Attempting to execute a privileged instruction when in user mode causes a privilegedinstruction program exception (see Program Interrupt (0x0700), page 511).

Throughout this book, the terms privileged and system are used interchangeably to refer to software that operates under the privileged-programming model. Likewise, the terms user and application are used to refer to software that operates under the user-programming model. Registers and instructions are defined as either privileged or user, indicating which of the two programming models they belong to. User registers and user instructions belong to both the user-programming and privileged-programming models.

Privileged Mode

Privileged mode allows programs to access all registers and execute all instructions supported by the processor. The privileged-programming model comprises the entire register set and instruction set supported by the PPC405. Operating systems are typically the only software that runs in privileged mode.

The registers available only in privileged mode are shown in Figure 4-1, page 430. Refer to the corresponding section describing each register for more information. The instructions available only in privileged mode are shown in Ta b le 4 - 3, pa g e 43 4 . The operation of each instruction is described in Chapter 11, Instruction Set.

Privileged mode is sometimes referred to as supervisor state.

March 2002 Release www.xilinx.com 343 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

User Mode

User mode restricts access to some registers and instructions. The user-programming model comprises the register set and instruction set supported by the processor running in user mode, and is a subset of the privileged-programming model. Operating systems typically confine the execution of application programs to user mode, thereby protecting system resources and other software from the effects of errant applications.

The registers available in user mode are shown in Figure 3-1, page 360. Refer to the corresponding section in Chapter 3 for a description of each register. All instructions are available in user mode except as shown in Table 4 - 3, p age 4 34.

User mode is sometimes referred to as problem state.

Memory Organization

PowerPC programs reference memory using an effective address computed by the processor when executing a load, store, branch, or cache-control instruction, and when fetching the next-sequential instruction. Depending on the address-relocation mode, this effective address is either used to directly access physical memory or is treated as a virtual address that is translated into physical memory.

Effective-Address Calculation

Chapter 2: Operational Concepts

Programs reference memory using an effective address (also called a logical address). An effective address (EA) is the 32-bit unsigned sum computed by the processor when accessing memory, executing a branch instruction, or fetching the next-sequential instruction. An EA is often referred to as the next-instruction address (NIA) when it is used to fetch an instruction (sequentially or as the result of a branch). The input values and method used by the processor to calculate an EA depend on the instruction that is executed.

When accessing data in memory, effective addresses are calculated in one of the following ways:

• EA = (rA|0)—this is referred to as register-indirect addressing.

• EA = (rA|0) + offset—this is referred to as register-indirect with immediate-index

addressing.

• EA = (rA|0) + (rB)—this is referred to as register-indirect with index addressing.

Note: In the above, the notation (rA|0) specifies the following:

If the rA instruction field is 0, the base address is 0. If the rA instruction field is not 0, the contents of register rA are used as the base address.

When instructions execute sequentially, the next-instruction effective address is the current-instruction address (CIA) + 4. This is because all instructions are four bytes long. When branching to a new address, the next-instruction effective address is calculated in one of the following ways:

• NIA = CIA + displacement—this is referred to as branch-to-relative addressing.

• NIA = displacement—this is referred to as branch-to-absolute addressing.

• NIA = (LR)—this is referred to as branch to link-register addressing.

• NIA = (CTR)—this is referred to as branch to count-register addressing.

When the NIA is calculated for a branch instruction, the two low-order bits (30:31) are always cleared to 0, forcing word-alignment of the address. This is true even when the address is contained in the LR or CR, and the register contents are not word-aligned.

All effective-address computations are performed by the processor using unsigned binary arithmetic. Carries from bit 0 are ignored and the effective address wraps from the

maximum address (2

-1) to address 0 when the calculation overflows.

344 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Memory Management

Physical Memory

Virtual Memory

Physical memory represents the address space of memory installed in a computer system, including memory-mapped I/O devices. Generally, the amount of physical memory actually available in a system is smaller than that supported by the processor. When address translation is supported by the operating system—as it is in virtual-memory systems—the very-large virtual-address space is translated into the smaller physicaladdress space using the memory-management resources supported by the processor.

The PPC405 supports up to four gigabytes of physical memory using a 32-bit physical address. A hierarchical-memory system involving external (system) memory and the caches internal to the processor are employed to support that address space. The PPC405 supports separate level-1 (L1) caches for instructions and data. The operation and control of these caches is described in Chapter 5, Memory-System Management.

Virt u al m e mory is a relocatable address space that is generally larger than the physicalmemory space installed in a computer system. Operating systems relocate (map) applications and data in virtual memory so it appears that more memory is available than actually exists. Virtual memory software moves unused instructions and data between physical memory and external storage devices (such as a hard drive) when insufficient physical memory is available. The PPC405 supports a 40-bit virtual address that allows privileged software to manage a one-terabyte virtual-memory space.

Memory Management

Memory management describes the collection of mechanisms used to translate the addresses generated by programs into physical-memory addresses. Memory management also consists of the mechanisms used to characterize memory-region behavior, also referred to as storage control. Memory management is performed by privileged-mode software and is completely transparent to user-mode programs running in virtual mode.

The PPC405 is a PowerPC embedded-environment implementation. The memorymanagement resources defined by the PowerPC embedded-environment architecture (and its successor, the PowerPC Book-E architecture) differ significantly from the resources defined by the PowerPC architecture. The resources defined by the PowerPC embedded environment architecture are well-suited for the special requirements of embedded-system applications. The resources defined by the PowerPC architecture better meet the requirements of desktop and commercial-workstation systems.

Generally, the differences between the two memory-management mechanisms are as follows:

• The PPC405 supports software page translation and provides special instructions for managing the page tables and the translation look-aside buffer (TLB) internal to the processor. The page-translation table format, organization, and search algorithms are software-dependent and transparent to the PPC405 processor. The PowerPC architecture, on the other hand, defines the page-translation table organization, format, and search algorithms. It does not define support for the special page table and TLB instructions but instead assumes the processor hardware is responsible for searching page tables and updating the TLB.

• The PPC405 supports variable-sized pages. The PowerPC architecture defines fixed-size pages of 4 KB.

• The PPC405 does not support the segment-translation mechanism defined by the PowerPC architecture.

• The PPC405 does not support the block-address-translation (BAT) mechanism defined by the PowerPC architecture.

March 2002 Release www.xilinx.com 345 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 2: Operational Concepts

• Additional storage-control attributes not defined by the PowerPC architecture are supported by the PPC405. The methods for using these attributes to characterize memory regions also differ.

At a high level, Figure 2-1 shows the differences between 32-bit memory management in the PowerPC embedded-environment architecture (and PowerPC Book-E architecture) and in the PowerPC architecture. See Chapter 6, Virtual-Memory Management for more information on the resources supported by the PPC405. Additional information on the

differences with the PowerPC architecture is described in Appendix E, PowerPC

6xx/7xx

Compatibility. PowerPC Book-E architecture extends the resources first defined by the

PowerPC embedded-environment architecture. A description of those extensions is in

Appendix F, PowerPC

Book-E Compatibility.

PowerPC Embedded Environment

PowerPC Book-E

32-Bit Effective Address

PID

40-Bit Virtual Address

Page

Translation

32-Bit Physical Address

PowerPC Architecture

32-Bit Effective Address

Segment

Translation

51-Bit Virtual Address

Page

Translation

32-Bit Physical Address

Block

Address

Translation

UG011_13_033101

Figure 2-1: PowerPC 32-Bit Memory Management

Addressing Modes

Programs can use 32-bit effective addresses to reference the 4 GB physical-address space using one of two addressing modes:

• Real mode

• Virtual mode

Real mode and virtual mode are enabled and disabled independently for instruction fetches and data accesses. The instruction-fetch address mode is controlled using the instruction-relocate (IR) field in the machine-state register (MSR). When MSR[IR] = 0, instruction fetches are performed in real mode. When MSR[IR] = 1, instruction fetches are performed in virtual mode. Similarly, the data-access address mode is controlled using the data-relocate (DR) field in the MSR. When MSR[DR] = 0, data accesses are performed in real mode. Setting MSR[DR] = 1 enables virtual mode for data accesses. See Virtual Mode,

page 472 for more information on these fields.

346 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operand Conventions

Real Mode

In real mode, an effective address is used directly as the physical address into the 4 GB address space. Here, the logical-address space is mapped directly onto the physicaladdress space.

Virtual Mode

In virtual mode, address translation is enabled. Effective addresses are translated into physical addresses using the memory-management unit, as shown in Figure 2-1, page 346. In this mode, pages within the logical-address space are mapped onto pages in the physical-address space. An overview of memory management is provided in the following section.

Operand Conventions

Bit positions within registers and memory operands (bytes, halfwords, and words) are numbered consecutively from left to right, starting with zero. The most-significant bit is always numbered 0. The number assigned to the least-significant bit depends on the size of the register or memory operand, as follows:

• Byte—the least-significant bit is numbered 7.

• Halfword—the least-significant bit is numbered 15.

• Wo rd —the least-significant bit is numbered 31.

A bit set to 1 has a numerical value associated with its position (b) relative to the leastsignificant bit (lsb). This value is equal to 2(lsb-b). For example, if bit 5 is set to 1 in a byte, halfword, or word memory operand, its value is determined as follows:

• Byte—the value is 2(7-5), or 4 .

• Halfword—the value is 2(15-5), or 1024 .

• Wo rd —the value is 2(31-5), or 67108864 .

Bytes in memory are addressed consecutively starting with zero. The PPC405 supports both big-endian and little-endian byte ordering, with big-endian being the default byte ordering. Bit ordering within bytes and registers is always big endian.

The operand length is implicit for each instruction. Memory operands can be bytes (eight bits), halfwords (two bytes), words (four bytes), or strings (one to 128 bytes). For the load/store multiple instructions, memory operands are a sequence of words. The address of any memory operand is the address of its first byte (that is, of its lowest-numbered byte).

Figure 2-2 shows how word, halfword, and byte operands appear in memory (using big-

endian ordering) and in a register. The memory operand appears on the left in this diagram and the equivalent register representation appears on the right.

The following sections describe the concepts of byte ordering and data alignment, and their significance to the PowerPC PPC405.

March 2002 Release www.xilinx.com 347 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 2: Operational Concepts

Bit Weight

Bit Number

Word

Halfword

Byte

Memory Content

LSB

MSB

Byte 3

Byte 2

Byte 1

Byte 0

Memory Content

LSB

MSB

Byte 1

Byte 0

Memory Content

MSB

Byte 0

Figure 2-2: Operand Data Types

0x04

0x03

0x02

031

Byte 0 Byte 1 Byte 2 Byte 3

Memory Address

0x01

0x00

Byte 0 Byte 1

0x04

0x03

0x02

031

Memory Address

0x01

0x00

Byte 0

0x04

0x03

0x02

031

Memory Address

0x01

0x00

UG011_14_100901

348 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operand Conventions

Byte Ordering

Structure-Mapping Examples

The order that addresses are assigned to individual bytes within a scalar (a single data object or instruction) is referred to as endianness. Halfwords, words, and doublewords all consist of more than one byte, so it is important to understand the relationship between the bytes in a scalar and the addresses of those bytes. For example, when the processor loads a register with a value from memory, it needs to know which byte in memory holds the highorder byte, which byte holds the next-highest-order byte, and so on.

Computer systems generally use one of the following two byte orders to address data:

• Big-endian ordering assigns the lowest-byte address to the highest-order (“left-most”) byte in the scalar. The next sequential-byte address is assigned to the next-highest byte, and so on. The term “big endian” is used because the “big end” of the scalar (when considered as a binary number) comes first in memory.

• Little-endian ordering assigns the lowest-byte address to the lowest-order (“right- most”) byte in the scalar. The next sequential-byte address is assigned to the nextlowest byte, and so on. The term “little endian” is used because the “little end” of the scalar (when considered as a binary number) comes first in memory.

The following sections further describe the differences between big-endian and littleendian byte ordering. The default byte ordering assumed by the PPC405 is big-endian. However, the PPC405 also fully supports little-endian peripherals and memory.

The following C language structure, s, contains an assortment of scalars and a character string. The comments show the values assumed in each structure element. These values show how the bytes comprising each structure element are mapped into memory.

struct {

int a; /* 0x1112_1314 word */ long long b; /* 0x2122_2324_2526_2728 doubleword */ char *c; /* 0x3132_3334 word */ char d[7]; /* ’A’,’B’,’C’,’D’,’E’,’F’,’G’ array of bytes */ short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */

} s;

C structure-mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries. The structure-mapping examples show how each scalar aligns on its natural boundary (the alignment boundary is equal to the scalar size). This alignment introduces padding of four bytes between a and b, one byte between d and e, and two bytes between e and f. The same amount of padding is present in both big-endian and littleendian mappings.

March 2002 Release www.xilinx.com 349 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 2: Operational Concepts

Big-Endian Mapping

The big-endian mapping of structure s follows. The contents of each byte, as defined in structure s, is shown as a (hexadecimal) number or character (for the string elements). Data addresses (in hexadecimal) are shown below the corresponding data value.

11 12 13 14

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07

21 22 23 24 25 26 27 28

0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F

31 32 33 34 ’A’’B’’C’’D’

0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17

’E’’F’’G’ 51 52

0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F

61 62 63 64

0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27

Little-Endian Mapping

The little-endian mapping of structure s follows.

14 13 12 11

0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07

28 27 26 25 24 23 22 21

0x08 0x09 0x0A 0x0B 0x0C 0x0D 0x0E 0x0F

34 33 32 31 ’A’’B’’C’’D’

0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17

’E’’F’’G’ 52 51

0x18 0x19 0x1A 0x1B 0x1C 0x1D 0x1E 0x1F

64 63 62 61

0x20 0x21 0x22 0x23 0x24 0x25 0x26 0x27

Little-Endian Byte Ordering Support

Except as noted, this book describes the processor from the perspective of big-endian operations. However, the PPC405 processor also fully supports little-endian operations. This support is provided by the endian (E) storage attribute described in the following sections. The endian-storage attribute is defined by both the PowerPC embeddedenvironment architecture and PowerPC Book-E architecture.

Little-endian mode, defined by the PowerPC architecture, is not implemented by the PPC405. Little-endian mode does not support true little-endian memory accesses. This is because little-endian mode modifies memory addresses rather than reordering bytes as they are accessed. Memory-address modification restricts how the processor can access misaligned data and I/O. The PPC405 little-endian support does not have these restrictions.

350 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operand Conventions

Endian (E) Storage Attribute

The endian (E) storage attribute allows the PPC405 to support direct connection of littleendian peripherals and memory containing little-endian instructions and data. An E storage attribute is associated with every memory reference—instruction fetch, data load, and data store. The E attribute specifies whether the memory region being accessed should be interpreted as big endian (E = 0) or little endian (E = 1).

If virtual mode is enabled (MSR[IR] = 1 or MSR[DR] = 1), the E field in the corresponding TLB entry defines the endianness of a memory region. When virtual mode is disabled (MSR[IR] = 0 and MSR[DR] = 0), the SLER defines the endianness of a memory region. See

Chapter 6, Virtual-Memory Management for more information on virtual memory, and Storage Little-Endian Register (SLER), page 455 for more information on the SLER.

When a memory region is defined as little endian, the processor accesses those bytes as if they are arranged in true little-endian order. Unlike the little-endian mode defined by the PowerPC architecture, no address modification is performed when accessing memory regions designated as little endian. Instead, the PPC405 reorders the bytes as they are transferred between the processor and memory.

On-the-fly reversal of bytes in little-endian memory regions is handled in one of two ways, depending on whether the memory access is an instruction fetch or a data access (load or store). The following sections describe byte reordering for both types of memory accesses.

Little-Endian Instruction Fetching

Instructi ons are word (four-byte) data types th at are always aligned on word boundaries i n memory. Instructions stored in a big-endian memory region are arranged with the mostsignificant byte (MSB) of the instruction word at the lowest byte address.

Consider the big-endian mapping of instruction p at address 0x00, where, for example, p is an add r7,r7,r4 instruction (instruction opcode bytes are shown in hexadecimal on top, with the corresponding byte address shown below):

MSB LSB

7C E7 22 14

0x00 0x01 0x02 0x03

In the little-endian mapping, instruction p is arranged with the least-significant byte (LSB) of the instruction word at the lowest byte address:

LSB MSB

14 22 E7 7C

0x00 0x01 0x02 0x03

The instruction decoder on the PPC405 assumes the instructions it receives are in bigendian order. When an instruction is fetched from memory, the instruction must be placed in the instruction queue in big-endian order so that the instruction is properly decoded. When instructions are fetched from little-endian memory regions, the four bytes of an instruction word are reversed by the processor before the instruction is decoded. This byte reversal occurs between memory and the instruction-cache unit (ICU) and is transparent to software. The ICU always stores instructions in big-endian order regardless of whether the instruction-memory region is defined as big endian or little endian. This means the bytes are already in the proper order when an instruction is transferred from the ICU to the instruction decoder.

If the endian-storage attribute is changed, the affected memory region must be reloaded with program and data structures using the new endian ordering. If the endian ordering of

March 2002 Release www.xilinx.com 351 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 2: Operational Concepts

instruction memory changes, the ICU must be made coherent with the updates. This is accomplished by invalidating the ICU and updating the instruction memory with instructions using the new endian ordering. Subsequent fetches from the updated memory region are interpreted correctly before they are cached and decoded. See Instruction-

Cache Control Instructions, page 456 for information on instruction-cache invalidation.

Little-Endian Data Accesses

Unlike instruction fetches, data accesses from little-endian memory regions are not bytereversed between memory and the data-cache unit (DCU). The data-byte ordering stored in memory depends on the data size (byte, halfword, or word). The data size is not known until the data item is moved between memory and a general-purpose register. In the PPC405, byte reversal of load and store accesses is performed between the DCU and the GPRs.

When accessing data in a little-endian memory region, the processor automatically does the following regardless of data alignment:

• For byte loads/stores, no reordering occurs

• For halfword loads/stores, bytes are reversed within the halfword

• For word loads/stores, bytes are reversed within the word

The big-endian and little-endian mappings of the structure s, shown in Structure-

Mapping Examples, page 349, demonstrate how the size of a data item determines its byte

ordering. For example:

• The word a has its four bytes reversed within the word spanning addresses 0x00–0x03

• The halfword e has its two bytes reversed within the halfword spanning addresses 0x1C–0x1D

• The array of bytes d (where each data item is a byte) is not reversed when the big-

endian and little-endian mappings are compared (For example, the character 'A' is located at address 14 in both the big-endian and little-endian mappings)

In little-endian memory regions, data alignment is treated as it is in big-endian memory regions. Unlike little-endian mode in the PowerPC architecture, no special alignment exceptions occur when accessing data in little-endian memory regions versus big-endian regions.

Load and Store Byte-Reverse Instructions

When accessing big-endian memory regions, load/store instructions move the moresignificant register bytes to and from the lower-numbered memory addresses and the lesssignificant register bytes are moved to and from the higher-numbered memory addresses. The load/store with byte-reverse instructions, as described in Load and Store with Byte-

Reverse Instructions, page 385, do the opposite. The more-significant register bytes are

moved to and from the higher-numbered memory addresses, and the less-significant register bytes are moved to and from the lower-numbered memory addresses.

Even though the load/store with byte-reverse instructions can be used to access littleendian memory, the E storage attribute provides two advantages over using those instructions:

• The load/store with byte-reverse instructions do not solve the problem of fetching instructions from a little-endian memory region. Only the E storage attribute mechanism supports little-endian instruction fetching.

• Typical compilers cannot make general use of the load/store with byte-reverse instructions, so these instructions are normally used only in device drivers written in hand-coded assembler. However, compilers can take full advantage of the E storageattribute mechanism, allowing application programmers working in a high-level language, such as C, to compile programs and data structures using little-endian ordering.

352 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operand Conventions

Operand Alignment

The operand of a memory-access instruction has a natural alignment boundary equal to the operand length. In other words, the natural address of an operand is an integral multiple of the operand length. A memory operand is said to be aligned if it is aligned on its natural boundary, otherwise it is misaligned.

All instructions are words and are always aligned on word boundaries.

Ta bl e 2 -1 shows the value required by the least-significant four address bits (bits 28:31) of

each data type for it to be aligned in memory. A value of x in a given bit position indicates the address bit can have a value of 0 or 1.

Table 2-1: Memory Operand Alignment Requirements

Data Type Size

Aligned Address

Bits 28:31

Byte 8 Bits xxxx

Halfword 2 Bytes xxx0

Word 4 Bytes xx00

Doubleword 8 Bytes x000

The concept of alignment can be generally applied to any data in memory. For example, a 12-byte data item is said to be word aligned if its address is a multiple of four.

Some instructions require aligned memory operands. Also, alignment can affect performance. For single-register memory access instructions, the best performance is obtained when memory operands are aligned.

Alignment and Endian Storage Control

The endian storage-control attribute (E) does not affect how the processor handles operand alignment. Data alignment is handled identically for accesses to big-endian and littleendian memory regions. No special alignment exceptions occur when accessing data in little-endian memory regions. However, alignment exceptions that apply to big-endian memory accesses also apply to little-endian memory accesses.

Performance Effects of Operand Alignment

The performance of accesses varies depending on the following parameters:

• Operand size

• Operand alignment

• Boundary crossing:

-None

-Cache block

-Page

To obtain the best performance across the widest range of PowerPC embeddedenvironment implementations and PowerPC Book-E processor implementations, programmers should assume the alignment performance effects described in Figure 2-2. This table applies to both big-endian and little-endian accesses. Figure 2-2 also applies to PowerPC processors running in the default big-endian mode. However, those same processors suffer further performance degradation when running in PowerPC littleendian mode.

March 2002 Release www.xilinx.com 353 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 2-2: Performance Effects of Operand Alignment

Size Byte Alignment None Cache Block Page

Byte 1 Optimal Not Applicable

Halfword 2 Optimal Not Applicable

Word 4 Optimal Not Applicable

Multiple Word 4 Good Good Good

Byte String 1 Good Good Poor

Note: Assumes both pages have identical storage-control attributes. Performance is poor otherwise.

Alignment Exceptions

Misalignment occurs when addresses are not evenly divided by the data-object size. The PPC405 automatically handles misalignments within word boundaries and across word boundaries, generally at a cost in performance. Some instructions cause an alignment exception if their operand is not properly aligned, as shown in Tab le 2 -3 .

Chapter 2: Operational Concepts

Operand Boundary Crossing

1 Good Good Poor

<4 Good Good Poor

Table 2-3: Instructions Causing Alignment Exceptions

Mnemonic Condition

dcbz

dcread, lwarx, stwcx

Cache-control instructions ignore the four least-significant bits of the EA. No alignment restrictions are placed on an EA when executing a cache-control instruction. However, certain storage-control attributes can cause an alignment exception to occur when a cachecontrol instruction is executed. If data-address translation is disabled (MSR[DR]=0) and a dcbz instruction references a non-cacheable memory region, or the memory region uses a write-through caching policy, an alignment exception occurs. The alignment exception allows the operating system to emulate the write-through caching policy. See Alignment

Interrupt (0x0600), page 510 for more information.

Instruction Conventions

Instruction Forms

Opcode tables and instruction listings often contain information regarding the instruction form. This information refers to the type of format used to encode the instruction. Grouping instructions by format is useful for programmers that must deal directly with machinelevel code, particularly programmers that write assemblers and disassemblers.

The formats used for the instructions of the PowerPC embedded-environment architecture are shown in Instructions Grouped by Form, page 792. The Instruction Set Information,

page 797 also shows the form used by each instruction, listed alphabetically by mnemonic.

EA is in non-cacheable or write-through memory.

EA is not word aligned.

354 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Instruction Conventions

Instruction Classes

PowerPC instructions belong to one of the following three classes:

• Defined

• Illegal

• Reserved

An instruction class is determined by examining the primary opcode, and the extended opcode if one exists. If the opcode and extended opcode combination does not specify a defined instruction or reserved instruction, the instruction is illegal. Although the definitions of these terms are consistent among PowerPC processor implementations, the assignment of these classifications is not. For example, an instruction specific to 64-bit implementations is considered defined for 64-bit implementations but illegal for 32-bit implementations.

In future versions of the PowerPC architecture, instruction encodings that are now illegal or reserved can become defined (by being added to the architecture) or reserved (by being assigned a special purpose in an implementation).

Boundedly Undefined

The results of executing an instruction are said to be boundedly undefined if those results could be achieved by executing an arbitrary sequence of instructions, starting in the machine state prior to executing the given instruction. Boundedly-undefined results for an instruction can vary between implementations and between different executions on the same implementation.

Defined Instruction Class

Defined instructions contain all the instructions defined by the PowerPC architecture. Defined instructions are guaranteed to be supported by all implementations of the PowerPC architecture. The only exceptions are the instructions defined only for 64-bit implementations, instructions defined only for 32-bit implementations, and instructions defined only for embedded implementations. A PowerPC processor can invoke the illegalinstruction error handler (through the program-interrupt handler) when an unimplemented instruction is encountered, allowing emulation of the instruction in software.

A defined instruction can have preferred forms and invalid forms as described in the following sections.

Preferred Instruction Forms

A preferred form of a defined instruction is one in which the instruction executes in an efficient manner. Any form other than the preferred form can take significantly longer to execute. The following instructions have preferred forms:

• Load-multiple and store-multiple instructions

• Load-string and store-string instructions

• OR-immediate instruction (preferred form of no-operation)

Invalid Instruction Forms

An invalid form of a defined instruction is one in which one or more operands are coded incorrectly and in a manner that can be deduced only by examining the instruction encoding (primary and extended opcodes). For example, coding a value of 1 in a reserved bit (normally cleared to 0) produces an invalid instruction form.

The following instructions have invalid forms:

• Branch-conditional instructions

• Load with update and store with update instructions

• Load multiple instructions

March 2002 Release www.xilinx.com 355 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

• Load string instructions

• Integer compare instructions

On the PPC405, attempting to execute an invalid instruction form generally yields a boundedly-undefined result, although in some cases a program exception (illegalinstruction error) can occur.

Optional Instructions

The PowerPC architecture allows implementations to optionally support some defined instructions. The PPC405 does not implement the following instructions:

• Floating-point instructions

• External-control instructions (eciwx, ecowx)

• Invalidate TLB entry (tlbie)

Illegal Instruction Class

Illegal instructions are grouped into the following categories:

• Unused primary opcodes. The following primary opcodes are defined as illegal but can be defined by future extensions to the architecture:

1, 5, 6, 56, 57, 60, 61

• Unused extended opcodes. Unused extended opcodes can be derived from information in Instructions Sorted by Opcode, page 781. The following primary opcodes have unused extended opcodes:

Chapter 2: Operational Concepts

19, 31, 59, 63

• An instruction consisting entirely of zeros is guaranteed to be an illegal instruction. This increases the probability that an attempt to execute data or uninitialized memory causes an illegal-instruction error. If only the primary opcode consists of all zeros, the instruction is considered a reserved instruction, as described in the following section.

An attempt to execute an illegal instruction causes an illegal-instruction error (program exception). With the exception of an instruction consisting entirely of zeros, illegal instructions are available for future addition to the PowerPC architecture.

Reserved Instruction Class

Reserved instructions are allocated to specific implementation-dependent purposes not defined by the PowerPC architecture. An attempt to execute an unimplemented reserved instruction causes an illegal-instruction error (program exception). The following types of instructions are included in this class:

• Instructions for the POWER architecture that have not been included in the PowerPC architecture.

• Implementation-specific instructions used to conform to the PowerPC architecture specification. For example, load data-TLB entry (tlbld) and load instruction-TLB entry (tlbli) instructions in the PowerPC 603™.

• The instruction with primary opcode 0, when the instruction does not consist entirely of binary zeros.

• Any other implementation-specific instruction not defined by the PowerPC architecture.

PowerPC Embedded-Environment Instructions

To support functions required in embedded-system applications, the PowerPC embeddedenvironment architecture defines instructions that are not part of the PowerPC architecture. Tab l e 2- 4 lists the instructions specific to the PPC405 and other PowerPC embedded-environment family implementations. From the standpoint of the PowerPC architecture, these instructions are part of the reserved class and are implementation

356 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Instruction Conventions

dependent. Programs using these instructions are not portable to implementations that do not support the PowerPC embedded-environment architecture.

In the table, the syntax “[o]” indicates the instruction has an overflow-enabled form that updates XER[OV,SO] as well as a non-overflow-enabled form. The syntax “[.]” indicates the instruction has a record form that updates CR[CR0] as well as a non-record form. The headings “defined” and “allocated”, as they are used in Tab le 2 -4 , are described in the following section, PowerPC Book-E Instruction Classes.

Table 2-4: PowerPC Embedded-Environment Instructions

Defined (Book-E) Allocated (Book-E)

mfdcr

mtdcr

rfci

wrtee

wrteei

tlbre

tlbsx[.]

tlbwe

dccci

dcread

iccci

icread

PowerPC Book-E Instruction Classes

The PowerPC Book-E architecture defines four instruction classes:

• Defined

• Allocated

• Reserved

• Preserved

Referring to Ta bl e 2 -4 , the first two columns indicate which PPC405 instructions are part of the defined instruction class and are guaranteed support in PowerPC Book-E processor implementations. The last three columns indicate which PPC405 instructions are part of the allocated instruction class. Support of these instructions by PowerPC Book-E processors is implementation-dependent.

macchw[o][.]

macchws[o][.]

macchwsu[o][.]

macchwu[o][.]

machhw[o][.]

machhws[o][.]

machhwsu[o][.]

machhwu[o][.]

maclhw[o][.]

maclhws[o][.]

maclhwsu[o][.]

maclhwu[o][.]

nmacchw[o][.]

nmacchws[o][.]

nmachhw[o][.]

nmachhws[o][.]

nmaclhw[o][.]

nmaclhws[o][.]

mulchw[.]

mulchwu[.]

mulhhw[.]

mulhhwu[.]

mullhw[.]

mullhwu[.]

Defined Book-E Instruction Class

The defined instruction class consists of all instructions defined by the PowerPC Book E architecture. In general, defined instructions are guaranteed to be supported by a PowerPC Book E processor as specified by the architecture, either within the processor implementation itself or within emulation software supported by the operating system.

Allocated Book-E Instruction Class

The allocated instruction class contains the set of instructions used for implementationdependent and application-specific use, outside the scope of the PowerPC Book E architecture.

March 2002 Release www.xilinx.com 357 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Reserved Book-E Instruction Class

The reserved instruction class consists of all instruction primary opcodes (and associated extended opcodes, if applicable) that do not belong to either the defined class or the allocated class.

Preserved Book-E Instruction Class

The preserved instruction class is provided to support backward compatibility with previous generations of this architecture.

Chapter 2: Operational Concepts

358 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

User Programming Model

This chapter describes the processor resources and instructions available to all programs running on the PPC405, whether they are running in user mode or privileged mode. These resources and instructions are referred to as the user-programming model, which is a subset of the privileged-programming model. Applications are typically restricted to running in user mode. System software runs in privileged mode and has access to all register processor resources, and can execute all instructions supported by the PPC405. System software typically creates a context (execution environment) that protects itself and other applications from the effects of an errant application program.

The remaining chapters in this book generally describe aspects of the privilegedprogramming model and are not relevant to application programmers. There are two exceptions:

• Chapter 5, Memory-System Management, describes cache management features

available to both system and application programs.

• Chapter 8, Timer Resources, describes the time base, which can be read by

application programs.

Chapter 3

User Registers

Figure 3-1 shows the user registers supported by the PPC405, all of which are available to

software running in user mode and privileged mode. In the PPC405, all user registers are 32-bits wide, except for the time base as described in Time Base, page 524. Floating-point registers are not supported by the PPC405.

March 2002 Release www.xilinx.com 359 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 3: User Programming Model

General-Purpose Registers

r0 r1

. . .

r31

Condition Register

Time-Base Registers

TBR 0x10C

TBU

Special-Purpose Registers (SPRs)

Fixed-Point Exception Register

SPR 0x001

XER

Link Register

SPR 0x008

Count Register

SPR 0x009

CTR

(read only)

TBR 0x10D

TBL

Figure 3-1: PPC405 User Registers

User-SPR General-Purpos

Registers

(SPR 0x100)

USPRG0

SPR General-Purpose

Registers

(read only)

SPR 0x104

SPRG4

SPR 0x105

SPRG5

SPR 0x106

SPRG6

SPR 0x107

SPRG7

UG011_30_033101

Most registers in the PPC405 are special-purpose registers, or SPRs. SPRs control the operation of debug facilities, timers, interrupts, storage control attributes, and other processor resources. All SPRs can be accessed explicitly using the move to special-purpose

Purpose Register Instructions, page 424 for more information on these instructions. A few

registers are accessed as a by-product of executing certain instructions. For example, some branch instructions access and update the link register.

The PPC405 SPRs in the user-programming model are shown in Figure 3-1. The SPR number (SPRN) for each SPR is shown above the corresponding register. See Appendix A,

Special-Purpose Registers, page 770 for a complete list of all SPRs (user and privileged)

supported by the PPC405.

Simplified instruction mnemonics are available for the mtspr and mfspr instructions for some SPRs. See Special-Purpose Registers, page 830 for more information.

General-Purpose Registers (GPRs)

The PPC405 contains thirty-two 32-bit general-purpose registers (GPRs), numbered r0 through r31, as shown in Figure 3-2. Data from memory are read into GPRs using load instructions and the contents of GPRs are written to memory using store instructions. Most integer instructions use the GPRs for source and destination operands.

0 31

Figure 3-2: General Purpose Registers (R0-R31)

360 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

User Registers

Condition Register (CR)

The condition register (CR) is a 32-bit register that reflects the result of certain instructions and provides a mechanism for testing and conditional branching. The bits in the CR are grouped into eight 4-bit fields, CR0–CR7, as shown in Figure 3-3. The bits within an arbitrary CRn field are shown in Figure 3-4. In this figure, the bit positions shown are relative positions within the field rather than absolute positions within the CR register.

0 3 4 7 8 1112 1516 1920 2324 2728 31

CR0 CR1 CR2 CR3 CR4 CR5 CR6 CR7

Figure 3-3: Condition Register (CR)

0123

LT GT EQ SO

Figure 3-4: CRn Field

In the PPC405, the CR fields are modified in the following ways:

• The mtcrf instruction can update specific fields in the CR from a GPR.

• The mcrxr instruction can update a CR field with the contents of XER[0:3].

• The mcrf instruction can copy one CR field into another CR field.

• The condition-register logical instructions can update specific bits in the CR.

• The integer-arithmetic instructions can update CR0 to reflect their result.

• The integer-compare instructions can update a specific CR field to reflect their result.

Conditional-branch instructions can test bits in the CR and use the results of such a test as the branch condition.

CR0 Field

The CR0 field is updated to reflect the result of an integer instruction if the Rc opcode field (record bit) is set to 1. The addic., andi., and andis. instructions also update CR0 to reflect the result they produce. For all of these instructions, CR0 is updated as follows:

• The instruction result is interpreted as a signed integer and algebraically compared to

0. The first three bits of CR0 (CR0[0:2]) are updated to reflect the result of the algebraic comparison.

• The fourth bit of CR0 (CR0[3]) is copied from XER[SO].

The CR0 bits are interpreted as described in Tabl e 3- 1 . If any portion of the result is undefined, the value written into CR0[0:2] is undefined.

March 2002 Release www.xilinx.com 361 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 3: User Programming Model

Table 3-1: CR0-Field Bit Settings

Bit Name Function Description

0LTNegative

0—Result is not negative.

1—Result is negative.

1GTPositive

0—Result is not positive.

1—Result is positive.

2EQZero

0—Result is not equal to zero.

1—Result is equal to zero.

3 SO Summary overflow

0—No overflow occurred.

1—Overflow occurred.

CR1 Field

In PowerPC® implementations that support floating-point operations, the CR1 field can be updated by the processor to reflect the result of those operations. Because the PPC405 does not support floating-point operations in hardware, CR1 is not updated in this manner.

CRn Fields (Compare Instructions)

Any one of the eight CRn fields (including CR0 and CR1) can be updated to reflect the result of a compare instruction. The CRn-field bits are interpreted as described in Tab le 3 - 2.

This bit is set when the result is negative, otherwise it is cleared.

This bit is set when the result is positive (and not zero), otherwise it is cleared.

This bit is set when the result is zero, otherwise it is cleared.

This is a copy of the final state of XER[SO] at the completion of the instruction.

Table 3-2: CRn-Field Bit Settings

Bit Name Function Description

0 LT Less than

0—

rA is not less than.

rA is less than.

1—

1 GT Greater than

0—rA is not greater than.

1—

rA is greater than.

2EQEqual to

0—rA is not equal.

rA is equal.

1—

3 SO Summary overflow

0—No overflow occurred.

1—Overflow occurred.

This bit is set when

rA < SIMM or rB (signed comparison), or rA < UIMM or rB (unsigned comparison),

otherwise it is cleared.

This bit is set when

rA > SIMM or rB (signed comparison), or rA > UIMM or rB (unsigned comparison),

otherwise it is cleared.

This bit is set when

rA = SIMM or rB (signed comparison), or rA = UIMM or rB (unsigned comparison),

otherwise it is cleared.

This is a copy of the final state of XER[SO] at the completion of the instruction.

362 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

User Registers

Fixed-Point Exception Register (XER)

The fixed-point exception register (XER) is a 32-bit register that reflects the result of arithmetic operations that have resulted in an overflow or carry. This register is also used to indicate the number of bytes to be transferred by load/store string indexed instructions.

Figure 3-5 shows the format of the XER. The bits in the XER are defined as shown in Ta bl e 3 -3 .

0123 24 25 31

SO OV CA

Figure 3-5: Fixed Point Exception Register (XER)

Table 3-3: Fixed Point Exception Register (XER) Bit Definitions

Bit Name Function Description

TBC

0 SO Summary overflow

0—No overflow occurred.

1—Overflow occurred.

1OVOverflow

0—No overflow occurred.

1—Overflow occurred.

2CACarry

0—Carry did not occur.

1—Carry occurred.

3:24 Reserved

25:31 TBC Transfer-byte count TBC is modified using the mtspr instruction. It specifies the

SO is set to 1 whenever an instruction (except mtspr) sets the overflow bit (XER[OV]). Once set, the SO bit remains set until it is cleared to 0 by an mtspr instruction (specifying the XER) or an

mcrxr instruction. SO can be cleared to 0 and OV set to 1 using an mtspr instruction.

OV can be modified by instructions when the overflow-enable bit in the instruction encoding is set (OE=1). Add, subtract, and negate instructions set OV=1 if the carry out from the result msb is not equal to the carry out from the result msb + 1. Otherwise, they clear OV=0. Multiply and divide set OV=1 if the result cannot be represented in 32 bits. mtspr can be used to set OV=1, and mtspr and mcrxr can be used to clear OV=0.

CA can be modified by add-carrying, subtract-from-carrying, add- extended, and subtract-from-extended instructions. These instructions set CA=1 when there is a carry out from the result msb. Otherwise, they clear CA=0. Shift-right algebraic instructions set CA=1 if any 1 bits are shifted out of a negative operand. Otherwise, they clear CA=0. mtspr can be used to set CA=1, and mtspr and mcrxr can be used to clear CA=0.

number of bytes to be transferred by a load-string word indexed (lswx) or store-string word indexed (stswx) instruction.

The XER is an SPR with an address of 1 (0x001) and can be read and written using the mfspr and mtspr instructions. The mcrxr instruction can be used to move XER[0:3] into one of the seven CR fields.

Link Register (LR)

The link register (LR) is a 32-bit register that is used by branch instructions, generally for the purpose of subroutine linkage. Two types of branch instructions use the link register:

•

Branch-conditional to link-register (bclrx) instructions read the branch-target address from

the LR.

• Branch instructions with the link-register update-option enabled load the LR with the effective address of the instruction following the branch instruction. The link-register update-option is enabled when the branch-instruction LK opcode field (bit 31) is set to 1.

The format of LR is shown in Figure 3-6.

March 2002 Release www.xilinx.com 363 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

0 31

Branch Address

Chapter 3: User Programming Model

Figure 3-6: Link Register (LR)

The LR is an SPR with an address of 8 (0x008) and can be read and written using the mfspr and mtspr instructions. It is possible for the processor to prefetch instructions along the target path specified by the LR provided the LR is loaded sufficiently ahead of the branch to link-register instruction, giving branch-prediction hardware time to calculate the branch address.

The two least-significant bits (LR[30:31]) can be written with any value. However, those bits are ignored and assumed to have a value of 0 when the LR is used as a branch-target address.

Some PowerPC processors implement a software-invisible link-register stack for performance reasons. Although the PPC405 processor does not implement such a stack, certain programming conventions should be followed so that software running on multiple PowerPC processors can benefit from this stack. See Link-Register Stack,

page 371 for more information.

Count Register (CTR)

The count register (CTR) is a 32-bit register that can be used by branch instructions in the following two ways:

• The CTR can hold a loop count that is decremented by a conditional-branch instruction with an appropriately coded BO opcode field. The value in the CTR wraps to 0xFFFF_FFFF if the value in the register is 0 prior to the decrement. See

Conditional Branch Control, page 367 for information on encoding the BO opcode

field.

• The CTR can hold the branch-target address used by

branch-conditional to count-register

(bcctrx) instructions.

The format of CTR is shown in Figure 3-7.

0 31

Count

Figure 3-7: Count Register (CTR)

The CTR is an SPR with an address of 9 (0x009) and can be read and written using the mfspr and mtspr instructions. It is possible for the processor to prefetch instructions along the target path specified by the CTR provided the CTR is loaded sufficiently ahead of the branch to count-register instruction, giving branch-prediction hardware time to calculate the branch address.

The two least-significant bits (CTR[30:31]) can be written with any value. However, those bits are ignored and assumed to have a value of 0 when the CTR is used as a branch-target address.

User-SPR General-Purpose Register

The user-SPR general-purpose register (USPRG0) is a 32-bit register that can be used by application software for any purpose. The value stored in this register does not have an effect on the operation of the PPC405 processor.

The format of USPRG0 is shown in Figure 3-8.

364 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

User Registers

0 31

General-Purpose Application-Software Data

Figure 3-8: User SPR General-Purpose Register (USPRG0)

The USPRG0 is an SPR with an address of 256 (0x100) and can be read and written using the mfspr and mtspr instructions.

SPR General-Purpose Registers

The SPR general-purpose registers (SPRG0–SPRG7) are 32-bit registers that can be used by system software for any purpose. Four of the registers (SPRG4–SPRG7) are available from user mode with read-only access. Application software can read the contents of SPRG4– SPRG7, but cannot modify them. The values stored in these registers do not affect the operation of the PPC405 processor.

The format of all SPRGn registers is shown in Figure 3-9.

0 31

General-Purpose System-Software Data

Figure 3-9: SPR General-Purpose Registers (SPRG4–SPRG7)

The SPRGn registers are SPRs with the following addresses:

• SPRG4—260 (0x104).

• SPRG5—261 (0x105).

• SPRG6—262 (0x106).

• SPRG7—263 (0x107).

These registers can be read using the mfspr instruction. In privileged mode, system software accesses these registers using different SPR numbers (see page 432).

Time-Base Registers

The time base is a 64-bit incrementing counter implemented as two 32-bit registers. The time-base upper register (TBU) holds time-base bits 0:31, and the time-base lower register (TBL) holds time-base bits 32:63. Figure 3-10 shows the format of the time base.

0 31

TBU (Time Base [0:31])

0 31

TBL (Time Base [32:63])

Figure 3-10: Time-Base Register

The TBU and TBL registers are SPRs with user-mode read access and privileged-mode write access. Reading the time-base registers requires use of the mftb instruction with the following addresses:

• TBU—269 (0x10D).

• TBL—268 (0x10C).

See Time Base, page 524, for information on using the time base.

March 2002 Release www.xilinx.com 365 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Exception Summary

An exception is an event that can be caused by a number of sources, including:

• Error conditions arising from instruction execution.

• Internal timer resources.

• Internal debug resources.

• External peripherals.

When an exception occurs, the processor can interrupt the currently executing program so that system software can deal with the exception condition. The action taken by an interrupt includes saving the processor context and transferring control to a predetermined exception-handler address operating under a new context. When the interrupt handler completes execution, it can return to the interrupted program by executing a return-from-interrupt instruction.

Exceptions are handled by privileged software. The exception mechanism is described in

Chapter 7, Exceptions and Interrupts. Following is a list of exceptions that can be caused

by the execution of an instruction in user mode.

• Data-Storage Exception.

An attempt to access data in memory that results in a memory-protection violation causes the data-storage interrupt handler to be invoked.

• Instruction-Storage Exception.

Chapter 3: User Programming Model

An attempt to access instructions in memory that result in a memory-protection violation causes the instruction-storage interrupt handler to be invoked.

• Alignment Exception.

An attempt to access memory with an invalid effective-address alignment (for the specific instruction) causes the alignment-interrupt handler to be invoked.

• Program Exception.

Three different types of interrupt handlers can be invoked when a program exception occurs: illegal instruction, privileged instruction, and system trap. The conditions causing a program interrupt include:

- An attempt to execute an illegal instruction causes the illegal-instruction interrupt

handler to be invoked.

- An attempt to execute an optional instruction not implemented by the PPC405

causes the illegal-instruction interrupt handler to be invoked.

- An attempt by a user-level program to execute a supervisor-level instruction

causes the privileged-instruction interrupt handler to be invoked.

- An attempt to execute a defined instruction with an invalid form causes either the

illegal-instruction interrupt handler or the privileged-instruction interrupt handler to be invoked.

- Executing a trap instruction can cause the system-trap interrupt handler to be

invoked.

• Floating-Point Unavailable Exception.

On processors that support floating-point instructions, executing such instructions when the floating-point unit is disabled (MSR[FP]=0) invokes the floating-pointunavailable interrupt handler.

• System-Call Exception.

The execution of an sc instruction causes the system-call interrupt handler to be invoked. The interrupt handler can be used to call a system-service routine.

• Data TLB-Miss Exception.

366 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Branch and Flow-Control Instructions

If data translation is enabled, an attempt to access data in memory when a valid TLB entry is not present causes the data TLB-miss interrupt handler to be invoked.

• Instruction TLB-Miss Exception.

If instruction translation is enabled, an attempt to access instructions in memory when a valid TLB entry is not present causes the instruction TLB-miss interrupt handler to be invoked.

Other exceptions can occur during user-mode program execution that are not directly caused by instruction execution. These are also described in Chapter 7:

• Machine-check exceptions.

• Exceptions caused by external devices.

• Exceptions caused by a timer.

• Debug exceptions.

Branch and Flow-Control Instructions

Branch instructions redirect program flow by altering the next-instruction address nonsequentially. Branches unconditionally or conditionally alter program flow forward or backward using either an absolute address or an address relative to the branch-instruction address. Branches calculate the target address using the contents of the CTR, LR, or fields within the branch instruction. Optionally, a branch-return address can be automatically loaded into the LR by setting the LK instruction-opcode bit to 1. This option is useful for specifying the return address for subroutine calls and causes the address of the instruction following the branch to be loaded in the LR. Branches are used for all non-sequential program flow including jumps, loops, calls and returns.

Branch-conditional instructions redirect program flow if a tested condition is true. These instructions can test a bit value within the CR, the value of the CTR, or both. Conditionregister logical instructions are provided to set up the tests for branch-conditional instructions.

Conditional Branch Control

With branch-conditional instructions, the BO opcode field specifies the branch-control conditions and how the branch affects the CTR. The BO field can specify a test of the CR and it can specify that the CTR be decremented and tested. The BO field can also be initialized to reverse the default prediction performed by the processor. The bits within the BO field are defined as shown in Tab le 3 - 4.

Table 3-4: BO Field Bit Definitions

BO Bit Description

BO[0] CR Test Control

0—Test the CR bit specified by the BI opcode field for the value indicated by BO[1].

1—Do not test the CR.

BO[1] CR Test Value

0—Test for CR[BI]=0. 1—Test for CR[BI]=1.

March 2002 Release www.xilinx.com 367 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 3: User Programming Model

Table 3-4: BO Field Bit Definitions (Continued)

BO Bit Description

BO[2] CTR Test Control

0—Decrement CTR by one, and test whether CTR satisfies the condition specified by BO[3].

1—Do not change or test CTR.

BO[3] CTR Test Value

0—Test for C T R ≠ 0. 1—Test for C T R =0.

BO[4] Branch Prediction Reversal

0—Apply standard branch prediction.

1—Reverse the standard branch prediction.

The 5-bit BI opcode field in branch-conditional instructions specifies which of the 32 bits in the CR are used in the branch-condition test. For example, if BI=0b01010, CR

is used in

the test.

In some encodings of the BO field, certain BO bits are ignored. Ignored bits can be assigned a meaning in future extensions of the PowerPC architecture and should be cleared to 0. Valid BO field encodings are shown in Table 3 -5 . In this table, z indicates the ignored bits that should be cleared to 0. The y bit (BO[4]) specifies the branch-prediction behavior for the instruction as described in Specifying Branch-Prediction Behavior, page 370.

Table 3-5: Valid BO Opcode-Field Encoding

BO[0:4] Description

0000y

0001

001

0100

0101

011

z00y

z01y

z1zz

Branch Instructions

The following sections describe the branch instructions defined by the PowerPC architecture. A number of simplified mnemonics are defined for the branch instructions. See Branch Instructions, page 821 for more information.

Branch Unconditional

Decrement the CTR. Branch if the decremented CTR ≠ 0 and CR[BI]=0.

Decrement the CTR. Branch if the decremented CTR = 0 and CR[BI]=0.

Branch if CR[BI]=0.

Decrement the CTR. Branch if the decremented CTR ≠ 0 and CR[BI]=1.

Decrement the CTR. Branch if the decremented CTR=0 and CR[BI]=1.

Branch if CR[BI]=1.

Decrement the CTR. Branch if the decremented CTR ≠ 0.

Decrement the CTR. Branch if the decremented CTR = 0.

Branch always.

Ta bl e 3 -6 lists the PowerPC unconditional branch instructions. These branches specify a 26-

bit signed displacement to the branch-target address by appending the 24-bit LI instruction field with 0b00. The displacement value gives unconditional branches the ability to cover an address range of ±32 MB.

368 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Branch and Flow-Control Instructions

Table 3-6: Branch-Unconditional Instructions

Mnemonic Name Operation

b Branch Branch to relative address..

ba Branch Absolute Branch to absolute address.

bl Branch and Link Branch to relative address. LR is updated with the

address of the instruction following the branch.

bla Branch Absolute and Link Branch to absolute address. LR is updated with the

address of the instruction following the branch.

Branch Conditional

Ta bl e 3 -7 lists the PowerPC branch-conditional instructions. The BO field specifies the

condition tested by the branch, as shown in Tab le 3-5, page 3 6 8. The BI field specifies the CR bit used in the test. These branches specify a 16-bit signed displacement to the branchtarget address by appending the 14-bit BD instruction field with 0b00. The displacement value gives conditional branches the ability to cover an address range of ±32 KB.

Table 3-7: Branch-Conditional Instructions

Mnemonic Name Operation

bc Branch Conditional Branch-conditional to relative address..

Operand

Syntax

tgt_addr

Operand

Syntax

BO,BI,tgt_addr

bca Branch Conditional Absolute Branch-conditional to absolute address.

bcl Branch Conditional and Link Branch-conditional to relative address. LR is

updated with the address of the instruction following the branch.

bcla Branch Conditional Absolute and

Link

Branch-conditional to absolute address. LR is updated with the address of the instruction following the branch.

Branch Conditional to Link Register

Ta bl e 3 -8 lists the PowerPC branch-conditional to link-register instructions. The BO field

specifies the condition tested by the branch, as shown in Table 3-5, p a ge 3 6 8. The BI field specifies the CR bit used in the test. The branch-target address is read from the LR, with LR[30:31] cleared to zero to form a word-aligned address. Using the 32-bit LR as a branch target gives these branches the ability to cover the full 4 GB address range.

Table 3-8: Branch-Conditional to Link-Register Instructions

Mnemonic Name Operation

bclr Branch Conditional to Link Register Branch-conditional to address in LR.

bclrl Branch Conditional to Link Register

and Link

Branch-conditional to address in LR. LR is updated with the address of the instruction following the branch.

Operand

Syntax

BO,BI

March 2002 Release www.xilinx.com 369 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Branch Conditional to Count Register

Ta bl e 3 -9 lists the PowerPC branch-conditional to count-register instructions. The BO field

specifies the condition tested by the branch, as shown in Table 3-5, p a ge 3 6 8. The BI field specifies the CR bit used in the test. The branch-target address is read from the CTR, with CTR[30:31] cleared to zero to form a word-aligned address. Using the 32-bit CTR as a branch target gives these branches the ability to cover the full 4 GB address range.

Table 3-9: Branch-Conditional to Count-Register Instructions

Chapter 3: User Programming Model

Mnemonic Name Operation

bcctr Branch Conditional to Count Register Branch-conditional to address in CTR.

bcctrl Branch Conditional to Count Register

and Link

Branch-conditional to address in CTR. LR is updated with the address of the instruction following the branch.

Branch Prediction

Conditional branches alter program flow based on the value of bits in the CR. If a condition is met by the CR bits, the branch instruction alters the next-instruction address nonsequentially. Otherwise, the next-sequential instruction following the branch is executed. When the processor encounters a conditional branch, it scans the execution pipelines to determine whether an instruction in progress can affect the CR bit tested by the branch. If no such instruction is found, the branch can be resolved immediately by checking the bit in the CR and taking the action defined by the branch instruction.

However, if a CR-altering instruction is detected, the branch is considered unresolved until the CR-altering instruction completes execution and writes its result to the CR. Prior to that time, the processor can predict how the branch is resolved. First, the processor uses special dynamic prediction hardware to analyze instruction flow and branch history to predict resolution of the current branch. If branches are predicted correctly, performance improvements can be realized because instruction execution does not stall waiting for the branch to be resolved. The PowerPC architecture provides software with the ability to override (reverse) the dynamic prediction using a static prediction hint encoded in the instruction opcode. This can be useful when it is known at compile time that a branch is likely to behave contrary to what the processor expects. The use of static prediction is described in the next section, Specifying Branch-Prediction Behavior.

When a prediction is made, instructions are fetched from the predicted execution path. If the processor determines the prediction was incorrect after the CR-altering instruction completes execution, all instructions fetched as a result of the prediction are discarded by the processor. Instruction fetch is restarted along the correct path. If the prediction was correct, instruction fetch and execution proceed normally along the predicted (and now resolved) path.

Branch prediction is most effective when the branch-target address is computed well in advance of resolving the branch. If a branch instruction contains immediate addressing operands, the processor can compute the branch-target address ahead of branch resolution. If the branch instruction uses the LR or CTR for addressing, it is important that the register is loaded by software sufficiently ahead of the branch instruction.

Operand

Syntax

BO,BI

Specifying Branch-Prediction Behavior

All PowerPC processors predict a conditional branch as taken using the following rules:

• For the bcx instruction with a negative value in the displacement operand, the branch is predicted taken.

• For all other branch-conditional instructions (bcx with a non-negative value in the displacement operand, bclrx, or bcctrx), the branch is predicted not taken.

370 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Branch and Flow-Control Instructions

Algorithmically, a branch is predicted taken if:

((BO[0] ∧ BO[2]) ∨ s) = 1

where s is the sign bit of the displacement operand, if the instruction has a displacement operand (bit 16 of the branch-conditional instruction encoding).

When the result of the above equation is 0, the branch is predicted not-taken and the processor speculatively fetches instructions that sequentially follow the branch instruction.

Examining the above equation, BO[0] ∧ BO[2]=1 only when the conditional branch tests nothing, meaning the branch is always taken. In this case, the processor predicts the branch as taken.

If the conditional branch tests anything (BO[0] ∧ BO[2]=0), s controls the prediction. In the bclrx and bcctrx instructions, bit 16 (s) is reserved and always 0. In this case those instructions are predicted not-taken.

Only the bcx instructions can specify a displacement value. The bcx instructions are commonly used at the end of loops to control the number of times a loop is executed. Here, the branch is taken every time the loop is executed except the last time, so a branch should normally be predicted as taken. Because the branch target is at the beginning of the loop, the branch displacement is negative and s=1, so the processor predicts the branch as taken. Forward branches have a positive displacement and are predicted not-taken.

When the y bit (BO[4]) is cleared to 0, the default branch prediction behavior described above is followed by the processor. Setting the y bit to 1 reverses the above behavior. For branch always encoding (BO[0], BO[2]), branch prediction cannot be reversed (no y bit is recognized).

The sign of the displacement operand (s) is used as described above even when the target is an absolute address. The default value for the y bit should be 0. Compilers can set this bit if it they determine that the prediction corresponding to y=1 is more likely to be correct than the prediction corresponding to y=0. Compilers that do not statically predict branches should always clear the y bit.

Link-Register Stack

Some processor implementations keep a stack (history) of the LR values most recently used by branch-and-link instructions. Those processors use this software-invisible stack to predict the target address of nested-subroutine returns. Although the PPC405 processor does not implement such a stack, the following programming conventions should be followed so that software running on multiple PowerPC processors can benefit from this stack.

In the following examples, let A, B, and Glue represent subroutine labels:

• When obtaining the address of the next instruction, use the following form of branchand-link:

bcl 20,31,$+4

• Loop counts:

Keep loop counts in the CTR, and use one of the branch-conditional instructions to decrement the count and to control branching (for example, branching back to the start of a loop if the decremented CTR value is nonzero).

• Computed “go to”, case statements, etc.:

Use the CTR to hold the branch-target address, and use the bcctr instruction with the link register option disabled (LK=0) to branch to the selected address.

• Direct subroutine linkage, where A calls B and B returns to A:

- A calls B—use a branch instruction that enables the LR (LK=1).

March 2002 Release www.xilinx.com 371 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

- B returns to A—use the bclr instruction with the link-register option disabled

(LK=0). The return address is in, or can be restored to, the LR.

• Indirect subroutine linkage, where A calls Glue, Glue calls B, and B returns to A rather than to Glue.

Such a calling sequence is common in linkage code where the subroutine that the programmer wants to call, B, is in a different module than the caller, A. The binder inserts “glue” code to mediate the branch:

- A calls Glue—use a branch instruction that sets the LR with the link-register

option enabled (LK=1).

- Glue calls B—write the address of B in the CTR, and use the bcctr instruction with

the link-register option disabled (LK=0).

- B returns to A—use the bclr instruction with the link-register option disabled

(LK=0). The return address is in, or can be restored to, the LR.

Branch-Target Address Calculation

Branch instructions compute the effective address (EA) of the next instruction using the following addressing modes:

• Branch to relative (conditional and unconditional).

• Branch to absolute (conditional and unconditional).

• Branch to link register (conditional only).

• Branch to count register (conditional only).

Instruction addresses are always assumed to be word aligned. PowerPC processors ignore the two low-order bits of the generated branch-target address.

Chapter 3: User Programming Model

Branch to Relative

Instructions that use branch-to-relative addressing generate the next-instruction address by right-extending 0b00 to the immediate-displacement operand (LI), and then signextending the result. That result is added to the current-instruction address to produce the next-instruction address. Branches using this addressing mode must have the absoluteaddressing option disabled by clearing the AA instruction field (bit 30) to 0. The linkregister update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.

Figure 3-11 shows how the branch-target address is generated when using the branch-to-

relative addressing mode.

372 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Branch and Flow-Control Instructions

Instruction Encoding

0 6 30 31

031

Sign Extension

031

Current Instruction Address

031

Branch Target Address

Figure 3-11: Branch-to-Relative Addressing

Branch-Conditional to Relative

If the branch conditions are met, instructions that use branch-conditional to relative addressing generate the next-instruction address by appending 0b00 to the immediatedisplacement operand (BD) and sign-extending the result. That result is added to the current-instruction address to produce the next-instruction address. Branches using this addressing mode must have the absolute-addressing option disabled by clearing the AA instruction field (bit 30) to 0. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.

Figure 3-12 shows how the branch-target address is generated when using the branch-

conditional to relative addressing mode.

UG011_11_033101

0 6 11 16 30 31

Instruction Encoding

Condition

Met?

031

Next Sequential Instruction Address

031

Current Instruction Address

Ye s

031

BO BI

Branch Target Address

Figure 3-12: Branch-Conditional to Relative Addressing

BDSign Extension

3016

UG011_07_033101

March 2002 Release www.xilinx.com 373 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Branch to Absolute

Instructions that use branch-to-absolute addressing generate the next-instruction address by appending 0b00 to the immediate-displacement operand (LI) and sign-extending the result. Branches using this addressing mode must have the absolute-addressing option enabled by setting the AA instruction field (bit 30) to 1. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.

Figure 3-13 shows how the branch-target address is generated when using the branch-to-

absolute addressing mode.

Instruction Encoding

Chapter 3: User Programming Model

0 6 30 31

031

Sign Extension

031

Branch Target Address

UG011_12_033101

Figure 3-13: Branch-to-Absolute Addressing

Branch-Conditional to Absolute

If the branch conditions are met, instructions that use branch-conditional to absolute addressing generate the next-instruction address by appending 0b00 to the immediatedisplacement operand (BD) and sign-extending the result. Branches using this addressing mode must have the absolute-addressing option enabled by setting the AA instruction field (bit 30) to 1. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.

Figure 3-14 shows how the branch-target address is generated when using the branch-

conditional to absolute-addressing mode.

0 6 11 16 30 31

Instruction Encoding

BO BI

Condition

Met?

031

Next Sequential Instruction Address

Ye s

031

Branch Target Address

BDSign Extension

3016

UG011_08_033101

Figure 3-14: Branch-Conditional to Absolute Addressing

374 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Branch and Flow-Control Instructions

Branch-Conditional to Link Register

If the branch conditions are met, the branch-conditional to link-register instruction generates the next-instruction address by reading the contents of the LR and clearing the two loworder bits to zero. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.

Figure 3-15 shows how the branch-target address is generated when using the branch-

conditional to link-register addressing mode.

0 6 11 16 31

Instruction Encoding

Condition

Met?

031

Next Sequential Instruction Address

Ye s

031

BO BI

Branch Target Address

00000 16

Figure 3-15: Branch-Conditional to Link-Register Addressing

Branch-Conditional to Count Register

If the branch conditions are met, the branch-conditional to count-register instruction generates the next-instruction address by reading the contents of the CTR and clearing the two low-order bits to zero. The link-register update option is enabled by setting the LK instruction field (bit 31) to 1. This option causes the effective address of the instruction following the branch instruction to be loaded into the LR.

Figure 3-16 shows how the branch-target address is generated when using the branch-

conditional to count-register addressing mode.

3029

UG011_09_033101

0 6 11 16 31

Instruction Encoding

Condition

Met?

031

Next Sequential Instruction Address

Ye s

031

BO BI

CTR

Branch Target Address

00000 528

3029

UG011_10_033101

Figure 3-16: Branch-Conditional to Count-Register Addressing

March 2002 Release www.xilinx.com 375 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Condition-Register Logical Instructions

Ta bl e 3 -1 0 lists the PowerPC condition-register logical instructions. The condition-register

logical instructions perform logical operations on any two bits within the CR and store the result of the operation in any CR bit. The move condition-register field instruction is used to move any CR field (each field comprising four bits) to any other CR-field location. All of these instructions are considered flow-control instructions because they are generally used to set up conditions for testing by the branch-conditional instructions and to reduce the number of branches in a code sequence. Simplified mnemonics are defined for the condition-register logical instructions. See CR-Logical Instructions, page 828 for more information.

In Tabl e 3-10 , the instruction-operand fields crbA, crbB, and crbD all specify a single bit within the CR. The instruction-operand fields crfD and crfS specify a 4-bit field within the CR.

Table 3-10: Condition-Register Logical Instructions

Chapter 3: User Programming Model

Mnemonic Name Operation

crand Condition Register AND CR-bit crbA is ANDed with CR-bit crbB and the

result is stored in CR-bit crbD.

crandc Condition Register AND with

Complement

creqv Condition Register Equivalent

CR-bit crbA is ANDed with the bit crbB and the result is stored in CR-bit crbD.

CR-bit crbA is XORed with CR-bit crbB and the

complement of CR-

complemented result is stored in CR-bit crbD.

crnand Condition Register NAND

CR-bit crbA is ANDed with CR-bit crbB and the

complemented result is stored in CR-bit crbD.

crnor Condition Register NOR

CR-bit crbA is ORed with CR-bit crbB and the

complemented result is stored in CR-bit crbD.

cror Condition Register OR CR-bit crbA is ORed with CR-bit crbB and the

result is stored in CR-bit crbD.

crorc Condition Register OR with

Complement

crxor Condition Register XOR CR-bit crbA is XORed with CR-bit crbB and the

mcrf Move Condition Register Field CR-field crfS is copied into CR-field crfD. No other

CR-bit crbA is ORed with the bit crbB and the result is stored in CR-bit crbD.

result is stored in CR-bit crbD.

CR fields are modified.

complement of CR-

Operand

Syntax

crbD,crbA,crbB

,crfS

crfD

System Call

Ta bl e 3 -11 lists the PowerPC system-call instruction. The sc instruction is a user-level

instruction that can be used by a user-mode program to transfer control to a privilegedmode program (typically a system-service routine). Executing the sc instruction causes a system-call exception to occur. See System-Call Interrupt (0x0C00), page 514 for more information on the operation of this instruction.

Table 3-11: System-Call Instruction

Mnemonic Name Operation

sc System Call Causes a system-call exception to occur. —

376 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Operand

Syntax

Branch and Flow-Control Instructions

System Trap

Ta bl e 3 -1 2 lists the PowerPC system-trap instructions. System-trap instructions are

normally used by software-debug applications to set breakpoints. These instructions test for a specified set of conditions and cause a program exception to occur if any of the conditions are met. If the tested conditions are not met, instruction execution continues normally with the instruction following the system-trap instruction (a program exception does not occur). The system-trap handler can be called from the program-interrupt handler when it is determined that a system-trap instruction caused the exception. See Program

Interrupt (0x0700), page 511 for more information on program exceptions caused by the

system-trap instructions.

Trap instructions can also be used to cause a debug exception. See Trap-Instruction Debug

Event, page 546 for more information.

Simplified mnemonics are defined for the system-trap instructions. See Trap Instructions,

page 832 for more information.

Table 3-12: System-Trap Instructions

Mnemonic Name Operation

tw Trap Word The contents of rA are compared with rB. A

program exception occurs if the comparison meets any test condition enabled by the TO operand.

twi Trap Word Immediate The contents of rA are compared with the sign-

extended SIMM operand. A program exception occurs if the comparison meets any test condition enabled by the TO operand.

The TO operand field in the system-trap instructions specifies the test conditions performed on the remaining two operands. Multiple test conditions can be set simultaneously, expanding the number of possible conditions that can cause the trap (program exception). If all bits in the TO operand field are set, the trap always occurs because one of the trap conditions is always met. The bits within the TO field are defined as shown in Ta bl e 3 -1 3.

Table 3-13: TO Field Bit Definitions

TO Bit Description

TO[0] Less-than arithmetic comparison.

0—Ignore trap condition.

1—Trap if first operand is arithmetically less-than second operand.

Operand

Syntax

TO,rA,rB

TO,rA,SIMM

TO[1] Greater-than arithmetic comparison.

0—Ignore trap condition.

1—Trap if first operand is arithmetically greater-than second operand.

March 2002 Release www.xilinx.com 377 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-13: TO Field Bit Definitions (Continued)

TO Bit Description

TO[2] Equal-to arithmetic comparison.

0—Ignore trap condition.

1—Trap if first operand is arithmetically equal-to second operand.

TO[3] Less-than unsigned comparison.

0—Ignore trap condition.

1—Trap if first operand is less-than second operand.

TO[4] Greater-than unsigned comparison.

0—Ignore trap condition.

1—Trap if first operand is greater-than second operand.

Integer Load and Store Instructions

The integer load and store instructions move data between the general-purpose registers and memory. Several types of loads and stores are supported by the PowerPC instruction set:

• Load and zero

• Load algebraic

• Store

• Load with byte reverse and store with byte reverse

• Load multiple and store multiple

• Load string and store string

• Memory synchronization instructions

Memory accesses performed by the load and store instructions can occur out of order. Synchronizing instructions are provided to enforce strict memory-access ordering. See

Synchronizing Instructions, page 424 for more information.

In general, the PowerPC architecture defines a sequential-execution model. When a store instruction modifies an instruction-memory location, software synchronization is required to ensure subsequent instruction fetches from that location obtain the modified version of the instruction. See Self-Modifying Code, page 467 for more information.

Chapter 3: User Programming Model

Operand-Address Calculation

Integer load and store instructions generate effective addresses using one of three addressing modes: register-indirect with immediate index, register-indirect with index, or register indirect. These addressing modes are described in the following sections. For some instructions, update forms that load the calculated effective address into rA are also provided.

In the PPC405 processor, loads and stores to unaligned addresses can suffer from performance degradation. Refer to Performance Effects of Operand Alignment, page 353 for more information.

Load and store instructions using this addressing mode contain a signed, 16-bit immediate index (d operand) and a general-purpose register operand, rA. The index is sign-extended to 32 bits and added to the contents of rA to generate the effective address. If the rA instruction field is 0 (specifying r0), a value of zero—rather than the contents of r0—is added to the sign-extended immediate index. The option to specify rA or 0 is shown in the instruction description as (rA|0).

378 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Load and Store Instructions

Figure 3-17 shows how an effective address is generated when using register-indirect with

immediate-index addressing.

rA=0?

Instruction Encoding

Ye s

031

0000 0000 0000 0000 0000 0000 0000 0000

031

Figure 3-17: Register-Indirect with Immediate-Index Addressing

Load and store instructions using this addressing mode contain two general-purpose register operands, rA and rB. The contents of these two registers are added to generate the effective address. If the rA instruction field is 0 (specifying r0), a value of zero—rather than the contents of r0—is added to rB. The option to specify rA or 0 is shown in the instruction description as (rA|0).

Figure 3-18 shows how an effective address is generated when using register-indirect with

index addressing.

(rA)

0 6 11 16

Opcode

031

rD/rS rA

Sign Extension

031

Effective Address

UG011_02_033101

March 2002 Release www.xilinx.com 379 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

0 6 11 16 20 31

Instruction Encoding

Opcode

031

Chapter 3: User Programming Model

rD/rS rA rB

(rB)

Subopcode 0

rA=0?

Ye s

031

0000 0000 0000 0000 0000 0000 0000 0000

031

Figure 3-18: Register-Indirect with Index Addressing

Only load-string and store-string instructions can use this addressing mode. This mode uses only the contents of the general-purpose register specified by the rA operand as the effective address. Rather than using the contents of r0, a zero in the rA operand causes an effective address of zero to be generated. The option to specify rA or 0 is shown in the instruction descriptions as (rA|0).

Figure 3-19 shows how an effective address is generated when using register-indirect

addressing.

(rA)

031

Effective Address

UG011_01_033101

0 6 11 16 20 31

Instruction Encoding

rA=0?

031

(rA)

Ye s

Opcode

031

rD/rS rA NB

0000 0000 0000 0000 0000 0000 0000 0000

Effective Address

Subopcode 0

UG011_03_033101

Figure 3-19: Register-Indirect Addressing

380 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Load and Store Instructions

Load Instructions

Integer-load instructions read an operand from memory and store it in a GPR destination register, rD. Each type of load is characterized by what they do with unused high-order bits in rD when the operand size is less than a word (32 bits). Load-and-zero instructions clear the unused high-order bits in rD to zero. Load-algebraic instructions fill the unused high-order bits in rD with a copy of the most-significant bit in the operand.

Load-with-update instructions are provided, but the following two rules apply:

• rA must not be equal to 0. If rA = 0, the instruction form is invalid.

• rA must not be equal to rD. If rA = rD, the instruction form is invalid.

In the PPC405, the above invalid instruction forms produce a boundedly-undefined result. In other PowerPC implementations, those forms can cause a program exception.

Load Byte and Zero

Ta bl e 3 -1 4 lists the PowerPC load byte and zero instructions. These instructions load a byte

from memory into the lower-eight bits of rD and clear the upper-24 bits of rD to 0.

Table 3-14: Load Byte and Zero Instructions

Mnemonic Name Addressing Mode

lbz Load Byte and Zero Register-indirect with immediate index

EA = (rA|0) + d

lbzu Load Byte and Zero with Update Register-indirect with immediate index

EA = (rA) + d

rA ← EA rA ≠ 0, rA ≠ rD

lbzx Load Byte and Zero Indexed Register-indirect with index

EA = (rA|0) + (rB)

lbzux Load Byte and Zero with Update

Indexed

EA = (rA) + (rB)

rA ← EA rA ≠ 0, rA ≠ rD

Load Halfword and Zero

Ta bl e 3 -1 5 lists the PowerPC load halfword and zero instructions. These instructions load a

halfword from memory into the lower-16 bits of rD and clear the upper-16 bits of rD to 0.

Operand

Syntax

rD,d(rA)

rD,rA,rB

March 2002 Release www.xilinx.com 381 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-15: Load Halfword and Zero Instructions

Chapter 3: User Programming Model

Mnemonic Name Addressing Mode

lhz Load Halfword and Zero Register-indirect with immediate index

EA = (rA|0) + d

lhzu Load Halfword and Zero with Update Register-indirect with immediate index

EA = (rA) + d

rA ← EA rA ≠ 0, rA ≠ rD

lhzx Load Halfword and Zero Indexed Register-indirect with index

EA = (rA|0) + (rB)

lhzux Load Halfword and Zero with Update

Indexed

EA = (rA) + (rB)

rA ← EA rA ≠ 0, rA ≠ rD

Load Word and Zero

Ta bl e 3 -1 6 lists the PowerPC load word and zero instructions. These instructions load a word

from memory into rD.

Table 3-16: Load-Word and Zero Instructions

Operand

Syntax

rD,d(rA)

rD,rA,rB

Mnemonic Name Addressing Mode

lwz Load Word and Zero Register-indirect with immediate index

EA = (rA|0) + d

lwzu Load Word and Zero with Update Register-indirect with immediate index

EA = (rA) + d

rA ← EA rA ≠ 0, rA ≠ rD

lwzx Load Word and Zero Indexed Register-indirect with index

EA = (rA|0) + (rB)

lwzux Load Word and Zero with Update

Indexed

EA = (rA) + (rB)

rA ← EA rA ≠ 0, rA ≠ rD

Load Halfword Algebraic

Ta bl e 3 -1 7 lists the PowerPC load halfword algebraic instructions. These instructions load a

halfword from memory into the lower-16 bits of rD. The upper-16 bits of rD are filled with a copy of the most-significant bit (bit 16) of the operand.

Operand

Syntax

rD,d(rA)

rD,rA,rB

382 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Load and Store Instructions

Table 3-17: Load Halfword Algebraic Instructions

Mnemonic Name Addressing Mode

lha Load Halfword Algebraic Register-indirect with immediate index

EA = (rA|0) + d

lhau Load Halfword Algebraic with

Update

lhax Load Halfword Algebraic Indexed Register-indirect with index

lhaux Load Halfword Algebraic with

Update Indexed

EA = (rA) + d

rA ← EA rA ≠ 0, rA ≠ rD

EA = (rA|0) + (rB)

EA = (rA) + (rB)

rA ← EA rA ≠ 0, rA ≠ rD

Operand

Syntax

rD,d(rA)

rD,rA,rB

March 2002 Release www.xilinx.com 383 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Store Instructions

Integer-store instructions read an operand from a GPR source register, rS, and write it into memory. Store-with-update instructions are provided, but the following two rules apply:

• rA must not be equal to 0. If rA = 0, the instruction form is invalid.

• If rS = rA, rS is written to memory first, and then the effective address is loaded into

rS.

In the PPC405, the above invalid instruction form produces a boundedly-undefined result. In other PowerPC implementations, that form can cause a program exception.

Store Byte

Ta bl e 3 -1 8 lists the PowerPC store byte instructions. These instructions store the lower-eight

bits of rS into the specified byte location in memory.

Table 3-18: Store Byte Instructions

Chapter 3: User Programming Model

Mnemonic Name Addressing Mode

stb Store Byte Register-indirect with immediate index

EA = (rA|0) + d

stbu Store Byte with Update Register-indirect with immediate index

EA = (rA) + d

rA ← EA rA ≠ 0

stbx Store Byte Indexed Register-indirect with index

EA = (rA|0) + (rB)

stbux Store Byte with Update Indexed Register-indirect with index

EA = (rA) + (rB) rA ← EA rA ≠ 0

Store Halfword

Ta bl e 3 -1 9 lists the PowerPC store halfword instructions. These instructions store the lower-

16 bits of rS into the specified halfword location in memory.

Operand

Syntax

rS,d(rA)

rS,rA,rB

384 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Load and Store Instructions

Table 3-19: Store Halfword Instructions

Mnemonic Name Addressing Mode

sth Store Halfword Register-indirect with immediate index

EA = (rA|0) + d

sthu Store Halfword with Update Register-indirect with immediate index

EA = (rA) + d

rA ← EA rA ≠ 0

sthx Store Halfword Indexed Register-indirect with index

EA = (rA|0) + (rB)

sthux Store Halfword with Update Indexed Register-indirect with index

EA = (rA) + (rB) rA ← EA rA ≠ 0

Store Word

Ta bl e 3 -2 0 lists the PowerPC store word instructions. These instructions store the entire

contents of rS into the specified word location in memory.

Table 3-20: Store Word Instructions

Operand

Syntax

rS,d(rA)

rS,rA,rB

Mnemonic Name Addressing Mode

stw Store Word Register-indirect with immediate index

EA = (rA|0) + d

stwu Store Word with Update Register-indirect with immediate index

EA = (rA) + d

rA ← EA rA ≠ 0

stwx Store Word Indexed Register-indirect with index

EA = (rA|0) + (rB)

stwux Store Word with Update Indexed Register-indirect with index

EA = (rA) + (rB) rA ← EA rA ≠ 0

Load and Store with Byte-Reverse Instructions

Ta bl e 3 -2 1 lists the PowerPC load and store with byte-reverse instructions. Figure 3-20 shows

(using big-endian memory) how bytes are moved between memory and the GPRs for each of the byte-reverse instructions. When an lhbrx instruction is executed, the unloaded bytes in rD are cleared to 0.

When used in a system operating with the default big-endian byte order, these instructions have the effect of loading and storing data in little-endian order. Likewise, when used in a system operating with little-endian byte order, these instructions have the effect of loading

Operand

Syntax

rS,d(rA)

rS,rA,rB

March 2002 Release www.xilinx.com 385 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

and storing data in big-endian order. For more information about big-endian and littleendian byte ordering, see Byte Ordering, page 349.

Table 3-21: Load and Store with Byte-Reverse Instructions

Chapter 3: User Programming Model

Mnemonic Name Addressing Mode

lhbrx Load Halfword Byte-Reverse Indexed Register-indirect with index

lwbrx Load Word Byte-Reverse Indexed

sthbrx Store Halfword Byte-Reverse Indexed Register-indirect with index

stwbrx Store Word Byte-Reverse Indexed

lwbrx

Memory Word

03124816

Byte 1 Byte 2 Byte 3Byte 0

031

Byte 2 Byte 1 Byte 0Byte 3

24816

EA = (rA|0) + (rB)

Big-Endian

Little-Endian

stwbrx

Memory Word

03124816

Byte 2 Byte 1 Byte 0Byte 3

031

Byte 1 Byte 2 Byte 3Byte 0

Operand

Syntax

rD,rA,rB

rS,rA,rB

24816

lhbrx

Memory Halfword

0 815

Byte 1Byte 0

03124816

0000_0000 Byte 1 Byte 00000_0000

Figure 3-20: Load and Store with Byte-Reverse Instructions

Load and Store Multiple Instructions

Ta bl e 3 -2 2 lists the PowerPC load and store multiple instructions and their operation. Figure 3-21 shows how bytes are moved between memory and the GPRs for each of these

instructions.

These instructions are used to move blocks of data between memory and the GPRs. When the load multiple word instruction (lmw) is executed, rD through r31 are loaded with n

sthbrx

Memory Halfword

0 8

031

Byte 2Byte 3

24816

Byte 1 Byte 2 Byte 3Byte 0

UG011_04_091301

386 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Load and Store Instructions

consecutive words from memory, where n=32-rD. For the lmw instruction, if rA is in the range of registers to be loaded, or if rD=0, the instruction form is invalid. When the store multiple word instruction (stmw) is executed, the n consecutive words in rS through r31 are stored into memory, where n=32-rS.

Table 3-22: Load and Store Multiple Instructions

Mnemonic Name Addressing Mode

lmw Load Multiple Word Register-indirect with immediate index

EA = (rA|0) + d

stmw Store Multiple Word Register-indirect with immediate index

EA = (rA|0) + d

lmw

EA + 4(n-1)

Word 0

. . .

Word n-1

Memory GPRs

Word 0

. . .

Operand

Syntax

rD,d(rA)

rS,d(rA)

. . .

r31

. . .

r31

Word 0

. . .

Word n-1

Figure 3-21: Load and Store Multiple Instructions

Load and Store String Instructions

Ta bl e 3 -2 3 lists the PowerPC load and store string instructions and their addressing modes.

See the individual instruction listings in Chapter 11, Instruction Set for more information on their operation and restrictions on the instruction forms.

stmw

Word 0

. . .

Word n-1

MemoryGPRs

EA + 4(n-1)

UG011_05_033101

March 2002 Release www.xilinx.com 387 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-23: Load and Store String Instructions

Chapter 3: User Programming Model

Mnemonic Name Addressing Mode

lswi Load String Word Immediate Register-indirect

EA = (rA|0)

lswx Load String Word Indexed Register-indirect with index

EA = (rA|0) + (rB)

stswi Store String Word Immediate Register-indirect

EA = (rA|0)

stswx Store String Word Indexed Register-indirect with index

EA = (rA|0) + (rB)

These instructions are used to move up to 32 consecutive bytes of data between memory and the GPRs without concern for alignment. The instructions can be used for short moves between arbitrary memory locations or for long moves between misaligned memory fields. Performance of these instructions is degraded if the leading and/or trailing bytes are not aligned on a word boundary (see Performance Effects of Operand Alignment,

page 353 for more information).

The immediate form of the instructions take the byte count, n, from the NB instruction field. If NB=0, then n=32. The indexed forms take the byte count from XER[25:31]. Unlike the immediate forms, if XER[25:31]=0, then n=0. For the lswx instruction, the contents of rD are undefined if n=0.

The n bytes are loaded into and stored from registers beginning with the most-significant register byte. For loads, any unfilled low-order register bytes are cleared to 0. The sequence of registers loaded or stored wraps through r0 if necessary. Figure 3-22 shows an example of the string-instruction operation.

Operand

Syntax

rD,rA,NB

rD,rA,rB

rS,rA,NB

rS,rA,rB

388 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Instructions

Load String Example

EA + (n-1)

r31

Byte 0

Byte 1

. . .

Byte n-2

Byte n-1

Memory GPRs

Store String Example

240816

Byte 1 Byte 2 Byte 3Byte 0

. . . . . . . . .. . .

Byte n-1Byte n-2

31240816

Byte 1 Byte 2 Byte 3Byte 0

. . . . . . . . .. . .

Byte n-1 0000_0000 0000_0000Byte n-2

Byte 0

Byte 1

. . .

Byte n-2

Byte n-1

r31

EA + (n-1)

Integer Instructions

Integer instructions operate on the contents of GPRs. They use the GPRs (and sometimes immediate values coded in the instruction) as source operands. Results are written into GPRs. These instructions do not operate on memory locations. Integer instructions treat the source operands as signed integers unless the instruction is explicitly identified as performing an unsigned operation. For example, the multiply high-word unsigned (mulhwu) and divide-word unsigned (divwu) instructions interpret both operands as unsigned integers.

The following types of integer instructions are supported by the PowerPC architecture:

• Arithmetic Instructions

• Logical Instructions

• Compare Instructions

• Rotate Instructions

• Shift Instructions

The arithmetic, shift, and rotate instructions can update and/or read bits from the XER. Those instructions, plus the integer-logical instructions, can also update bits in the CR. Unless otherwise noted, when XER and/or CR are updated, they reflect the value written

MemoryGPRs

UG011_06_033101

Figure 3-22: Load and Store String Instructions

March 2002 Release www.xilinx.com 389 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

to the destination register. XER and CR can be updated by the integer instructions in the following ways:

• The XER[CA] bit is updated to reflect the carry out of bit 0 in the result.

• The XER[OV] bit is set or cleared to reflect a result overflow. When XER[OV] is set,

XER[SO] is also set to reflect a summary overflow. XER[SO] can only be cleared using the mtspr and mcrxr instructions. Instructions that update these bits have the overflow-enable (OE) bit set to 1 in the instruction encoding. This is indicated by the

“o” suffix in the instruction mnemonic.

• Bits in CR0 (CR[0:3]) are updated to reflect a signed comparison of the result to zero.

Instructions that update CR0 have the record (Rc) bit set to 1 in the instruction encoding. This is indicated by the “.” suffix in the instruction mnemonic. See CR0

Field, page 361, for information on how these bits are updated.

Instructions that update XER[OV] or XER[CA] can delay the execution of subsequent instructions. See Fixed-Point Exception Register (XER), page 363 for more information on these register bits.

Arithmetic Instructions

The integer-arithmetic instructions support addition, subtraction, multiplication, and division between operands in the GPRs and in some cases between GPRs and signedimmediate values.

Chapter 3: User Programming Model

Integer-Addition Instructions

Ta bl e 3 -2 4 shows the PowerPC integer-addition instructions. The instructions in this table

are grouped by the type of addition operation they perform. For each type of instruction shown, the “Operation” column indicates the addition-operation performed, and on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all). “SIMM” indicates an immediate value that is sign-extended prior to being used in the operation.

The add-extended instructions can be used to perform addition on integers larger than 32 bits. For example, assume a 64-bit integer i is represented by the register pair r3:r4, where r3 contains the most-significant 32 bits of i, and r4 contains the least-significant 32 bits. The 64-bit integer j is similarly represented by the register pair r5:r6. The 64-bit result i+j=r (represented by the pair r7:r8) is produced by pairing adde with addc as follows:

addc r8,r6,r4 ! Add the least-significant words and record a

! carry.

adde r7,r5,r3 ! Add the most-significant words, using

! previous carry.

Table 3-24: Integer-Addition Instructions

Mnemonic Name Operation

Add Instructions

add Add

rD is loaded with the sum (rA) + (rB).

XER and CR0 are

not updated.

Operand

Syntax

rD,rA,rB

add. Add and Record

addo Add with Overflow Enabled

addo. Add with Overflow Enabled and

Record

390 www.xilinx.com March 2002 Release

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Instructions

Table 3-24: Integer-Addition Instructions (Continued)

Mnemonic Name Operation

Add-Carrying Instructions

addc Add Carrying

addc. Add Carrying and Record

addco Add Carrying with Overflow Enabled

addco. Add Carrying with Overflow Enabled

and Record

Add-Immediate Instructions

addi Add Immediate

addic Add Immediate Carrying

addic. Add Immediate Carrying and Record

rD is loaded with the sum (rA) + (rB).

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

rD is loaded with the sum (rA|0) + SIMM.

XER and CR0 are

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

not updated.

Add Immediate-Shifted Instructions rD is loaded with the sum (rA|0) + (SIMM || 0x0000).

addis Add Immediate Shifted

Add-Extended Instructions

XER and CR0 are

rD is loaded with the sum (rA) + (rB) + XER[CA].

not updated.

Operand

Syntax

rD,rA,rB

rD,rA,SIMM

adde Add Extended

adde. Add Extended and Record

addeo Add Extended with Overflow

Enabled

addeo. Add Extended with Overflow

Enabled and Record

Add to Minus-One-Extended Instructions

addme Add to Minus One Extended

addme. Add to Minus One Extended and

Record

addmeo Add to Minus One Extended with

Overflow Enabled

addmeo. Add to Minus One Extended with

Overflow Enabled and Record

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

rD is loaded with the sum (rA) + XER[CA] + 0xFFFF_FFFF.

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

rD,rA,rB

rD,rA

March 2002 Release www.xilinx.com 391 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-24: Integer-Addition Instructions (Continued)

Chapter 3: User Programming Model

Mnemonic Name Operation

Add to Zero-Extended Instructions

addze Add to Zero Extended

addze. Add to Zero Extended and Record

addzeo Add to Zero Extended with Overflow

Enabled

addzeo. Add to Zero Extended with Overflow

Enabled and Record

rD is loaded with the sum (rA) + XER[CA].

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

Integer-Subtraction Instructions

Ta bl e 3 -2 5 shows the PowerPC integer-subtraction instructions. The instructions in this table

are grouped by the type of subtraction operation they perform. For each type of instruction shown, the “Operation” column indicates the subtraction-operation performed. The column also shows, on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all). The subtraction operation is expressed as addition so that the two’s- complement operation is clear. “SIMM” indicates an immediate value that is signextended prior to being used in the operation.

The integer-subtraction instructions subtract the second operand (rA) from the third operand (rB). Simplified mnemonics are provided with a more familiar operand ordering, whereby the third operand is subtracted from the second. Simplified mnemonics are also defined for the addi instruction to provide a subtract-immediate operation. See Subtract

Instructions, page 831 for more information.

The subtract-from extended instructions can be used to perform subtraction on integers larger than 32 bits. For example, assume a 64-bit integer i is represented by the register pair r3:r4, where r3 contains the most-significant 32 bits of i, and r4 contains the least-significant 32 bits. The 64-bit integer j is similarly represented by the register pair r5:r6. The 64-bit result i−j=r (represented by the pair r7:r8) is produced by pairing subfe with subfc as follows:

subfc r8,r6,r4 ! Subtract the least-significant words and record a

! carry.

subfe r7,r5,r3 ! Subtract the most-significant words, using

! previous carry.

Operand

Syntax

rD,rA

Table 3-25: Integer-Subtraction Instructions

Mnemonic Name Operation

Subtract-From Instructions

subf Subtract from

subf. Subtract from and Record

subfo Subtract from with Overflow Enabled

subfo. Subtract from with Overflow Enabled

and Record

392 www.xilinx.com March 2002 Release

rD is loaded with the sum ¬(rA) + (rB) + 1.

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

not updated.

Operand

Syntax

rD,rA,rB

Integer Instructions

Table 3-25: Integer-Subtraction Instructions (Continued)

Mnemonic Name Operation

Subtract- From Carrying Instructions

subfc Subtract from Carrying

subfc. Subtract from Carrying and Record

subfco Subtract from Carrying with

Overflow Enabled

subfco. Subtract from Carrying with

Overflow Enabled and Record

rD is loaded with the sum ¬(rA) + (rB) + 1.

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

Subtract-From Immediate Instructions rD is loaded with the sum ¬(rA) + SIMM + 1.

subfic Subtract from Immediate Carrying

XER[CA] is updated to reflect the result.

Subtract-From Extended Instructions rD is loaded with the sum ¬(rA) + (rB) + XER[CA].

subfe Subtract from Extended

subfe. Subtract from Extended and Record

subfeo Subtract from Extended with

Overflow Enabled

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

Operand

Syntax

rD,rA,rB

rD,rA,SIMM

rD,rA,rB

subfeo. Subtract from Extended with

Overflow Enabled and Record

XER[CA,OV,SO] and CR0 are updated to reflect the result.

Subtract-From Minus-One-Extended Instructions rD is loaded with the sum ¬(rA) + XER[CA] + 0xFFFF_FFFF.

subfme Subtract from Minus One Extended

subfme. Subtract from Minus One Extended

and Record

subfmeo Subtract from Minus One Extended

with Overflow Enabled

subfmeo. Subtract from Minus One Extended

with Overflow Enabled and Record

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

rD,rA

Subtract-From Zero-Extended Instructions rD is loaded with the sum ¬(rA) + XER[CA].

subfze Subtract from Zero Extended

subfze. Subtract from Zero Extended and

Record

subfzeo Subtract from Zero Extended with

Overflow Enabled

subfzeo. Subtract from Zero Extended with

Overflow Enabled and Record

XER[CA] is updated to reflect the result.

XER[CA] and CR0 are updated to reflect the result.

XER[CA,OV,SO] are updated to reflect the result.

XER[CA,OV,SO] and CR0 are updated to reflect the result.

rD,rA

Negation Instructions

Ta bl e 3 -2 6 shows the PowerPC integer-negation instructions. Negation takes the operand

specified by rA and writes the two’s-compliment equivalent in rD. For each instruction shown, the “Operation” column indicates (on an instruction-by-instruction basis) how the XER and CR registers are updated (if at all).

March 2002 Release www.xilinx.com 393 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-26: Negation Instructions

Chapter 3: User Programming Model

Mnemonic Name Operation

Negation Instructions

neg Negate

neg. Negate and Record

nego Negate with Overflow Enabled

nego. Negate with Overflow Enabled and

Record

rD is loaded with the sum ¬(rA) + 1.

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

not updated.

Multiply Instructions

Ta bl e 3 -2 7 shows the PowerPC integer-multiply instructions. Multiplication of two 32-bit

values can result in a 64-bit result. The multiply low-word instructions are used with the multiply high-word instructions to calculate the full 64-bit product. For each type of instruction shown, the “Operation” column indicates the multiplication-operation performed. The column also shows, on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all). “SIMM” indicates an immediate value that is signextended prior to being used in the operation.

Table 3-27: Multiply Instructions

Mnemonic Name Operation

Operand

Syntax

rD,rA

Operand

Syntax

Multiply Low-Word Instructions

mullw Multiply Low Word

mullw. Multiply Low Word and Record

mullwo Multiply Low Word with Overflow

Enabled

mullwo. Multiply Low Word with Overflow

Enabled and Record

rD is loaded with the low-32 bits of the product (rA) × (rB).

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

not updated.

rD,rA,rB

Multiply Low-Word Immediate Instructions rD is loaded with the low-32 bits of the product (rA) × SIMM.

mulli Multiply Low Immediate

XER and CR0 are

not updated.

rD,rA,SIMM

Multiply High-Word Instructions rD is loaded with the high-32 bits of the product (rA) × (rB).

mulhw Multiply High Word

mulhw. Multiply High Word and Record

XER and CR0 are

CR0 is updated to reflect the result.

not updated.

rD,rA,rB

Multiply High-Word Unsigned Instructions rD is loaded with the high-32 bits of the product (rA) × (rB). The

contents of rA and rB are interpreted as unsigned integers.

mulhwu Multiply High Word

mulhwu. Multiply High Word and Record

XER and CR0 are not updated.

CR0 is updated to reflect the result.

rD,rA,rB

394 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Instructions

Divide Instructions

Ta bl e 3 -2 8 shows the PowerPC integer-divide instructions. Only the low-32 bits of the

quotient are returned. The remainder is not supplied as a result of executing these instructions. For each type of instruction shown, the “Operation” column indicates the divide-operation performed. The column also shows, on an instruction-by-instruction basis, how the XER and CR registers are updated (if at all).

Table 3-28: Divide Instructions

Mnemonic Name Operation

Divide-Word Instructions

divw Divide Word

divw. Divide Word and Record

divwo Divide Word with Overflow Enabled

divwo. Divide Word with Overflow Enabled

and Record

rD is loaded with the low-32 bits of the 64-bit quotient (rA) ÷ (rB).

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

not updated.

Operand

Syntax

rD,rA,rB

Divide-Word Unsigned Instructions rD is loaded with the low-32 bits of the 64-bit quotient (rA) ÷ (rB).

The contents of rA and rB are interpreted as unsigned integers.

divwu Divide Word Unsigned

divwu. Divide Word Unsigned and Record

divwuo Divide Word Unsigned with Overflow

Enabled

divwuo. Divide Word Unsigned with Overflow

Enabled and Record

XER and CR0 are not updated.

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

rD,rA,rB

Logical Instructions

The logical instructions perform bit operations on the 32-bit operands. If an immediate value is specified as an operand, the processor either zero-extends or left-shifts it prior to performing the operation, depending on the instruction. If the instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic.

The logical instructions do not update any bits in the XER register.

In the operand syntax for logical instructions, the rA operand specifies a destination register rather than a source register. rS is used to specify one of the source registers.

AND and NAND Instructions

Ta bl e 3 -2 9 shows the PowerPC AND and NAND instructions. For each type of instruction

shown, the “Operation” column indicates the Boolean operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

March 2002 Release www.xilinx.com 395 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-29: AND and NAND Instructions

Chapter 3: User Programming Model

Mnemonic Name Operation

AND Instructions

and AND

and. AND and Record

AND-Immediate Instructions

andi. AND Immediate and Record

rA is loaded with the logical result (rS) AND (rB).

not updated.

CR0 is

CR0 is updated to reflect the result.

rA is loaded with the logical result (rS) AND UIMM.

CR0 is updated to reflect the result.

Operand

Syntax

rA,rS,rB

rA,rS,UIMM

AND Immediate-Shifted Instructions rA is loaded with the logical result (rS) AND (UIMM || 0x0000)

andis. AND Immediate Shifted and Record

CR0 is updated to reflect the result.

rA,rS,UIMM

AND with Complement Instructions rA is loaded with the logical result (rS) AND ¬(rB).

andc AND with Complement

andc. AND with Complement and Record

not updated.

CR0 is

CR0 is updated to reflect the result.

rA,rS,rB

NAND Instructions rA is loaded with the logical result ¬((rS) AND (rB)).

nand NAND

nand. NAND and Record

not updated.

CR0 is

CR0 is updated to reflect the result.

rA,rS,rB

OR and NOR Instructions

Ta bl e 3 -3 0 shows the PowerPC OR and NOR instructions. For each type of instruction

shown, the “Operation” column indicates the Boolean operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

Simplified mnemonics are provided for some common operations that use the OR and NOR instructions, such as move register and complement (not) register. See Other

Simplified Mnemonics, page 834 for more information.

Table 3-30: OR and NOR Instructions

Mnemonic Name Operation

NOR Instructions

nor NOR

nor. NOR and Record

OR Instructions

or OR

or. OR and Record

OR-Immediate Instructions

rA is loaded with the logical result ¬((rS) OR (rB)).

not updated.

CR0 is

CR0 is updated to reflect the result.

rA is loaded with the logical result (rS) OR (rB).

not updated.

CR0 is

CR0 is updated to reflect the result.

rA is loaded with the logical result (rS) OR UIMM.

Operand

Syntax

rA,rS,rB

ori OR Immediate

396 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

CR0 is

not updated.

rA,rS,UIMM

Integer Instructions

Table 3-30: OR and NOR Instructions (Continued)

Mnemonic Name Operation

OR Immediate-Shifted Instructions

oris OR Immediate Shifted

rA is loaded with the logical result (rS) OR (UIMM || 0x0000)

CR0 is

not updated.

OR with Complement Instructions rA is loaded with the logical result (rS) OR ¬(rB).

orc OR with Complement

orc. OR with Complement and Record

not updated.

CR0 is

CR0 is updated to reflect the result.

XOR and Equivalence Instructions

Ta bl e 3 -3 1 shows the PowerPC XOR and equivalence (XNOR) instructions. For each type of

instruction shown, the “Operation” column indicates the Boolean operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

Table 3-31: XOR and Equivalence Instructions

Mnemonic Name Operation

Equivalence Instructions

rA is loaded with the logical result ¬((rS) XOR (rB)).

Operand

Syntax

rA,rS,UIMM

rA,rS,rB

Operand

Syntax

eqv Equivalent

eqv. Equivalent and Record

XOR Instructions

xor XOR

xor. XOR and Record

XOR-Immediate Instructions

xori XOR Immediate

not updated.

CR0 is

CR0 is updated to reflect the result.

rA is loaded with the logical result (rS) XOR (rB).

not updated.

CR0 is

CR0 is updated to reflect the result.

rA is loaded with the logical result (rS) XOR UIMM.

not updated.

CR0 is

rA,rS,rB

rA,rS,UIMM

XOR Immediate-Shifted Instructions rA is loaded with the logical result (rS) XOR (UIMM || 0x0000)

xoris XOR Immediate Shifted

CR0 is

not updated.

rA,rS,UIMM

Sign-Extension Instructions

Ta bl e 3 -3 2 shows the sign-extension instructions. These instructions sign-extend the value

in the rS register and write the result in the rA register. For each type of instruction shown, the “Operation” column indicates the operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

March 2002 Release www.xilinx.com 397 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Table 3-32: Sign-Extension Instructions

Chapter 3: User Programming Model

Mnemonic Name Operation

Extend-Sign Byte Instructions

extsb Extend Sign Byte

extsb. Extend Sign Byte and Record

Extend-Sign Halfword Instructions

extsh Extend Sign Halfword

extsh. Extend Sign Halfword and Record

rA[24:31] is loaded with (rS[24:31]). The remaining bits rA[0:23] are each loaded with a copy of (rS[24]).

not updated.

CR0 is

CR0 is updated to reflect the result.

rA[16:31] is loaded with (rS[16:31]). The remaining bits rA[0:15] are each loaded with a copy of (rS[16]).

not updated.

CR0 is

CR0 is updated to reflect the result.

Count Leading-Zeros Instructions

Ta bl e 3 -3 3 shows the count leading-zeros instructions. These instructions count the number

of consecutive zero bits in the rS register starting at bit 0. The count result is written to the rA register. For each type of instruction shown, the “Operation” column indicates the operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

Table 3-33: Count Leading-Zeros Instructions

Operand

Syntax

rA,rS

Mnemonic Name Operation

Count Leading-Zeros Instructions

cntlzw Count Leading Zeros Word

cntlzw. Count Leading Zeros Word and

Record

rA is loaded with a count of leading zeros in rS.

not updated.

CR0 is

CR0 is updated to reflect the result. CR0[LT] is always cleared to 0.

Compare Instructions

The integer-compare instructions support algebraic and logical comparisons between operands in the GPRs and between GPRs and immediate values. Immediate values are signed in algebraic comparisons and unsigned in logical comparisons.

All compare instructions have four operands. The first operand, crfD, specifies the field in the CR register that is updated with the comparison result. The left-most three bits in the CR field are updated to reflect a less-than, greater-than, or equal comparison. The fourth (least-significant) bit is updated with a copy of XER[SO]. The crfD operand can be omitted if the comparison results are written to CR0. See CRn Fields (Compare Instructions),

page 362 for more information on the CR fields.

The second operand specifies the operand length. This is referred to the “L” bit in the compare-instruction encoding. When using the compare instructions on 32-bit PowerPC implementations like the PPC405, this bit must always be coded as 0. It cannot be omitted from the standard instruction syntax. Simplified mnemonics are provided that omit this operand. See Compare Instructions, page 828 for more information.

The last two operands specify the quantities to be compared (the contents of a register and a register or immediate value).

Operand

Syntax

rA,rS

398 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Instructions

Algebraic-Comparison Instructions

Ta bl e 3 -3 4 shows the PowerPC algebraic-comparison instructions. During comparison, both

operands are treated as signed integers. If a comparison is made with a signed-immediate value (SIMM), that value is sign-extended by the processor prior to performing the comparison.

Table 3-34: Algebraic-Comparison Instructions

Mnemonic Name Operation

cmp Compare crfD[LT,GT,EQ] are loaded with the result of

algebraically comparing (rA) with (rB). CR[SO] is loaded with a copy of XER[SO].

cmpi Compare Immediate crfD[LT,GT,EQ] are loaded with the result of

algebraically comparing (rA) with SIMM. CR[SO] is loaded with a copy of XER[SO].

Logical-Comparison Instructions

Ta bl e 3 -3 5 shows the PowerPC logical-comparison instructions. During comparison, both

operands are treated as unsigned integers. If a comparison is made with an unsignedimmediate value (UIMM), that value is zero extended by the processor prior to performing the comparison.

Table 3-35: Logical-Comparison Instructions

Mnemonic Name Operation

cmpl Compare Logical crfD[LT,GT,EQ] are loaded with the result of

logically comparing (rA) with (rB). CR[SO] is loaded with a copy of XER[SO].

Operand

Syntax

crfD,0,rA,rB

crfD,0,rA,SIMM

Operand

Syntax

crfD,0,rA,rB

cmpli Compare Logical Immediate crfD[LT,GT,EQ] are loaded with the result of

logically comparing (rA) with UIMM. CR[SO] is loaded with a copy of XER[SO].

Rotate Instructions

Rotate instructions operate on 32-bit data in the GPRs, returning the result in a second GPR. These instructions rotate data to the left—the direction of least-significant bit to mostsignificant bit. Bits rotated out of the most-significant bit (bit 0) are rotated into the leastsignificant bit (bit 31). Programmers can achieve apparent right rotation using these leftrotation instructions by specifying a rotation amount of 32-n, where n is the number of bits to rotate right.

If the rotate instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic. Rotate instructions do not update any bits in the XER register.

In the operand syntax for rotate instructions, the rA operand specifies the destination register rather than a source register. rS is used to specify the source register.

Simplified mnemonics using the rotate instructions are provided for easy coding of extraction, insertion, left or right justification, and other bit-manipulation operations. See

Rotate and Shift Instructions, page 829 for more information.

crfD,0,rA,UIMM

March 2002 Release www.xilinx.com 399 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Mask Generation

The rotate instructions write their results into the destination register under the control of a mask specified in the rotate-instruction encoding. The mask is used to write or insert a partial result into the destination register.

Rotate masks are 32-bits long. Two instruction-opcode fields are used to specify the mask: MB and ME. MB is a 5-bit field specifying the starting bit position of the mask and ME is a 5-bit field specifying the ending bit position of the mask. The mask consists of all 1’s from MB to ME inclusive and all 0’s elsewhere. If MB > ME, the string of 1’s wraps around from bit 31 to bit 0. In this case, 0’s are found from ME to MB exclusive. The generation of an all- zero mask is not possible.

The function of the MASK(MB,ME) generator is summarized as:

Figure 3-23 shows the generated mask for both cases.

Chapter 3: User Programming Model

if MB < ME then

mask[MB:ME] = 1’s

mask[all remaining bits] = 0’s

else

mask[MB:31] = ones mask[0:ME] = ones mask[all remaining bits] = 0’s

0MB ME 31

MB < ME

MB > ME

0 0 . . . 0 1 1 . . . 1 0 0 . . . 0

0ME MB 31

1 1 . . . 1 0 0 . . . 0 1 1 . . . 1

Figure 3-23: Rotate Mask Generation

Rotate Left then AND-with-Mask Instructions

Ta bl e 3 -3 6 shows the PowerPC rotate left then AND-with-mask instructions. For each type of

instruction shown, the “Operation” column indicates the rotate operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

Table 3-36: Rotate Left then AND-with-Mask Instructions

Mnemonic Name Operation

Rotate Left then AND-with-Mask Immediate Instructions

rA is loaded with the masked result of left-rotating (rS) the number of bits specified by SH. The mask is specified by operands MB and ME.

UG011_15_033101

Operand

Syntax

rlwinm Rotate Left Word Immediate then

AND with Mask

rlwinm. Rotate Left Word Immediate then

AND with Mask and Record

400 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

not updated.

CR0 is

CR0 is updated to reflect the result.

rA,rS,SH,MB,ME

Integer Instructions

Table 3-36: Rotate Left then AND-with-Mask Instructions (Continued)

Mnemonic Name Operation

Rotate Left then AND-with-Mask Instructions

rlwnm Rotate Left Word then AND with

Mask

rlwnm. Rotate Left Word then AND with

Mask and Record

rA is loaded with the masked result of left-rotating (rS) the number of bits specified by (rB). The mask is specified by operands MB and ME.

not updated.

CR0 is

CR0 is updated to reflect the result.

These instructions left rotate GPR contents and logically AND the result with the mask prior to writing it into the destination GPR. The destination register contains the rotated result in the unmasked bit positions (mask bits with 1’s), and 0’s in the masked bit positions (mask bits with 0’s). Rotation amounts are specified using an immediate field in the instruction (the SH opcode field) or using a value in a register.

Figure 3-24 shows an example of a rotate left then AND-with-mask immediate instruction.

In this example, the rotation amount is 16 bits as specified by the SH field in the instruction. The mask specifies an unmasked byte in bit positions 16:23 (MB=16, ME=23) and masks all other bit positions. The example shows the original contents of the destination register, rA, and the source register, rS. rS is left-rotated 16 bits and the result is written to rA after ANDing with the mask. This has the effect of extracting byte 0 from rS (rS[0:7]) and placing it in byte 2 of rA (rA[16:23]).

Operand

Syntax

rA,rS,rB,MB,ME

031

Rotate

Mask MB=16 ME=23

0xFF 0xEE 0xDD 0xCC

031

0x88 0x77 0x66 0x55

031

0x66 0x55 0x88 0x77

Rotate by SH=16 bits

0162331

0000_0000_0000_0000

031

0x00 0x00 0x88 0x00

1111_1111

0000_0000

UG011_16_033101

Figure 3-24: Rotate Left then AND-with-Mask Immediate Example

Rotate Left then Mask-Insert Instructions

Ta bl e 3 -3 6 shows the PowerPC rotate left then mask-insert instructions. For each type of

instruction shown, the “Operation” column indicates the rotate operation performed. The

March 2002 Release www.xilinx.com 401 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated.

Table 3-37: Rotate Left then Mask-Insert Instructions

Chapter 3: User Programming Model

Mnemonic Name Operation

Rotate Left then Mask-Insert Immediate Instructions

rlwimi Rotate Left Word Immediate then

Mask Insert

rlwimi. Rotate Left Word Immediate then

Mask Insert and Record

The masked result of left-rotating (rS) the number of bits specified by SH is inserted into rA. The mask is specified by operands MB and ME.

not updated.

CR0 is

CR0 is updated to reflect the result.

These instructions left rotate GPR contents and insert the results into the destination GPR under control of the mask. The destination register contains the rotated result in the unmasked bit positions (mask bits with 1’s) and the original contents of the destination register in the masked bit positions (mask bits with 0’s). Rotation amounts are specified using an immediate field in the instruction (the SH opcode field).

Figure 3-25 shows an example of a rotate left then mask-insert immediate instruction. In

this example, the rotation amount is 16 bits as specified by the SH field in the instruction. The mask specifies an unmasked byte in bit positions 16:23 (MB=16, ME=23) and masks all other bit positions. The example shows the original contents of the destination register, rA, and the source register, rS. rS is rotated 16 bits and the result is inserted into rA after ANDing with the mask. This has the effect of extracting byte 0 from rS (rS[0:7]) and inserting it into byte 2 of rA (rA[16:23]), leaving all remaining bytes in rA unmodified.

Operand

Syntax

rA,rS,SH,MB,ME

031

Rotate

Mask MB=16 ME=23

0xFF 0xEE 0xDD 0xCC

031

0x88 0x77 0x66 0x55

031

0x66 0x55 0x88 0x77

Rotate by SH=16 bits

0162331

0000_0000_0000_0000

031

0xFF 0xEE 0x88 0xCC

1111_1111

0000_0000

UG011_17_033101

Figure 3-25: Rotate Left then Mask-Insert Immediate Example

402 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Integer Instructions

Shift Instructions

Shift instructions operate on 32-bit data in the GPRs and return the result in a GPR. Both logical and algebraic shifts are provided:

• Logical left-shift instructions shift bits from the direction of least-significant bit to most- significant bit. Bits shifted out of bit 0 are lost. The vacated bit positions on the right are filled with zeros.

• Logical right-shift instructions shift bits from the direction of most-significant bit to least-significant bit. Bits shifted out of bit 31 are lost. The vacated bit positions on the left are filled with zeros.

• Algebraic right-shift instructions shift bits from the direction of most-significant bit to least-significant bit. Bits shifted out of bit 31 are lost. The vacated bit positions on the left are filled with a copy of the original bit 0 (the value prior to starting the shift).

If the shift instruction has the record (Rc) bit set to 1 in the instruction encoding, CR0 (CR[0:3]) is updated to reflect the result of the operation. A set Rc bit is indicated by the “.” suffix in the instruction mnemonic. Algebraic right-shift instructions update XER[CA] to reflect the result of the operation but the other shift instructions do not modify XER[CA]. XER[OV,SO] are not modified by any shift instructions.

In the operand syntax for shift instructions, the rA operand specifies the destination register rather than a source register. rS is used to specify the source register.

Simplified mnemonics using the rotate instructions are provided for coding of logical shiftleft immediate and logical shift-right immediate operations. See Rotate and Shift

Instructions, page 829 for more information.

Logical-Shift Instructions

Ta bl e 3 -3 8 shows the PowerPC logical-shift instructions. For each type of instruction shown,

the “Operation” column indicates the shift operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated. XER is not updated by these instructions.

Table 3-38: Logical-Shift Instructions

Mnemonic Name Operation

Shift-Left-Logical Instructions

slw Shift Left Word

slw. Shift Left Word and Record

Shift-Right-Logical Instructions

srw Shift Right Word

srw. Shift Right Word and Record

Figure 3-26 shows two examples of logical-shift operations. The top example shows a left

shift of seven bits, and the bottom example shows a right shift of seven bits. As is seen in these examples, bits shifted out of the register are lost and vacated bits are filled with zeros.

rA is loaded with the result of logically left-shifting (rS) the number of bits specified by (rB).

CR0 is not updated.

CR0 is updated to reflect the result.

rA is loaded with the result of logically right-shifting (rS) the number of bits specified by (rB).

CR0 is not updated.

CR0 is updated to reflect the result.

Operand

Syntax

rA,rS,rB

March 2002 Release www.xilinx.com 403 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

Chapter 3: User Programming Model

Left Shift

031

1000_011

1000_0111_0110_0101_0100_0011_0010_0001

031

1011_0010_1010_0001_1001_0000_1000_0000

031

1011_0010_1010_0001_1001_0000_1000_0000

Shift by 7 bits

Right Shift

031

Shift by 7 bits

1000_0111_0110_0101_0100_0011_0010_0001

031

0000_0001_0000_1110_1100_1010_1000_0110 010_0001

031

0000_0001_0000_1110_1100_1010_1000_0110

Figure 3-26: Logical-Shift Examples

Algebraic-Shift Instructions

Ta bl e 3 -3 9 shows the PowerPC algebraic-shift instructions. For each type of instruction

shown, the “Operation” column indicates the shift operation performed. The column also shows, on an instruction-by-instruction basis, whether the CR0 field is updated. XER[CA] is always updated by these instructions to reflect the result.

The shift-right-algebraic instructions can be followed by an addze instruction to implement a divide-by-2 information.

Table 3-39: Algebraic-Shift Instructions

Mnemonic Name Operation

Shift-Right-Algebraic Immediate Instructions

srawi Shift Right Algebraic Word Immediate

srawi. Shift Right Algebraic Word Immediate

and Record

operation. See Multiple-Precision Shifts, page 840, for more

rA is loaded with the result of algebraically right-shifting (rS) the number of bits specified by SH.

CR0 is not updated. XER[CA] is updated to reflect the result.

CR0 and XER[CA] are updated to reflect the result.

UG011_18_033101

Operand

Syntax

rA,rS,SH

404 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Multiply-Accumulate Instruction-Set Extensions

Table 3-39: Algebraic-Shift Instructions (Continued)

Mnemonic Name Operation

Shift-Right-Algebraic Instructions

sraw Shift Right Algebraic Word

sraw. Shift Right Algebraic Word and

Record

rA is loaded with the result of algebraically right-shifting (rS) the number of bits specified by (rB).

not updated. XER[CA] is updated to reflect

CR0 is the result.

CR0 and XER[CA] are updated to reflect the result.

Figure 3-27 shows an example of an algebraic-shift operation. In this example, a shift of

seven bits is performed. Bits shifted out of the least-significant register bit are lost and vacated bits on the left side are filled with a copy of the original bit 0 (prior to the shift). In this example, the original value of bit 0 is 0b1.

031

Shift by 7 bits

1000_0111_0110_0101_0100_0011_0010_0001

031

1111_1111_0000_1110_1100_1010_1000_0110 010_0001

031

1111_1111_0000_1110_1100_1010_1000_0110

Operand

Syntax

rA,rS,rB

Figure 3-27: Algebraic-Shift Example

Multiply-Accumulate Instruction-Set Extensions

The PPC405 supports an integer multiply-accumulate instruction-set extension that provides functions usable by certain computationally intensive applications, such as those that implement DSP algorithms. These instructions comply with the architectural requirements for auxiliary-processor units (APUs) defined by the PowerPC embedded-environment architecture and the PowerPC Book-E architecture. They are considered implementationdependent instructions and are not part of the PowerPC architecture, the PowerPC embedded-environment architecture, or the PowerPC Book-E architecture. Programs that use these instructions are not portable to all PowerPC implementations.

The multiply-accumulate instruction-set extensions include multiply-accumulate instructions, negative multiply-accumulate instructions, and multiply-halfword instructions.

Modulo and Saturating Arithmetic

The multiply-accumulate and negative multiply-accumulate instructions produce a 33-bit intermediate result. The method used to store this result in the 32-bit destination register depends on whether the instruction performs modulo arithmetic or saturating arithmetic.

With modulo-arithmetic instructions, the most-significant bit in the intermediate result is discarded and the low-32 bits of this result are stored in the destination register.

With saturating-arithmetic instructions, the low 32-bits of the intermediate result are stored in the destination register if the intermediate result does not overflow 32-bits. However, if the intermediate result overflows what is representable in 32-bits, the

UG011_19_033101

March 2002 Release www.xilinx.com 405 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

instruction loads the nearest representable value into the destination register. For the various instruction forms, these results are:

• Signed arithmetic—if the result exceeds 2 the destination register with 2

• Signed arithmetic—if the result is less than −2 the destination register with −2

• Unsigned arithmetic—if the result exceeds 2 loads the destination register with 2

Multiply-Accumulate Instructions

Multiply-Accumulate Cross-Halfword to Word Instructions

Ta bl e 3 -4 0 shows the PPC405 integer multiply-accumulate cross-halfword to word instructions.

These instructions take the lower halfword of the first source operand (rA[16:31]) and multiply it with the upper halfword of the second source operand (rB[0:15]), producing a 32-bit product. The product is signed or unsigned, depending on the instruction. This product is added to the value in the destination register, rD, producing a 33-bit intermediate result. Generally, rD is loaded with the lower-32 bits of the 33-bit intermediate result. However, if the instruction performs saturating arithmetic and the intermediate result overflows, rD is loaded with the nearest representable value (see

Modulo and Saturating Arithmetic, above).

For each type of instruction shown in Ta b le 3 - 40 , the “Operation” column indicates the multiply-accumulate operation performed. The column also shows, on an instruction-byinstruction basis, how the XER and CR registers are updated (if at all).

Chapter 3: User Programming Model

−1 (> 0x7FFF_FFFF), the instruction loads

−1.

(< 0x8000_0000), the instruction loads

−1 (> 0xFFFF_FFFF), the instruction

−1.

Table 3-40: Multiply-Accumulate Cross-Halfword to Word Instructions

Mnemonic Name Operation

Multiply-Accumulate Cross-Halfword to Word Modulo Signed Instructions

macchw Multiply Accumulate Cross Halfword

to Word Modulo Signed

macchw. Multiply Accumulate Cross Halfword

to Word Modulo Signed and Record

macchwo Multiply Accumulate Cross Halfword

to Word Modulo Signed with Overflow Enabled

macchwo. Multiply Accumulate Cross Halfword

to Word Modulo Signed with Overflow Enabled and Record

rD is added to the signed product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. The low-32 bits of this result are stored in rD.

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

not updated.

Operand

Syntax

rD,rA,rB

406 www.xilinx.com March 2002 Release

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Multiply-Accumulate Instruction-Set Extensions

Table 3-40: Multiply-Accumulate Cross-Halfword to Word Instructions (Continued)

Mnemonic Name Operation

Multiply-Accumulate Cross-Halfword to Word Saturate Signed Instructions

macchws Multiply Accumulate Cross Halfword

to Word Saturate Signed

macchws. Multiply Accumulate Cross Halfword

to Word Saturate Signed and Record

macchwso Multiply Accumulate Cross Halfword

to Word Saturate Signed with Overflow Enabled

macchwso. Multiply Accumulate Cross Halfword

to Word Saturate Signed with Overflow Enabled and Record

Multiply-Accumulate Cross-Halfword to Word Saturate Unsigned Instructions

macchwsu Multiply Accumulate Cross Halfword

to Word Saturate Unsigned

rD is added to the signed product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. If the result does not overflow, the low-32 bits of this result are stored in rD. Otherwise, the nearestrepresentable value is stored in rD.

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

rD is added to the unsigned product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. If the result does not overflow, the low-32 bits of this result are stored in rD. Otherwise, the nearestrepresentable value is stored in rD.

XER and CR0 are

not updated.

Operand

Syntax

rD,rA,rB

macchwsu. Multiply Accumulate Cross Halfword

to Word Saturate Unsigned and Record

macchwsuo Multiply Accumulate Cross Halfword

to Word Saturate Unsigned with Overflow Enabled

macchwsuo. Multiply Accumulate Cross Halfword

to Word Saturate Unsigned with Overflow Enabled and Record

Multiply-Accumulate Cross-Halfword to Word Modulo Unsigned Instructions

macchwu Multiply Accumulate Cross Halfword

to Word Modulo Unsigned

macchwu. Multiply Accumulate Cross Halfword

to Word Modulo Unsigned and Record

macchwuo Multiply Accumulate Cross Halfword

to Word Modulo Unsigned with Overflow Enabled

macchwuo. Multiply Accumulate Cross Halfword

to Word Modulo Unsigned with Overflow Enabled and Record

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

rD is added to the unsigned product (rA[16:31]) × (rB[0:15]), producing a 33-bit result. The low-32 bits of this result are stored in rD.

XER and CR0 are

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

not updated.

rD,rA,rB

Figure 3-28 shows the operation of the integer multiply-accumulate cross-halfword to

word instructions.

March 2002 Release www.xilinx.com 407 Virtex-II Pro™ Platform FPGA Documentation 1-800-255-7778

03116

Chapter 3: User Programming Model

03115

031

032

Intermediate Result

031

UG011_20_033101

Figure 3-28: Multiply-Accumulate Cross-Halfword to Word Operation

Multiply-Accumulate High-Halfword to Word Instructions

Ta bl e 3 -4 1 shows the PPC405 multiply-accumulate high-halfword to word instructions. These

instructions multiply the high halfword of both source operands, rA[0:15] and rB[0:15], producing a 32-bit product. The product is signed or unsigned, depending on the instruction. This product is added to the value in the destination register, rD, producing a 33-bit intermediate result. Generally, rD is loaded with the lower-32 bits of the 33-bit intermediate result. However, if the instruction performs saturating arithmetic and the intermediate result overflows, rD is loaded with the nearest representable value (see

Modulo and Saturating Arithmetic, page 405).

For each type of instruction shown in Ta b le 3 - 41 , the “Operation” column indicates the multiply-accumulate operation performed. The column also shows, on an instruction-byinstruction basis, how the XER and CR registers are updated (if at all).

Table 3-41: Multiply-Accumulate High-Halfword to Word Instructions

Mnemonic Name Operation

Multiply-Accumulate High-Halfword to Word Modulo Signed Instructions

machhw Multiply Accumulate High Halfword

to Word Modulo Signed

rD is added to the signed product (rA[0:15]) × (rB[0:15]), producing a 33-bit result. The low-32 bits of this result are stored in rD.

XER and CR0 are not updated.

Operand

Syntax

rD,rA,rB

machhw. Multiply Accumulate High Halfword

to Word Modulo Signed and Record

machhwo Multiply Accumulate High Halfword

to Word Modulo Signed with Overflow Enabled

machhwo. Multiply Accumulate High Halfword

to Word Modulo Signed with Overflow Enabled and Record

408 www.xilinx.com March 2002 Release

CR0 is updated to reflect the result.

XER[OV,SO] are updated to reflect the result.

XER[OV,SO] and CR0 are updated to reflect the result.

1-800-255-7778 Virtex-II Pro™ Platform FPGA Documentation

Xilinx PPC405 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

About This Book

Document Organization

Document Conventions

General Conventions

Instruction Fields

Pseudocode Conventions

Registers

Operator Precedence

Terms

Additional Reading

Introduction to the PPC405

PowerPC Architecture Overview

PowerPC Architecture Levels

PowerPC Embedded-Environment Architecture

PPC405 Features

PowerPC Book-E Architecture

Privilege Modes

Address Translation Modes

Addressing Modes

Data Types

Register Set Summary

PPC405 Organization

Operational Concepts

Execution Model

Synchronization Operations

Context Synchronization

Execution Synchronization

Processor Operating Modes

Storage Synchronization

Privileged Mode

User Mode

Memory Organization

Effective-Address Calculation

Memory Management

Physical Memory

Virtual Memory

Addressing Modes

Operand Conventions

Byte Ordering

Operand Alignment

Instruction Conventions

Instruction Forms

Instruction Classes

PowerPC Book-E Instruction Classes

User Programming Model

User Registers

Special-Purpose Registers (SPRs)

General-Purpose Registers (GPRs)

Condition Register (CR)

Fixed-Point Exception Register (XER)

Link Register (LR)

Count Register (CTR)

User-SPR General-Purpose Register

SPR General-Purpose Registers

Time-Base Registers

Exception Summary

Branch and Flow-Control Instructions

Conditional Branch Control

Branch Instructions

Branch Prediction

Branch-Target Address Calculation

Condition-Register Logical Instructions

System Call

System Trap

Integer Load and Store Instructions

Operand-Address Calculation

Load Instructions

Store Instructions

Load and Store with Byte-Reverse Instructions

Load and Store Multiple Instructions

Load and Store String Instructions

Integer Instructions

Arithmetic Instructions

Logical Instructions

Compare Instructions

Rotate Instructions