1994 MIPS Technologies, Inc. All Rights Reserved.
RESTRICTED RIGHTS LEGEND
Use, duplication, or disclosure of the technical data contained in this
document by the Government is subject to restrictions as set forth in
subdivision (c) (1) (ii) of the Rights in Technical Data and Computer
Software clause at DFARS 52.227-7013 and/or in similar or successor
clauses in the FAR, or in the DOD or NASA FAR Supplement.
Unpublished rights reserved under the Copyright Laws of the United
States. Contractor/manufacturer is MIPS Technologies, Inc., 2011 N.
Shoreline Blvd., Mountain View, CA 94039-7311.
RISCompiler, RISC/os, R2000, R6000, R4000, and R4400 are trademarks of
MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of
MIPS Technologies, Inc.
IBM 370 is a registered trademark of International Business Machines.
VAX is a registered trademark of Digital Equipment Corporation.
iAPX is a registered trademark of Intel Corporation.
MC68000 is a registered trademark of Motorola Inc.
UNIX is a registered trademark in the United States and other countries,
licensed exclusively through X/Open Company, Ltd.
MIPS Technologies, Inc.
2011 North Shoreline
Mountain View, California 94039-7311
Acknowledgments for the First Edition
First of all, special thanks go to Duk Chun for his patient help in supplying and
verifying the content of this manual; that this manual is technically correct is, in a
very large part, directly attributable to him.
Thanks also to the following people for supplying portions of this book: ShabbirLatif, for, among other things, the exception handler flow charts, the description
of the output buffer edge-control logic, and the interrupts; once again, Duk Chun,
for his paper on R4000 processor synchronization support; Paul Ries, for
confirming the accuracy of sections describing the memory management and the
caches; John Mashey, for verifying the R4000 processor actually does employ the
64-bit architecture; Dave Ditzel, for raising the issue in the first place; and Mike
Gupta, for substantiating various aspects of the errata. Finally, thanks to Ed
Reidenbach for supplying a large portion of the parity and ECC sections of this
manual, and Michael Ngo for checking their accuracy.
Thanks also to the following folks for their technical assistance: Andy Keane,
Keith Garrett, Viggy Mokkarala, Charles Price, Ali Moayedian, George Hsieh,
Peter Fu, Stephen Przybylski, Michael Woodacre, and Earl Killian. Also to be
thanked are the people at fvn@world.std.com: Bill Tuthill, Barry Shein, Bob
Devine, and Alan Marr, for helping place RISC in a pecuniary perspective. Also,
thanks to the following people at the mystery_train@swim2birds news group: toma,
dan_sears, jharris@garnet, tut@cairo (again), and elvis@dalkey(mateo_b). Their night-
for-day netversations, fueled by caffeine, concerning the viability of the
cyberpsykinetic compute-core model helped form an important basis of this book.
On the editorial front, thanks once again to Ms. Robin Cowan, of the Consortiumof Editorial Arts for her labors in editing this manual. Thanks to Evelyn Spire for
slaving over that bottomless black well we refer to as an “Index.” Thanks also,
once again, to Karen Gettman, and Lisa Iarkowski at Prentice-Hall for their help.
On the artistic side, thanks to Jeanne Simonian, of the Creative department here
at Silicon Graphics, for the book cover design; and thanks to Pam Flanders for
providing MarCom tactical support.
Have we missed anyone? If so, here is where we apologize for doing so.
Joe Heinrich
April 1, 1993
Mt. View, California
MIPS R4000 Microprocessor User's Manualiii
MIPS R4000 Microprocessor User's Manualiv
Acknowledgments for the Second Edition
Thanks go to Shabbir Latif, from whose errata the major part of this second
edition is derived. Thanks also to Charlie Price for, among other things, making
available his revision of the ISA.
On the production side, thanks to Kay Maitz, Beth Fraker, Molly Castor, LynneaHumphries, and Claudia Lohnes for their assistance at the center of the hurricane.
MIPS R4000 Microprocessor User's Manualv
Joe Heinrich
joeh@sgi.com
April 1, 1994
Mt. View, California
MIPS R4000 Microprocessor User's Manualvi
Preface
This book describes the MIPS R4000 and R4400 family of RISC
microprocessors (also referred to in this book as processor).
Overview of the Contents
Chapter 1 is a discussion (including the historical context) of RISC
development in general, and the R4000 microprocessor in particular.
Chapter 2 is an overview of the CPU instruction set.
Chapter 3 describes the operation of the R4000 instruction execution
pipeline, including the basic operation of the pipeline and
interruptions that are caused by interlocks and exceptions.
Chapter 4 describes the memory management system including
address mapping and address spaces, virtual memory, the translation
lookaside buffer (TLB), and the System Control Processor (CP0).
Chapter 5 describes the exception processing resources of R4000
processor. It includes an overview of the CPU exception handling
process and describes the format and use of each CPU exception
handling register.
MIPS R4000 Microprocessor User's Manualvii
Preface
Chapter 6 describes the Floating-Point Unit (FPU), a coprocessor for
the CPU that extends the CPU instruction set to perform floatingpoint arithmetic operations. This chapter lists the FPU registers and
instructions.
Chapter 7 describes the FPU exception processing.
Chapter 8 describes the signals that pass between the R4000 processor
and other components in a system. The signals discussed include the
System interface, the Clock/Control interface, the Secondary Cache
interface, the Interrupt interface, the Initialization interface, and the
JTAG interface.
Chapter 9 describes in more detail the Initialization interface, which
includes the boot modes for the processor, as well as system resets.
Chapter 10 describes the clocks used in the R4000 processor, as well as
the processor status reporting mechanism.
Chapter 11 discusses cache memory, including the operation of the
primary and secondary caches, and cache coherency in a
multiprocessor system.
Chapter 12 describes the System interface, which allows the processor
access to external resources such as memory and input/output (I/O).
It also allows an external agent access to the internal resources of the
processor, such as the secondary cache.
Chapter 13 describes the Secondary Cache interface, including read
and write cycle timing. This chapter also discusses the interface buses
and signals.
Chapter 14 describes the Joint Test Action Group (JTAG) interface.
The JTAG boundary scan mechanism tests the interconnections
between the R4000 processor, the printed circuit board to which it is
mounted, and other components on the board.
Chapter 15 describes the single nonmaskable processor interrupt,
along with the six hardware and two software processor interrupts.
Chapter 16 describes the error checking and correcting (ECC)
mechanisms of the R4000 processor.
viiiMIPS R4000 Microprocessor User's Manual
A Note on Style
Preface
Appendix A describes the R4000 CPU instructions, in both 32- and 64bit modes. The instruction list is given in alphabetical order.
Appendix B describes the R4000 FPU instructions, listed
alphabetically.
Appendix C describes sub-block ordering, a nonsequential method of
retrieving data.
Appendix D describes the output buffer and the ∆i/∆t control
mechanism.
Appendix E describes the passive components that make up the
phase-locked loop (PLL).
Appendix F describes Coprocessor 0 hazards.
Appendix G describes the R4000 pinout.
A brief note on some of the stylistic conventions used in this book: bits,
fields, and registers of interest from a software perspective are
italicized (such as Config register); signal names of more importance
from a hardware point of view are rendered in bold (such as Reset*).
A range of bits uses a colon as a separator; for instance, (15:0)
represents the 16-bit range that runs from bit 0, inclusive, through bit
15. (In some places an ellipsis may used in place of the colon for
visibility: (15...0).)
MIPS R4000 Microprocessor User's Manualix
Preface
xMIPS R4000 Microprocessor User's Manual
Preface to the Second Edition
Changes From the First Edition
The second edition of this book incorporates certain low-level changes
and technical additions, but retains a substantive identity with the
original version.
Changes from the first edition are indicated by left-margin vertical
rules.
Getting MIPS Documents On-Line
MIPS documents (including an electronic version of the errata) are
available on-line, through the file transport protocol (FTP). To
retrieve them, follow the steps below. The text you are to type is
shown in Courier Bold font; the computer’s responses are in
shown in Courier Regular font.
1.First, place yourself in the directory on your system within which
you want to store the retrieved files. Do this by typing:
cd <directory_you_want_file_to_be_in>
2.Access the MIPS document server, sgigate, through FTP by
typing:
ftp sgigate.sgi.com
3.The server tells you when you are connected for FTP by
responding:
Connected to sgigate.sgi.com.
MIPS R4000 Microprocessor User's Manualxi
Preface
4.Next (after some announcements) the server asks you to log in by
requesting a name and then a password.
Name (sgigate.sgi.com:<login_name>):
5.Login by typing anonymous for your name and your electronic
mail address for your password.
Name (sgigate.sgi.com:<login_name>): anonymous
331 Guest login ok, type your name as
password.
Password: your_email_address
6.The system indicates you have successfully logged in by
supplying an FTP prompt:
ftp>
7.Go to the pub/doc directory by typing:
ftp> cd pub/doc
8.You can take a look at the contents of the doc directory by listing
them:
ftp> ls
9.You will find several R4000-related subdirectories, such as R4200,
R4400, and R4600. When you find the subdirectory you want, cd
into that subdirectory and retrieve the file you want by typing:
get <filename>
This copies the file from sgigate back to your system.
10. When you have retrieved the files you want, exit from ftp by
typing:
ftp> quit
11. If the file was encoded for transmission, you must decode it, after
retrieval, by typing:
uudecode <filename>
12. If the file was compressed for transmission, you must uncompress
it, after retrieval, by typing:
uncompress <filename>
13. If you tarred the file, type:
tar xvof <filename>
xiiMIPS R4000 Microprocessor User's Manual
Table of Contents
Preface
Overview of the Contents...................................................................................vii
A Note on Style ....................................................................................................ix
Preface to the Second Edition
Changes From the First Edition.........................................................................xi
Getting MIPS Documents On-Line.................................................................... xi
MIPS R4000 Microprocessor User's Manualxiii
Table of Contents
1
Introduction
Benefits of RISC Design...........................................................................................2
Pinout of R4000PC....................................................................................................G-2
Pinout of R4000MC/SC Package Pinout ..............................................................G-5
Index
MIPS R4000 Microprocessor User's Manualxxix
Table of Contents
xxxMIPS R4000 Microprocessor User's Manual
Introduction
Historically, the evolution of computer architectures has been dominated
by families of increasingly complex central processors. Under market
pressures to preserve existing software, complex instruction set computer
(CISC) architectures evolved by the accretion of microcode and
increasingly intricate instruction sets. This intricacy in architecture was
itself driven by the need to support high-level languages and operating
systems, as advances in semiconductor technology made it possible to
fabricate integrated circuits of greater and greater complexity. And at that
time it seemed self-evident to designers that architectures should continue
to become more and more complex as technological advances made such
VLSI designs possible.
1
MIPS R4000 Microprocessor User's Manual1
Chapter 1
In recent years, however, reduced instruction set computer (RISC)
architectures are implementing a different model for the interaction
between hardware, firmware, and software. RISC concepts emerged from
a statistical analysis of the way in which software actually uses processor
resources: dynamic measurement of system kernels and object modules
generated by optimizing compilers showed that the simplest instructions
were used most often—even in the code for CISC machines.
Correspondingly, complex instructions often went unused because their
single way of performing a complex operation rarely matched the precise
needs of a high-level language.
RISC architecture eliminates microcode routines and turns low-level
control of the machine over to software. The RISC approach is not new,
but its application has become more prevalent in recent years, due to the
increasing use of high-level languages, the development of compilers that
are able to optimize at the microcode level, and dramatic advances in
semiconductor memory and packaging. It is now feasible to replace
relatively slow microcode ROM with faster RAM that is organized as an
instruction cache. Machine control resides in this instruction cache that is,
in effect, customized on-the-fly: the instruction stream generated by
system- and compiler-generated code provides a precise fit between the
requirements of high-level software and the low-level capabilities of the
hardware.
Reducing or simplifying the instruction set was not the primary goal of
RISC architecture; it is a pleasant side effect of techniques used to gain the
highest performance possible from available technology. Thus, the term
reduced instruction set computers is a bit misleading; it is the push for
performance that really drives and shapes RISC designs.
1.1 Benefits of RISC Design
Some benefits that result from RISC design techniques are not directly
attributable to the drive to increase performance, but are a result of the
basic reduction in complexity—a simpler design allows both chip-area
resources and human resources to be applied to features that enhance
performance. Some of these benefits are described below.
2MIPS R4000 Microprocessor User's Manual
Shorter Design Cycle
The architectures of RISC processors can be implemented more quickly
than their CISC counterparts: it is easier to fabricate and debug a
streamlined, simplified architecture with no microcode than a complex
architecture that uses microcode. CISC processors have such a long
design cycle that they may not be completely debugged by the time they
are technologically obsolete. The shorter time required to design and
implement RISC processors allows them to make use of the best available
technologies.
Effective Utilization of Chip Area
The simplicity of RISC processors also frees scarce chip geography for
performance-critical resources such as larger register files, translation
lookaside buffers (TLBs), coprocessors, and fast multiply and divide units.
Such resources help RISC processors obtain an even greater performance
edge.
User (Programmer) Benefits
Simplicity in architecture also helps the user by providing a uniform
instruction set that is easier to use. This allows a closer correlation
between the instruction count and the cycle count, making it easier to
measure code optimization activities.
Introduction
Advanced Semiconductor Technologies
Each new VLSI technology is introduced with tight limits on the number
of transistors that fit on each chip. Since the simplicity of a RISC processor
allows it to be implemented in fewer transistors than its CISC counterpart,
the first computers capable of exploiting these new VLSI technologies
have been using and will continue to use RISC architecture.
MIPS R4000 Microprocessor User's Manual3
Chapter 1
Optimizing Compilers
RISC architecture is designed so that the compilers, not assembly
languages, have the optimal working environment. RISC philosophy
assumes that high-level language programming is used, which contradicts
the older CISC philosophy that assumes assembly language programming
is of primary importance.
The trend toward high-level language instructions has led to the
development of more efficient compilers to convert high-level language
instructions to machine code. Primary measures of compiler efficiency are
the compactness of its generated code and the shortness of its execution
time.
During the development of more efficient compilers, analysis of
instruction streams revealed that the greatest amount of time was spent
executing simple instructions and performing load and store operations,
while the more complex instructions were used less frequently. It was also
learned that compilers produce code that is often a narrow subset of the
processor instruction set architecture (ISA). A compiler works more
efficiently with instructions that perform simple, well-defined operations
and generate minimal side-effects. Compilers do not use complex
instructions and features; the more complex, powerful instructions are
either too difficult for the compiler to employ or those instructions do not
precisely fit high-level language requirements.
Thus, a natural match exists between RISC architectures and efficient,
optimizing compilers. This match makes it easier for compilers to
generate the most effective sequences of machine instructions to
accomplish tasks defined by the high-level language.
4MIPS R4000 Microprocessor User's Manual
MIPS RISCompiler Language Suite
Some compiler products are derived from disparate sources and
consequently do not fit together very well. Instead of treating each
language’s compiler as a separate entity, the MIPS RISCompiler
language suite shares common elements across the entire family of
compilers. In this way the language suite offers both tight integration and
broad language coverage.
The MIPS language suite supports:
•industry-standard front ends for the following languages (C,
FORTRAN, Pascal)
•a common intermediate language, offering an efficient way to
add language front ends over time
•all of the back end optimization and code generation
•the same object format and calling conventions
•mixed-language programs
•debugging of programs written in all languages, including
mixtures
This language suite approach yields high-quality compilers for all
languages, since common elements make up the majority of each of the
language products. In addition, this approach provides the ability to
develop and execute multi-language programs, promoting flexibility in
development, avoiding the necessity of recoding proven program
segments, and protecting the user’s software investment. The common
back-end also exports optimizing and code-generating improvements
immediately throughout the language suite, thereby reducing
maintenance.
Introduction
TM
MIPS R4000 Microprocessor User's Manual5
Chapter 1
1.2 Compatibility
The R4000 processor provides complete application software
compatibility with the MIPS R2000, R3000, and R6000 processors.
Although the MIPS processor architecture has evolved in response to a
compromise between software and hardware resources in the computer
system, the R4000 processor implements the MIPS ISA for user-mode
programs. This guarantees that user programs conforming to the ISA
execute on any MIPS hardware implementation.
1.3 Processor General Features
This section briefly describes the programming model, the memory
management unit (MMU), and the caches in the R4000 processor. A more
detailed description is given in succeeding sections.
•Full 32-bit and 64-bit Operations. The R4000 processor
contains 32 general purpose 64-bit registers. (When operating
as a 32-bit processor, the general purpose registers are 32-bits
wide.) All instructions are 32 bits wide.
•Efficient Pipeline. The superpipeline design of the processor
results in an execution rate approaching one instruction per
cycle. Pipeline stalls and exceptional events are handled
precisely and efficiently.
•MMU. The R4000 processor uses an on-chip TLB that provides
rapid virtual-to-physical address translation.
•Cache Control. The R4000 primary instruction and data caches
reside on-chip, and can each hold 8 Kbytes. In the R4400
processor, the primary caches can each hold 16 Kbytes.
Architecturally, each primary cache can be increased to hold up
to 32 Kbytes. An off-chip secondary cache (R4000SC and
R4000MC processors only) can hold from 128 Kbytes to 4
Mbytes. All processor cache control logic, including the
secondary cache control logic, is on-chip.
•Floating-Point Unit. The FPU is located on-chip and
implements the ANSI/IEEE standard 754-1985.
6MIPS R4000 Microprocessor User's Manual
1.4 R4000 Processor Configurations
The R4000 processor† is packaged in three different configurations. All
processors are implemented in sub-1-micron CMOS technology.
•R4000PC is designed for cost-sensitive systems such as
inexpensive desktop systems and high-end embedded
controllers. It is packaged in a 179-pin PGA, and does not
support a secondary cache.
•R4000SC is designed for high-performance uniprocessor
systems. It is packaged in a 447-pin LGA/PGA and includes
integrated control for large secondary caches built from
standard SRAMs.
•R4000MC is designed for large cache-coherent multiprocessor
systems. It is packaged in a 447-pin LGA/PGA and, in addition
to the features of R4000SC, includes support for a wide variety
of bus designs and cache-coherency mechanisms.
Table 1-1 lists the features in each of the three configurations (X indicates
the feature is present). R4400 processor enhancements are described in the
section following.
Introduction
1.5 R4400 Processor Enhancements
In addition to the features contained in the R4000 processor, the R4400
processor has the following enhancements:
•fully functional Status pins (described in Chapter 10)
•Master/Checker mode (described in Chapter 16)
•larger primary caches (described in Processor General Featur es,
in this chapter)
•uncached store buffer (described in Chapter 3)
•divide-by-6 and divide-by-8 modes (described in Chapter 10)
•cache error bit, EW, added to the CacheErr register (described in
Chapter 5).
† Features of the R4400 processor that differ from the R4000 pr ocessor ar e noted throughout
this book; for instance, R4400 processor enhancements are listed in the next section.
Otherwise, references to the R4000 pr ocessor may be taken to include the R4400 pr ocessor.
MIPS R4000 Microprocessor User's Manual7
Chapter 1
Table 1-1 R4000 Features
FeatureR4000PCR4000SCR4000MC
Primary Cache States
ValidXX X
SharedX
Clean ExclusiveXX
Dirty ExclusiveXX X
Secondary Cache InterfaceXX
Secondary Cache States
ValidXX X
SharedX
Dirty SharedX
Clean ExclusiveXX
Dirty ExclusiveXX X
MultiprocessingX
Cache Coherency Attributes
UncachedXX X
NoncoherentXX X
SharableX
UpdateX
ExclusiveX
Packages
PGA (179-pin)X
PGA (447-pin)XX
8MIPS R4000 Microprocessor User's Manual
1.6 R4000 Processor
This section describes the following:
•the 64-bit architecture of the R4000 processor
•the superpipeline design of the CPU instruction pipeline
(described in detail in Chapter 3)
•an overview of the System interface (described in detail in
Chapter 12)
•an overview of the CPU registers (detailed in Chapters 4 and 5)
and CPU instruction set (detailed in Chapter 2 and Appendix
A)
•data formats and byte ordering
•the System Control Coprocessor, CP0, and the floating-point
unit, CP1
•caches and memory, including a description of primary and
secondary caches, the memory management unit (MMU), the
translation lookaside buffer (TLB), and the Secondary Cache
interface (described in more detail in Chapters 4 and 11). The
Secondary Cache interface is detailed in Chapter 13.
Introduction
64-bit Architecture
The natural mode of operation for the R4000 processor is as a 64-bit
microprocessor; however, 32-bit applications maintain compatibility even
when the processor operates as a 64-bit processor.
The R4000 processor provides the following:
•64-bit on-chip floating-point unit (FPU)
•64-bit integer arithmetic logic unit (ALU)
•64-bit integer registers
•64-bit virtual address space
•64-bit system bus
Figure 1-1 is a block diagram of the R4000 processor internals.
MIPS R4000 Microprocessor User's Manual9
Chapter 1
64-bit System Bus
System
Control
CP0
Exception/Control
Registers
Memory Management
Registers
Translation
Lookaside
Buffers
S-cache
Control
Data CacheP-cache
CPU
CPU Registers
ALU
Load Aligner/Store Driver
Integer Multiplier/Divider
Address Unit
PC Incrementer
Pipeline Control
Control
FPU
FPU Registers
Pipeline Bypass
FP Multiplier
FP Divider
FP Add, Convert
Square Root
Instruction
Cache
Figure 1-1 R4000 Processor Internal Block Diagram
10MIPS R4000 Microprocessor User's Manual
Superpipeline Architecture
The R4000 processor exploits instruction parallelism by using an eightstage superpipeline which places no restrictions on the instruction issued.
Under normal circumstances, two instructions are issued each cycle.
The internal pipeline of the R4000 processor operates at twice the
frequency of the master clock, as discussed in Chapter 3. The processor
achieves high throughput by pipelining cache accesses, shortening
register access times, implementing virtual-indexed primary caches, and
allowing the latency of functional units to span more than one pipeline
clock cycles.
System Interface
The R4000 processor supports a 64-bit System interface that can construct
uniprocessor systems with a direct DRAM interface—with or without a
secondary cache—or cache-coherent multiprocessor systems. The System
interface includes:
•a 64-bit multiplexed address and data bus
•8 check bits
•a 9-bit parity-protected command bus
•8 handshake signals
Introduction
The interface is capable of transferring data between the processor and
memory at a peak rate of 400 Mbytes/second, when running at 50 MHz.
MIPS R4000 Microprocessor User's Manual11
Chapter 1
CPU Register Overview
The central processing unit (CPU) provides the following registers:
•32 general purpose registers
•a Program Counter (PC) register
•2 registers that hold the results of integer multiply and divide
operations (HI and LO).
Floating-point unit (FPU) registers are described in Chapter 6.
CPU registers can be either 32 bits or 64 bits wide, depending on the R4000
processor mode of operation.
Figure 1-2 shows the CPU registers.
General Purpose Registers
63
310
32
r0
r1
r2
•
•
•
•
r29
r30
r31
Multiply and Divide Registers
63
310
32
HI
63
310
32
LO
Program Counter
63
310
32
PC
Register width depends on mode of operation: 32-bit or 64-bit
12MIPS R4000 Microprocessor User's Manual
Figure 1-2 CPU Registers
Introduction
Two of the CPU general purpose registers have assigned functions:
•r0 is hardwired to a value of zero, and can be used as the target
register for any instruction whose result is to be discarded. r0
can also be used as a source when a zero value is needed.
•r31 is the link register used by Jump and Link instructions. It
should not be used by other instructions.
The CPU has three special purpose registers:
•PC — Program Counter register
•HI — Multiply and Divide register higher result
•LO — Multiply and Divide register lower result
The two Multiply and Divide registers (HI, LO) store:
•the product of integer multiply operations, or
•the quotient (in LO) and remainder (in HI) of integer divide
operations
The R4000 processor has no Program Status Word (PSW) register as such;
this is covered by the Status and Cause registers incorporated within the
System Control Coprocessor (CP0). CP0 registers are described later in
this chapter.
MIPS R4000 Microprocessor User's Manual13
Chapter 1
CPU Instruction Set Overview
Each CPU instruction is 32 bits long. As shown in Figure 1-3, there are
three instruction formats:
•immediate (I-type)
•jump (J-type)
•register (R-type)
15162021252631
I-Type (Immediate)
J-Type (Jump)
R-Type (Register)
Figure 1-3 CPU Instruction Formats
Each format contains a number of different instructions, which are
described further in this chapter. Fields of the instruction formats are
described in Chapter 2.
Instruction decoding is greatly simplified by limiting the number of
formats to these three. This limitation means that the more complicated
(and less frequently used) operations and addressing modes can be
synthesized by the compiler, using sequences of these same simple
instructions.
oprsrtimmediate
optarget
rs
11 10
rt
rdsa
65
functop
0
0252631
015162021252631
14MIPS R4000 Microprocessor User's Manual
Introduction
The instruction set can be further divided into the following groupings:
•Load and Store instructions move data between memory and
general registers. They are all immediate (I-type) instructions,
since the only addressing mode supported is base register plus
16-bit, signed immediate offset.
•Computational instructions perform arithmetic, logical, shift,
multiply, and divide operations on values in registers. They
include register (R-type, in which both the operands and the
result are stored in registers) and immediate (I-type, in which
one operand is a 16-bit immediate value) formats.
•Jump and Branch instructions change the control flow of a
program. Jumps are always made to a paged, absolute address
formed by combining a 26-bit target address with the highorder bits of the Program Counter (J-type format) or register
address (R-type format). Branches have 16-bit offsets relative
to the program counter (I-type). Jump And Link instructions
save their return address in register 31.
•Coprocessor instructions perform operations in the
coprocessors. Coprocessor load and store instructions are
I-type.
•Coprocessor 0 (system coprocessor) instructions perform
operations on CP0 registers to control the memory
management and exception handling facilities of the processor.
These are listed in Table 1-18.
•Special instructions perform system calls and breakpoint
operations. These instructions are always R-type.
•Exception instructions cause a branch to the general exceptionhandling vector based upon the result of a comparison. These
instructions occur in both R-type (both the operands and the
result are registers) and I-type (one operand is a 16-bit
immediate value) formats.
Chapter 2 provides a more detailed summary and Appendix A gives a
complete description of each instruction.
MIPS R4000 Microprocessor User's Manual15
Chapter 1
Tables 1-2 through 1-17 list CPU instructions common to MIPS R-Series
processors, along with those instructions that are extensions to the
instruction set architecture. The extensions result in code space
reductions, multiprocessor support, and improved performance in
operating system kernel code sequences—for instance, in situations where
run-time bounds-checking is frequently performed. Table 1-18 lists CP0
instructions.
Table 1-2 CPU Instruction Set: Load and Store Instructions
OpCodeDescription
LBLoad Byte
LBULoad Byte Unsigned
LHLoad Halfword
LHULoad Halfword Unsigned
LWLoad Word
LWLLoad Word Left
LWRLoad Word Right
SBStore Byte
SHStore Halfword
SWStore Word
SWLStore Word Left
SWRStore Word Right
Table 1-3 CPU Instruction Set: Arithmetic Instructions (ALU Immediate)
OpCodeDescription
ADDIAdd Immediate
ADDIUAdd Immediate Unsigned
SLTISet on Less Than Immediate
SLTIUSet on Less Than Immediate Unsigned
ANDIAND Immediate
ORIOR Immediate
XORIExclusive OR Immediate
LUILoad Upper Immediate
16MIPS R4000 Microprocessor User's Manual
Table 1-4 CPU Instruction Set: Arithmetic (3-Operand, R-Type)
OpCodeDescription
ADDAdd
ADDUAdd Unsigned
SUBSubtract
SUBUSubtract Unsigned
SLTSet on Less Than
SLTUSet on Less Than Unsigned
ANDAND
OROR
XORExclusive OR
NORNOR
Table 1-5 CPU Instruction Set: Multiply and Divide Instructions
OpCodeDescription
MULTMultiply
MULTUMultiply Unsigned
DIVDivide
DIVUDivide Unsigned
MFHIMove From HI
MTHIMove To HI
MFLOMove From LO
MTLOMove To LO
Introduction
MIPS R4000 Microprocessor User's Manual17
Chapter 1
Table 1-6 CPU Instruction Set: Jump and Branch Instructions
OpCodeDescription
JJump
JALJump And Link
JRJump Register
JALRJump And Link Register
BEQBranch on Equal
BNEBranch on Not Equal
BLEZBranch on Less Than or Equal to Zero
BGTZBranch on Greater Than Zero
BLTZBranch on Less Than Zero
BGEZBranch on Greater Than or Equal to Zero
BLTZALBranch on Less Than Zero And Link
BGEZALBranch on Greater Than or Equal to Zero And Link
Table 1-7 CPU Instruction Set: Shift Instructions
OpCodeDescription
SLLShift Left Logical
SRLShift Right Logical
SRAShift Right Arithmetic
SLLVShift Left Logical Variable
SRLVShift Right Logical Variable
SRAVShift Right Arithmetic Variable
18MIPS R4000 Microprocessor User's Manual
Table 1-8 CPU Instruction Set: Coprocessor Instructions
OpCodeDescription
LWCzLoad Word to Coprocessor z
SWCzStore Word from Coprocessor z
MTCzMove To Coprocessor z
MFCzMove From Coprocessor z
CTCzMove Control to Coprocessor z
CFCzMove Control From Coprocessor z
COPzCoprocessor Operation z
BCzTBranch on Coprocessor z True
BCzFBranch on Coprocessor z False
Table 1-9 CPU Instruction Set: Special Instructions
OpCodeDescription
SYSCALLSystem Call
BREAKBreak
Introduction
MIPS R4000 Microprocessor User's Manual19
Chapter 1
Table 1-10 Extensions to the ISA: Load and Store Instructions
OpCodeDescription
LDLoad Doubleword
LDLLoad Doubleword Left
LDRLoad Doubleword Right
LLLoad Linked
LLDLoad Linked Doubleword
LWULoad Word Unsigned
SCStore Conditional
SCDStore Conditional Doubleword
SDStore Doubleword
SDLStore Doubleword Left
SDRStore Doubleword Right
SYNCSync
Table 1-11 Extensions to the ISA: Arithmetic Instructions (ALU Immediate)
Table 1-13 Extensions to the ISA: Branch Instructions
OpCodeDescription
BEQLBranch on Equal Likely
BNELBranch on Not Equal Likely
BLEZLBranch on Less Than or Equal to Zero Likely
BGTZLBranch on Greater Than Zero Likely
BLTZLBranch on Less Than Zero Likely
BGEZLBranch on Greater Than or Equal to Zero Likely
BLTZALLBranch on Less Than Zero And Link Likely
BGEZALL
Branch on Greater Than or Equal to Zero And Link
Likely
BCzTLBranch on Coprocessor z True Likely
BCzFLBranch on Coprocessor z False Likely
Table 1-14 Extensions to the ISA: Arithmetic Instructions (3-operand, R-type)
Table 1-15 Extensions to the ISA: Shift Instructions
OpCodeDescription
DSLLDoubleword Shift Left Logical
DSRLDoubleword Shift Right Logical
DSRADoubleword Shift Right Arithmetic
DSLLVDoubleword Shift Left Logical Variable
DSRLVDoubleword Shift Right Logical Variable
DSRAVDoubleword Shift Right Arithmetic Variable
DSLL32Doubleword Shift Left Logical + 32
DSRL32Doubleword Shift Right Logical + 32
DSRA32Doubleword Shift Right Arithmetic + 32
Table 1-16 Extensions to the ISA: Exception Instructions
OpCodeDescription
TGETrap if Greater Than or Equal
TGEUTrap if Greater Than or Equal Unsigned
TLTTrap if Less Than
TLTUTrap if Less Than Unsigned
TEQTrap if Equal
TNETrap if Not Equal
TGEITrap if Greater Than or Equal Immediate
TGEIU
Trap if Greater Than or Equal Immediate
Unsigned
TLTITrap if Less Than Immediate
TLTIUTrap if Less Than Immediate Unsigned
TEQITrap if Equal Immediate
TNEITrap if Not Equal Immediate
22MIPS R4000 Microprocessor User's Manual
Table 1-17 Extensions to the ISA: Coprocessor Instructions
OpCodeDescription
DMFCzDoubleword Move From Coprocessor z
DMTCzDoubleword Move To Coprocessor z
LDCzLoad Double Coprocessor z
SDCzStore Double Coprocessor z
Table 1-18 CP0 Instructions
OpCodeDescription
DMFC0Doubleword Move From CP0
DMTC0Doubleword Move To CP0
MTC0Move to CP0
MFC0Move from CP0
TLBRRead Indexed TLB Entry
TLBWIWrite Indexed TLB Entry
TLBWRWrite Random TLB Entry
TLBPProbe TLB for Matching Entry
CACHECache Operation
ERETException Return
Introduction
MIPS R4000 Microprocessor User's Manual23
Chapter 1
Data Formats and Addressing
The R4000 processor uses four data formats: a 64-bit doubleword, a 32-bit
word, a 16-bit halfword, and an 8-bit byte. Byte ordering within each of
the larger data formats—halfword, word, doubleword—can be
configured in either big-endian or little-endian order. Endianness refers
to the location of byte 0 within the multi-byte data structure. Figures 1-4
and 1-5 show the ordering of bytes within words and the ordering of
words within multiple-word structures for the big-endian and littleendian conventions.
When the R4000 processor is configured as a big-endian system, byte 0 is
the most-significant (leftmost) byte, thereby providing compatibility with
MC 68000 and IBM 370 conventions. Figure 1-4 shows this
configuration.
Higher
Address
Lower
Address
Word
Address
12
8
4
0
3124 2316158 70
12131514
891110
4576
0132
Bit #
Figure 1-4 Big-Endian Byte Ordering
When configured as a little-endian system, byte 0 is always the leastsignificant (rightmost) byte, which is compatible with iAPX x86 and DEC
VAX conventions. Figure 1-5 shows this configuration.
Higher
Address
Lower
Address
Word
Address
12
8
4
0
3124 2316158 70
15141213
111089
7645
3201
Bit #
Figure 1-5 Little-Endian Byte Ordering
24MIPS R4000 Microprocessor User's Manual
Introduction
In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit
designations are always little-endian (although no instructions explicitly
designate bit positions within words).
Figures 1-6 and 1-7 show little-endian and big-endian byte ordering in
doublewords.
Most-significant byte
6356 55 48 47 40 3932
Bit #
Byte #
76
Figure 1-6 Little-Endian Data in a Doubleword
Most-significant byte
Bit #
Byte #
013
5
Halfword
2
Least-significant byte
Word
3124 2316 158 70
4
3201
Byte
70123456
Bit #
Bits in a Byte
Least-significant byte
Word
3124 2316 158 706356 55 48 47 40 3932
4576
MIPS R4000 Microprocessor User's Manual25
Halfword
Bit #
Byte
Bits in a Byte
Figure 1-7 Big-Endian Data in a Doubleword
07654321
Chapter 1
The CPU uses byte addressing for halfword, word, and doubleword
accesses with the following alignment constraints:
•Halfword accesses must be aligned on an even byte boundary
(0, 2, 4...).
•Word accesses must be aligned on a byte boundary divisible by
four (0, 4, 8...).
•Doubleword accesses must be aligned on a byte boundary
divisible by eight (0, 8, 16...).
The following special instructions load and store words that are not
aligned on 4-byte (word) or 8-word (doubleword) boundaries:
LWLLWRSWLSWR
LDLLDRSDLSDR
These instructions are used in pairs to provide addressing of misaligned
words. Addressing misaligned data incurs one additional instruction
cycle over that required for addressing aligned data.
Figures 1-8 and 1-9 show the access of a misaligned word that has byte
address 3.
Higher
Address
3124 2316158 70
45 6
Bit #
3
Lower
Address
Figure 1-8 Big-Endian Misaligned Word Addressing
Higher
Address
3124 2316 158 70
3
Lower
Address
Bit #
645
Figure 1-9 Little-Endian Misaligned Word Addressing
26MIPS R4000 Microprocessor User's Manual
Coprocessors (CP0-CP2)
The MIPS ISA defines three coprocessors (designated CP0 through CP2):
•Coprocessor 0 (CP0) is incorporated on the CPU chip and
supports the virtual memory system and exception handling.
CP0 is also referred to as the System Control Coprocessor.
•Coprocessor 1 (CP1) is reserved for the on-chip, floating-point
coprocessor, the FPU.
•Coprocessor 2 (CP2) is reserved for future definition by MIPS.
CP0 and CP1 are described in the sections that follow.
System Control Coprocessor, CP0
CP0 translates virtual addresses into physical addresses and manages
exceptions and transitions between kernel, supervisor, and user states.
CP0 also controls the cache subsystem, as well as providing diagnostic
control and error recovery facilities.
The CP0 registers shown in Figure 1-10 and described in Table 1-19
manipulate the memory management and exception handling capabilities
of the CPU.
Introduction
MIPS R4000 Microprocessor User's Manual27
Chapter 1
Register NameReg. #Register NameReg. #
Index
Random
EntryLo0
EntryLo1
Context
PageMask
Wired
BadVAddr
Count
EntryHi
Compare
SR
Cause
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Config
LLAddr
WatchLo
WatchHi
XContext
ECC
CacheErr
TagLo
TagHi
16
17
18
19
20
21
22
23
24
25
26
27
28
29
EPC
PRId15
14
ErrorEPC
30
31
Exception ProcessingMemory ManagementReserved
Figure 1-10 R4000 CP0 Registers
28MIPS R4000 Microprocessor User's Manual
Introduction
Table 1-19 System Control Coprocessor (CP0) Register Definitions
NumberRegisterDescription
0IndexProgrammable pointer into TLB array
1RandomPseudorandom pointer into TLB array(read only)
2EntryLo0Low half of TLB entry for even virtual address (VPN)
3EntryLo1Low half of TLB entry for odd virtual address (VPN)
4Context
Pointer to kernel virtual page table entry (PTE) in 32-bit
addressing mode
5PageMaskTLB Page Mask
6WiredNumber of wired TLB entries
7—Reserved
8BadVAddrBad virtual address
9CountTimer Count
10EntryHiHigh half of TLB entry
11CompareTimer Compare
12SRStatus register
13CauseCause of last exception
14EPCException Program Counter
15PRIdProcessor Revision Identifier
16ConfigConfiguration register
17LLAddrLoad Linked Address
18WatchLoMemory reference trap address low bits
19WatchHiMemory reference trap address high bits
20XContextPointer to kernel virtual PTE table in 64-bit addressing mode
21–25—Reserved
26ECC
Secondary-cache error checking and correcting (ECC) and
Primary parity
27CacheErrCache Error and Status register
28TagLoCache Tag register
29TagHiCache Tag register
30ErrorEPCError Exception Program Counter
31—Reserved
MIPS R4000 Microprocessor User's Manual29
Chapter 1
Floating-Point Unit (FPU), CP1
The MIPS floating-point unit (FPU) is designated CP1; the FPU extends
the CPU instruction set to perform arithmetic operations on floating-point
values. The FPU, with associated system software, fully conforms to the
requirements of ANSI/IEEE Standard 754–1985, IEEE Standard for BinaryFloating-Point Arithmetic.
The FPU features include:
•Full 64-bit Operation. The FPU can contain either 16 or 32
64-bit registers to hold single-precision or double-precision
values. The FPU also includes a 32-bit Status/Control register
that provides access to all IEEE-Standard exception handling
capabilities.
•Load and Store Instruction Set. Like the CPU, the FPU uses a
load- and store-based instruction set. Floating-point operations
are started in a single cycle and their execution overlaps other
fixed-point or floating-point operations.
•Tightly-coupled Coprocessor Interface. The FPU is on the
CPU chip, and appears to the programmer as a simple
extension of the CPU (accessed as CP1). Together, the CPU and
FPU form a tightly-coupled unit with a seamless integration of
floating-point and fixed-point instruction sets. Since each unit
receives and executes instructions in parallel, some floatingpoint instructions can execute at the same rate (two
instructions per cycle) as fixed-point instructions.
30MIPS R4000 Microprocessor User's Manual
Memory Management System (MMU)
The R4000 processor has a 36-bit physical addressing range of 64 Gbytes.
However, since it is rare for systems to implement a physical memory
space this large, the CPU provides a logical expansion of memory space by
translating addresses composed in the large virtual address space into
available physical memory addresses. The R4000 processor supports the
following two addressing modes:
•32-bit mode, in which the virtual address space is divided into
2 Gbytes per user process and 2 Gbytes for the kernel.
•64-bit mode, in which the virtual address is expanded to
1 Tbyte (240 bytes) of user virtual address space.
A detailed description of these address spaces is given in Chapter 4.
The Translation Lookaside Buffer (TLB)
Virtual memory mapping is assisted by a translation lookaside buffer,
which caches virtual-to-physical address translations. This fullyassociative, on-chip TLB contains 48 entries, each of which maps a pair of
variable-sized pages ranging from 4 Kbytes to 16 Mbytes, in multiples of
four.
Introduction
Instruction TLB
The R4000 processor has a two-entry instruction TLB (ITLB) which assists
in instruction address translation. The ITLB is completely invisible to
software and exists only to increase performance.
Joint TLB
An address translation value is tagged with the most-significant bits of its
virtual address (the number of these bits depends upon the size of the
page) and a per-process identifier. If there is no matching entry in the TLB,
an exception is taken and software refills the on-chip TLB from a page
table resident in memory; this TLB is referred to as the joint TLB (JTLB)
because it contains both data and instructions jointly. The JTLB entry to
be rewritten is selected at random.
MIPS R4000 Microprocessor User's Manual31
Chapter 1
Operating Modes
The R4000 processor has three operating modes:
•User mode
•Supervisor mode
•Kernel mode
The manner in which memory addresses are translated ormapped depends
on the operating mode of the CPU; this is described in Chapter 4.
Cache Memory Hierarchy
To achieve a high performance in uniprocessor and multiprocessor
systems, the R4000 processor supports a two-level cache memory
hierarchy that increases memory access bandwidth and reduces the
latency of load and store instructions. This hierarchy consists of on-chip
instruction and data caches, together with an optional external secondary
cache that varies in size from 128 Kbytes to 4 Mbytes.
The secondary cache is assumed to consist of one bank of industrystandard static RAM (SRAM) with output enables, arranged as a
quadword (128-bit) data array, with a 25-bit-wide tag array. Check fields
are added to both data and tag arrays to improve data integrity.
The secondary cache can be configured as a joint cache, or split into
separate instruction and data caches. The maximum secondary cache size
is 4 Mbytes; the minimum secondary cache size is 128 Kbytes for a joint
cache, or 256 Kbytes total for split instruction/data caches. The secondary
cache is direct mapped, and is addressed with the lower part of the
physical address.
Primary and secondary caches are described in more detail in Chapter 11.
32MIPS R4000 Microprocessor User's Manual
Primary Caches
The R4000 processor incorporates separate on-chip primary instruction
and data caches to fill the high-performance pipeline. Each cache has its
own 64-bit data path, and each can be accessed in parallel.
The R4000 processor primary caches hold from 8 Kbytes to 32 Kbytes; the
R4400 processor primary caches are fixed at 16 Kbytes.
Cache accesses can occur up to twice each cycle. This provides the integer
and floating-point units with an aggregate bandwidth of 1.6 Gbytes per
second at a MasterClock frequency of 50 MHz.
Secondary Cache Interface
The R4000SC (secondary cache) and R4000MC (multiprocessor) versions
of the processor allow connection to an optional secondary cache. These
processors provide all of the secondary cache control circuitry, including
error checking and correcting (ECC) protection, on chip.
The Secondary Cache interface includes:
•a 128-bit data bus
•a 25-bit tag bus
•an 18-bit address bus
•SRAM control signals
Introduction
The 128-bit-wide data bus is designed to minimize cache miss penalties,
and allow the use of standard low-cost SRAM in secondary cache.
MIPS R4000 Microprocessor User's Manual33
Chapter 1
34MIPS R4000 Microprocessor User's Manual
CPU Instruction Set Summary
This chapter is an overview of the central processing unit (CPU)
instruction set; refer to Appendix A for detailed descriptions of individual
CPU instructions.
2
An overview of the floating-point unit (FPU) instruction set is in
Chapter 6; refer to Appendix B for detailed descriptions of individual FPU
instructions.
MIPS R4000 Microprocessor User's Manual35
Chapter 2
2.1 CPU Instruction Formats
Each CPU instruction consists of a single 32-bit word, aligned on a word
boundary. There are three instruction formats—immediate (I-type), jump
(J-type), and register (R-type)—as shown in Figure 2-1. The use of a small
number of instruction formats simplifies instruction decoding, allowing
the compiler to synthesize more complicated (and less frequently used)
operations and addressing modes from these three formats as needed.
rt
immediate16-bit immediate value, branch displacement or
target26-bit jump target address
rd5-bit destination register specifier
sa5-bit shift amount
funct6-bit function field
In the MIPS architecture, coprocessor instructions are implementationdependent; see Appendix A for details of individual Coprocessor 0
instructions.
36MIPS R4000 Microprocessor User's Manual
5-bit target (source/destination) register or branch
condition
address displacement
Figure 2-1 CPU Instruction Formats
Load and Store Instructions
Load and store are immediate (I-type) instructions that move data
between memory and the general registers. The only addressing mode
that load and store instructions directly support is base register plus 16-bitsigned immediate offset.
Scheduling a Load Delay Slot
A load instruction that does not allow its result to be used by the
instruction immediately following is called a delayed load instruction. The
instruction slot immediately following this delayed load instruction is
referred to as the load delay slot.
In the R4000 processor, the instruction immediately following a load
instruction can use the contents of the loaded register, however in such
cases hardware interlocks insert additional real cycles. Consequently,
scheduling load delay slots can be desirable, both for performance and
R-Series processor compatibility. However, the scheduling of load delay
slots is not absolutely required.
Defining Access Types
CPU Instruction Set Summary
Access type indicates the size of an R4000 processor data item to be loaded
or stored, set by the load or store instruction opcode. Access types are
defined in Appendix A.
Regardless of access type or byte ordering (endianness), the address given
specifies the low-order byte in the addressed field. For a big-endian
configuration, the low-order byte is the most-significant byte; for a littleendian configuration, the low-order byte is the least-significant byte.
†
The access type, together with the three low-order bits of the address,
define the bytes accessed within the addressed doubleword (shown in
Table 2-1). Only the combinations shown in Table 2-1 are permissible;
other combinations cause address error exceptions. See Appendix A for
individual descriptions of CPU load and store instructions.
† Data formats are described in Chapter 1.
MIPS R4000 Microprocessor User's Manual37
Chapter 2
Table 2-1 Byte Access within a Doubleword
Access Type
Mnemonic
(Value)
Low Order
Address
Bits
Big endian
(63-----------31------------0)
Byte
Bytes Accessed
Little endian
(63-----------31------------0)
Byte210
Doubleword (7) 0 0 0 0123456776543210
0 0 0 01234566543210
Septibyte (6)
00112345677654321
0 0 0 012345543210
Sextibyte (5)
010234567765432
0 0 0 0123443210
Quintibyte (4)
0113456776543
0 0 0 01233210
Word (3)
10045677654
000012210
001123321
Triplebyte (2)
100456654
101567765
0000110
0102332
Halfword (1)
1004554
1106776
00000
00111
01022
01133
Byte (0)
10044
10155
11066
11177
38MIPS R4000 Microprocessor User's Manual
Computational Instructions
Computational instructions can be either in register (R-type) format, in
which both operands are registers, or in immediate (I-type) format, in
which one operand is a 16-bit immediate.
Computational instructions perform the following operations on register
values:
•arithmetic
•logical
•shift
•multiply
•divide
These operations fit in the following four categories of computational
instructions:
•ALU Immediate instructions
•three-Operand Register-Type instructions
•shift instructions
•multiply and divide instructions
CPU Instruction Set Summary
64-bit Operations
When operating in 64-bit mode, 32-bit operands must be sign extended.
The result of operations that use incorrect sign-extended 32-bit values is
unpredictable.
MIPS R4000 Microprocessor User's Manual39
Chapter 2
Cycle Timing for Multiply and Divide Instructions
Any multiply instruction in the integer pipeline is transferred to the
multiplier as remaining instructions continue through the pipeline; the
product of the multiply instruction is saved in the HI and LO registers.
If the multiply instruction is followed by an MFHI or MFLO before the
product is available, the pipeline interlocks until this product does become
available.
Table 2-2 gives the execution time for integer multiply and divide
operations. The “Total Cycles” column gives the total number of cycles
required to execute the instruction. The “Overlap” column gives the
number of cycles that overlap other CPU operations; that is, the number of
cycles required between the present instruction and a subsequent MFHI or
MFLO without incurring an interlock. If this value is zero, the operation
is not performed in parallel with any other CPU operation.
For more information about computational instructions, refer to the
individual instruction as described in Appendix A.
40MIPS R4000 Microprocessor User's Manual
Jump and Branch Instructions
Jump and branch instructions change the control flow of a program. All
jump and branch instructions occur with a delay of one instruction: that is,
the instruction immediately following the jump or branch (this is known
as the instruction in the delay slot) always executes while the target
instruction is being fetched from storage.
Overview of Jump Instructions
Subroutine calls in high-level languages are usually implemented with
Jump or Jump and Link instructions, both of which are J-type instructions.
In J-type format, the 26-bit target address shifts left 2 bits and combines
with the high-order 4 bits of the current program counter to form an
absolute address.
Returns, dispatches, and large cross-page jumps are usually implemented
with the Jump Register or Jump and Link Register instructions. Both are
R-type instructions that take the 32-bit or 64-bit byte address contained in
one of the general purpose registers.
For more information about jump instructions, refer to the individual
instruction as described in Appendix A.
CPU Instruction Set Summary
†
Overview of Branch Instructions
All branch instruction target addresses are computed by adding the
address of the instruction in the delay slot to the 16-bit offset (shifted left
2 bits and sign-extended to 32 bits). All branches occur with a delay of one
instruction.
If a conditional branch likely is not taken, the instruction in the delay slot
is nullified.
For more information about branch instructions, refer to the individual
instruction as described in Appendix A.
† Taken branches have a 3 cycle penalty in this implementation. See Chapter 3 for more
information.
MIPS R4000 Microprocessor User's Manual41
Chapter 2
Special Instructions
Special instructions allow the software to initiate traps; they are always
R-type. For more information about special instructions, refer to the
individual instruction as described in Appendix A.
Exception Instructions
Exception instructions are extensions to the MIPS ISA. For more
information about exception instructions, refer to the individual
instruction as described in Appendix A.
Coprocessor Instructions
Coprocessor instructions perform operations in their respective
coprocessors. Coprocessor loads and stores are I-type, and coprocessor
computational instructions have coprocessor-dependent formats.
Individual coprocessor instructions are described in Appendices A (for
CP0) and B (for the FPU, CP1).
CP0 instructions perform operations specifically on the System Control
Coprocessor registers to manipulate the memory management and
exception handling facilities of the processor. Appendix A details CP0
instructions.
42MIPS R4000 Microprocessor User's Manual
The CPU Pipeline
This chapter describes the basic operation of the CPU pipeline, which
includes descriptions of the delay instructions (instructions that follow a
branch or load instruction in the pipeline), interruptions to the pipeline
flow caused by interlocks and exceptions, and R4400 implementation of an
uncached store buffer.
3
The FPU pipeline is described in Chapter 6.
MIPS R4000 Microprocessor User's Manual43
Chapter 3
3.1 CPU Pipeline Operation
The CPU has an eight-stage instruction pipeline; each stage takes one
PCycle (one cycle of PClock, which runs at twice the frequency of
MasterClock). Thus, the execution of each instruction takes at least eight
PCycles (four MasterClock cycles). An instruction can take longer—for
example, if the required data is not in the cache, the data must be retrieved
from main memory.
Once the pipeline has been filled, eight instructions are executed
simultaneously. Figure 3-1 shows the eight stages of the instruction
pipeline; the next section describes the pipeline stages.
MasterClock
Cycle
PCycle
(8-Deep)
IFISRFEXDFDSTCWB
IFISRFEXDFDSTCWB
IFISRFEXDFDSTCWB
IFISRFEXDFDSTCWB
IFISRFEXDFDSTCWB
IFISRFEXDFDSTCWB
IFISRFEXDFDSTCWB
Current
CPU
Figure 3-1 Instruction Pipeline Stages
IFISRFEXDFDSTCWB
Cycle
44MIPS R4000 Microprocessor User's Manual
3.2 CPU Pipeline Stages
This section describes each of the eight pipeline stages:
•IF - Instruction Fetch, First Half
•IS - Instruction Fetch, Second Half
•RF - Register Fetch
•EX - Execution
•DF - Data Fetch, First Half
•DS - Data Fetch, Second Half
•TC - Tag Check
•WB - Write Back
IF - Instruction Fetch, First Half
During the IF stage, the following occurs:
•Branch logic selects an instruction address and the instruction
cache fetch begins.
During the IS stage, the instruction cache fetch and the virtual-to-physical
address translation are completed.
RF - Register Fetch
During the RF stage, the following occurs:
•The instruction decoder (IDEC) decodes the instruction and
checks for interlock conditions.
•The instruction cache tag is checked against the page frame
number obtained from the ITLB.
•Any required operands are fetched from the register file.
MIPS R4000 Microprocessor User's Manual45
Chapter 3
EX - Execution
During the EX stage, one of the following occurs:
•The arithmetic logic unit (ALU) performs the arithmetic or
logical operation for register-to-register instructions.
•The ALU calculates the data virtual address for load and store
instructions.
•The ALU determines whether the branch condition is true and
calculates the virtual branch target address for branch
instructions.
DF - Data Fetch, First Half
During the DF stage, one of the following occurs:
•The data cache fetch and the data virtual-to-physical
translation begins for load and store instructions.
•The branch instruction address translation and translation
lookaside buffer (TLB)† update begins for branch instructions.
•No operations are performed during the DF, DS, and TC stages
for register-to-register instructions.
DS - Data Fetch, Second Half
During the DS stage, one of the following occurs:
•The data cache fetch and data virtual-to-physical translation
are completed for load and store instructions. The Shifter
aligns data to its word or doubleword boundary.
•The branch instruction address translation and TLB update are
completed for branch instructions.
TC - Tag Check
For load and store instructions, the cache performs the tag check during
the TC stage. The physical address from the TLB is checked against the
cache tag to determine if there is a hit or a miss.
† The TLB is described in Chapter 4.
46MIPS R4000 Microprocessor User's Manual
Clock
The CPU Pipeline
WB - Write Back
For register-to-register instructions, the instruction result is written back
to the register file during the WB stage. Branch instructions perform no
operation during this stage.
Figure 3-2 shows the activities occurring during each ALU pipeline stage,
for load, store, and branch instructions.
The CPU pipeline has a branch delay of three cycles and a load delay of
two cycles. The three-cycle branch delay is a result of the branch
comparison logic operating during the EX pipeline stage of the branch,
producing an instruction address that is available in the IF stage, four
instructions later.
Figure 3-3 illustrates the branch delay.
branch
target
IFISRFEXDF DSTC WB
IFISRFEX DFDSTCWB
Branch Delay
3.4 Load Delay
The completion of a load at the end of the DS pipeline stage produces an
operand that is available for the EX pipeline stage of the third subsequent
instruction.
Figure 3-4 shows the load delay of two pipeline stages.
load
IFISRFEXDF DSTC WB
IFISRF EXDFDSTC WB
IFISRF EXDFDSTC WB
IFISRFEXDFDSTC WB
Figure 3-3 CPU Pipeline Branch Delay
three branch
delay
instructions
IFISRFEX DFDSTCWB
f(load)
Load
Delay
48MIPS R4000 Microprocessor User's Manual
IFISRF EXDFDSTC WB
IFISRF EXDFDSTC WB
Figure 3-4 CPU Pipeline Load Delay
two load
delay
instructions
3.5 Interlock and Exception Handling
Smooth pipeline flow is interrupted when cache misses or exceptions
occur, or when data dependencies are detected. Interruptions handled
using hardware, such as cache misses, are referred to as interlocks, while
those that are handled using software are called exceptions.
As shown in Figure 3-5, all interlock and exception conditions are
collectively referred to as faults.
Faults
The CPU Pipeline
Software
Exceptions
Stalls
Hardware
Interlocks
Slips
Figure 3-5 Interlocks, Exceptions, and Faults
There are two types of interlocks:
•stalls, which are resolved by halting the pipeline
•slips, which require one part of the pipeline to advance while
another part of the pipeline is held static
At each cycle, exception and interlock conditions are checked for all active
instructions.
Because each exception or interlock condition corresponds to a particular
pipeline stage, a condition can be traced back to the particular instruction
in the exception/interlock stage, as shown in Figure 3-6. For instance, an
Illegal Instruction (II) exception is raised in the execution (EX) stage.
Tables 3-1 and 3-2 describe the pipeline interlocks and exceptions listed in
Figure 3-6.
MIPS R4000 Microprocessor User's Manual49
Chapter 3
Clock
PCycle
1212121212121212
Pipeline Stage
State
IFISRFEXDFDSTCWB
ITMICMCPBEDCM
Stall*
*MP stalls can occur at any stage; they are not associated with any instruction or pipe stage
SXTWA
STI
IFISRFEXDFDSTCWB
LDI
MultB
Slip
DivB
MDOne
ShSlip
FCBsy
IFISRFEXDFDSTCWB
ITLBIntrOVFDTLBDBE
IBEFPETLBMod Watch
IVACohExTrapDVACoh
Exceptions
IIDECCErr
BPNMI
SCReset
CUn
IECCErr
Figure 3-6 Correspondence of Pipeline Stage to Interlock Condition
ITMInstruction TLB Miss
ICMInstruction Cache Miss
CPBECoprocessor Possible Exception
SXTInteger Sign Extend
STIStore Interlock
DCMData Cache Miss
WAWatch Address Exception
LDILoad Interlock
MultBMultiply Unit Busy
DivBDivide Unit Busy
MDOneMult/Div One Cycle Slip
ShSlipVar Shift or Shift > 32 bits
FCBsyFP Busy
Exception Conditions
Table 3-2 Pipeline Interlocks
When an exception condition occurs, the relevant instruction and all those
that follow it in the pipeline are cancelled. Accordingly, any stall
conditions and any later exception conditions that may have referenced
this instruction are inhibited; there is no benefit in servicing stalls for a
cancelled instruction.
After instruction cancellation, a new instruction stream begins, starting
execution at a predefined exception vector. System Control Coprocessor
registers are loaded with information that identifies the type of exception
and auxiliary information such as the virtual address at which translation
exceptions occur.
52MIPS R4000 Microprocessor User's Manual
Stall Conditions
Often, a stall condition is only detected after parts of the pipeline have
advanced using incorrect data; this is called apipeline overrun. When a stall
condition is detected, all eight instructions—each different stage of the
pipeline—are frozen at once. In this stalled state, no pipeline stages can
advance until the interlock condition is resolved.
Once the interlock is removed, the restart sequence begins two cycles
before the pipeline resumes execution. The restart sequence reverses the
pipeline overrun by inserting the correct information into the pipeline.
Slip Conditions
When a slip condition is detected, pipeline stages that must advance to
resolve the dependency continue to be retired (completed), while
dependent stages are held until the required data is available.
External Stalls
External stall is another class of interlocks. An external stall originates
outside the processor and is not referenced to a particular pipeline stage.
This interlock is not affected by exceptions.
The CPU Pipeline
Interlock and Exception Timing
To prevent interlock and exception handling from adversely affecting the
processor cycle time, the R4000 processor uses both logic and circuit
pipeline techniques to reduce critical timing paths. Interlock and
exception handling have the following effects on the pipeline:
•In some cases, the processor pipeline must be backed up
(reversed and started over again from a prior stage) to recover
from interlocks.
•In some cases, interlocks are serviced for instructions that will
be aborted, due to an exception.
These two cases are discussed below.
MIPS R4000 Microprocessor User's Manual53
Chapter 3
Backing Up the Pipeline
An example of pipeline back-up occurs in a data cache miss, in which the
late detection of the miss causes a subsequent instruction to compute an
incorrect result.
When this occurs, not only must the cache miss be serviced but the EX
stage of the dependent instruction must be re-executed before the pipeline
can be restarted. Figure 3-7 illustrates this procedure; a minus (–) after
the pipeline stage descriptor (for instance, EX–) indicates the operation
produced an incorrect result, while a plus (+) indicates the successful
re-execution of that operation.
Cycle
Restart
Load
ALU
Run Run Run Run Run Run Run Stl StlStlStl Stl Run Run Run Run Run
Rst2 Rst1
IFISRF EX DF DS TCDF DS TC WB
IFISRF EXDF DSDF DS TC WB
IFISRF EX DFDF DS TC WB
IFISRF EX-RF EX+ DF DS TC WB
IFIS RFEX DF DS TC WB
Figure 3-7 Pipeline Overrun
54MIPS R4000 Microprocessor User's Manual
Aborting an Instruction Subsequent to an Interlock
The interaction between an integer overflow and an instruction cache miss
is an example of an interlock being serviced for an instruction that is
subsequently aborted.
In this case, pipelining the overflow exception handling into the DF stage
allows an instruction cache miss to occur on the next immediate
instruction. Figure 3-8 illustrates this; aborted instructions are indicated
with an asterisk (*).
The CPU Pipeline
Cycle
Stall
Restart
ALU
Run Run Run Run StlStl StlStl Stl Run Run Run Run Run Run Run
InstrCacheMiss
Rst2 Rst1
IFISRF EXDF DS TC WB*
OVF
IFISRFIFIS RFEX DF DS TC WB*
ICM
IFISIFISRF EX DF DS TC WB*
IFIFISRF EX DF DS TC WB*
Figure 3-8 Instruction Cache Miss
Even though the line brought in by the instruction cache could have been
replaced by a line of the exception handler, no performance loss occurs,
since the instruction cache miss would have been serviced anyway, after
returning from the exception handler. Handling of the exception is done
in this fashion because the frequency of an exception occurring is, by
definition, relatively low.
MIPS R4000 Microprocessor User's Manual55
Chapter 3
Pipelining the Exception Handling
Pipelining of interlock and exception handling is done by pipelining the
logical resolution of possible fault conditions with the buffering and
distributing of the pipeline control signals.
In particular, a half clock period is provided for buffering and distributing
the run control signal; during this time the logic evaluation to produce run
for the next cycle begins. Figure 3-9 shows this process for a sequence of
loads.
Clock
Phase
Load1:
Load2:
Load3:
12
DFDSTCWB
1212121212
TagCkResolveBuffer
DFDSTCWB
TagCkResolveBuffer
DFDSTCWB
TagCkResolveBuffer
Figure 3-9 Pipelining of Interlock and Exception Handling
56MIPS R4000 Microprocessor User's Manual
Clock
The CPU Pipeline
The decision whether or not to advance the pipeline is derived from these
three rules:
•All possible fault-causing events, such as cache misses,
translation exceptions, load interlocks, etc., must be
individually evaluated.
•The fault to be serviced is selected, based on a predefined
priority as determined by the pipeline stage of the asserted
faults.
•Pipeline advance control signals are buffered and distributed.
Figure 3-10 illustrates this process.
Phase
Cycle
12
RunRunRunRun
EvaluateResolveBuffer
12 1212
EvaluateResolveBuffer
EvaluateResolveBuffer
Figure 3-10 Pipeline Advance Decision
MIPS R4000 Microprocessor User's Manual57
Chapter 3
Special Cases
Performance Considerations
In some instances, the pipeline control state machine is bypassed. This
occurs due to performance considerations or to correctness
considerations, which are described in the following sections.
A performance consideration occurs when there is a cache load miss. By
bypassing the pipeline state machine, it is possible to eliminate up to two
cycles of load miss latency. Two techniques, address acceleration and
address prediction, increase performance.
Address Acceleration
Address acceleration bypasses a potential cache miss address. It is relatively
straightforward to perform this bypass since sending the cache miss
address to the secondary cache has no negative impact even if a
subsequent exception nullifies the effect of this cache access. Power is
wasted when the miss is inhibited by some fault, but this is a minor effect.
Address Prediction
Another technique used to reduce miss latency is the automatic increment
and transmission of instruction miss addresses following an instruction
cache miss. This form of latency reduction is called address prediction: the
subsequent instruction miss address is predicted to be a simple increment
of the previous miss address. Figure 3-11 shows a cache miss in which the
cache miss address is changed based on the detection of the miss.
Cycle
Address
Restart
Load
Run Run Run Run Run Run Run Stl StlStlStl StlStlStlStl Run
Cache Index
Rst1
Rst2
Rst3
IFISRF EX DF DS TCDFDS TC WB
Figure 3-11 Load Address Bypassing
Correctness Considerations
An example in which bypassing is necessary to guarantee correctness is a
cache write.
58MIPS R4000 Microprocessor User's Manual
3.6 R4400 Processor Uncached Store Buffer
The R4400 processor contains an uncached store buffer to improve the
performance of uncached stores over that available from an R4000
processor. When an uncached store reaches the write-back (WB) stage in
the CPU pipeline, the CPU must stall until the store is sent off-chip. In the
R4400 processor, a single-entry buffer stores this uncached WB-stage data
on the chip without stalling the pipeline.
If a second uncached store reaches the WB stage in the R4400 processor
before the first uncached store has been moved off-chip, the CPU stalls
until the store buffer completes the first uncached store. To avoid this
stall, the compiler can insert seven instruction cycles between the two
uncached stores, as shown in Figure 3-12. A single instruction that
requires seven cycles to complete could be used in place of the seven No
Operation (NOP) instructions.
Figure 3-12 Pipeline Sequence for Back-to-Back Uncached Stores
If the two uncached stores execute within a loop, the two killed
instructions which are part of the loop branch latency are included in the
count of seven interpolated cycles. Figure 3-13 shows the four NOP
instructions that need to be scheduled in this case.
MIPS R4000 Microprocessor User's Manual59
Chapter 3
Loop:SW R2, (R3)# uncached store
NOP
NOP
NOP
B Loop# branch to loop
NOP
killed# branch latency
killed# branch latency
Figure 3-13 Back-to-Back Uncached Stores in a Loop
The timing requirements of the System interface govern the latency
between uncached stores; back-to-back stores can be sent across the
interface at a maximum rate of one store for every four external cycles. If
the R4400 processor is programmed to run in divide-by-2 mode (for more
information about divided clock, see the description of SClock in Chapter
10), an uncached store can occur every eight pipeline cycles. If a larger
clock divisor is used, more pipeline cycles are required for each store.
CAUTION: The R4000 processor always had a strongly-ordered
execution; however, with the addition of the uncached store buffer in
the R4400 there is a potential for out-of-order execution (described in
the section of the same name in Chapter 11, and Uncached Loads or
Stores in Chapter 12).
60MIPS R4000 Microprocessor User's Manual
Memory Management
The MIPS R4000 processor provides a full-featured memory management
unit (MMU) which uses an on-chip translation lookaside buffer (TLB) to
translate virtual addresses into physical addresses.
4
This chapter describes the processor virtual and physical address spaces,
the virtual-to-physical address translation, the operation of the TLB in
making these translations, and those System Control Coprocessor (CP0)
registers that provide the software interface to the TLB.
MIPS R4000 Microprocessor User's Manual61
Chapter 4
4.1 Translation Lookaside Buffer (TLB)
Mapped virtual addresses are translated into physical addresses using an
on-chip TLB.† The TLB is a fully associative memory that holds 48 entries,
which provide mapping to 48 odd/even page pairs (96 pages). When
address mapping is indicated, each TLB entry is checked simultaneously
for a match with the virtual address that is extended with an ASID stored
in the EntryHi register.
The address mapped to a page ranges in size from 4 Kbytes to 16 Mbytes,
in multiples of 4—that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M.
Hits and Misses
If there is a virtual address match, or hit, in the TLB, the physical page
number is extracted from the TLB and concatenated with the offset to form
the physical address (see Figure 4-1).
If no match occurs (TLB miss), an exception is taken and software refills
the TLB from the page table resident in memory. Software can write over
a selected TLB entry or use a hardware mechanism to write into a random
entry.
Multiple Matches
If more than one entry in the TLB matches the virtual address being
translated, the operation is undefined. To prevent permanent damage to
the part, the TLB may be disabled if more than several entries match. The
TLB-Shutdown (TS) bit in the Status register is set to 1 if the TLB is
disabled.
† There are virtual-to-physical address translations that occur outside of the TLB. For
example, addresses in the kseg0 and kseg1 spaces are unmapped translations. In these
spaces the physical address is derived by subtracting the base address of the space from
the virtual address.
62MIPS R4000 Microprocessor User's Manual
4.2 Address Spaces
This section describes the virtual and physical address spaces and the
manner in which virtual addresses are converted or “translated” into
physical addresses in the TLB.
Virtual Address Space
The processor virtual address can be either 32 or 64 bits wide,† depending
on whether the processor is operating in 32-bit or 64-bit mode.
•In 32-bit mode, addresses are 32 bits wide. The maximum user
process size is 2 gigabytes (231).
•In 64-bit mode, addresses are 64 bits wide. The maximum user
process size is 1 terabyte (240).
Figure 4-1 shows the translation of a virtual address into a physical
address.
Memory Management
1. Virtual address (VA) represented by the
virtual page number (VPN) is compared
with tag in TLB.
2. If there is a match, the page frame
number (PFN) representing the upper
bits of the physical address (PA) is
output from the TLB.
3. The Offset, which does not pass through
the TLB, is then concatenated to the PFN.
Figure 4-1 Overview of a Virtual-to-Physical Address Translation
TLB
Virtual address
G
ASID
G
ASID
VPN
VPN
Offset
TLB
Entry
PFN
PFN
Offset
Physical address
† Figure 4-8 shows the 32-bit and 64-bit versions of the processor TLB entry.
MIPS R4000 Microprocessor User's Manual63
Chapter 4
As shown in Figures 4-2 and 4-3, the virtual address is extended with an
8-bit address space identifier (ASID), which reduces the frequency of TLB
flushing when switching contexts. This 8-bit ASID is in the CP0 EntryHi
register, described later in this chapter. TheGlobal bit (G) is in the EntryLo0
and EntryLo1 registers, described later in this chapter.
Physical Address Space
Using a 36-bit address, the processor physical address space encompasses
64 gigabytes. The section following describes the translation of a virtual
address to a physical address.
Virtual-to-Physical Address Translation
Converting a virtual address to a physical address begins by comparing
the virtual address from the processor with the virtual addresses in the
TLB; there is a match when the virtual page number (VPN) of the address
is the same as the VPN field of the entry, and either:
•the Global (G) bit of the TLB entry is set, or
•the ASID field of the virtual address is the same as the ASID
field of the TLB entry.
This match is referred to as a TLB hit. If there is no match, a TLB Miss
exception is taken by the processor and software is allowed to refill the
TLB from a page table of virtual/physical addresses in memory.
If there is a virtual address match in the TLB, the physical address is
output from the TLB and concatenated with the Offset, which represents
an address within the page frame space. The Offset does not pass through
the TLB.
Virtual-to-physical translation is described in greater detail throughout
the remainder of this chapter; Figure 4-20 is a flow diagram of the process
shown at the end of this chapter.
The next two sections describe the 32-bit and 64-bit address translations.
64MIPS R4000 Microprocessor User's Manual
32-bit Mode Address Translation
Figure 4-2 shows the virtual-to-physical-address translation of a 32-bit
mode address.
•The top portion of Figure 4-2 shows a virtual address with a
12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 20
bits of the address represent the VPN, and index the 1M-entry
page table.
•The bottom portion of Figure 4-2 shows a virtual address with
a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining
8 bits of the address represent the VPN, and index the 256entry page table.
Virtual Address with 1M (220) 4-Kbyte pages
28110
2931
3239
20 bits = 1M pages
Memory Management
12
ASID
8
Bits 31, 30 and 29 of the virtual
address select user, supervisor,
or kernel address spaces.
Figure 4-3 shows the virtual-to-physical-address translation of a 64-bit
mode address. This figure illustrates the two extremes in the range of
possible page sizes: a 4-Kbyte page (12 bits) and a 16-Mbyte page (24 bits).
•The top portion of Figure 4-3 shows a virtual address with a
12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 28
bits of the address represent the VPN, and index the 256Mentry page table.
•The bottom portion of Figure 4-3 shows a virtual address with
a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining
16 bits of the address represent the VPN, and index the 64Kentry page table.
Virtual Address with 256M (228) 4-Kbyte pages
63
6471
616240 39
28 bits = 256M pages
110
12
ASID
8
Bits 62 and 63 of the virtual
address select user, supervisor,
or kernel address spaces.
The processor has three operating modes that function in both 32- and 64bit operations:
These modes are described in the next three sections.
User Mode Operations
In User mode, a single, uniform virtual address space—labelled User
segment—is available; its size is:
Figure 4-4 shows User mode virtual address space.
0x FFFF FFFF
Memory Management
•User mode
•Supervisor mode
•Kernel mode
•2 Gbytes (231 bytes) in 32-bit mode (useg)
•1 Tbyte (240bytes) in 64-bit mode (xuseg)
32-bit*64-bit
0x FFFF FFFF FFFF FFFF
Address
Error
1 TB
Mapped
0x 8000 0000
0x 0000 0000
Address
Error
2 GB
Mapped
0x 0000 0100 0 0 00 0000
usegxuseg
0x 0000 0000 0000 0000
Figure 4-4 User Mode Virtual Address Space
*NOTE: The R4000 uses 64-bit addresses internally. When the kernel
is running in Kernel mode, it initializes registers before switching
modes, and saves (or restores, whichever is appropriate) register
values on context switches. In 32-bit mode, a valid address must be a
32-bit signed number, where bits 63:32 = bit 31. In normal operation
it is not possible for a 32-bit User-mode program to produce invalid
addresses. However, although it would be an error, it is possible for a
Kernel-mode program to erroneously place a value that is not a 32-bit
signed number into a 64-bit register, in which case the User-mode
program generates an invalid address.
MIPS R4000 Microprocessor User's Manual67
Chapter 4
The User segment starts at address 0 and the current active user process
resides in either useg (in 32-bit mode) or xuseg (in 64-bit mode). The TLB
identically maps all references to useg/xuseg from all modes, and controls
cache accessibility.
†
The processor operates in User mode when the Status register contains the
following bit-values:
•KSU bits = 10
2
•EXL = 0
•ERL = 0
In conjunction with these bits, the UX bit in the Status register selects
between 32- or 64-bit User mode addressing as follows:
•when UX = 0, 32-bit useg space is selected and TLB misses are
handled by the 32-bit TLB refill exception handler
•when UX = 1, 64-bit xuseg space is selected and TLB misses are
handled by the 64-bit XTLB refill exception handler
Table 4-1 lists the characteristics of the two user mode segments, useg and
xuseg.
Table 4-1 32-bit and 64-bit User Mode Segments
Status Register
Address Bit
Values
Segment
Name
Address RangeSegment SizeBit Values
KSU EXL ERL UX
32-bit
A(31) = 0
64-bit
A(63:40) = 0
† The cached (C) field in a TLB entry determines whether the reference is cached; see Figur e
4-8.
68MIPS R4000 Microprocessor User's Manual
102000useg
102001xuseg
0x0000 0000
through
0x7FFF FFFF
0x0000 0000 0000 0000
through
0x0000 00FF FFFF FFFF
2 Gbyte
(231 bytes)
1 Tbyte
(240 bytes)
Memory Management
32-bit User Mode (useg)
In User mode, when UX = 0 in the Status register, User mode addressing
is compatible with the 32-bit addressing model shown in Figure 4-4, and a
2-Gbyte user address space is available, labelled useg.
All valid User mode virtual addresses have their most-significant bit
cleared to 0; any attempt to reference an address with the most-significant
bit set while in User mode causes an Address Error exception.
The system maps all references to useg through the TLB, and bit settings
within the TLB entry for the page determine the cacheability of a reference.
64-bit User Mode (xuseg)
In User mode, when UX =1 in theStatus register, User mode addressing is
extended to the 64-bit model shown in Figure 4-4. In 64-bit User mode, the
processor provides a single, uniform address space of 240 bytes, labelled
xuseg.
All valid User mode virtual addresses have bits 63:40 equal to 0; an
attempt to reference an address with bits 63:40 not equal to 0 causes an
Address Error exception.
Supervisor Mode Operations
Supervisor mode is designed for layered operating systems in which a
true kernel runs in R4000 Kernel mode, and the rest of the operating
system runs in Supervisor mode.
The processor operates in Supervisor mode when the Status register
contains the following bit-values:
•KSU = 01
•EXL =0
• ERL = 0
In conjunction with these bits, the SX bit in the Status register selects
between 32- or 64-bit Supervisor mode addressing:
•when SX = 0, 32-bit supervisor space is selected and TLB
misses are handled by the 32-bit TLB refill exception handler
•when SX = 1, 64-bit supervisor space is selected and TLB
misses are handled by the 64-bit XTLB refill exception handler
MIPS R4000 Microprocessor User's Manual69
2
Chapter 4
Figure 4-5 shows Supervisor mode address mapping. Table 4-2 lists the
characteristics of the supervisor mode segments; descriptions of the
address spaces follow.
0x FFFF FFFF
0x E000 0000
0x C000 0000
0x A000 0000
0x 8000 0000
0x 0000 0000
32-bit*
Address
error
0.5 GB
Mapped
Address
error
Address
error
0x FFFF FFFF FFFF FFFF
0x FFFF FFFF E000 0000
sseg
0x FFFF FFFF C000 0000
0x 4000 0100 0000 0000
0x 4000 0000 0000 0000
64-bit
Address
error
0.5 GB
Mapped
Address
error
1 TB
Mapped
csseg
xsseg
Address
2 GB
Mapped
suseg
0x 0000 0100 0000 0000
0x 0000 0000 0000 0000
error
1 TB
Mapped
xsuseg
Figure 4-5 Supervisor Mode Address Space
*NOTE: The R4000 uses 64-bit addresses internally. In 32-bit mode,
a valid address must be a 32-bit signed number, where bits 63:32 = bit
31. In normal operation it is not possible for a 32-bit Supervisor-mode
program to create an invalid address through arithmetic operations.
However 32-bit-mode Supervisor programs must not create addresses
using base register+offset calculations that produce a 32-bit 2’s-
complement overflow; in specific, there are two prohibited cases:
•offset with bit 15 = 0 and base register with bit 31 = 0, but (baseregister+offset) bit 31 = 1
•offset with bit 15 = 1 and base register with bit 31 = 1, but (baseregister+offset) bit 31 = 0
Using this invalid address produces an undefined result.
70MIPS R4000 Microprocessor User's Manual
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.