Mips Technologies R4000 User Manual

MIPS R4000 Microprocessor
User’s Manual
Second Edition
Joe Heinrich
1994 MIPS Technologies, Inc. All Rights Reserved.
RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this
RISCompiler, RISC/os, R2000, R6000, R4000, and R4400 are trademarks of MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of MIPS Technologies, Inc.
IBM 370 is a registered trademark of International Business Machines. VAX is a registered trademark of Digital Equipment Corporation. iAPX is a registered trademark of Intel Corporation. MC68000 is a registered trademark of Motorola Inc. UNIX is a registered trademark in the United States and other countries,
licensed exclusively through X/Open Company, Ltd.
MIPS Technologies, Inc. 2011 North Shoreline Mountain View, California 94039-7311
Acknowledgments for the First Edition
First of all, special thanks go to Duk Chun for his patient help in supplying and verifying the content of this manual; that this manual is technically correct is, in a very large part, directly attributable to him.
Thanks also to the following people for supplying portions of this book: Shabbir Latif, for, among other things, the exception handler flow charts, the description of the output buffer edge-control logic, and the interrupts; once again, Duk Chun, for his paper on R4000 processor synchronization support; Paul Ries, for confirming the accuracy of sections describing the memory management and the caches; John Mashey, for verifying the R4000 processor actually does employ the 64-bit architecture; Dave Ditzel, for raising the issue in the first place; and Mike
Gupta, for substantiating various aspects of the errata. Finally, thanks to Ed Reidenbach for supplying a large portion of the parity and ECC sections of this
manual, and Michael Ngo for checking their accuracy. Thanks also to the following folks for their technical assistance: Andy Keane,
Keith Garrett, Viggy Mokkarala, Charles Price, Ali Moayedian, George Hsieh, Peter Fu, Stephen Przybylski, Michael Woodacre, and Earl Killian. Also to be thanked are the people at fvn@world.std.com: Bill Tuthill, Barry Shein, Bob Devine, and Alan Marr, for helping place RISC in a pecuniary perspective. Also,
thanks to the following people at the mystery_train@swim2birds news group: toma, dan_sears, jharris@garnet, tut@cairo (again), and elvis@dalkey(mateo_b). Their night-
for-day netversations, fueled by caffeine, concerning the viability of the cyberpsykinetic compute-core model helped form an important basis of this book.
On the editorial front, thanks once again to Ms. Robin Cowan, of the Consortium of Editorial Arts for her labors in editing this manual. Thanks to Evelyn Spire for slaving over that bottomless black well we refer to as an “Index.” Thanks also, once again, to Karen Gettman, and Lisa Iarkowski at Prentice-Hall for their help.
On the artistic side, thanks to Jeanne Simonian, of the Creative department here at Silicon Graphics, for the book cover design; and thanks to Pam Flanders for providing MarCom tactical support.
Have we missed anyone? If so, here is where we apologize for doing so.
Joe Heinrich
April 1, 1993
Mt. View, California
MIPS R4000 Microprocessor User's Manual iii
MIPS R4000 Microprocessor User's Manual iv
Acknowledgments for the Second Edition
Thanks go to Shabbir Latif, from whose errata the major part of this second edition is derived. Thanks also to Charlie Price for, among other things, making available his revision of the ISA.
On the production side, thanks to Kay Maitz, Beth Fraker, Molly Castor, Lynnea Humphries, and Claudia Lohnes for their assistance at the center of the hurricane.
MIPS R4000 Microprocessor User's Manual v
Joe Heinrich joeh@sgi.com April 1, 1994
Mt. View, California
MIPS R4000 Microprocessor User's Manual vi
Preface
This book describes the MIPS R4000 and R4400 family of RISC microprocessors (also referred to in this book as processor).
Overview of the Contents
Chapter 1 is a discussion (including the historical context) of RISC
development in general, and the R4000 microprocessor in particular.
Chapter 2 is an overview of the CPU instruction set. Chapter 3 describes the operation of the R4000 instruction execution
pipeline, including the basic operation of the pipeline and interruptions that are caused by interlocks and exceptions.
Chapter 4 describes the memory management system including address mapping and address spaces, virtual memory, the translation lookaside buffer (TLB), and the System Control Processor (CP0).
Chapter 5 describes the exception processing resources of R4000 processor. It includes an overview of the CPU exception handling process and describes the format and use of each CPU exception handling register.
MIPS R4000 Microprocessor User's Manual vii
Preface
Chapter 6 describes the Floating-Point Unit (FPU), a coprocessor for the CPU that extends the CPU instruction set to perform floating­point arithmetic operations. This chapter lists the FPU registers and instructions.
Chapter 7 describes the FPU exception processing. Chapter 8 describes the signals that pass between the R4000 processor
and other components in a system. The signals discussed include the System interface, the Clock/Control interface, the Secondary Cache interface, the Interrupt interface, the Initialization interface, and the JTAG interface.
Chapter 9 describes in more detail the Initialization interface, which includes the boot modes for the processor, as well as system resets.
Chapter 10 describes the clocks used in the R4000 processor, as well as the processor status reporting mechanism.
Chapter 11 discusses cache memory, including the operation of the primary and secondary caches, and cache coherency in a multiprocessor system.
Chapter 12 describes the System interface, which allows the processor access to external resources such as memory and input/output (I/O). It also allows an external agent access to the internal resources of the processor, such as the secondary cache.
Chapter 13 describes the Secondary Cache interface, including read and write cycle timing. This chapter also discusses the interface buses and signals.
Chapter 14 describes the Joint Test Action Group (JTAG) interface. The JTAG boundary scan mechanism tests the interconnections between the R4000 processor, the printed circuit board to which it is mounted, and other components on the board.
Chapter 15 describes the single nonmaskable processor interrupt, along with the six hardware and two software processor interrupts.
Chapter 16 describes the error checking and correcting (ECC) mechanisms of the R4000 processor.
viii MIPS R4000 Microprocessor User's Manual
A Note on Style
Preface
Appendix A describes the R4000 CPU instructions, in both 32- and 64­bit modes. The instruction list is given in alphabetical order.
Appendix B describes the R4000 FPU instructions, listed alphabetically.
Appendix C describes sub-block ordering, a nonsequential method of retrieving data.
Appendix D describes the output buffer and the i/t control mechanism.
Appendix E describes the passive components that make up the phase-locked loop (PLL).
Appendix F describes Coprocessor 0 hazards. Appendix G describes the R4000 pinout.
A brief note on some of the stylistic conventions used in this book: bits, fields, and registers of interest from a software perspective are italicized (such as Config register); signal names of more importance from a hardware point of view are rendered in bold (such as Reset*).
A range of bits uses a colon as a separator; for instance, (15:0) represents the 16-bit range that runs from bit 0, inclusive, through bit
15. (In some places an ellipsis may used in place of the colon for visibility: (15...0).)
MIPS R4000 Microprocessor User's Manual ix
Preface
x MIPS R4000 Microprocessor User's Manual
Preface to the Second Edition
Changes From the First Edition
The second edition of this book incorporates certain low-level changes and technical additions, but retains a substantive identity with the original version.
Changes from the first edition are indicated by left-margin vertical rules.
Getting MIPS Documents On-Line
MIPS documents (including an electronic version of the errata) are available on-line, through the file transport protocol (FTP). To retrieve them, follow the steps below. The text you are to type is shown in Courier Bold font; the computer’s responses are in shown in Courier Regular font.
1. First, place yourself in the directory on your system within which you want to store the retrieved files. Do this by typing:
cd <directory_you_want_file_to_be_in>
2. Access the MIPS document server, sgigate, through FTP by typing:
ftp sgigate.sgi.com
3. The server tells you when you are connected for FTP by responding:
Connected to sgigate.sgi.com.
MIPS R4000 Microprocessor User's Manual xi
Preface
4. Next (after some announcements) the server asks you to log in by requesting a name and then a password.
Name (sgigate.sgi.com:<login_name>):
5. Login by typing anonymous for your name and your electronic mail address for your password.
Name (sgigate.sgi.com:<login_name>): anonymous 331 Guest login ok, type your name as
password.
Password: your_email_address
6. The system indicates you have successfully logged in by supplying an FTP prompt:
ftp>
7. Go to the pub/doc directory by typing:
ftp> cd pub/doc
8. You can take a look at the contents of the doc directory by listing them:
ftp> ls
9. You will find several R4000-related subdirectories, such as R4200, R4400, and R4600. When you find the subdirectory you want, cd into that subdirectory and retrieve the file you want by typing:
get <filename>
This copies the file from sgigate back to your system.
10. When you have retrieved the files you want, exit from ftp by typing:
ftp> quit
11. If the file was encoded for transmission, you must decode it, after retrieval, by typing:
uudecode <filename>
12. If the file was compressed for transmission, you must uncompress it, after retrieval, by typing:
uncompress <filename>
13. If you tarred the file, type:
tar xvof <filename>
xii MIPS R4000 Microprocessor User's Manual
Table of Contents
Preface
Overview of the Contents...................................................................................vii
A Note on Style ....................................................................................................ix
Preface to the Second Edition
Changes From the First Edition.........................................................................xi
Getting MIPS Documents On-Line.................................................................... xi
MIPS R4000 Microprocessor User's Manual xiii
Table of Contents
1
Introduction
Benefits of RISC Design...........................................................................................2
Shorter Design Cycle........................................................................................... 3
Effective Utilization of Chip Area ..................................................................... 3
User (Programmer) Benefits...............................................................................3
Advanced Semiconductor Technologies.......................................................... 3
Optimizing Compilers.........................................................................................4
MIPS RISCompiler Language Suite ..................................................................5
Compatibility............................................................................................................ 6
Processor General Features..................................................................................... 6
R4000 Processor Configurations ............................................................................7
R4400 Processor Enhancements............................................................................. 7
R4000 Processor........................................................................................................9
64-bit Architecture ............................................................................................... 9
Superpipeline Architecture ................................................................................11
System Interface................................................................................................... 11
CPU Register Overview......................................................................................12
CPU Instruction Set Overview...........................................................................14
Data Formats and Addressing........................................................................... 24
Coprocessors (CP0-CP2)..................................................................................... 27
System Control Coprocessor, CP0.................................................................27
Floating-Point Unit (FPU), CP1 ..................................................................... 30
Memory Management System (MMU).............................................................31
The Translation Lookaside Buffer (TLB)......................................................31
Operating Modes.............................................................................................32
Cache Memory Hierarchy.............................................................................. 32
Primary Caches................................................................................................33
Secondary Cache Interface............................................................................. 33
xiv MIPS R4000 Microprocessor User's Manual
2
CPU Instruction Set Summary
CPU Instruction Formats ........................................................................................36
Load and Store Instructions ...............................................................................37
Scheduling a Load Delay Slot........................................................................37
Defining Access Types....................................................................................37
Computational Instructions................................................................................39
64-bit Operations .............................................................................................39
Cycle Timing for Multiply and Divide Instructions................................... 40
Jump and Branch Instructions ...........................................................................41
Overview of Jump Instructions ..................................................................... 41
Overview of Branch Instructions ..................................................................41
Special Instructions..............................................................................................42
Exception Instructions......................................................................................... 42
Coprocessor Instructions ....................................................................................42
3
The CPU Pipeline
Table of Contents
CPU Pipeline Operation..........................................................................................44
CPU Pipeline Stages................................................................................................. 45
Branch Delay.............................................................................................................48
Load Delay ................................................................................................................48
Interlock and Exception Handling......................................................................... 49
Exception Conditions .......................................................................................... 52
Stall Conditions....................................................................................................53
Slip Conditions.....................................................................................................53
External Stalls ....................................................................................................... 53
Interlock and Exception Timing ........................................................................53
Backing Up the Pipeline .................................................................................54
Aborting an Instruction Subsequent to an Interlock..................................55
Pipelining the Exception Handling...................................................................56
Special Cases.........................................................................................................58
Performance Considerations.......................................................................... 58
Correctness Considerations............................................................................58
R4400 Processor Uncached Store Buffer ............................................................... 59
MIPS R4000 Microprocessor User's Manual xv
Table of Contents
4
Memory Management
Translation Lookaside Buffer (TLB) ......................................................................62
Hits and Misses .................................................................................................... 62
Multiple Matches .................................................................................................62
Address Spaces.........................................................................................................63
Virtual Address Space.........................................................................................63
Physical Address Space....................................................................................... 64
Virtual-to-Physical Address Translation..........................................................64
32-bit Mode Address Translation......................................................................65
64-bit Mode Address Translation......................................................................66
Operating Modes .................................................................................................67
User Mode Operations...................................................................................67
Supervisor Mode Operations........................................................................69
Kernel Mode Operations ............................................................................... 73
System Control Coprocessor ..................................................................................80
Format of a TLB Entry.........................................................................................81
CP0 Registers........................................................................................................84
Index Register (0).............................................................................................85
Random Register (1)........................................................................................86
EntryLo0 (2), and EntryLo1 (3) Registers.....................................................87
PageMask Register (5)..................................................................................... 87
Wired Register (6)............................................................................................88
EntryHi Register (CP0 Register 10)...............................................................89
Processor Revision Identifier (PRId) Register (15)......................................89
Config Register (16).........................................................................................90
Load Linked Address (LLAddr) Register (17) ............................................93
Cache Tag Registers [TagLo (28) and TagHi (29)]...................................... 93
Virtual-to-Physical Address Translation Process............................................ 95
TLB Misses............................................................................................................ 97
TLB Instructions...................................................................................................97
xvi MIPS R4000 Microprocessor User's Manual
5
CPU Exception Processing
How Exception Processing Works......................................................................... 100
Exception Processing Registers..............................................................................101
Context Register (4) .............................................................................................102
Bad Virtual Address Register (BadVAddr) (8)................................................103
Count Register (9) ................................................................................................ 103
Compare Register (11).........................................................................................104
Status Register (12)...............................................................................................105
Status Register Format....................................................................................105
Status Register Modes and Access States..................................................... 109
Status Register Reset .......................................................................................110
Cause Register (13) ..............................................................................................110
Exception Program Counter (EPC) Register (14) ............................................ 112
WatchLo (18) and WatchHi (19) Registers ....................................................... 113
XContext Register (20)......................................................................................... 114
Error Checking and Correcting (ECC) Register (26)....................................... 115
Cache Error (CacheErr) Register (27)................................................................116
Error Exception Program Counter (Error EPC) Register (30)........................118
Processor Exceptions ...............................................................................................119
Exception Types................................................................................................... 119
Reset Exception Process..................................................................................120
Cache Error Exception Process......................................................................120
Soft Reset and NMI Exception Process......................................................... 121
General Exception Process .............................................................................121
Exception Vector Locations................................................................................ 122
Priority of Exceptions..........................................................................................123
Reset Exception ....................................................................................................124
Soft Reset Exception ............................................................................................125
Address Error Exception..................................................................................... 127
TLB Exceptions.....................................................................................................128
TLB Refill Exception........................................................................................129
TLB Invalid Exception.....................................................................................130
TLB Modified Exception.................................................................................131
Cache Error Exception......................................................................................... 132
Virtual Coherency Exception ............................................................................. 133
Bus Error Exception.............................................................................................134
Integer Overflow Exception ...............................................................................135
Table of Contents
MIPS R4000 Microprocessor User's Manual xvii
Table of Contents
Trap Exception .....................................................................................................136
System Call Exception.........................................................................................137
Breakpoint Exception ..........................................................................................138
Reserved Instruction Exception.........................................................................139
Coprocessor Unusable Exception......................................................................140
Floating-Point Exception.....................................................................................141
Watch Exception ..................................................................................................142
Interrupt Exception.............................................................................................. 143
Exception Handling and Servicing Flowcharts ...................................................144
xviii MIPS R4000 Microprocessor User's Manual
6
Floating-Point Unit
Overview................................................................................................................... 152
FPU Features.............................................................................................................153
FPU Programming Model.......................................................................................154
Floating-Point General Registers (FGRs).......................................................... 154
Floating-Point Registers......................................................................................156
Floating-Point Control Registers .......................................................................157
Implementation and Revision Register, (FCR0)..............................................158
Control/Status Register (FCR31)....................................................................... 159
Accessing the Control/Status Register......................................................... 160
IEEE Standard 754 ........................................................................................... 161
Control/Status Register FS Bit....................................................................... 161
Control/Status Register Condition Bit.........................................................161
Control/Status Register Cause, Flag, and Enable Fields...........................161
Control/Status Register Rounding Mode Control Bits..............................163
Floating-Point Formats............................................................................................164
Binary Fixed-Point Format...................................................................................... 166
Floating-Point Instruction Set Overview.............................................................. 167
Floating-Point Load, Store, and Move Instructions........................................169
Transfers Between FPU and Memory........................................................... 169
Transfers Between FPU and CPU..................................................................169
Load Delay and Hardware Interlocks.......................................................... 169
Data Alignment................................................................................................ 170
Endianness........................................................................................................170
Floating-Point Conversion Instructions............................................................170
Floating-Point Computational Instructions..................................................... 170
Branch on FPU Condition Instructions............................................................. 170
Floating-Point Compare Operations.................................................................171
FPU Instruction Pipeline Overview.......................................................................172
Instruction Execution ..........................................................................................172
Instruction Execution Cycle Time .....................................................................173
Scheduling FPU Instructions.............................................................................. 175
FPU Pipeline Overlapping.................................................................................. 175
Instruction Scheduling Constraints ..............................................................176
Instruction Latency, Repeat Rate, and Pipeline Stage Sequences.............181
Resource Scheduling Rules ............................................................................ 182
Table of Contents
MIPS R4000 Microprocessor User's Manual xix
Table of Contents
7
Floating-Point Exceptions
Exception Types........................................................................................................188
Exception Trap Processing......................................................................................189
Flags ...........................................................................................................................190
FPU Exceptions......................................................................................................... 192
Inexact Exception (I)............................................................................................ 192
Invalid Operation Exception (V)........................................................................ 193
Division-by-Zero Exception (Z).........................................................................194
Overflow Exception (O)...................................................................................... 194
Underflow Exception (U).................................................................................... 195
Unimplemented Instruction Exception (E) ...................................................... 196
Saving and Restoring State ..................................................................................... 197
Trap Handlers for IEEE Standard 754 Exceptions............................................... 198
8
R4000 Processor Signal Descriptions
System Interface Signals..........................................................................................201
Clock/Control Interface Signals ............................................................................203
Secondary Cache Interface Signals........................................................................ 205
Interrupt Interface Signals ......................................................................................207
JTAG Interface Signals............................................................................................. 207
Initialization Interface Signals................................................................................208
Signal Summary .......................................................................................................209
xx MIPS R4000 Microprocessor User's Manual
9
Initialization Interface
Functional Overview ...............................................................................................214
Reset Signal Description.......................................................................................... 215
Power-on Reset..................................................................................................... 216
Cold Reset .............................................................................................................217
Warm Reset...........................................................................................................217
Initialization Sequence.............................................................................................218
Boot-Mode Settings..................................................................................................222
10
Clock Interface
Signal Terminology..................................................................................................228
Basic System Clocks.................................................................................................229
MasterClock..........................................................................................................229
MasterOut .............................................................................................................229
SyncIn/SyncOut................................................................................................... 229
PClock....................................................................................................................229
SClock.................................................................................................................... 230
TClock....................................................................................................................230
RClock.................................................................................................................... 230
PClock-to-SClock Division .................................................................................230
System Timing Parameters..................................................................................... 233
Alignment to SClock............................................................................................ 233
Alignment to MasterClock .................................................................................233
Phase-Locked Loop (PLL)................................................................................... 233
Connecting Clocks to a Phase-Locked System.....................................................234
Connecting Clocks to a System without Phase Locking.....................................235
Connecting to a Gate-Array Device ..................................................................235
Connecting to a CMOS Logic System............................................................... 238
Processor Status Outputs ........................................................................................ 241
Table of Contents
MIPS R4000 Microprocessor User's Manual xxi
Table of Contents
11
Cache Organization, Operation, and Coherency
Memory Organization............................................................................................. 244
Overview of Cache Operations.............................................................................. 245
R4000 Cache Description......................................................................................... 246
Secondary Cache Size..........................................................................................248
Variable-Length Cache Lines ............................................................................. 248
Cache Organization and Accessibility..............................................................248
Organization of the Primary Instruction Cache (I-Cache)......................... 249
Organization of the Primary Data Cache (D-Cache)..................................250
Accessing the Primary Caches.......................................................................251
Organization of the Secondary Cache.......................................................... 252
Accessing the Secondary Cache.....................................................................254
Cache States............................................................................................................... 255
Primary Cache States...........................................................................................256
Secondary Cache States....................................................................................... 256
Mapping States Between Caches....................................................................... 257
Cache Line Ownership............................................................................................ 258
Cache Write Policy...................................................................................................259
Cache State Transition Diagrams...........................................................................260
Cache Coherency Overview ................................................................................... 264
Cache Coherency Attributes...............................................................................264
Uncached ..........................................................................................................265
Noncoherent.....................................................................................................265
Sharable.............................................................................................................265
Update...............................................................................................................265
Exclusive ........................................................................................................... 266
Cache Operation Modes......................................................................................266
Secondary-Cache Mode..................................................................................266
No-Secondary-Cache Mode........................................................................... 266
Strong Ordering ...................................................................................................267
An Example of Strong Ordering....................................................................267
Testing for Strong Ordering...........................................................................267
Restarting the Processor .................................................................................268
Maintaining Coherency on Loads and Stores......................................................269
Manipulation of the Cache by an External Agent............................................... 270
Invalidate...............................................................................................................270
Update ...................................................................................................................270
xxii MIPS R4000 Microprocessor User's Manual
Table of Contents
Snoop ..................................................................................................................... 270
Intervention...........................................................................................................271
Coherency Conflicts.................................................................................................271
How Coherency Conflicts Arise ........................................................................ 272
Processor Coherent Read Requests...............................................................272
Processor Invalidate or Update Requests ....................................................273
External Coherency Requests ........................................................................274
System Implications of Coherency Conflicts................................................... 275
System Model...................................................................................................276
Load...................................................................................................................278
Store...................................................................................................................278
Processor Coherent Read Request and Read Response.............................278
Processor Invalidate........................................................................................ 279
Processor Write................................................................................................ 279
Handling Coherency Conflicts........................................................................... 280
Coherent Read Conflicts.................................................................................280
Coherent Write Conflicts................................................................................281
Invalidate Conflicts .........................................................................................282
Sample Cycle: Coherent Read Request.............................................................283
R4000 Processor Synchronization Support........................................................... 286
Test-and-Set (Spinlock) .......................................................................................286
Counter..................................................................................................................288
LL and SC..............................................................................................................289
Examples Using LL and SC................................................................................ 290
MIPS R4000 Microprocessor User's Manual xxiii
Table of Contents
12
System Interface
Terminology..............................................................................................................294
System Interface Description..................................................................................294
Interface Buses......................................................................................................295
Address and Data Cycles ...............................................................................296
Issue Cycles ......................................................................................................296
Handshake Signals..............................................................................................298
System Interface Protocols......................................................................................299
Master and Slave States....................................................................................... 299
Moving from Master to Slave State...................................................................300
External Arbitration............................................................................................. 300
Uncompelled Change to Slave State .................................................................301
Processor and External Requests ...........................................................................302
Rules for Processor Requests.............................................................................. 303
Processor Requests...............................................................................................304
Processor Read Request..................................................................................306
Processor Write Request.................................................................................307
Processor Invalidate Request.........................................................................308
Processor Update Request..............................................................................310
Clusters..............................................................................................................311
External Requests.................................................................................................313
External Read Request.................................................................................... 316
External Write Request................................................................................... 316
External Invalidate Request ........................................................................... 316
External Update Request................................................................................316
External Snoop Request..................................................................................317
External Intervention Request....................................................................... 317
Read Response .................................................................................................317
Handling Requests...................................................................................................318
Load Miss..............................................................................................................318
Secondary-Cache Mode..................................................................................320
No-Secondary-Cache Mode........................................................................... 320
Store Miss..............................................................................................................321
Secondary-Cache Mode..................................................................................323
No-Secondary-Cache Mode........................................................................... 325
Store Hit.................................................................................................................326
Secondary-Cache Mode..................................................................................326
xxiv MIPS R4000 Microprocessor User's Manual
Table of Contents
No-Secondary-Cache Mode........................................................................... 326
Uncached Loads or Stores ..................................................................................326
CACHE Operations............................................................................................. 327
Load Linked Store Conditional Operation....................................................... 327
Processor and External Request Protocols............................................................329
Processor Request Protocols...............................................................................330
Processor Read Request Protocol.................................................................. 330
Processor Write Request Protocol................................................................. 333
Processor Invalidate and Update Request Protocol ................................... 335
Processor Null Write Request Protocol........................................................ 336
Processor Cluster Request Protocol .............................................................. 337
Processor Request and Cluster Flow Control.............................................. 338
External Request Protocols.................................................................................341
External Arbitration Protocol......................................................................... 342
External Read Request Protocol ....................................................................343
External Null Request Protocol .....................................................................344
External Write Request Protocol ...................................................................347
External Invalidate and Update Request Protocols....................................348
External Intervention Request Protocol .......................................................349
External Snoop Request Protocol.................................................................. 352
Read Response Protocol..................................................................................354
Data Rate Control.....................................................................................................356
Data Transfer Patterns......................................................................................... 356
Secondary Cache Transfers ................................................................................357
Secondary Cache Write Cycle Time.................................................................. 358
Independent Transmissions on the SysAD Bus ..............................................359
System Interface Endianness..............................................................................360
System Interface Cycle Time...................................................................................361
Cluster Request Spacing .....................................................................................361
Release Latency.................................................................................................... 362
External Request Response Latency.................................................................. 363
System Interface Commands and Data Identifiers.............................................. 364
Command and Data Identifier Syntax..............................................................364
System Interface Command Syntax ..................................................................365
Read Requests .................................................................................................. 366
Write Requests .................................................................................................367
Null Requests................................................................................................... 369
Invalidate Requests .........................................................................................370
MIPS R4000 Microprocessor User's Manual xxv
Table of Contents
Update Requests.............................................................................................. 370
Intervention and Snoop Requests .................................................................372
System Interface Data Identifier Syntax ........................................................... 374
Coherent Data ..................................................................................................374
Noncoherent Data............................................................................................374
Data Identifier Bit Definitions........................................................................ 375
System Interface Addresses....................................................................................377
Addressing Conventions ....................................................................................377
Sequential and Subblock Ordering....................................................................378
Processor Internal Address Map............................................................................ 378
13
Secondary Cache Interface
Data Transfer Rates..................................................................................................380
Duplicating Signals..................................................................................................380
Accessing a Split Secondary Cache........................................................................381
SCDChk Bus..............................................................................................................381
SCTAG Bus................................................................................................................ 381
Operation of the Secondary Cache Interface........................................................ 382
Read Cycles...........................................................................................................383
4-Word Read Cycle.......................................................................................... 383
8-Word Read Cycle.......................................................................................... 384
Notes on a Secondary Cache Read Cycle.....................................................384
Write Cycles..........................................................................................................385
4-Word Write Cycle......................................................................................... 385
8-Word Write Cycle......................................................................................... 386
Notes on a Secondary Cache Write Cycle....................................................387
xxvi MIPS R4000 Microprocessor User's Manual
14
JTAG Interface
What Boundary Scanning Is ................................................................................... 390
Signal Summary .......................................................................................................391
JTAG Controller and Registers............................................................................... 392
Instruction Register..............................................................................................392
Bypass Register.....................................................................................................393
Boundary-Scan Register......................................................................................394
Test Access Port (TAP)........................................................................................395
TAP Controller.................................................................................................396
Controller Reset ...............................................................................................396
Controller States...............................................................................................396
Implementation-Specific Details............................................................................400
15
R4000 Processor Interrupts
Hardware Interrupts................................................................................................ 402
Nonmaskable Interrupt (NMI)...............................................................................402
Asserting Interrupts.................................................................................................402
Table of Contents
MIPS R4000 Microprocessor User's Manual xxvii
Table of Contents
16
Error Checking and Correcting
Error Checking in the Processor.............................................................................408
Types of Error Checking.....................................................................................408
Parity Error Detection.....................................................................................408
SECDED ECC Code......................................................................................... 409
Error Checking Operation .................................................................................. 412
System Interface...............................................................................................412
Secondary Cache Data Bus.............................................................................412
System Interface and Secondary Cache Data Bus....................................... 412
Secondary Cache Tag Bus...............................................................................413
System Interface Command Bus ...................................................................413
SECDED ECC Matrices for Data and Tag Buses.............................................414
ECC Check Bits..................................................................................................... 414
Data ECC Generation.......................................................................................... 415
Detecting Data Transmission Errors................................................................. 418
Single Data Bit ECC Error ..............................................................................420
Single Check Bit ECC Error............................................................................ 421
Double Data Bit ECC Errors........................................................................... 422
Three Data Bit ECC Errors .............................................................................423
Four Data Bit ECC Errors ............................................................................... 424
Tag ECC Generation............................................................................................425
Summary of ECC Operations............................................................................. 426
R4400 Master/Checker Mode.................................................................................430
Connecting a System in Lock Step ....................................................................431
Master-Listener Configuration ..........................................................................432
Cross-Coupled Checking Configuration..........................................................433
Fault Detection .....................................................................................................435
Reset Operation....................................................................................................436
Fault History.........................................................................................................436
xxviii MIPS R4000 Microprocessor User's Manual
A
CPU Instruction Set Details
B
FPU Instruction Set Details
C
Subblock Ordering
Sequential Ordering.................................................................................................C-2
Subblock Ordering................................................................................................... C-2
D
Output Buffer i/t Control Mechanism
Mode Bits...................................................................................................................D-1
Delay Times............................................................................................................... D-2
E
PLL Passive Components
F
Coprocessor 0 Hazards
Table of Contents
G
R4000 Pinouts
Pinout of R4000PC....................................................................................................G-2
Pinout of R4000MC/SC Package Pinout ..............................................................G-5
Index
MIPS R4000 Microprocessor User's Manual xxix
Table of Contents
xxx MIPS R4000 Microprocessor User's Manual
Introduction
Historically, the evolution of computer architectures has been dominated by families of increasingly complex central processors. Under market pressures to preserve existing software, complex instruction set computer (CISC) architectures evolved by the accretion of microcode and increasingly intricate instruction sets. This intricacy in architecture was itself driven by the need to support high-level languages and operating systems, as advances in semiconductor technology made it possible to fabricate integrated circuits of greater and greater complexity. And at that time it seemed self-evident to designers that architectures should continue to become more and more complex as technological advances made such VLSI designs possible.
1
MIPS R4000 Microprocessor User's Manual 1
Chapter 1
In recent years, however, reduced instruction set computer (RISC) architectures are implementing a different model for the interaction between hardware, firmware, and software. RISC concepts emerged from a statistical analysis of the way in which software actually uses processor resources: dynamic measurement of system kernels and object modules generated by optimizing compilers showed that the simplest instructions were used most often—even in the code for CISC machines. Correspondingly, complex instructions often went unused because their single way of performing a complex operation rarely matched the precise needs of a high-level language.
RISC architecture eliminates microcode routines and turns low-level control of the machine over to software. The RISC approach is not new, but its application has become more prevalent in recent years, due to the increasing use of high-level languages, the development of compilers that are able to optimize at the microcode level, and dramatic advances in semiconductor memory and packaging. It is now feasible to replace relatively slow microcode ROM with faster RAM that is organized as an instruction cache. Machine control resides in this instruction cache that is, in effect, customized on-the-fly: the instruction stream generated by system- and compiler-generated code provides a precise fit between the requirements of high-level software and the low-level capabilities of the hardware.
Reducing or simplifying the instruction set was not the primary goal of RISC architecture; it is a pleasant side effect of techniques used to gain the highest performance possible from available technology. Thus, the term reduced instruction set computers is a bit misleading; it is the push for performance that really drives and shapes RISC designs.
1.1 Benefits of RISC Design
Some benefits that result from RISC design techniques are not directly attributable to the drive to increase performance, but are a result of the basic reduction in complexity—a simpler design allows both chip-area resources and human resources to be applied to features that enhance performance. Some of these benefits are described below.
2 MIPS R4000 Microprocessor User's Manual
Shorter Design Cycle
The architectures of RISC processors can be implemented more quickly than their CISC counterparts: it is easier to fabricate and debug a streamlined, simplified architecture with no microcode than a complex architecture that uses microcode. CISC processors have such a long design cycle that they may not be completely debugged by the time they are technologically obsolete. The shorter time required to design and implement RISC processors allows them to make use of the best available technologies.
Effective Utilization of Chip Area
The simplicity of RISC processors also frees scarce chip geography for performance-critical resources such as larger register files, translation lookaside buffers (TLBs), coprocessors, and fast multiply and divide units. Such resources help RISC processors obtain an even greater performance edge.
User (Programmer) Benefits
Simplicity in architecture also helps the user by providing a uniform instruction set that is easier to use. This allows a closer correlation between the instruction count and the cycle count, making it easier to measure code optimization activities.
Introduction
Advanced Semiconductor Technologies
Each new VLSI technology is introduced with tight limits on the number of transistors that fit on each chip. Since the simplicity of a RISC processor allows it to be implemented in fewer transistors than its CISC counterpart, the first computers capable of exploiting these new VLSI technologies have been using and will continue to use RISC architecture.
MIPS R4000 Microprocessor User's Manual 3
Chapter 1
Optimizing Compilers
RISC architecture is designed so that the compilers, not assembly languages, have the optimal working environment. RISC philosophy assumes that high-level language programming is used, which contradicts the older CISC philosophy that assumes assembly language programming is of primary importance.
The trend toward high-level language instructions has led to the development of more efficient compilers to convert high-level language instructions to machine code. Primary measures of compiler efficiency are the compactness of its generated code and the shortness of its execution time.
During the development of more efficient compilers, analysis of instruction streams revealed that the greatest amount of time was spent executing simple instructions and performing load and store operations, while the more complex instructions were used less frequently. It was also learned that compilers produce code that is often a narrow subset of the processor instruction set architecture (ISA). A compiler works more efficiently with instructions that perform simple, well-defined operations and generate minimal side-effects. Compilers do not use complex instructions and features; the more complex, powerful instructions are either too difficult for the compiler to employ or those instructions do not precisely fit high-level language requirements.
Thus, a natural match exists between RISC architectures and efficient, optimizing compilers. This match makes it easier for compilers to generate the most effective sequences of machine instructions to accomplish tasks defined by the high-level language.
4 MIPS R4000 Microprocessor User's Manual
MIPS RISCompiler Language Suite
Some compiler products are derived from disparate sources and consequently do not fit together very well. Instead of treating each language’s compiler as a separate entity, the MIPS RISCompiler language suite shares common elements across the entire family of compilers. In this way the language suite offers both tight integration and broad language coverage.
The MIPS language suite supports:
industry-standard front ends for the following languages (C, FORTRAN, Pascal)
a common intermediate language, offering an efficient way to add language front ends over time
all of the back end optimization and code generation
the same object format and calling conventions
mixed-language programs
debugging of programs written in all languages, including mixtures
This language suite approach yields high-quality compilers for all languages, since common elements make up the majority of each of the language products. In addition, this approach provides the ability to develop and execute multi-language programs, promoting flexibility in development, avoiding the necessity of recoding proven program segments, and protecting the user’s software investment. The common back-end also exports optimizing and code-generating improvements immediately throughout the language suite, thereby reducing maintenance.
Introduction
TM
MIPS R4000 Microprocessor User's Manual 5
Chapter 1
1.2 Compatibility
The R4000 processor provides complete application software compatibility with the MIPS R2000, R3000, and R6000 processors. Although the MIPS processor architecture has evolved in response to a compromise between software and hardware resources in the computer system, the R4000 processor implements the MIPS ISA for user-mode programs. This guarantees that user programs conforming to the ISA execute on any MIPS hardware implementation.
1.3 Processor General Features
This section briefly describes the programming model, the memory management unit (MMU), and the caches in the R4000 processor. A more detailed description is given in succeeding sections.
Full 32-bit and 64-bit Operations. The R4000 processor contains 32 general purpose 64-bit registers. (When operating as a 32-bit processor, the general purpose registers are 32-bits wide.) All instructions are 32 bits wide.
Efficient Pipeline. The superpipeline design of the processor results in an execution rate approaching one instruction per cycle. Pipeline stalls and exceptional events are handled precisely and efficiently.
MMU. The R4000 processor uses an on-chip TLB that provides rapid virtual-to-physical address translation.
Cache Control. The R4000 primary instruction and data caches reside on-chip, and can each hold 8 Kbytes. In the R4400 processor, the primary caches can each hold 16 Kbytes. Architecturally, each primary cache can be increased to hold up to 32 Kbytes. An off-chip secondary cache (R4000SC and R4000MC processors only) can hold from 128 Kbytes to 4 Mbytes. All processor cache control logic, including the secondary cache control logic, is on-chip.
Floating-Point Unit. The FPU is located on-chip and implements the ANSI/IEEE standard 754-1985.
6 MIPS R4000 Microprocessor User's Manual
1.4 R4000 Processor Configurations
The R4000 processor† is packaged in three different configurations. All processors are implemented in sub-1-micron CMOS technology.
R4000PC is designed for cost-sensitive systems such as inexpensive desktop systems and high-end embedded controllers. It is packaged in a 179-pin PGA, and does not support a secondary cache.
R4000SC is designed for high-performance uniprocessor systems. It is packaged in a 447-pin LGA/PGA and includes integrated control for large secondary caches built from standard SRAMs.
R4000MC is designed for large cache-coherent multiprocessor systems. It is packaged in a 447-pin LGA/PGA and, in addition to the features of R4000SC, includes support for a wide variety of bus designs and cache-coherency mechanisms.
Table 1-1 lists the features in each of the three configurations (X indicates the feature is present). R4400 processor enhancements are described in the section following.
Introduction
1.5 R4400 Processor Enhancements
In addition to the features contained in the R4000 processor, the R4400 processor has the following enhancements:
fully functional Status pins (described in Chapter 10)
Master/Checker mode (described in Chapter 16)
larger primary caches (described in Processor General Featur es, in this chapter)
uncached store buffer (described in Chapter 3)
divide-by-6 and divide-by-8 modes (described in Chapter 10)
cache error bit, EW, added to the CacheErr register (described in Chapter 5).
† Features of the R4400 processor that differ from the R4000 pr ocessor ar e noted throughout
this book; for instance, R4400 processor enhancements are listed in the next section. Otherwise, references to the R4000 pr ocessor may be taken to include the R4400 pr ocessor.
MIPS R4000 Microprocessor User's Manual 7
Chapter 1
Table 1-1 R4000 Features
Feature R4000PC R4000SC R4000MC
Primary Cache States
Valid XX X Shared X Clean Exclusive XX Dirty Exclusive XX X
Secondary Cache Interface X X
Secondary Cache States
Valid XX X Shared X Dirty Shared X Clean Exclusive XX Dirty Exclusive XX X
Multiprocessing X
Cache Coherency Attributes
Uncached XX X Noncoherent XX X Sharable X Update X Exclusive X
Packages
PGA (179-pin) X PGA (447-pin) XX
8 MIPS R4000 Microprocessor User's Manual
1.6 R4000 Processor
This section describes the following:
the 64-bit architecture of the R4000 processor
the superpipeline design of the CPU instruction pipeline (described in detail in Chapter 3)
an overview of the System interface (described in detail in Chapter 12)
an overview of the CPU registers (detailed in Chapters 4 and 5) and CPU instruction set (detailed in Chapter 2 and Appendix A)
data formats and byte ordering
the System Control Coprocessor, CP0, and the floating-point unit, CP1
caches and memory, including a description of primary and secondary caches, the memory management unit (MMU), the translation lookaside buffer (TLB), and the Secondary Cache interface (described in more detail in Chapters 4 and 11). The Secondary Cache interface is detailed in Chapter 13.
Introduction
64-bit Architecture
The natural mode of operation for the R4000 processor is as a 64-bit microprocessor; however, 32-bit applications maintain compatibility even when the processor operates as a 64-bit processor.
The R4000 processor provides the following:
64-bit on-chip floating-point unit (FPU)
64-bit integer arithmetic logic unit (ALU)
64-bit integer registers
64-bit virtual address space
64-bit system bus
Figure 1-1 is a block diagram of the R4000 processor internals.
MIPS R4000 Microprocessor User's Manual 9
Chapter 1
64-bit System Bus
System
Control
CP0
Exception/Control Registers
Memory Management Registers
Translation Lookaside Buffers
S-cache
Control
Data Cache P-cache
CPU
CPU Registers
ALU
Load Aligner/Store Driver
Integer Multiplier/Divider
Address Unit
PC Incrementer
Pipeline Control
Control
FPU
FPU Registers
Pipeline Bypass
FP Multiplier
FP Divider
FP Add, Convert Square Root
Instruction
Cache
Figure 1-1 R4000 Processor Internal Block Diagram
10 MIPS R4000 Microprocessor User's Manual
Superpipeline Architecture
The R4000 processor exploits instruction parallelism by using an eight­stage superpipeline which places no restrictions on the instruction issued. Under normal circumstances, two instructions are issued each cycle.
The internal pipeline of the R4000 processor operates at twice the frequency of the master clock, as discussed in Chapter 3. The processor achieves high throughput by pipelining cache accesses, shortening register access times, implementing virtual-indexed primary caches, and allowing the latency of functional units to span more than one pipeline clock cycles.
System Interface
The R4000 processor supports a 64-bit System interface that can construct uniprocessor systems with a direct DRAM interface—with or without a secondary cache—or cache-coherent multiprocessor systems. The System interface includes:
a 64-bit multiplexed address and data bus
8 check bits
a 9-bit parity-protected command bus
8 handshake signals
Introduction
The interface is capable of transferring data between the processor and memory at a peak rate of 400 Mbytes/second, when running at 50 MHz.
MIPS R4000 Microprocessor User's Manual 11
Chapter 1
CPU Register Overview
The central processing unit (CPU) provides the following registers:
32 general purpose registers
a Program Counter (PC) register
2 registers that hold the results of integer multiply and divide operations (HI and LO).
Floating-point unit (FPU) registers are described in Chapter 6. CPU registers can be either 32 bits or 64 bits wide, depending on the R4000
processor mode of operation. Figure 1-2 shows the CPU registers.
General Purpose Registers
63
31 0
32
r0 r1 r2
r29 r30
r31
Multiply and Divide Registers
63
31 0
32
HI
63
31 0
32
LO
Program Counter
63
31 0
32
PC
Register width depends on mode of operation: 32-bit or 64-bit
12 MIPS R4000 Microprocessor User's Manual
Figure 1-2 CPU Registers
Introduction
Two of the CPU general purpose registers have assigned functions:
r0 is hardwired to a value of zero, and can be used as the target register for any instruction whose result is to be discarded. r0 can also be used as a source when a zero value is needed.
r31 is the link register used by Jump and Link instructions. It should not be used by other instructions.
The CPU has three special purpose registers:
PC — Program Counter register
HI — Multiply and Divide register higher result
LO — Multiply and Divide register lower result
The two Multiply and Divide registers (HI, LO) store:
the product of integer multiply operations, or
the quotient (in LO) and remainder (in HI) of integer divide operations
The R4000 processor has no Program Status Word (PSW) register as such; this is covered by the Status and Cause registers incorporated within the System Control Coprocessor (CP0). CP0 registers are described later in this chapter.
MIPS R4000 Microprocessor User's Manual 13
Chapter 1
CPU Instruction Set Overview
Each CPU instruction is 32 bits long. As shown in Figure 1-3, there are three instruction formats:
immediate (I-type)
jump (J-type)
register (R-type)
15162021252631
I-Type (Immediate)
J-Type (Jump)
R-Type (Register)
Figure 1-3 CPU Instruction Formats
Each format contains a number of different instructions, which are described further in this chapter. Fields of the instruction formats are described in Chapter 2.
Instruction decoding is greatly simplified by limiting the number of formats to these three. This limitation means that the more complicated (and less frequently used) operations and addressing modes can be synthesized by the compiler, using sequences of these same simple instructions.
op rs rt immediate
op target
rs
11 10
rt
rd sa
65
functop
0
0252631
015162021252631
14 MIPS R4000 Microprocessor User's Manual
Introduction
The instruction set can be further divided into the following groupings:
Load and Store instructions move data between memory and general registers. They are all immediate (I-type) instructions, since the only addressing mode supported is base register plus 16-bit, signed immediate offset.
Computational instructions perform arithmetic, logical, shift, multiply, and divide operations on values in registers. They include register (R-type, in which both the operands and the result are stored in registers) and immediate (I-type, in which one operand is a 16-bit immediate value) formats.
Jump and Branch instructions change the control flow of a program. Jumps are always made to a paged, absolute address formed by combining a 26-bit target address with the high­order bits of the Program Counter (J-type format) or register address (R-type format). Branches have 16-bit offsets relative to the program counter (I-type). Jump And Link instructions save their return address in register 31.
Coprocessor instructions perform operations in the coprocessors. Coprocessor load and store instructions are I-type.
Coprocessor 0 (system coprocessor) instructions perform operations on CP0 registers to control the memory management and exception handling facilities of the processor. These are listed in Table 1-18.
Special instructions perform system calls and breakpoint operations. These instructions are always R-type.
Exception instructions cause a branch to the general exception­handling vector based upon the result of a comparison. These instructions occur in both R-type (both the operands and the result are registers) and I-type (one operand is a 16-bit immediate value) formats.
Chapter 2 provides a more detailed summary and Appendix A gives a complete description of each instruction.
MIPS R4000 Microprocessor User's Manual 15
Chapter 1
Tables 1-2 through 1-17 list CPU instructions common to MIPS R-Series processors, along with those instructions that are extensions to the instruction set architecture. The extensions result in code space reductions, multiprocessor support, and improved performance in operating system kernel code sequences—for instance, in situations where run-time bounds-checking is frequently performed. Table 1-18 lists CP0 instructions.
Table 1-2 CPU Instruction Set: Load and Store Instructions
OpCode Description
LB Load Byte LBU Load Byte Unsigned LH Load Halfword LHU Load Halfword Unsigned LW Load Word LWL Load Word Left LWR Load Word Right SB Store Byte SH Store Halfword SW Store Word SWL Store Word Left SWR Store Word Right
Table 1-3 CPU Instruction Set: Arithmetic Instructions (ALU Immediate)
OpCode Description
ADDI Add Immediate ADDIU Add Immediate Unsigned SLTI Set on Less Than Immediate SLTIU Set on Less Than Immediate Unsigned ANDI AND Immediate ORI OR Immediate XORI Exclusive OR Immediate LUI Load Upper Immediate
16 MIPS R4000 Microprocessor User's Manual
Table 1-4 CPU Instruction Set: Arithmetic (3-Operand, R-Type)
OpCode Description
ADD Add ADDU Add Unsigned SUB Subtract SUBU Subtract Unsigned SLT Set on Less Than SLTU Set on Less Than Unsigned AND AND OR OR XOR Exclusive OR NOR NOR
Table 1-5 CPU Instruction Set: Multiply and Divide Instructions
OpCode Description
MULT Multiply MULTU Multiply Unsigned DIV Divide DIVU Divide Unsigned MFHI Move From HI MTHI Move To HI MFLO Move From LO MTLO Move To LO
Introduction
MIPS R4000 Microprocessor User's Manual 17
Chapter 1
Table 1-6 CPU Instruction Set: Jump and Branch Instructions
OpCode Description
J Jump JAL Jump And Link JR Jump Register JALR Jump And Link Register BEQ Branch on Equal BNE Branch on Not Equal BLEZ Branch on Less Than or Equal to Zero BGTZ Branch on Greater Than Zero BLTZ Branch on Less Than Zero BGEZ Branch on Greater Than or Equal to Zero BLTZAL Branch on Less Than Zero And Link BGEZAL Branch on Greater Than or Equal to Zero And Link
Table 1-7 CPU Instruction Set: Shift Instructions
OpCode Description
SLL Shift Left Logical SRL Shift Right Logical SRA Shift Right Arithmetic SLLV Shift Left Logical Variable SRLV Shift Right Logical Variable SRAV Shift Right Arithmetic Variable
18 MIPS R4000 Microprocessor User's Manual
Table 1-8 CPU Instruction Set: Coprocessor Instructions
OpCode Description
LWCz Load Word to Coprocessor z SWCz Store Word from Coprocessor z MTCz Move To Coprocessor z MFCz Move From Coprocessor z CTCz Move Control to Coprocessor z CFCz Move Control From Coprocessor z COPz Coprocessor Operation z BCzT Branch on Coprocessor z True BCzF Branch on Coprocessor z False
Table 1-9 CPU Instruction Set: Special Instructions
OpCode Description
SYSCALL System Call BREAK Break
Introduction
MIPS R4000 Microprocessor User's Manual 19
Chapter 1
Table 1-10 Extensions to the ISA: Load and Store Instructions
OpCode Description
LD Load Doubleword LDL Load Doubleword Left LDR Load Doubleword Right LL Load Linked LLD Load Linked Doubleword LWU Load Word Unsigned SC Store Conditional SCD Store Conditional Doubleword SD Store Doubleword SDL Store Doubleword Left SDR Store Doubleword Right SYNC Sync
Table 1-11 Extensions to the ISA: Arithmetic Instructions (ALU Immediate)
OpCode Description
DADDI Doubleword Add Immediate DADDIU Doubleword Add Immediate Unsigned
Table 1-12 Extensions to the ISA: Multiply and Divide Instructions
OpCode Description
DMULT Doubleword Multiply DMULTU Doubleword Multiply Unsigned DDIV Doubleword Divide DDIVU Doubleword Divide Unsigned
20 MIPS R4000 Microprocessor User's Manual
Introduction
Table 1-13 Extensions to the ISA: Branch Instructions
OpCode Description
BEQL Branch on Equal Likely BNEL Branch on Not Equal Likely BLEZL Branch on Less Than or Equal to Zero Likely BGTZL Branch on Greater Than Zero Likely BLTZL Branch on Less Than Zero Likely BGEZL Branch on Greater Than or Equal to Zero Likely BLTZALL Branch on Less Than Zero And Link Likely
BGEZALL
Branch on Greater Than or Equal to Zero And Link
Likely BCzTL Branch on Coprocessor z True Likely BCzFL Branch on Coprocessor z False Likely
Table 1-14 Extensions to the ISA: Arithmetic Instructions (3-operand, R-type)
OpCode Description
DADD Doubleword Add DADDU Doubleword Add Unsigned DSUB Doubleword Subtract DSUBU Doubleword Subtract Unsigned
MIPS R4000 Microprocessor User's Manual 21
Chapter 1
Table 1-15 Extensions to the ISA: Shift Instructions
OpCode Description
DSLL Doubleword Shift Left Logical DSRL Doubleword Shift Right Logical DSRA Doubleword Shift Right Arithmetic DSLLV Doubleword Shift Left Logical Variable DSRLV Doubleword Shift Right Logical Variable DSRAV Doubleword Shift Right Arithmetic Variable DSLL32 Doubleword Shift Left Logical + 32 DSRL32 Doubleword Shift Right Logical + 32 DSRA32 Doubleword Shift Right Arithmetic + 32
Table 1-16 Extensions to the ISA: Exception Instructions
OpCode Description
TGE Trap if Greater Than or Equal TGEU Trap if Greater Than or Equal Unsigned TLT Trap if Less Than TLTU Trap if Less Than Unsigned TEQ Trap if Equal TNE Trap if Not Equal TGEI Trap if Greater Than or Equal Immediate
TGEIU
Trap if Greater Than or Equal Immediate
Unsigned TLTI Trap if Less Than Immediate TLTIU Trap if Less Than Immediate Unsigned TEQI Trap if Equal Immediate TNEI Trap if Not Equal Immediate
22 MIPS R4000 Microprocessor User's Manual
Table 1-17 Extensions to the ISA: Coprocessor Instructions
OpCode Description
DMFCz Doubleword Move From Coprocessor z DMTCz Doubleword Move To Coprocessor z LDCz Load Double Coprocessor z SDCz Store Double Coprocessor z
Table 1-18 CP0 Instructions
OpCode Description
DMFC0 Doubleword Move From CP0 DMTC0 Doubleword Move To CP0 MTC0 Move to CP0 MFC0 Move from CP0 TLBR Read Indexed TLB Entry TLBWI Write Indexed TLB Entry TLBWR Write Random TLB Entry TLBP Probe TLB for Matching Entry CACHE Cache Operation ERET Exception Return
Introduction
MIPS R4000 Microprocessor User's Manual 23
Chapter 1
Data Formats and Addressing
The R4000 processor uses four data formats: a 64-bit doubleword, a 32-bit word, a 16-bit halfword, and an 8-bit byte. Byte ordering within each of the larger data formats—halfword, word, doubleword—can be configured in either big-endian or little-endian order. Endianness refers to the location of byte 0 within the multi-byte data structure. Figures 1-4 and 1-5 show the ordering of bytes within words and the ordering of words within multiple-word structures for the big-endian and little­endian conventions.
When the R4000 processor is configured as a big-endian system, byte 0 is the most-significant (leftmost) byte, thereby providing compatibility with MC 68000 and IBM 370 conventions. Figure 1-4 shows this configuration.
Higher
Address
Lower
Address
Word
Address
12
8 4 0
31 24 23 1615 8 7 0
12 13 1514
89 1110 45 76 01 32
Bit #
Figure 1-4 Big-Endian Byte Ordering
When configured as a little-endian system, byte 0 is always the least­significant (rightmost) byte, which is compatible with iAPX x86 and DEC VAX conventions. Figure 1-5 shows this configuration.
Higher
Address
Lower
Address
Word
Address
12
8 4 0
31 24 23 1615 8 7 0
15 14 1213 11 10 89
76 45 32 01
Bit #
Figure 1-5 Little-Endian Byte Ordering
24 MIPS R4000 Microprocessor User's Manual
Introduction
In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit designations are always little-endian (although no instructions explicitly designate bit positions within words).
Figures 1-6 and 1-7 show little-endian and big-endian byte ordering in doublewords.
Most-significant byte
63 56 55 48 47 40 39 32
Bit #
Byte #
76
Figure 1-6 Little-Endian Data in a Doubleword
Most-significant byte
Bit #
Byte #
01 3
5
Halfword
2
Least-significant byte
Word
31 24 23 16 15 8 7 0
4
32 01
Byte
70123456
Bit #
Bits in a Byte
Least-significant byte
Word
31 24 23 16 15 8 7 063 56 55 48 47 40 39 32
45 76
MIPS R4000 Microprocessor User's Manual 25
Halfword
Bit #
Byte
Bits in a Byte
Figure 1-7 Big-Endian Data in a Doubleword
07654321
Chapter 1
The CPU uses byte addressing for halfword, word, and doubleword accesses with the following alignment constraints:
Halfword accesses must be aligned on an even byte boundary (0, 2, 4...).
Word accesses must be aligned on a byte boundary divisible by four (0, 4, 8...).
Doubleword accesses must be aligned on a byte boundary divisible by eight (0, 8, 16...).
The following special instructions load and store words that are not aligned on 4-byte (word) or 8-word (doubleword) boundaries:
LWL LWR SWL SWR LDL LDR SDL SDR
These instructions are used in pairs to provide addressing of misaligned words. Addressing misaligned data incurs one additional instruction cycle over that required for addressing aligned data.
Figures 1-8 and 1-9 show the access of a misaligned word that has byte address 3.
Higher
Address
31 24 23 1615 8 7 0
45 6
Bit #
3
Lower
Address
Figure 1-8 Big-Endian Misaligned Word Addressing
Higher
Address
31 24 23 16 15 8 7 0
3
Lower
Address
Bit #
645
Figure 1-9 Little-Endian Misaligned Word Addressing
26 MIPS R4000 Microprocessor User's Manual
Coprocessors (CP0-CP2)
The MIPS ISA defines three coprocessors (designated CP0 through CP2):
Coprocessor 0 (CP0) is incorporated on the CPU chip and supports the virtual memory system and exception handling. CP0 is also referred to as the System Control Coprocessor.
Coprocessor 1 (CP1) is reserved for the on-chip, floating-point coprocessor, the FPU.
Coprocessor 2 (CP2) is reserved for future definition by MIPS.
CP0 and CP1 are described in the sections that follow.
System Control Coprocessor, CP0
CP0 translates virtual addresses into physical addresses and manages exceptions and transitions between kernel, supervisor, and user states. CP0 also controls the cache subsystem, as well as providing diagnostic control and error recovery facilities.
The CP0 registers shown in Figure 1-10 and described in Table 1-19 manipulate the memory management and exception handling capabilities of the CPU.
Introduction
MIPS R4000 Microprocessor User's Manual 27
Chapter 1
Register Name Reg. #Register Name Reg. #
Index Random EntryLo0 EntryLo1 Context PageMask Wired
BadVAddr Count EntryHi Compare SR Cause
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Config LLAddr WatchLo WatchHi
XContext
ECC CacheErr TagLo TagHi
16 17 18 19 20 21 22 23 24 25 26 27 28
29 EPC PRId 15
14
ErrorEPC
30
31
Exception Processing Memory Management Reserved
Figure 1-10 R4000 CP0 Registers
28 MIPS R4000 Microprocessor User's Manual
Introduction
Table 1-19 System Control Coprocessor (CP0) Register Definitions
Number Register Description
0 Index Programmable pointer into TLB array 1 Random Pseudorandom pointer into TLB array(read only) 2 EntryLo0 Low half of TLB entry for even virtual address (VPN) 3 EntryLo1 Low half of TLB entry for odd virtual address (VPN)
4 Context
Pointer to kernel virtual page table entry (PTE) in 32-bit
addressing mode 5 PageMask TLB Page Mask 6 Wired Number of wired TLB entries 7 Reserved 8 BadVAddr Bad virtual address 9 Count Timer Count 10 EntryHi High half of TLB entry 11 Compare Timer Compare 12 SR Status register 13 Cause Cause of last exception 14 EPC Exception Program Counter 15 PRId Processor Revision Identifier 16 Config Configuration register 17 LLAddr Load Linked Address 18 WatchLo Memory reference trap address low bits 19 WatchHi Memory reference trap address high bits 20 XContext Pointer to kernel virtual PTE table in 64-bit addressing mode 21–25 Reserved
26 ECC
Secondary-cache error checking and correcting (ECC) and
Primary parity 27 CacheErr Cache Error and Status register 28 TagLo Cache Tag register 29 TagHi Cache Tag register 30 ErrorEPC Error Exception Program Counter 31 Reserved
MIPS R4000 Microprocessor User's Manual 29
Chapter 1
Floating-Point Unit (FPU), CP1
The MIPS floating-point unit (FPU) is designated CP1; the FPU extends the CPU instruction set to perform arithmetic operations on floating-point values. The FPU, with associated system software, fully conforms to the requirements of ANSI/IEEE Standard 754–1985, IEEE Standard for Binary Floating-Point Arithmetic.
The FPU features include:
Full 64-bit Operation. The FPU can contain either 16 or 32
64-bit registers to hold single-precision or double-precision values. The FPU also includes a 32-bit Status/Control register that provides access to all IEEE-Standard exception handling capabilities.
Load and Store Instruction Set. Like the CPU, the FPU uses a
load- and store-based instruction set. Floating-point operations are started in a single cycle and their execution overlaps other fixed-point or floating-point operations.
Tightly-coupled Coprocessor Interface. The FPU is on the
CPU chip, and appears to the programmer as a simple extension of the CPU (accessed as CP1). Together, the CPU and FPU form a tightly-coupled unit with a seamless integration of floating-point and fixed-point instruction sets. Since each unit receives and executes instructions in parallel, some floating­point instructions can execute at the same rate (two instructions per cycle) as fixed-point instructions.
30 MIPS R4000 Microprocessor User's Manual
Memory Management System (MMU)
The R4000 processor has a 36-bit physical addressing range of 64 Gbytes. However, since it is rare for systems to implement a physical memory space this large, the CPU provides a logical expansion of memory space by translating addresses composed in the large virtual address space into available physical memory addresses. The R4000 processor supports the following two addressing modes:
32-bit mode, in which the virtual address space is divided into 2 Gbytes per user process and 2 Gbytes for the kernel.
64-bit mode, in which the virtual address is expanded to 1 Tbyte (240 bytes) of user virtual address space.
A detailed description of these address spaces is given in Chapter 4.
The Translation Lookaside Buffer (TLB)
Virtual memory mapping is assisted by a translation lookaside buffer, which caches virtual-to-physical address translations. This fully­associative, on-chip TLB contains 48 entries, each of which maps a pair of variable-sized pages ranging from 4 Kbytes to 16 Mbytes, in multiples of four.
Introduction
Instruction TLB
The R4000 processor has a two-entry instruction TLB (ITLB) which assists in instruction address translation. The ITLB is completely invisible to software and exists only to increase performance.
Joint TLB
An address translation value is tagged with the most-significant bits of its virtual address (the number of these bits depends upon the size of the page) and a per-process identifier. If there is no matching entry in the TLB, an exception is taken and software refills the on-chip TLB from a page table resident in memory; this TLB is referred to as the joint TLB (JTLB) because it contains both data and instructions jointly. The JTLB entry to be rewritten is selected at random.
MIPS R4000 Microprocessor User's Manual 31
Chapter 1
Operating Modes
The R4000 processor has three operating modes:
User mode
Supervisor mode
Kernel mode
The manner in which memory addresses are translated ormapped depends on the operating mode of the CPU; this is described in Chapter 4.
Cache Memory Hierarchy
To achieve a high performance in uniprocessor and multiprocessor systems, the R4000 processor supports a two-level cache memory hierarchy that increases memory access bandwidth and reduces the latency of load and store instructions. This hierarchy consists of on-chip instruction and data caches, together with an optional external secondary cache that varies in size from 128 Kbytes to 4 Mbytes.
The secondary cache is assumed to consist of one bank of industry­standard static RAM (SRAM) with output enables, arranged as a quadword (128-bit) data array, with a 25-bit-wide tag array. Check fields are added to both data and tag arrays to improve data integrity.
The secondary cache can be configured as a joint cache, or split into separate instruction and data caches. The maximum secondary cache size is 4 Mbytes; the minimum secondary cache size is 128 Kbytes for a joint cache, or 256 Kbytes total for split instruction/data caches. The secondary cache is direct mapped, and is addressed with the lower part of the physical address.
Primary and secondary caches are described in more detail in Chapter 11.
32 MIPS R4000 Microprocessor User's Manual
Primary Caches
The R4000 processor incorporates separate on-chip primary instruction and data caches to fill the high-performance pipeline. Each cache has its own 64-bit data path, and each can be accessed in parallel.
The R4000 processor primary caches hold from 8 Kbytes to 32 Kbytes; the R4400 processor primary caches are fixed at 16 Kbytes.
Cache accesses can occur up to twice each cycle. This provides the integer and floating-point units with an aggregate bandwidth of 1.6 Gbytes per second at a MasterClock frequency of 50 MHz.
Secondary Cache Interface
The R4000SC (secondary cache) and R4000MC (multiprocessor) versions of the processor allow connection to an optional secondary cache. These processors provide all of the secondary cache control circuitry, including error checking and correcting (ECC) protection, on chip.
The Secondary Cache interface includes:
a 128-bit data bus
a 25-bit tag bus
an 18-bit address bus
SRAM control signals
Introduction
The 128-bit-wide data bus is designed to minimize cache miss penalties, and allow the use of standard low-cost SRAM in secondary cache.
MIPS R4000 Microprocessor User's Manual 33
Chapter 1
34 MIPS R4000 Microprocessor User's Manual
CPU Instruction Set Summary
This chapter is an overview of the central processing unit (CPU) instruction set; refer to Appendix A for detailed descriptions of individual CPU instructions.
2
An overview of the floating-point unit (FPU) instruction set is in Chapter 6; refer to Appendix B for detailed descriptions of individual FPU instructions.
MIPS R4000 Microprocessor User's Manual 35
Chapter 2
2.1 CPU Instruction Formats
Each CPU instruction consists of a single 32-bit word, aligned on a word boundary. There are three instruction formats—immediate (I-type), jump (J-type), and register (R-type)—as shown in Figure 2-1. The use of a small number of instruction formats simplifies instruction decoding, allowing the compiler to synthesize more complicated (and less frequently used) operations and addressing modes from these three formats as needed.
I-Type (Immediate)
op
J-Type (Jump)
op target
R-Type (Register)
rs rt immediate
1110 6 5
rd sa
015162021252631
0252631
015162021252631
functop rs rt
op 6-bit operation code rs 5-bit source register specifier
rt immediate 16-bit immediate value, branch displacement or
target 26-bit jump target address rd 5-bit destination register specifier sa 5-bit shift amount funct 6-bit function field
In the MIPS architecture, coprocessor instructions are implementation­dependent; see Appendix A for details of individual Coprocessor 0 instructions.
36 MIPS R4000 Microprocessor User's Manual
5-bit target (source/destination) register or branch condition
address displacement
Figure 2-1 CPU Instruction Formats
Load and Store Instructions
Load and store are immediate (I-type) instructions that move data between memory and the general registers. The only addressing mode that load and store instructions directly support is base register plus 16-bit signed immediate offset.
Scheduling a Load Delay Slot
A load instruction that does not allow its result to be used by the instruction immediately following is called a delayed load instruction. The instruction slot immediately following this delayed load instruction is referred to as the load delay slot.
In the R4000 processor, the instruction immediately following a load instruction can use the contents of the loaded register, however in such cases hardware interlocks insert additional real cycles. Consequently, scheduling load delay slots can be desirable, both for performance and R-Series processor compatibility. However, the scheduling of load delay slots is not absolutely required.
Defining Access Types
CPU Instruction Set Summary
Access type indicates the size of an R4000 processor data item to be loaded
or stored, set by the load or store instruction opcode. Access types are defined in Appendix A.
Regardless of access type or byte ordering (endianness), the address given specifies the low-order byte in the addressed field. For a big-endian configuration, the low-order byte is the most-significant byte; for a little­endian configuration, the low-order byte is the least-significant byte.
The access type, together with the three low-order bits of the address, define the bytes accessed within the addressed doubleword (shown in Table 2-1). Only the combinations shown in Table 2-1 are permissible; other combinations cause address error exceptions. See Appendix A for individual descriptions of CPU load and store instructions.
† Data formats are described in Chapter 1.
MIPS R4000 Microprocessor User's Manual 37
Chapter 2
Table 2-1 Byte Access within a Doubleword
Access Type
Mnemonic
(Value)
Low Order
Address
Bits
Big endian
(63-----------31------------0)
Byte
Bytes Accessed
Little endian
(63-----------31------------0)
Byte210
Doubleword (7) 0 0 0 0123456776543210
0 0 0 0123456 6543210
Septibyte (6)
001 12345677654321 0 0 0 012345 543210
Sextibyte (5)
010 234567765432 0 0 0 01234 43210
Quintibyte (4)
011 3456776543 0 0 0 0123 3210
Word (3)
100 45677654 000012 210 001 123 321
Triplebyte (2)
100 456 654 101 567765 00001 10 010 23 32
Halfword (1)
100 45 54 110 6776 0000 0 001 1 1 010 2 2 011 3 3
Byte (0)
100 4 4 101 5 5 110 6 6 111 77
38 MIPS R4000 Microprocessor User's Manual
Computational Instructions
Computational instructions can be either in register (R-type) format, in which both operands are registers, or in immediate (I-type) format, in which one operand is a 16-bit immediate.
Computational instructions perform the following operations on register values:
arithmetic
logical
shift
multiply
divide
These operations fit in the following four categories of computational instructions:
ALU Immediate instructions
three-Operand Register-Type instructions
shift instructions
multiply and divide instructions
CPU Instruction Set Summary
64-bit Operations
When operating in 64-bit mode, 32-bit operands must be sign extended. The result of operations that use incorrect sign-extended 32-bit values is unpredictable.
MIPS R4000 Microprocessor User's Manual 39
Chapter 2
Cycle Timing for Multiply and Divide Instructions
Any multiply instruction in the integer pipeline is transferred to the multiplier as remaining instructions continue through the pipeline; the product of the multiply instruction is saved in the HI and LO registers.
If the multiply instruction is followed by an MFHI or MFLO before the product is available, the pipeline interlocks until this product does become available.
Table 2-2 gives the execution time for integer multiply and divide operations. The “Total Cycles” column gives the total number of cycles required to execute the instruction. The “Overlap” column gives the number of cycles that overlap other CPU operations; that is, the number of cycles required between the present instruction and a subsequent MFHI or MFLO without incurring an interlock. If this value is zero, the operation is not performed in parallel with any other CPU operation.
Table 2-2 Multiply/Divide Instruction Cycle Timing
Instruction Total Cycles Overlap
MULT 12 10 MULTU 12 10 DIV 75 0 DIVU 75 0 DMULT 20 18 DMULTU 20 18 DDIV 139 0 DDIVU 139 0
For more information about computational instructions, refer to the individual instruction as described in Appendix A.
40 MIPS R4000 Microprocessor User's Manual
Jump and Branch Instructions
Jump and branch instructions change the control flow of a program. All jump and branch instructions occur with a delay of one instruction: that is, the instruction immediately following the jump or branch (this is known as the instruction in the delay slot) always executes while the target instruction is being fetched from storage.
Overview of Jump Instructions
Subroutine calls in high-level languages are usually implemented with Jump or Jump and Link instructions, both of which are J-type instructions. In J-type format, the 26-bit target address shifts left 2 bits and combines with the high-order 4 bits of the current program counter to form an absolute address.
Returns, dispatches, and large cross-page jumps are usually implemented with the Jump Register or Jump and Link Register instructions. Both are R-type instructions that take the 32-bit or 64-bit byte address contained in one of the general purpose registers.
For more information about jump instructions, refer to the individual instruction as described in Appendix A.
CPU Instruction Set Summary
Overview of Branch Instructions
All branch instruction target addresses are computed by adding the address of the instruction in the delay slot to the 16-bit offset (shifted left 2 bits and sign-extended to 32 bits). All branches occur with a delay of one instruction.
If a conditional branch likely is not taken, the instruction in the delay slot is nullified.
For more information about branch instructions, refer to the individual instruction as described in Appendix A.
† Taken branches have a 3 cycle penalty in this implementation. See Chapter 3 for more
information.
MIPS R4000 Microprocessor User's Manual 41
Chapter 2
Special Instructions
Special instructions allow the software to initiate traps; they are always R-type. For more information about special instructions, refer to the individual instruction as described in Appendix A.
Exception Instructions
Exception instructions are extensions to the MIPS ISA. For more information about exception instructions, refer to the individual instruction as described in Appendix A.
Coprocessor Instructions
Coprocessor instructions perform operations in their respective coprocessors. Coprocessor loads and stores are I-type, and coprocessor computational instructions have coprocessor-dependent formats.
Individual coprocessor instructions are described in Appendices A (for CP0) and B (for the FPU, CP1).
CP0 instructions perform operations specifically on the System Control Coprocessor registers to manipulate the memory management and exception handling facilities of the processor. Appendix A details CP0 instructions.
42 MIPS R4000 Microprocessor User's Manual
The CPU Pipeline
This chapter describes the basic operation of the CPU pipeline, which includes descriptions of the delay instructions (instructions that follow a branch or load instruction in the pipeline), interruptions to the pipeline flow caused by interlocks and exceptions, and R4400 implementation of an uncached store buffer.
3
The FPU pipeline is described in Chapter 6.
MIPS R4000 Microprocessor User's Manual 43
Chapter 3
3.1 CPU Pipeline Operation
The CPU has an eight-stage instruction pipeline; each stage takes one PCycle (one cycle of PClock, which runs at twice the frequency of MasterClock). Thus, the execution of each instruction takes at least eight PCycles (four MasterClock cycles). An instruction can take longer—for example, if the required data is not in the cache, the data must be retrieved from main memory.
Once the pipeline has been filled, eight instructions are executed simultaneously. Figure 3-1 shows the eight stages of the instruction pipeline; the next section describes the pipeline stages.
MasterClock
Cycle
PCycle
(8-Deep)
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Current CPU
Figure 3-1 Instruction Pipeline Stages
IF IS RF EX DF DS TC WB
Cycle
44 MIPS R4000 Microprocessor User's Manual
3.2 CPU Pipeline Stages
This section describes each of the eight pipeline stages:
IF - Instruction Fetch, First Half
IS - Instruction Fetch, Second Half
RF - Register Fetch
EX - Execution
DF - Data Fetch, First Half
DS - Data Fetch, Second Half
TC - Tag Check
WB - Write Back
IF - Instruction Fetch, First Half
During the IF stage, the following occurs:
Branch logic selects an instruction address and the instruction cache fetch begins.
The instruction translation lookaside buffer (ITLB) begins the virtual-to-physical address translation.
The CPU Pipeline
IS - Instruction Fetch, Second Half
During the IS stage, the instruction cache fetch and the virtual-to-physical address translation are completed.
RF - Register Fetch
During the RF stage, the following occurs:
The instruction decoder (IDEC) decodes the instruction and checks for interlock conditions.
The instruction cache tag is checked against the page frame number obtained from the ITLB.
Any required operands are fetched from the register file.
MIPS R4000 Microprocessor User's Manual 45
Chapter 3
EX - Execution
During the EX stage, one of the following occurs:
The arithmetic logic unit (ALU) performs the arithmetic or logical operation for register-to-register instructions.
The ALU calculates the data virtual address for load and store instructions.
The ALU determines whether the branch condition is true and calculates the virtual branch target address for branch instructions.
DF - Data Fetch, First Half
During the DF stage, one of the following occurs:
The data cache fetch and the data virtual-to-physical translation begins for load and store instructions.
The branch instruction address translation and translation lookaside buffer (TLB)† update begins for branch instructions.
No operations are performed during the DF, DS, and TC stages for register-to-register instructions.
DS - Data Fetch, Second Half
During the DS stage, one of the following occurs:
The data cache fetch and data virtual-to-physical translation are completed for load and store instructions. The Shifter aligns data to its word or doubleword boundary.
The branch instruction address translation and TLB update are completed for branch instructions.
TC - Tag Check
For load and store instructions, the cache performs the tag check during the TC stage. The physical address from the TLB is checked against the cache tag to determine if there is a hit or a miss.
† The TLB is described in Chapter 4.
46 MIPS R4000 Microprocessor User's Manual
Clock
The CPU Pipeline
WB - Write Back
For register-to-register instructions, the instruction result is written back to the register file during the WB stage. Branch instructions perform no operation during this stage.
Figure 3-2 shows the activities occurring during each ALU pipeline stage, for load, store, and branch instructions.
Phase
Stage
IFetch
and
Decode
ALU
Load/Store
Branch
IC1 Instruction cache access stage 1 IC2 Instruction cache access stage 2 ITLB1 Instruction address translation stage 1 ITLB2 Instruction address translation stage 2 ITC Instruction tag check IDEC Instruction decode RF Register operand fetch ALU Operation DVA Data virtual address calculation DC1 Data cache access stage 1 DC2 Data cache access stage 2 LSA Data load or store align JTLB1 Data/Instruction address translation stage 1 JTLB2 Data/Instruction address translation stage 2 DTC Data tag check IVA Instruction virtual address calculation WB Write back to register file
12
IF IS RF EX DF DS TC WB
IC1 IC2
ITLB1 ITLB2 ITC
12121212121212
IDEC
RF
ALU DVA DC1 DC2
IVA
WB
LSA
JTLB1 JTLB2 DTC WB
MIPS R4000 Microprocessor User's Manual 47
Figure 3-2 CPU Pipeline Activities
Chapter 3
3.3 Branch Delay
The CPU pipeline has a branch delay of three cycles and a load delay of two cycles. The three-cycle branch delay is a result of the branch comparison logic operating during the EX pipeline stage of the branch, producing an instruction address that is available in the IF stage, four instructions later.
Figure 3-3 illustrates the branch delay.
branch
target
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Branch Delay
3.4 Load Delay
The completion of a load at the end of the DS pipeline stage produces an operand that is available for the EX pipeline stage of the third subsequent instruction.
Figure 3-4 shows the load delay of two pipeline stages.
load
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Figure 3-3 CPU Pipeline Branch Delay
three branch delay instructions
IF IS RF EX DF DS TC WB
f(load)
Load
Delay
48 MIPS R4000 Microprocessor User's Manual
IF IS RF EX DF DS TC WB
IF IS RF EX DF DS TC WB
Figure 3-4 CPU Pipeline Load Delay
two load delay instructions
3.5 Interlock and Exception Handling
Smooth pipeline flow is interrupted when cache misses or exceptions occur, or when data dependencies are detected. Interruptions handled using hardware, such as cache misses, are referred to as interlocks, while those that are handled using software are called exceptions.
As shown in Figure 3-5, all interlock and exception conditions are collectively referred to as faults.
Faults
The CPU Pipeline
Software
Exceptions
Stalls
Hardware
Interlocks
Slips
Figure 3-5 Interlocks, Exceptions, and Faults
There are two types of interlocks:
stalls, which are resolved by halting the pipeline
slips, which require one part of the pipeline to advance while another part of the pipeline is held static
At each cycle, exception and interlock conditions are checked for all active instructions.
Because each exception or interlock condition corresponds to a particular pipeline stage, a condition can be traced back to the particular instruction in the exception/interlock stage, as shown in Figure 3-6. For instance, an Illegal Instruction (II) exception is raised in the execution (EX) stage.
Tables 3-1 and 3-2 describe the pipeline interlocks and exceptions listed in Figure 3-6.
MIPS R4000 Microprocessor User's Manual 49
Chapter 3
Clock PCycle
1212121212121212
Pipeline Stage
State
IF IS RF EX DF DS TC WB
ITM ICM CPBE DCM
Stall*
*MP stalls can occur at any stage; they are not associated with any instruction or pipe stage
SXT WA STI
IF IS RF EX DF DS TC WB
LDI MultB
Slip
DivB MDOne ShSlip FCBsy
IF IS RF EX DF DS TC WB
ITLB Intr OVF DTLB DBE
IBE FPE TLBMod Watch IVACoh ExTrap DVACoh
Exceptions
II DECCErr BP NMI SC Reset CUn IECCErr
Figure 3-6 Correspondence of Pipeline Stage to Interlock Condition
50 MIPS R4000 Microprocessor User's Manual
The CPU Pipeline
Table 3-1 Pipeline Exceptions
Exception Description
ITLB Instruction Translation or Address Exception Intr External Interrupt IBE IBus Error IVACoh IVA Coherent II Illegal Instruction BP Breakpoint SC System Call CUn Coprocessor Unusable IECCErr Instruction ECC Error OVF Integer Overflow FPE FP Interrupt ExTrap EX Stage Traps DTLB Data Translation or Address Exception TLBMod TLB Modified DBE Data Bus Error Watch Memory Reference Address Compare DVACoh DVA Coherent DECCErr Data ECC Error NMI Non-maskable Interrupt Reset Reset
MIPS R4000 Microprocessor User's Manual 51
Chapter 3
Interlock Description
ITM Instruction TLB Miss ICM Instruction Cache Miss CPBE Coprocessor Possible Exception SXT Integer Sign Extend STI Store Interlock DCM Data Cache Miss WA Watch Address Exception LDI Load Interlock MultB Multiply Unit Busy DivB Divide Unit Busy MDOne Mult/Div One Cycle Slip ShSlip Var Shift or Shift > 32 bits FCBsy FP Busy
Exception Conditions
Table 3-2 Pipeline Interlocks
When an exception condition occurs, the relevant instruction and all those that follow it in the pipeline are cancelled. Accordingly, any stall conditions and any later exception conditions that may have referenced this instruction are inhibited; there is no benefit in servicing stalls for a cancelled instruction.
After instruction cancellation, a new instruction stream begins, starting execution at a predefined exception vector. System Control Coprocessor registers are loaded with information that identifies the type of exception and auxiliary information such as the virtual address at which translation exceptions occur.
52 MIPS R4000 Microprocessor User's Manual
Stall Conditions
Often, a stall condition is only detected after parts of the pipeline have advanced using incorrect data; this is called apipeline overrun. When a stall condition is detected, all eight instructions—each different stage of the pipeline—are frozen at once. In this stalled state, no pipeline stages can advance until the interlock condition is resolved.
Once the interlock is removed, the restart sequence begins two cycles before the pipeline resumes execution. The restart sequence reverses the pipeline overrun by inserting the correct information into the pipeline.
Slip Conditions
When a slip condition is detected, pipeline stages that must advance to resolve the dependency continue to be retired (completed), while dependent stages are held until the required data is available.
External Stalls
External stall is another class of interlocks. An external stall originates outside the processor and is not referenced to a particular pipeline stage. This interlock is not affected by exceptions.
The CPU Pipeline
Interlock and Exception Timing
To prevent interlock and exception handling from adversely affecting the processor cycle time, the R4000 processor uses both logic and circuit pipeline techniques to reduce critical timing paths. Interlock and exception handling have the following effects on the pipeline:
In some cases, the processor pipeline must be backed up (reversed and started over again from a prior stage) to recover from interlocks.
In some cases, interlocks are serviced for instructions that will be aborted, due to an exception.
These two cases are discussed below.
MIPS R4000 Microprocessor User's Manual 53
Chapter 3
Backing Up the Pipeline
An example of pipeline back-up occurs in a data cache miss, in which the late detection of the miss causes a subsequent instruction to compute an incorrect result.
When this occurs, not only must the cache miss be serviced but the EX stage of the dependent instruction must be re-executed before the pipeline can be restarted. Figure 3-7 illustrates this procedure; a minus (–) after the pipeline stage descriptor (for instance, EX–) indicates the operation produced an incorrect result, while a plus (+) indicates the successful re-execution of that operation.
Cycle
Restart
Load
ALU
Run Run Run Run Run Run Run Stl Stl Stl Stl Stl Run Run Run Run Run
Rst2 Rst1
IF IS RF EX DF DS TC DF DS TC WB
IF IS RF EX DF DS DF DS TC WB
IF IS RF EX DF DF DS TC WB
IF IS RF EX- RF EX+ DF DS TC WB
IF IS RF EX DF DS TC WB
Figure 3-7 Pipeline Overrun
54 MIPS R4000 Microprocessor User's Manual
Aborting an Instruction Subsequent to an Interlock
The interaction between an integer overflow and an instruction cache miss is an example of an interlock being serviced for an instruction that is subsequently aborted.
In this case, pipelining the overflow exception handling into the DF stage allows an instruction cache miss to occur on the next immediate instruction. Figure 3-8 illustrates this; aborted instructions are indicated with an asterisk (*).
The CPU Pipeline
Cycle
Stall
Restart
ALU
Run Run Run Run Stl Stl Stl Stl Stl Run Run Run Run Run Run Run
InstrCacheMiss
Rst2 Rst1
IF IS RF EX DF DS TC WB*
OVF
IF IS RF IF IS RF EX DF DS TC WB*
ICM
IF IS IF IS RF EX DF DS TC WB*
IF IF IS RF EX DF DS TC WB*
Figure 3-8 Instruction Cache Miss
Even though the line brought in by the instruction cache could have been replaced by a line of the exception handler, no performance loss occurs, since the instruction cache miss would have been serviced anyway, after returning from the exception handler. Handling of the exception is done in this fashion because the frequency of an exception occurring is, by definition, relatively low.
MIPS R4000 Microprocessor User's Manual 55
Chapter 3
Pipelining the Exception Handling
Pipelining of interlock and exception handling is done by pipelining the logical resolution of possible fault conditions with the buffering and distributing of the pipeline control signals.
In particular, a half clock period is provided for buffering and distributing the run control signal; during this time the logic evaluation to produce run for the next cycle begins. Figure 3-9 shows this process for a sequence of loads.
Clock
Phase
Load1:
Load2:
Load3:
12
DF DS TC WB
1212121212
TagCk Resolve Buffer
DF DS TC WB
TagCk Resolve Buffer
DF DS TC WB
TagCk Resolve Buffer
Figure 3-9 Pipelining of Interlock and Exception Handling
56 MIPS R4000 Microprocessor User's Manual
Clock
The CPU Pipeline
The decision whether or not to advance the pipeline is derived from these three rules:
All possible fault-causing events, such as cache misses, translation exceptions, load interlocks, etc., must be individually evaluated.
The fault to be serviced is selected, based on a predefined priority as determined by the pipeline stage of the asserted faults.
Pipeline advance control signals are buffered and distributed.
Figure 3-10 illustrates this process.
Phase
Cycle
12
Run Run Run Run
Evaluate Resolve Buffer
12 1212
Evaluate Resolve Buffer
Evaluate Resolve Buffer
Figure 3-10 Pipeline Advance Decision
MIPS R4000 Microprocessor User's Manual 57
Chapter 3
Special Cases
Performance Considerations
In some instances, the pipeline control state machine is bypassed. This occurs due to performance considerations or to correctness considerations, which are described in the following sections.
A performance consideration occurs when there is a cache load miss. By bypassing the pipeline state machine, it is possible to eliminate up to two cycles of load miss latency. Two techniques, address acceleration and address prediction, increase performance.
Address Acceleration
Address acceleration bypasses a potential cache miss address. It is relatively straightforward to perform this bypass since sending the cache miss address to the secondary cache has no negative impact even if a subsequent exception nullifies the effect of this cache access. Power is wasted when the miss is inhibited by some fault, but this is a minor effect.
Address Prediction
Another technique used to reduce miss latency is the automatic increment and transmission of instruction miss addresses following an instruction cache miss. This form of latency reduction is called address prediction: the subsequent instruction miss address is predicted to be a simple increment of the previous miss address. Figure 3-11 shows a cache miss in which the cache miss address is changed based on the detection of the miss.
Cycle
Address
Restart
Load
Run Run Run Run Run Run Run Stl Stl Stl Stl Stl Stl Stl Stl Run
Cache Index
Rst1
Rst2
Rst3
IF IS RF EX DF DS TC DF DS TC WB
Figure 3-11 Load Address Bypassing
Correctness Considerations
An example in which bypassing is necessary to guarantee correctness is a cache write.
58 MIPS R4000 Microprocessor User's Manual
3.6 R4400 Processor Uncached Store Buffer
The R4400 processor contains an uncached store buffer to improve the performance of uncached stores over that available from an R4000 processor. When an uncached store reaches the write-back (WB) stage in the CPU pipeline, the CPU must stall until the store is sent off-chip. In the R4400 processor, a single-entry buffer stores this uncached WB-stage data on the chip without stalling the pipeline.
If a second uncached store reaches the WB stage in the R4400 processor before the first uncached store has been moved off-chip, the CPU stalls until the store buffer completes the first uncached store. To avoid this stall, the compiler can insert seven instruction cycles between the two uncached stores, as shown in Figure 3-12. A single instruction that requires seven cycles to complete could be used in place of the seven No Operation (NOP) instructions.
SW R2, (r3) # uncached store NOP # NOP 1 NOP # NOP 2 NOP # NOP 3 NOP # NOP 4 NOP # NOP 5 NOP # NOP 6 NOP # NOP 7 SW R2, (R3) # uncached store
The CPU Pipeline
Figure 3-12 Pipeline Sequence for Back-to-Back Uncached Stores
If the two uncached stores execute within a loop, the two killed instructions which are part of the loop branch latency are included in the count of seven interpolated cycles. Figure 3-13 shows the four NOP instructions that need to be scheduled in this case.
MIPS R4000 Microprocessor User's Manual 59
Chapter 3
Loop: SW R2, (R3) # uncached store
NOP NOP NOP B Loop # branch to loop NOP killed # branch latency killed # branch latency
Figure 3-13 Back-to-Back Uncached Stores in a Loop
The timing requirements of the System interface govern the latency between uncached stores; back-to-back stores can be sent across the interface at a maximum rate of one store for every four external cycles. If the R4400 processor is programmed to run in divide-by-2 mode (for more information about divided clock, see the description of SClock in Chapter
10), an uncached store can occur every eight pipeline cycles. If a larger clock divisor is used, more pipeline cycles are required for each store.
CAUTION: The R4000 processor always had a strongly-ordered execution; however, with the addition of the uncached store buffer in the R4400 there is a potential for out-of-order execution (described in the section of the same name in Chapter 11, and Uncached Loads or Stores in Chapter 12).
60 MIPS R4000 Microprocessor User's Manual
Memory Management
The MIPS R4000 processor provides a full-featured memory management unit (MMU) which uses an on-chip translation lookaside buffer (TLB) to translate virtual addresses into physical addresses.
4
This chapter describes the processor virtual and physical address spaces, the virtual-to-physical address translation, the operation of the TLB in making these translations, and those System Control Coprocessor (CP0) registers that provide the software interface to the TLB.
MIPS R4000 Microprocessor User's Manual 61
Chapter 4
4.1 Translation Lookaside Buffer (TLB)
Mapped virtual addresses are translated into physical addresses using an on-chip TLB.† The TLB is a fully associative memory that holds 48 entries, which provide mapping to 48 odd/even page pairs (96 pages). When address mapping is indicated, each TLB entry is checked simultaneously for a match with the virtual address that is extended with an ASID stored in the EntryHi register.
The address mapped to a page ranges in size from 4 Kbytes to 16 Mbytes, in multiples of 4—that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M.
Hits and Misses
If there is a virtual address match, or hit, in the TLB, the physical page number is extracted from the TLB and concatenated with the offset to form the physical address (see Figure 4-1).
If no match occurs (TLB miss), an exception is taken and software refills the TLB from the page table resident in memory. Software can write over a selected TLB entry or use a hardware mechanism to write into a random entry.
Multiple Matches
If more than one entry in the TLB matches the virtual address being translated, the operation is undefined. To prevent permanent damage to the part, the TLB may be disabled if more than several entries match. The TLB-Shutdown (TS) bit in the Status register is set to 1 if the TLB is disabled.
† There are virtual-to-physical address translations that occur outside of the TLB. For
example, addresses in the kseg0 and kseg1 spaces are unmapped translations. In these spaces the physical address is derived by subtracting the base address of the space from the virtual address.
62 MIPS R4000 Microprocessor User's Manual
4.2 Address Spaces
This section describes the virtual and physical address spaces and the manner in which virtual addresses are converted or “translated” into physical addresses in the TLB.
Virtual Address Space
The processor virtual address can be either 32 or 64 bits wide,† depending on whether the processor is operating in 32-bit or 64-bit mode.
In 32-bit mode, addresses are 32 bits wide. The maximum user process size is 2 gigabytes (231).
In 64-bit mode, addresses are 64 bits wide. The maximum user process size is 1 terabyte (240).
Figure 4-1 shows the translation of a virtual address into a physical address.
Memory Management
1. Virtual address (VA) represented by the virtual page number (VPN) is compared with tag in TLB.
2. If there is a match, the page frame number (PFN) representing the upper bits of the physical address (PA) is output from the TLB.
3. The Offset, which does not pass through the TLB, is then concatenated to the PFN.
Figure 4-1 Overview of a Virtual-to-Physical Address Translation
TLB
Virtual address
G
ASID
G
ASID
VPN
VPN
Offset
TLB
Entry
PFN
PFN
Offset
Physical address
† Figure 4-8 shows the 32-bit and 64-bit versions of the processor TLB entry.
MIPS R4000 Microprocessor User's Manual 63
Chapter 4
As shown in Figures 4-2 and 4-3, the virtual address is extended with an 8-bit address space identifier (ASID), which reduces the frequency of TLB flushing when switching contexts. This 8-bit ASID is in the CP0 EntryHi register, described later in this chapter. TheGlobal bit (G) is in the EntryLo0 and EntryLo1 registers, described later in this chapter.
Physical Address Space
Using a 36-bit address, the processor physical address space encompasses 64 gigabytes. The section following describes the translation of a virtual address to a physical address.
Virtual-to-Physical Address Translation
Converting a virtual address to a physical address begins by comparing the virtual address from the processor with the virtual addresses in the TLB; there is a match when the virtual page number (VPN) of the address is the same as the VPN field of the entry, and either:
the Global (G) bit of the TLB entry is set, or
the ASID field of the virtual address is the same as the ASID field of the TLB entry.
This match is referred to as a TLB hit. If there is no match, a TLB Miss exception is taken by the processor and software is allowed to refill the TLB from a page table of virtual/physical addresses in memory.
If there is a virtual address match in the TLB, the physical address is output from the TLB and concatenated with the Offset, which represents an address within the page frame space. The Offset does not pass through the TLB.
Virtual-to-physical translation is described in greater detail throughout the remainder of this chapter; Figure 4-20 is a flow diagram of the process shown at the end of this chapter.
The next two sections describe the 32-bit and 64-bit address translations.
64 MIPS R4000 Microprocessor User's Manual
32-bit Mode Address Translation
Figure 4-2 shows the virtual-to-physical-address translation of a 32-bit mode address.
The top portion of Figure 4-2 shows a virtual address with a 12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 20 bits of the address represent the VPN, and index the 1M-entry page table.
The bottom portion of Figure 4-2 shows a virtual address with a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining 8 bits of the address represent the VPN, and index the 256­entry page table.
Virtual Address with 1M (220) 4-Kbyte pages
28 11 0
2931
3239
20 bits = 1M pages
Memory Management
12
ASID
8
Bits 31, 30 and 29 of the virtual address select user, supervisor, or kernel address spaces.
Virtual-to-physical translation in TLB
39
ASID
8
8 bits = 256 pages
Virtual Address with 256 (28)16-Mbyte pages
Figure 4-2 32-bit Mode Virtual Address Translation
VPN
20 12
Virtual-to-physical translation in TLB
TLB
36-bit Physical Address
35 0
PFN
TLB
24
23
Offset
28 293132
VPN
8 24
Offset
Offset passed unchanged to physical memory
Offset
Offset passed unchanged to physical memory
0
MIPS R4000 Microprocessor User's Manual 65
Chapter 4
64-bit Mode Address Translation
Figure 4-3 shows the virtual-to-physical-address translation of a 64-bit mode address. This figure illustrates the two extremes in the range of possible page sizes: a 4-Kbyte page (12 bits) and a 16-Mbyte page (24 bits).
The top portion of Figure 4-3 shows a virtual address with a 12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 28 bits of the address represent the VPN, and index the 256M­entry page table.
The bottom portion of Figure 4-3 shows a virtual address with a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining 16 bits of the address represent the VPN, and index the 64K­entry page table.
Virtual Address with 256M (228) 4-Kbyte pages
63
6471
6162 40 39
28 bits = 256M pages
11 0
12
ASID
8
Bits 62 and 63 of the virtual address select user, supervisor, or kernel address spaces.
0 or -1
Virtual-to-physical translation in TLB
636471 6162 40 24
ASID
8
Figure 4-3 64-bit Mode Virtual Address Translation
VPN Offset
24
28
12
TLB
36-bit Physical Address
35 0
PFN
Virtual-to-physical translation in TLB
Offset
TLB
39
0 or -1
24
Virtual Address with 64K (216)16-Mbyte pages
VPN
16
16 bits = 64K pages
23 0
Offset
24
Offset passed unchanged to physical memory
Offset passed unchanged to physical memory
66 MIPS R4000 Microprocessor User's Manual
Operating Modes
The processor has three operating modes that function in both 32- and 64­bit operations:
These modes are described in the next three sections.
User Mode Operations
In User mode, a single, uniform virtual address space—labelled User segment—is available; its size is:
Figure 4-4 shows User mode virtual address space.
0x FFFF FFFF
Memory Management
User mode
Supervisor mode
Kernel mode
2 Gbytes (231 bytes) in 32-bit mode (useg)
1 Tbyte (240bytes) in 64-bit mode (xuseg)
32-bit* 64-bit
0x FFFF FFFF FFFF FFFF
Address
Error
1 TB
Mapped
0x 8000 0000
0x 0000 0000
Address
Error
2 GB
Mapped
0x 0000 0100 0 0 00 0000
useg xuseg
0x 0000 0000 0000 0000
Figure 4-4 User Mode Virtual Address Space
*NOTE: The R4000 uses 64-bit addresses internally. When the kernel is running in Kernel mode, it initializes registers before switching modes, and saves (or restores, whichever is appropriate) register values on context switches. In 32-bit mode, a valid address must be a 32-bit signed number, where bits 63:32 = bit 31. In normal operation it is not possible for a 32-bit User-mode program to produce invalid addresses. However, although it would be an error, it is possible for a Kernel-mode program to erroneously place a value that is not a 32-bit signed number into a 64-bit register, in which case the User-mode program generates an invalid address.
MIPS R4000 Microprocessor User's Manual 67
Chapter 4
The User segment starts at address 0 and the current active user process resides in either useg (in 32-bit mode) or xuseg (in 64-bit mode). The TLB identically maps all references to useg/xuseg from all modes, and controls cache accessibility.
The processor operates in User mode when the Status register contains the following bit-values:
KSU bits = 10
2
EXL = 0
ERL = 0
In conjunction with these bits, the UX bit in the Status register selects between 32- or 64-bit User mode addressing as follows:
when UX = 0, 32-bit useg space is selected and TLB misses are handled by the 32-bit TLB refill exception handler
when UX = 1, 64-bit xuseg space is selected and TLB misses are handled by the 64-bit XTLB refill exception handler
Table 4-1 lists the characteristics of the two user mode segments, useg and
xuseg.
Table 4-1 32-bit and 64-bit User Mode Segments
Status Register
Address Bit
Values
Segment
Name
Address Range Segment SizeBit Values
KSU EXL ERL UX
32-bit
A(31) = 0
64-bit
A(63:40) = 0
† The cached (C) field in a TLB entry determines whether the reference is cached; see Figur e
4-8.
68 MIPS R4000 Microprocessor User's Manual
102000useg
102001xuseg
0x0000 0000
through
0x7FFF FFFF
0x0000 0000 0000 0000
through
0x0000 00FF FFFF FFFF
2 Gbyte
(231 bytes)
1 Tbyte
(240 bytes)
Memory Management
32-bit User Mode (useg)
In User mode, when UX = 0 in the Status register, User mode addressing is compatible with the 32-bit addressing model shown in Figure 4-4, and a 2-Gbyte user address space is available, labelled useg.
All valid User mode virtual addresses have their most-significant bit cleared to 0; any attempt to reference an address with the most-significant bit set while in User mode causes an Address Error exception.
The system maps all references to useg through the TLB, and bit settings within the TLB entry for the page determine the cacheability of a reference.
64-bit User Mode (xuseg)
In User mode, when UX =1 in theStatus register, User mode addressing is extended to the 64-bit model shown in Figure 4-4. In 64-bit User mode, the processor provides a single, uniform address space of 240 bytes, labelled xuseg.
All valid User mode virtual addresses have bits 63:40 equal to 0; an attempt to reference an address with bits 63:40 not equal to 0 causes an Address Error exception.
Supervisor Mode Operations
Supervisor mode is designed for layered operating systems in which a true kernel runs in R4000 Kernel mode, and the rest of the operating system runs in Supervisor mode.
The processor operates in Supervisor mode when the Status register contains the following bit-values:
KSU = 01
EXL =0
ERL = 0
In conjunction with these bits, the SX bit in the Status register selects between 32- or 64-bit Supervisor mode addressing:
when SX = 0, 32-bit supervisor space is selected and TLB misses are handled by the 32-bit TLB refill exception handler
when SX = 1, 64-bit supervisor space is selected and TLB misses are handled by the 64-bit XTLB refill exception handler
MIPS R4000 Microprocessor User's Manual 69
2
Chapter 4
Figure 4-5 shows Supervisor mode address mapping. Table 4-2 lists the characteristics of the supervisor mode segments; descriptions of the address spaces follow.
0x FFFF FFFF 0x E000 0000
0x C000 0000
0x A000 0000
0x 8000 0000
0x 0000 0000
32-bit*
Address
error
0.5 GB
Mapped
Address
error
Address
error
0x FFFF FFFF FFFF FFFF 0x FFFF FFFF E000 0000
sseg
0x FFFF FFFF C000 0000
0x 4000 0100 0000 0000
0x 4000 0000 0000 0000
64-bit
Address
error
0.5 GB
Mapped
Address
error 1 TB
Mapped
csseg
xsseg
Address
2 GB
Mapped
suseg
0x 0000 0100 0000 0000
0x 0000 0000 0000 0000
error
1 TB
Mapped
xsuseg
Figure 4-5 Supervisor Mode Address Space
*NOTE: The R4000 uses 64-bit addresses internally. In 32-bit mode, a valid address must be a 32-bit signed number, where bits 63:32 = bit
31. In normal operation it is not possible for a 32-bit Supervisor-mode
program to create an invalid address through arithmetic operations. However 32-bit-mode Supervisor programs must not create addresses using base register+offset calculations that produce a 32-bit 2’s- complement overflow; in specific, there are two prohibited cases:
offset with bit 15 = 0 and base register with bit 31 = 0, but (base register+offset) bit 31 = 1
offset with bit 15 = 1 and base register with bit 31 = 1, but (base register+offset) bit 31 = 0
Using this invalid address produces an undefined result.
70 MIPS R4000 Microprocessor User's Manual
Loading...