Mips Technologies R4000 User Manual

Download

Page 1

MIPS R4000 Microprocessor

User’s Manual

Second Edition

Joe Heinrich

Page 2

RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure of the technical data contained in this

document by the Government is subject to restrictions as set forth in subdivision (c) (1) (ii) of the Rights in Technical Data and Computer Software clause at DFARS 52.227-7013 and/or in similar or successor clauses in the FAR, or in the DOD or NASA FAR Supplement. Unpublished rights reserved under the Copyright Laws of the United States. Contractor/manufacturer is MIPS Technologies, Inc., 2011 N. Shoreline Blvd., Mountain View, CA 94039-7311.

RISCompiler, RISC/os, R2000, R6000, R4000, and R4400 are trademarks of MIPS Technologies, Inc. MIPS and R3000 are registered trademarks of MIPS Technologies, Inc.

IBM 370 is a registered trademark of International Business Machines. VAX is a registered trademark of Digital Equipment Corporation. iAPX is a registered trademark of Intel Corporation. MC68000 is a registered trademark of Motorola Inc. UNIX is a registered trademark in the United States and other countries,

licensed exclusively through X/Open Company, Ltd.

MIPS Technologies, Inc. 2011 North Shoreline Mountain View, California 94039-7311

Page 3

Acknowledgments for the First Edition

First of all, special thanks go to Duk Chun for his patient help in supplying and verifying the content of this manual; that this manual is technically correct is, in a very large part, directly attributable to him.

Thanks also to the following people for supplying portions of this book: Shabbir Latif, for, among other things, the exception handler flow charts, the description of the output buffer edge-control logic, and the interrupts; once again, Duk Chun, for his paper on R4000 processor synchronization support; Paul Ries, for confirming the accuracy of sections describing the memory management and the caches; John Mashey, for verifying the R4000 processor actually does employ the 64-bit architecture; Dave Ditzel, for raising the issue in the first place; and Mike

Gupta, for substantiating various aspects of the errata. Finally, thanks to Ed Reidenbach for supplying a large portion of the parity and ECC sections of this

manual, and Michael Ngo for checking their accuracy. Thanks also to the following folks for their technical assistance: Andy Keane,

Keith Garrett, Viggy Mokkarala, Charles Price, Ali Moayedian, George Hsieh, Peter Fu, Stephen Przybylski, Michael Woodacre, and Earl Killian. Also to be thanked are the people at fvn@world.std.com: Bill Tuthill, Barry Shein, Bob Devine, and Alan Marr, for helping place RISC in a pecuniary perspective. Also,

thanks to the following people at the mystery_train@swim2birds news group: toma, dan_sears, jharris@garnet, tut@cairo (again), and elvis@dalkey(mateo_b). Their night-

for-day netversations, fueled by caffeine, concerning the viability of the cyberpsykinetic compute-core model helped form an important basis of this book.

On the editorial front, thanks once again to Ms. Robin Cowan, of the Consortium of Editorial Arts for her labors in editing this manual. Thanks to Evelyn Spire for slaving over that bottomless black well we refer to as an “Index.” Thanks also, once again, to Karen Gettman, and Lisa Iarkowski at Prentice-Hall for their help.

On the artistic side, thanks to Jeanne Simonian, of the Creative department here at Silicon Graphics, for the book cover design; and thanks to Pam Flanders for providing MarCom tactical support.

Have we missed anyone? If so, here is where we apologize for doing so.

Joe Heinrich

April 1, 1993

Mt. View, California

MIPS R4000 Microprocessor User's Manual iii

Page 4

MIPS R4000 Microprocessor User's Manual iv

Page 5

Acknowledgments for the Second Edition

Thanks go to Shabbir Latif, from whose errata the major part of this second edition is derived. Thanks also to Charlie Price for, among other things, making available his revision of the ISA.

On the production side, thanks to Kay Maitz, Beth Fraker, Molly Castor, Lynnea Humphries, and Claudia Lohnes for their assistance at the center of the hurricane.

MIPS R4000 Microprocessor User's Manual v

Joe Heinrich joeh@sgi.com April 1, 1994

Mt. View, California

Page 6

MIPS R4000 Microprocessor User's Manual vi

Page 7

Preface

This book describes the MIPS R4000 and R4400 family of RISC microprocessors (also referred to in this book as processor).

Overview of the Contents

Chapter 1 is a discussion (including the historical context) of RISC

development in general, and the R4000 microprocessor in particular.

Chapter 2 is an overview of the CPU instruction set. Chapter 3 describes the operation of the R4000 instruction execution

pipeline, including the basic operation of the pipeline and interruptions that are caused by interlocks and exceptions.

Chapter 4 describes the memory management system including address mapping and address spaces, virtual memory, the translation lookaside buffer (TLB), and the System Control Processor (CP0).

Chapter 5 describes the exception processing resources of R4000 processor. It includes an overview of the CPU exception handling process and describes the format and use of each CPU exception handling register.

MIPS R4000 Microprocessor User's Manual vii

Page 8

Preface

Chapter 6 describes the Floating-Point Unit (FPU), a coprocessor for the CPU that extends the CPU instruction set to perform floatingpoint arithmetic operations. This chapter lists the FPU registers and instructions.

Chapter 7 describes the FPU exception processing. Chapter 8 describes the signals that pass between the R4000 processor

and other components in a system. The signals discussed include the System interface, the Clock/Control interface, the Secondary Cache interface, the Interrupt interface, the Initialization interface, and the JTAG interface.

Chapter 9 describes in more detail the Initialization interface, which includes the boot modes for the processor, as well as system resets.

Chapter 10 describes the clocks used in the R4000 processor, as well as the processor status reporting mechanism.

Chapter 11 discusses cache memory, including the operation of the primary and secondary caches, and cache coherency in a multiprocessor system.

Chapter 12 describes the System interface, which allows the processor access to external resources such as memory and input/output (I/O). It also allows an external agent access to the internal resources of the processor, such as the secondary cache.

Chapter 13 describes the Secondary Cache interface, including read and write cycle timing. This chapter also discusses the interface buses and signals.

Chapter 14 describes the Joint Test Action Group (JTAG) interface. The JTAG boundary scan mechanism tests the interconnections between the R4000 processor, the printed circuit board to which it is mounted, and other components on the board.

Chapter 15 describes the single nonmaskable processor interrupt, along with the six hardware and two software processor interrupts.

Chapter 16 describes the error checking and correcting (ECC) mechanisms of the R4000 processor.

viii MIPS R4000 Microprocessor User's Manual

Page 9

A Note on Style

Preface

Appendix A describes the R4000 CPU instructions, in both 32- and 64bit modes. The instruction list is given in alphabetical order.

Appendix B describes the R4000 FPU instructions, listed alphabetically.

Appendix C describes sub-block ordering, a nonsequential method of retrieving data.

Appendix D describes the output buffer and the ∆i/∆t control mechanism.

Appendix E describes the passive components that make up the phase-locked loop (PLL).

Appendix F describes Coprocessor 0 hazards. Appendix G describes the R4000 pinout.

A brief note on some of the stylistic conventions used in this book: bits, fields, and registers of interest from a software perspective are italicized (such as Config register); signal names of more importance from a hardware point of view are rendered in bold (such as Reset*).

A range of bits uses a colon as a separator; for instance, (15:0) represents the 16-bit range that runs from bit 0, inclusive, through bit

15. (In some places an ellipsis may used in place of the colon for visibility: (15...0).)

MIPS R4000 Microprocessor User's Manual ix

Page 10

Preface

x MIPS R4000 Microprocessor User's Manual

Page 11

Preface to the Second Edition

Changes From the First Edition

The second edition of this book incorporates certain low-level changes and technical additions, but retains a substantive identity with the original version.

Changes from the first edition are indicated by left-margin vertical rules.

Getting MIPS Documents On-Line

MIPS documents (including an electronic version of the errata) are available on-line, through the file transport protocol (FTP). To retrieve them, follow the steps below. The text you are to type is shown in Courier Bold font; the computer’s responses are in shown in Courier Regular font.

1. First, place yourself in the directory on your system within which you want to store the retrieved files. Do this by typing:

cd <directory_you_want_file_to_be_in>

2. Access the MIPS document server, sgigate, through FTP by typing:

ftp sgigate.sgi.com

3. The server tells you when you are connected for FTP by responding:

Connected to sgigate.sgi.com.

MIPS R4000 Microprocessor User's Manual xi

Page 12

Preface

4. Next (after some announcements) the server asks you to log in by requesting a name and then a password.

Name (sgigate.sgi.com:<login_name>):

5. Login by typing anonymous for your name and your electronic mail address for your password.

Name (sgigate.sgi.com:<login_name>): anonymous 331 Guest login ok, type your name as

password.

Password: your_email_address

6. The system indicates you have successfully logged in by supplying an FTP prompt:

ftp>

7. Go to the pub/doc directory by typing:

ftp> cd pub/doc

8. You can take a look at the contents of the doc directory by listing them:

ftp> ls

9. You will find several R4000-related subdirectories, such as R4200, R4400, and R4600. When you find the subdirectory you want, cd into that subdirectory and retrieve the file you want by typing:

get <filename>

This copies the ﬁle from sgigate back to your system.

10. When you have retrieved the files you want, exit from ftp by typing:

ftp> quit

11. If the file was encoded for transmission, you must decode it, after retrieval, by typing:

uudecode <filename>

12. If the file was compressed for transmission, you must uncompress it, after retrieval, by typing:

uncompress <filename>

13. If you tarred the file, type:

tar xvof <filename>

xii MIPS R4000 Microprocessor User's Manual

Page 13

Table of Contents

Preface

Overview of the Contents...................................................................................vii

A Note on Style ....................................................................................................ix

Preface to the Second Edition

Changes From the First Edition.........................................................................xi

Getting MIPS Documents On-Line.................................................................... xi

MIPS R4000 Microprocessor User's Manual xiii

Page 14

Table of Contents

Introduction

Benefits of RISC Design...........................................................................................2

Shorter Design Cycle........................................................................................... 3

Effective Utilization of Chip Area ..................................................................... 3

User (Programmer) Benefits...............................................................................3

Advanced Semiconductor Technologies.......................................................... 3

Optimizing Compilers.........................................................................................4

MIPS RISCompiler Language Suite ..................................................................5

Compatibility............................................................................................................ 6

Processor General Features..................................................................................... 6

R4000 Processor Configurations ............................................................................7

R4400 Processor Enhancements............................................................................. 7

R4000 Processor........................................................................................................9

64-bit Architecture ............................................................................................... 9

Superpipeline Architecture ................................................................................11

System Interface................................................................................................... 11

CPU Register Overview......................................................................................12

CPU Instruction Set Overview...........................................................................14

Data Formats and Addressing........................................................................... 24

Coprocessors (CP0-CP2)..................................................................................... 27

System Control Coprocessor, CP0.................................................................27

Floating-Point Unit (FPU), CP1 ..................................................................... 30

Memory Management System (MMU).............................................................31

The Translation Lookaside Buffer (TLB)......................................................31

Operating Modes.............................................................................................32

Cache Memory Hierarchy.............................................................................. 32

Primary Caches................................................................................................33

Secondary Cache Interface............................................................................. 33

xiv MIPS R4000 Microprocessor User's Manual

Page 15

CPU Instruction Set Summary

CPU Instruction Formats ........................................................................................36

Load and Store Instructions ...............................................................................37

Scheduling a Load Delay Slot........................................................................37

Defining Access Types....................................................................................37

Computational Instructions................................................................................39

64-bit Operations .............................................................................................39

Cycle Timing for Multiply and Divide Instructions................................... 40

Jump and Branch Instructions ...........................................................................41

Overview of Jump Instructions ..................................................................... 41

Overview of Branch Instructions ..................................................................41

Special Instructions..............................................................................................42

Exception Instructions......................................................................................... 42

Coprocessor Instructions ....................................................................................42

The CPU Pipeline

Table of Contents

CPU Pipeline Operation..........................................................................................44

CPU Pipeline Stages................................................................................................. 45

Branch Delay.............................................................................................................48

Load Delay ................................................................................................................48

Interlock and Exception Handling......................................................................... 49

Exception Conditions .......................................................................................... 52

Stall Conditions....................................................................................................53

Slip Conditions.....................................................................................................53

External Stalls ....................................................................................................... 53

Interlock and Exception Timing ........................................................................53

Backing Up the Pipeline .................................................................................54

Aborting an Instruction Subsequent to an Interlock..................................55

Pipelining the Exception Handling...................................................................56

Special Cases.........................................................................................................58

Performance Considerations.......................................................................... 58

Correctness Considerations............................................................................58

R4400 Processor Uncached Store Buffer ............................................................... 59

MIPS R4000 Microprocessor User's Manual xv

Page 16

Table of Contents

Memory Management

Translation Lookaside Buffer (TLB) ......................................................................62

Hits and Misses .................................................................................................... 62

Multiple Matches .................................................................................................62

Address Spaces.........................................................................................................63

Virtual Address Space.........................................................................................63

Physical Address Space....................................................................................... 64

Virtual-to-Physical Address Translation..........................................................64

32-bit Mode Address Translation......................................................................65

64-bit Mode Address Translation......................................................................66

Operating Modes .................................................................................................67

User Mode Operations...................................................................................67

Supervisor Mode Operations........................................................................69

Kernel Mode Operations ............................................................................... 73

System Control Coprocessor ..................................................................................80

Format of a TLB Entry.........................................................................................81

CP0 Registers........................................................................................................84

Index Register (0).............................................................................................85

Random Register (1)........................................................................................86

EntryLo0 (2), and EntryLo1 (3) Registers.....................................................87

PageMask Register (5)..................................................................................... 87

Wired Register (6)............................................................................................88

EntryHi Register (CP0 Register 10)...............................................................89

Processor Revision Identifier (PRId) Register (15)......................................89

Config Register (16).........................................................................................90

Load Linked Address (LLAddr) Register (17) ............................................93

Cache Tag Registers [TagLo (28) and TagHi (29)]...................................... 93

Virtual-to-Physical Address Translation Process............................................ 95

TLB Misses............................................................................................................ 97

TLB Instructions...................................................................................................97

xvi MIPS R4000 Microprocessor User's Manual

Page 17

CPU Exception Processing

How Exception Processing Works......................................................................... 100

Exception Processing Registers..............................................................................101

Context Register (4) .............................................................................................102

Bad Virtual Address Register (BadVAddr) (8)................................................103

Count Register (9) ................................................................................................ 103

Compare Register (11).........................................................................................104

Status Register (12)...............................................................................................105

Status Register Format....................................................................................105

Status Register Modes and Access States..................................................... 109

Status Register Reset .......................................................................................110

Cause Register (13) ..............................................................................................110

Exception Program Counter (EPC) Register (14) ............................................ 112

WatchLo (18) and WatchHi (19) Registers ....................................................... 113

XContext Register (20)......................................................................................... 114

Error Checking and Correcting (ECC) Register (26)....................................... 115

Cache Error (CacheErr) Register (27)................................................................116

Error Exception Program Counter (Error EPC) Register (30)........................118

Processor Exceptions ...............................................................................................119

Exception Types................................................................................................... 119

Reset Exception Process..................................................................................120

Cache Error Exception Process......................................................................120

Soft Reset and NMI Exception Process......................................................... 121

General Exception Process .............................................................................121

Exception Vector Locations................................................................................ 122

Priority of Exceptions..........................................................................................123

Reset Exception ....................................................................................................124

Soft Reset Exception ............................................................................................125

Address Error Exception..................................................................................... 127

TLB Exceptions.....................................................................................................128

TLB Refill Exception........................................................................................129

TLB Invalid Exception.....................................................................................130

TLB Modified Exception.................................................................................131

Cache Error Exception......................................................................................... 132

Virtual Coherency Exception ............................................................................. 133

Bus Error Exception.............................................................................................134

Integer Overflow Exception ...............................................................................135

Table of Contents

MIPS R4000 Microprocessor User's Manual xvii

Page 18

Table of Contents

Trap Exception .....................................................................................................136

System Call Exception.........................................................................................137

Breakpoint Exception ..........................................................................................138

Reserved Instruction Exception.........................................................................139

Coprocessor Unusable Exception......................................................................140

Floating-Point Exception.....................................................................................141

Watch Exception ..................................................................................................142

Interrupt Exception.............................................................................................. 143

Exception Handling and Servicing Flowcharts ...................................................144

xviii MIPS R4000 Microprocessor User's Manual

Page 19

Floating-Point Unit

Overview................................................................................................................... 152

FPU Features.............................................................................................................153

FPU Programming Model.......................................................................................154

Floating-Point General Registers (FGRs).......................................................... 154

Floating-Point Registers......................................................................................156

Floating-Point Control Registers .......................................................................157

Implementation and Revision Register, (FCR0)..............................................158

Control/Status Register (FCR31)....................................................................... 159

Accessing the Control/Status Register......................................................... 160

IEEE Standard 754 ........................................................................................... 161

Control/Status Register FS Bit....................................................................... 161

Control/Status Register Condition Bit.........................................................161

Control/Status Register Cause, Flag, and Enable Fields...........................161

Control/Status Register Rounding Mode Control Bits..............................163

Floating-Point Formats............................................................................................164

Binary Fixed-Point Format...................................................................................... 166

Floating-Point Instruction Set Overview.............................................................. 167

Floating-Point Load, Store, and Move Instructions........................................169

Transfers Between FPU and Memory........................................................... 169

Transfers Between FPU and CPU..................................................................169

Load Delay and Hardware Interlocks.......................................................... 169

Data Alignment................................................................................................ 170

Endianness........................................................................................................170

Floating-Point Conversion Instructions............................................................170

Floating-Point Computational Instructions..................................................... 170

Branch on FPU Condition Instructions............................................................. 170

Floating-Point Compare Operations.................................................................171

FPU Instruction Pipeline Overview.......................................................................172

Instruction Execution ..........................................................................................172

Instruction Execution Cycle Time .....................................................................173

Scheduling FPU Instructions.............................................................................. 175

FPU Pipeline Overlapping.................................................................................. 175

Instruction Scheduling Constraints ..............................................................176

Instruction Latency, Repeat Rate, and Pipeline Stage Sequences.............181

Resource Scheduling Rules ............................................................................ 182

Table of Contents

MIPS R4000 Microprocessor User's Manual xix

Page 20

Table of Contents

Floating-Point Exceptions

Exception Types........................................................................................................188

Exception Trap Processing......................................................................................189

Flags ...........................................................................................................................190

FPU Exceptions......................................................................................................... 192

Inexact Exception (I)............................................................................................ 192

Invalid Operation Exception (V)........................................................................ 193

Division-by-Zero Exception (Z).........................................................................194

Overflow Exception (O)...................................................................................... 194

Underflow Exception (U).................................................................................... 195

Unimplemented Instruction Exception (E) ...................................................... 196

Saving and Restoring State ..................................................................................... 197

Trap Handlers for IEEE Standard 754 Exceptions............................................... 198

R4000 Processor Signal Descriptions

System Interface Signals..........................................................................................201

Clock/Control Interface Signals ............................................................................203

Secondary Cache Interface Signals........................................................................ 205

Interrupt Interface Signals ......................................................................................207

JTAG Interface Signals............................................................................................. 207

Initialization Interface Signals................................................................................208

Signal Summary .......................................................................................................209

xx MIPS R4000 Microprocessor User's Manual

Page 21

Initialization Interface

Functional Overview ...............................................................................................214

Reset Signal Description.......................................................................................... 215

Power-on Reset..................................................................................................... 216

Cold Reset .............................................................................................................217

Warm Reset...........................................................................................................217

Initialization Sequence.............................................................................................218

Boot-Mode Settings..................................................................................................222

Clock Interface

Signal Terminology..................................................................................................228

Basic System Clocks.................................................................................................229

MasterClock..........................................................................................................229

MasterOut .............................................................................................................229

SyncIn/SyncOut................................................................................................... 229

PClock....................................................................................................................229

SClock.................................................................................................................... 230

TClock....................................................................................................................230

RClock.................................................................................................................... 230

PClock-to-SClock Division .................................................................................230

System Timing Parameters..................................................................................... 233

Alignment to SClock............................................................................................ 233

Alignment to MasterClock .................................................................................233

Phase-Locked Loop (PLL)................................................................................... 233

Connecting Clocks to a Phase-Locked System.....................................................234

Connecting Clocks to a System without Phase Locking.....................................235

Connecting to a Gate-Array Device ..................................................................235

Connecting to a CMOS Logic System............................................................... 238

Processor Status Outputs ........................................................................................ 241

Table of Contents

MIPS R4000 Microprocessor User's Manual xxi

Page 22

Table of Contents

Cache Organization, Operation, and Coherency

Memory Organization............................................................................................. 244

Overview of Cache Operations.............................................................................. 245

R4000 Cache Description......................................................................................... 246

Secondary Cache Size..........................................................................................248

Variable-Length Cache Lines ............................................................................. 248

Cache Organization and Accessibility..............................................................248

Organization of the Primary Instruction Cache (I-Cache)......................... 249

Organization of the Primary Data Cache (D-Cache)..................................250

Accessing the Primary Caches.......................................................................251

Organization of the Secondary Cache.......................................................... 252

Accessing the Secondary Cache.....................................................................254

Cache States............................................................................................................... 255

Primary Cache States...........................................................................................256

Secondary Cache States....................................................................................... 256

Mapping States Between Caches....................................................................... 257

Cache Line Ownership............................................................................................ 258

Cache Write Policy...................................................................................................259

Cache State Transition Diagrams...........................................................................260

Cache Coherency Overview ................................................................................... 264

Cache Coherency Attributes...............................................................................264

Uncached ..........................................................................................................265

Noncoherent.....................................................................................................265

Sharable.............................................................................................................265

Update...............................................................................................................265

Exclusive ........................................................................................................... 266

Cache Operation Modes......................................................................................266

Secondary-Cache Mode..................................................................................266

No-Secondary-Cache Mode........................................................................... 266

Strong Ordering ...................................................................................................267

An Example of Strong Ordering....................................................................267

Testing for Strong Ordering...........................................................................267

Restarting the Processor .................................................................................268

Maintaining Coherency on Loads and Stores......................................................269

Manipulation of the Cache by an External Agent............................................... 270

Invalidate...............................................................................................................270

Update ...................................................................................................................270

xxii MIPS R4000 Microprocessor User's Manual

Page 23

Table of Contents

Snoop ..................................................................................................................... 270

Intervention...........................................................................................................271

Coherency Conflicts.................................................................................................271

How Coherency Conflicts Arise ........................................................................ 272

Processor Coherent Read Requests...............................................................272

Processor Invalidate or Update Requests ....................................................273

External Coherency Requests ........................................................................274

System Implications of Coherency Conflicts................................................... 275

System Model...................................................................................................276

Load...................................................................................................................278

Store...................................................................................................................278

Processor Coherent Read Request and Read Response.............................278

Processor Invalidate........................................................................................ 279

Processor Write................................................................................................ 279

Handling Coherency Conflicts........................................................................... 280

Coherent Read Conflicts.................................................................................280

Coherent Write Conflicts................................................................................281

Invalidate Conflicts .........................................................................................282

Sample Cycle: Coherent Read Request.............................................................283

R4000 Processor Synchronization Support........................................................... 286

Test-and-Set (Spinlock) .......................................................................................286

Counter..................................................................................................................288

LL and SC..............................................................................................................289

Examples Using LL and SC................................................................................ 290

MIPS R4000 Microprocessor User's Manual xxiii

Page 24

Table of Contents

System Interface

Terminology..............................................................................................................294

System Interface Description..................................................................................294

Interface Buses......................................................................................................295

Address and Data Cycles ...............................................................................296

Issue Cycles ......................................................................................................296

Handshake Signals..............................................................................................298

System Interface Protocols......................................................................................299

Master and Slave States....................................................................................... 299

Moving from Master to Slave State...................................................................300

External Arbitration............................................................................................. 300

Uncompelled Change to Slave State .................................................................301

Processor and External Requests ...........................................................................302

Rules for Processor Requests.............................................................................. 303

Processor Requests...............................................................................................304

Processor Read Request..................................................................................306

Processor Write Request.................................................................................307

Processor Invalidate Request.........................................................................308

Processor Update Request..............................................................................310

Clusters..............................................................................................................311

External Requests.................................................................................................313

External Read Request.................................................................................... 316

External Write Request................................................................................... 316

External Invalidate Request ........................................................................... 316

External Update Request................................................................................316

External Snoop Request..................................................................................317

External Intervention Request....................................................................... 317

Read Response .................................................................................................317

Handling Requests...................................................................................................318

Load Miss..............................................................................................................318

Secondary-Cache Mode..................................................................................320

No-Secondary-Cache Mode........................................................................... 320

Store Miss..............................................................................................................321

Secondary-Cache Mode..................................................................................323

No-Secondary-Cache Mode........................................................................... 325

Store Hit.................................................................................................................326

Secondary-Cache Mode..................................................................................326

xxiv MIPS R4000 Microprocessor User's Manual

Page 25

Table of Contents

No-Secondary-Cache Mode........................................................................... 326

Uncached Loads or Stores ..................................................................................326

CACHE Operations............................................................................................. 327

Load Linked Store Conditional Operation....................................................... 327

Processor and External Request Protocols............................................................329

Processor Request Protocols...............................................................................330

Processor Read Request Protocol.................................................................. 330

Processor Write Request Protocol................................................................. 333

Processor Invalidate and Update Request Protocol ................................... 335

Processor Null Write Request Protocol........................................................ 336

Processor Cluster Request Protocol .............................................................. 337

Processor Request and Cluster Flow Control.............................................. 338

External Request Protocols.................................................................................341

External Arbitration Protocol......................................................................... 342

External Read Request Protocol ....................................................................343

External Null Request Protocol .....................................................................344

External Write Request Protocol ...................................................................347

External Invalidate and Update Request Protocols....................................348

External Intervention Request Protocol .......................................................349

External Snoop Request Protocol.................................................................. 352

Read Response Protocol..................................................................................354

Data Rate Control.....................................................................................................356

Data Transfer Patterns......................................................................................... 356

Secondary Cache Transfers ................................................................................357

Secondary Cache Write Cycle Time.................................................................. 358

Independent Transmissions on the SysAD Bus ..............................................359

System Interface Endianness..............................................................................360

System Interface Cycle Time...................................................................................361

Cluster Request Spacing .....................................................................................361

Release Latency.................................................................................................... 362

External Request Response Latency.................................................................. 363

System Interface Commands and Data Identifiers.............................................. 364

Command and Data Identifier Syntax..............................................................364

System Interface Command Syntax ..................................................................365

Read Requests .................................................................................................. 366

Write Requests .................................................................................................367

Null Requests................................................................................................... 369

Invalidate Requests .........................................................................................370

MIPS R4000 Microprocessor User's Manual xxv

Page 26

Table of Contents

Update Requests.............................................................................................. 370

Intervention and Snoop Requests .................................................................372

System Interface Data Identifier Syntax ........................................................... 374

Coherent Data ..................................................................................................374

Noncoherent Data............................................................................................374

Data Identifier Bit Definitions........................................................................ 375

System Interface Addresses....................................................................................377

Addressing Conventions ....................................................................................377

Sequential and Subblock Ordering....................................................................378

Processor Internal Address Map............................................................................ 378

Secondary Cache Interface

Data Transfer Rates..................................................................................................380

Duplicating Signals..................................................................................................380

Accessing a Split Secondary Cache........................................................................381

SCDChk Bus..............................................................................................................381

SCTAG Bus................................................................................................................ 381

Operation of the Secondary Cache Interface........................................................ 382

Read Cycles...........................................................................................................383

4-Word Read Cycle.......................................................................................... 383

8-Word Read Cycle.......................................................................................... 384

Notes on a Secondary Cache Read Cycle.....................................................384

Write Cycles..........................................................................................................385

4-Word Write Cycle......................................................................................... 385

8-Word Write Cycle......................................................................................... 386

Notes on a Secondary Cache Write Cycle....................................................387

xxvi MIPS R4000 Microprocessor User's Manual

Page 27

JTAG Interface

What Boundary Scanning Is ................................................................................... 390

Signal Summary .......................................................................................................391

JTAG Controller and Registers............................................................................... 392

Instruction Register..............................................................................................392

Bypass Register.....................................................................................................393

Boundary-Scan Register......................................................................................394

Test Access Port (TAP)........................................................................................395

TAP Controller.................................................................................................396

Controller Reset ...............................................................................................396

Controller States...............................................................................................396

Implementation-Specific Details............................................................................400

R4000 Processor Interrupts

Hardware Interrupts................................................................................................ 402

Nonmaskable Interrupt (NMI)...............................................................................402

Asserting Interrupts.................................................................................................402

Table of Contents

MIPS R4000 Microprocessor User's Manual xxvii

Page 28

Table of Contents

Error Checking and Correcting

Error Checking in the Processor.............................................................................408

Types of Error Checking.....................................................................................408

Parity Error Detection.....................................................................................408

SECDED ECC Code......................................................................................... 409

Error Checking Operation .................................................................................. 412

System Interface...............................................................................................412

Secondary Cache Data Bus.............................................................................412

System Interface and Secondary Cache Data Bus....................................... 412

Secondary Cache Tag Bus...............................................................................413

System Interface Command Bus ...................................................................413

SECDED ECC Matrices for Data and Tag Buses.............................................414

ECC Check Bits..................................................................................................... 414

Data ECC Generation.......................................................................................... 415

Detecting Data Transmission Errors................................................................. 418

Single Data Bit ECC Error ..............................................................................420

Single Check Bit ECC Error............................................................................ 421

Double Data Bit ECC Errors........................................................................... 422

Three Data Bit ECC Errors .............................................................................423

Four Data Bit ECC Errors ............................................................................... 424

Tag ECC Generation............................................................................................425

Summary of ECC Operations............................................................................. 426

R4400 Master/Checker Mode.................................................................................430

Connecting a System in Lock Step ....................................................................431

Master-Listener Configuration ..........................................................................432

Cross-Coupled Checking Configuration..........................................................433

Fault Detection .....................................................................................................435

Reset Operation....................................................................................................436

Fault History.........................................................................................................436

xxviii MIPS R4000 Microprocessor User's Manual

Page 29

CPU Instruction Set Details

FPU Instruction Set Details

Subblock Ordering

Sequential Ordering.................................................................................................C-2

Subblock Ordering................................................................................................... C-2

Output Buffer ∆i/∆t Control Mechanism

Mode Bits...................................................................................................................D-1

Delay Times............................................................................................................... D-2

PLL Passive Components

Coprocessor 0 Hazards

Table of Contents

R4000 Pinouts

Pinout of R4000PC....................................................................................................G-2

Pinout of R4000MC/SC Package Pinout ..............................................................G-5

Index

MIPS R4000 Microprocessor User's Manual xxix

Page 30

Table of Contents

xxx MIPS R4000 Microprocessor User's Manual

Page 31

Introduction

Historically, the evolution of computer architectures has been dominated by families of increasingly complex central processors. Under market pressures to preserve existing software, complex instruction set computer (CISC) architectures evolved by the accretion of microcode and increasingly intricate instruction sets. This intricacy in architecture was itself driven by the need to support high-level languages and operating systems, as advances in semiconductor technology made it possible to fabricate integrated circuits of greater and greater complexity. And at that time it seemed self-evident to designers that architectures should continue to become more and more complex as technological advances made such VLSI designs possible.

MIPS R4000 Microprocessor User's Manual 1

Page 32

Chapter 1

In recent years, however, reduced instruction set computer (RISC) architectures are implementing a different model for the interaction between hardware, firmware, and software. RISC concepts emerged from a statistical analysis of the way in which software actually uses processor resources: dynamic measurement of system kernels and object modules generated by optimizing compilers showed that the simplest instructions were used most often—even in the code for CISC machines. Correspondingly, complex instructions often went unused because their single way of performing a complex operation rarely matched the precise needs of a high-level language.

RISC architecture eliminates microcode routines and turns low-level control of the machine over to software. The RISC approach is not new, but its application has become more prevalent in recent years, due to the increasing use of high-level languages, the development of compilers that are able to optimize at the microcode level, and dramatic advances in semiconductor memory and packaging. It is now feasible to replace relatively slow microcode ROM with faster RAM that is organized as an instruction cache. Machine control resides in this instruction cache that is, in effect, customized on-the-fly: the instruction stream generated by system- and compiler-generated code provides a precise fit between the requirements of high-level software and the low-level capabilities of the hardware.

Reducing or simplifying the instruction set was not the primary goal of RISC architecture; it is a pleasant side effect of techniques used to gain the highest performance possible from available technology. Thus, the term reduced instruction set computers is a bit misleading; it is the push for performance that really drives and shapes RISC designs.

1.1 Benefits of RISC Design

Some benefits that result from RISC design techniques are not directly attributable to the drive to increase performance, but are a result of the basic reduction in complexity—a simpler design allows both chip-area resources and human resources to be applied to features that enhance performance. Some of these benefits are described below.

2 MIPS R4000 Microprocessor User's Manual

Page 33

Shorter Design Cycle

The architectures of RISC processors can be implemented more quickly than their CISC counterparts: it is easier to fabricate and debug a streamlined, simplified architecture with no microcode than a complex architecture that uses microcode. CISC processors have such a long design cycle that they may not be completely debugged by the time they are technologically obsolete. The shorter time required to design and implement RISC processors allows them to make use of the best available technologies.

Effective Utilization of Chip Area

The simplicity of RISC processors also frees scarce chip geography for performance-critical resources such as larger register files, translation lookaside buffers (TLBs), coprocessors, and fast multiply and divide units. Such resources help RISC processors obtain an even greater performance edge.

User (Programmer) Benefits

Simplicity in architecture also helps the user by providing a uniform instruction set that is easier to use. This allows a closer correlation between the instruction count and the cycle count, making it easier to measure code optimization activities.

Introduction

Advanced Semiconductor Technologies

Each new VLSI technology is introduced with tight limits on the number of transistors that fit on each chip. Since the simplicity of a RISC processor allows it to be implemented in fewer transistors than its CISC counterpart, the first computers capable of exploiting these new VLSI technologies have been using and will continue to use RISC architecture.

MIPS R4000 Microprocessor User's Manual 3

Page 34

Chapter 1

Optimizing Compilers

RISC architecture is designed so that the compilers, not assembly languages, have the optimal working environment. RISC philosophy assumes that high-level language programming is used, which contradicts the older CISC philosophy that assumes assembly language programming is of primary importance.

The trend toward high-level language instructions has led to the development of more efficient compilers to convert high-level language instructions to machine code. Primary measures of compiler efficiency are the compactness of its generated code and the shortness of its execution time.

During the development of more efficient compilers, analysis of instruction streams revealed that the greatest amount of time was spent executing simple instructions and performing load and store operations, while the more complex instructions were used less frequently. It was also learned that compilers produce code that is often a narrow subset of the processor instruction set architecture (ISA). A compiler works more efficiently with instructions that perform simple, well-defined operations and generate minimal side-effects. Compilers do not use complex instructions and features; the more complex, powerful instructions are either too difficult for the compiler to employ or those instructions do not precisely fit high-level language requirements.

Thus, a natural match exists between RISC architectures and efficient, optimizing compilers. This match makes it easier for compilers to generate the most effective sequences of machine instructions to accomplish tasks defined by the high-level language.

4 MIPS R4000 Microprocessor User's Manual

Page 35

MIPS RISCompiler Language Suite

Some compiler products are derived from disparate sources and consequently do not fit together very well. Instead of treating each language’s compiler as a separate entity, the MIPS RISCompiler language suite shares common elements across the entire family of compilers. In this way the language suite offers both tight integration and broad language coverage.

The MIPS language suite supports:

• industry-standard front ends for the following languages (C, FORTRAN, Pascal)

• a common intermediate language, offering an efﬁcient way to add language front ends over time

• all of the back end optimization and code generation

• the same object format and calling conventions

• mixed-language programs

• debugging of programs written in all languages, including mixtures

This language suite approach yields high-quality compilers for all languages, since common elements make up the majority of each of the language products. In addition, this approach provides the ability to develop and execute multi-language programs, promoting flexibility in development, avoiding the necessity of recoding proven program segments, and protecting the user’s software investment. The common back-end also exports optimizing and code-generating improvements immediately throughout the language suite, thereby reducing maintenance.

Introduction

MIPS R4000 Microprocessor User's Manual 5

Page 36

Chapter 1

1.2 Compatibility

The R4000 processor provides complete application software compatibility with the MIPS R2000, R3000, and R6000 processors. Although the MIPS processor architecture has evolved in response to a compromise between software and hardware resources in the computer system, the R4000 processor implements the MIPS ISA for user-mode programs. This guarantees that user programs conforming to the ISA execute on any MIPS hardware implementation.

1.3 Processor General Features

This section briefly describes the programming model, the memory management unit (MMU), and the caches in the R4000 processor. A more detailed description is given in succeeding sections.

• Full 32-bit and 64-bit Operations. The R4000 processor contains 32 general purpose 64-bit registers. (When operating as a 32-bit processor, the general purpose registers are 32-bits wide.) All instructions are 32 bits wide.

• Efﬁcient Pipeline. The superpipeline design of the processor results in an execution rate approaching one instruction per cycle. Pipeline stalls and exceptional events are handled precisely and efﬁciently.

• MMU. The R4000 processor uses an on-chip TLB that provides rapid virtual-to-physical address translation.

• Cache Control. The R4000 primary instruction and data caches reside on-chip, and can each hold 8 Kbytes. In the R4400 processor, the primary caches can each hold 16 Kbytes. Architecturally, each primary cache can be increased to hold up to 32 Kbytes. An off-chip secondary cache (R4000SC and R4000MC processors only) can hold from 128 Kbytes to 4 Mbytes. All processor cache control logic, including the secondary cache control logic, is on-chip.

• Floating-Point Unit. The FPU is located on-chip and implements the ANSI/IEEE standard 754-1985.

6 MIPS R4000 Microprocessor User's Manual

Page 37

1.4 R4000 Processor Configurations

The R4000 processor† is packaged in three different configurations. All processors are implemented in sub-1-micron CMOS technology.

• R4000PC is designed for cost-sensitive systems such as inexpensive desktop systems and high-end embedded controllers. It is packaged in a 179-pin PGA, and does not support a secondary cache.

• R4000SC is designed for high-performance uniprocessor systems. It is packaged in a 447-pin LGA/PGA and includes integrated control for large secondary caches built from standard SRAMs.

• R4000MC is designed for large cache-coherent multiprocessor systems. It is packaged in a 447-pin LGA/PGA and, in addition to the features of R4000SC, includes support for a wide variety of bus designs and cache-coherency mechanisms.

Table 1-1 lists the features in each of the three configurations (X indicates the feature is present). R4400 processor enhancements are described in the section following.

Introduction

1.5 R4400 Processor Enhancements

In addition to the features contained in the R4000 processor, the R4400 processor has the following enhancements:

• fully functional Status pins (described in Chapter 10)

• Master/Checker mode (described in Chapter 16)

• larger primary caches (described in Processor General Featur es, in this chapter)

• uncached store buffer (described in Chapter 3)

• divide-by-6 and divide-by-8 modes (described in Chapter 10)

• cache error bit, EW, added to the CacheErr register (described in Chapter 5).

† Features of the R4400 processor that differ from the R4000 pr ocessor ar e noted throughout

this book; for instance, R4400 processor enhancements are listed in the next section. Otherwise, references to the R4000 pr ocessor may be taken to include the R4400 pr ocessor.

MIPS R4000 Microprocessor User's Manual 7

Page 38

Chapter 1

Table 1-1 R4000 Features

Feature R4000PC R4000SC R4000MC

Primary Cache States

Valid XX X Shared X Clean Exclusive XX Dirty Exclusive XX X

Secondary Cache Interface X X

Secondary Cache States

Valid XX X Shared X Dirty Shared X Clean Exclusive XX Dirty Exclusive XX X

Multiprocessing X

Cache Coherency Attributes

Uncached XX X Noncoherent XX X Sharable X Update X Exclusive X

Packages

PGA (179-pin) X PGA (447-pin) XX

8 MIPS R4000 Microprocessor User's Manual

Page 39

1.6 R4000 Processor

This section describes the following:

• the 64-bit architecture of the R4000 processor

• the superpipeline design of the CPU instruction pipeline (described in detail in Chapter 3)

• an overview of the System interface (described in detail in Chapter 12)

• an overview of the CPU registers (detailed in Chapters 4 and 5) and CPU instruction set (detailed in Chapter 2 and Appendix A)

• data formats and byte ordering

• the System Control Coprocessor, CP0, and the ﬂoating-point unit, CP1

• caches and memory, including a description of primary and secondary caches, the memory management unit (MMU), the translation lookaside buffer (TLB), and the Secondary Cache interface (described in more detail in Chapters 4 and 11). The Secondary Cache interface is detailed in Chapter 13.

Introduction

64-bit Architecture

The natural mode of operation for the R4000 processor is as a 64-bit microprocessor; however, 32-bit applications maintain compatibility even when the processor operates as a 64-bit processor.

The R4000 processor provides the following:

• 64-bit on-chip ﬂoating-point unit (FPU)

• 64-bit integer arithmetic logic unit (ALU)

• 64-bit integer registers

• 64-bit virtual address space

• 64-bit system bus

Figure 1-1 is a block diagram of the R4000 processor internals.

MIPS R4000 Microprocessor User's Manual 9

Page 40

Chapter 1

64-bit System Bus

System

Control

CP0

Exception/Control Registers

Memory Management Registers

Translation Lookaside Buffers

S-cache

Control

Data Cache P-cache

CPU

CPU Registers

ALU

Load Aligner/Store Driver

Integer Multiplier/Divider

Address Unit

PC Incrementer

Pipeline Control

Control

FPU

FPU Registers

Pipeline Bypass

FP Multiplier

FP Divider

FP Add, Convert Square Root

Instruction

Cache

Figure 1-1 R4000 Processor Internal Block Diagram

10 MIPS R4000 Microprocessor User's Manual

Page 41

Superpipeline Architecture

The R4000 processor exploits instruction parallelism by using an eightstage superpipeline which places no restrictions on the instruction issued. Under normal circumstances, two instructions are issued each cycle.

The internal pipeline of the R4000 processor operates at twice the frequency of the master clock, as discussed in Chapter 3. The processor achieves high throughput by pipelining cache accesses, shortening register access times, implementing virtual-indexed primary caches, and allowing the latency of functional units to span more than one pipeline clock cycles.

System Interface

The R4000 processor supports a 64-bit System interface that can construct uniprocessor systems with a direct DRAM interface—with or without a secondary cache—or cache-coherent multiprocessor systems. The System interface includes:

• a 64-bit multiplexed address and data bus

• 8 check bits

• a 9-bit parity-protected command bus

• 8 handshake signals

Introduction

The interface is capable of transferring data between the processor and memory at a peak rate of 400 Mbytes/second, when running at 50 MHz.

MIPS R4000 Microprocessor User's Manual 11

Page 42

Chapter 1

CPU Register Overview

The central processing unit (CPU) provides the following registers:

• 32 general purpose registers

• a Program Counter (PC) register

• 2 registers that hold the results of integer multiply and divide operations (HI and LO).

Floating-point unit (FPU) registers are described in Chapter 6. CPU registers can be either 32 bits or 64 bits wide, depending on the R4000

processor mode of operation. Figure 1-2 shows the CPU registers.

General Purpose Registers

31 0

r0 r1 r2

•

r29 r30

r31

Multiply and Divide Registers

31 0

Program Counter

31 0

12 MIPS R4000 Microprocessor User's Manual

Figure 1-2 CPU Registers

Page 43

Introduction

Two of the CPU general purpose registers have assigned functions:

• r0 is hardwired to a value of zero, and can be used as the target register for any instruction whose result is to be discarded. r0 can also be used as a source when a zero value is needed.

• r31 is the link register used by Jump and Link instructions. It should not be used by other instructions.

The CPU has three special purpose registers:

• PC — Program Counter register

• HI — Multiply and Divide register higher result

• LO — Multiply and Divide register lower result

The two Multiply and Divide registers (HI, LO) store:

• the product of integer multiply operations, or

• the quotient (in LO) and remainder (in HI) of integer divide operations

The R4000 processor has no Program Status Word (PSW) register as such; this is covered by the Status and Cause registers incorporated within the System Control Coprocessor (CP0). CP0 registers are described later in this chapter.

MIPS R4000 Microprocessor User's Manual 13

Page 44

Chapter 1

CPU Instruction Set Overview

Each CPU instruction is 32 bits long. As shown in Figure 1-3, there are three instruction formats:

• immediate (I-type)

• jump (J-type)

• register (R-type)

15162021252631

I-Type (Immediate)

J-Type (Jump)

R-Type (Register)

Figure 1-3 CPU Instruction Formats

Each format contains a number of different instructions, which are described further in this chapter. Fields of the instruction formats are described in Chapter 2.

Instruction decoding is greatly simplified by limiting the number of formats to these three. This limitation means that the more complicated (and less frequently used) operations and addressing modes can be synthesized by the compiler, using sequences of these same simple instructions.

op rs rt immediate

op target

11 10

rd sa

functop

0252631

015162021252631

14 MIPS R4000 Microprocessor User's Manual

Page 45

Introduction

The instruction set can be further divided into the following groupings:

• Load and Store instructions move data between memory and general registers. They are all immediate (I-type) instructions, since the only addressing mode supported is base register plus 16-bit, signed immediate offset.

• Computational instructions perform arithmetic, logical, shift, multiply, and divide operations on values in registers. They include register (R-type, in which both the operands and the result are stored in registers) and immediate (I-type, in which one operand is a 16-bit immediate value) formats.

• Jump and Branch instructions change the control ﬂow of a program. Jumps are always made to a paged, absolute address formed by combining a 26-bit target address with the highorder bits of the Program Counter (J-type format) or register address (R-type format). Branches have 16-bit offsets relative to the program counter (I-type). Jump And Link instructions save their return address in register 31.

• Coprocessor instructions perform operations in the coprocessors. Coprocessor load and store instructions are I-type.

• Coprocessor 0 (system coprocessor) instructions perform operations on CP0 registers to control the memory management and exception handling facilities of the processor. These are listed in Table 1-18.

• Special instructions perform system calls and breakpoint operations. These instructions are always R-type.

• Exception instructions cause a branch to the general exceptionhandling vector based upon the result of a comparison. These instructions occur in both R-type (both the operands and the result are registers) and I-type (one operand is a 16-bit immediate value) formats.

Chapter 2 provides a more detailed summary and Appendix A gives a complete description of each instruction.

MIPS R4000 Microprocessor User's Manual 15

Page 46

Chapter 1

Tables 1-2 through 1-17 list CPU instructions common to MIPS R-Series processors, along with those instructions that are extensions to the instruction set architecture. The extensions result in code space reductions, multiprocessor support, and improved performance in operating system kernel code sequences—for instance, in situations where run-time bounds-checking is frequently performed. Table 1-18 lists CP0 instructions.

Table 1-2 CPU Instruction Set: Load and Store Instructions

OpCode Description

LB Load Byte LBU Load Byte Unsigned LH Load Halfword LHU Load Halfword Unsigned LW Load Word LWL Load Word Left LWR Load Word Right SB Store Byte SH Store Halfword SW Store Word SWL Store Word Left SWR Store Word Right

Table 1-3 CPU Instruction Set: Arithmetic Instructions (ALU Immediate)

OpCode Description

ADDI Add Immediate ADDIU Add Immediate Unsigned SLTI Set on Less Than Immediate SLTIU Set on Less Than Immediate Unsigned ANDI AND Immediate ORI OR Immediate XORI Exclusive OR Immediate LUI Load Upper Immediate

16 MIPS R4000 Microprocessor User's Manual

Page 47

Table 1-4 CPU Instruction Set: Arithmetic (3-Operand, R-Type)

OpCode Description

ADD Add ADDU Add Unsigned SUB Subtract SUBU Subtract Unsigned SLT Set on Less Than SLTU Set on Less Than Unsigned AND AND OR OR XOR Exclusive OR NOR NOR

Table 1-5 CPU Instruction Set: Multiply and Divide Instructions

OpCode Description

MULT Multiply MULTU Multiply Unsigned DIV Divide DIVU Divide Unsigned MFHI Move From HI MTHI Move To HI MFLO Move From LO MTLO Move To LO

Introduction

MIPS R4000 Microprocessor User's Manual 17

Page 48

Chapter 1

Table 1-6 CPU Instruction Set: Jump and Branch Instructions

OpCode Description

J Jump JAL Jump And Link JR Jump Register JALR Jump And Link Register BEQ Branch on Equal BNE Branch on Not Equal BLEZ Branch on Less Than or Equal to Zero BGTZ Branch on Greater Than Zero BLTZ Branch on Less Than Zero BGEZ Branch on Greater Than or Equal to Zero BLTZAL Branch on Less Than Zero And Link BGEZAL Branch on Greater Than or Equal to Zero And Link

Table 1-7 CPU Instruction Set: Shift Instructions

OpCode Description

SLL Shift Left Logical SRL Shift Right Logical SRA Shift Right Arithmetic SLLV Shift Left Logical Variable SRLV Shift Right Logical Variable SRAV Shift Right Arithmetic Variable

18 MIPS R4000 Microprocessor User's Manual

Page 49

Table 1-8 CPU Instruction Set: Coprocessor Instructions

OpCode Description

LWCz Load Word to Coprocessor z SWCz Store Word from Coprocessor z MTCz Move To Coprocessor z MFCz Move From Coprocessor z CTCz Move Control to Coprocessor z CFCz Move Control From Coprocessor z COPz Coprocessor Operation z BCzT Branch on Coprocessor z True BCzF Branch on Coprocessor z False

Table 1-9 CPU Instruction Set: Special Instructions

OpCode Description

SYSCALL System Call BREAK Break

Introduction

MIPS R4000 Microprocessor User's Manual 19

Page 50

Chapter 1

Table 1-10 Extensions to the ISA: Load and Store Instructions

OpCode Description

LD Load Doubleword LDL Load Doubleword Left LDR Load Doubleword Right LL Load Linked LLD Load Linked Doubleword LWU Load Word Unsigned SC Store Conditional SCD Store Conditional Doubleword SD Store Doubleword SDL Store Doubleword Left SDR Store Doubleword Right SYNC Sync

Table 1-11 Extensions to the ISA: Arithmetic Instructions (ALU Immediate)

OpCode Description

DADDI Doubleword Add Immediate DADDIU Doubleword Add Immediate Unsigned

Table 1-12 Extensions to the ISA: Multiply and Divide Instructions

OpCode Description

DMULT Doubleword Multiply DMULTU Doubleword Multiply Unsigned DDIV Doubleword Divide DDIVU Doubleword Divide Unsigned

20 MIPS R4000 Microprocessor User's Manual

Page 51

Introduction

Table 1-13 Extensions to the ISA: Branch Instructions

OpCode Description

BEQL Branch on Equal Likely BNEL Branch on Not Equal Likely BLEZL Branch on Less Than or Equal to Zero Likely BGTZL Branch on Greater Than Zero Likely BLTZL Branch on Less Than Zero Likely BGEZL Branch on Greater Than or Equal to Zero Likely BLTZALL Branch on Less Than Zero And Link Likely

BGEZALL

Branch on Greater Than or Equal to Zero And Link

Likely BCzTL Branch on Coprocessor z True Likely BCzFL Branch on Coprocessor z False Likely

Table 1-14 Extensions to the ISA: Arithmetic Instructions (3-operand, R-type)

OpCode Description

DADD Doubleword Add DADDU Doubleword Add Unsigned DSUB Doubleword Subtract DSUBU Doubleword Subtract Unsigned

MIPS R4000 Microprocessor User's Manual 21

Page 52

Chapter 1

Table 1-15 Extensions to the ISA: Shift Instructions

OpCode Description

DSLL Doubleword Shift Left Logical DSRL Doubleword Shift Right Logical DSRA Doubleword Shift Right Arithmetic DSLLV Doubleword Shift Left Logical Variable DSRLV Doubleword Shift Right Logical Variable DSRAV Doubleword Shift Right Arithmetic Variable DSLL32 Doubleword Shift Left Logical + 32 DSRL32 Doubleword Shift Right Logical + 32 DSRA32 Doubleword Shift Right Arithmetic + 32

Table 1-16 Extensions to the ISA: Exception Instructions

OpCode Description

TGE Trap if Greater Than or Equal TGEU Trap if Greater Than or Equal Unsigned TLT Trap if Less Than TLTU Trap if Less Than Unsigned TEQ Trap if Equal TNE Trap if Not Equal TGEI Trap if Greater Than or Equal Immediate

TGEIU

Trap if Greater Than or Equal Immediate

Unsigned TLTI Trap if Less Than Immediate TLTIU Trap if Less Than Immediate Unsigned TEQI Trap if Equal Immediate TNEI Trap if Not Equal Immediate

22 MIPS R4000 Microprocessor User's Manual

Page 53

Table 1-17 Extensions to the ISA: Coprocessor Instructions

OpCode Description

DMFCz Doubleword Move From Coprocessor z DMTCz Doubleword Move To Coprocessor z LDCz Load Double Coprocessor z SDCz Store Double Coprocessor z

Table 1-18 CP0 Instructions

OpCode Description

DMFC0 Doubleword Move From CP0 DMTC0 Doubleword Move To CP0 MTC0 Move to CP0 MFC0 Move from CP0 TLBR Read Indexed TLB Entry TLBWI Write Indexed TLB Entry TLBWR Write Random TLB Entry TLBP Probe TLB for Matching Entry CACHE Cache Operation ERET Exception Return

Introduction

MIPS R4000 Microprocessor User's Manual 23

Page 54

Chapter 1

Data Formats and Addressing

The R4000 processor uses four data formats: a 64-bit doubleword, a 32-bit word, a 16-bit halfword, and an 8-bit byte. Byte ordering within each of the larger data formats—halfword, word, doubleword—can be configured in either big-endian or little-endian order. Endianness refers to the location of byte 0 within the multi-byte data structure. Figures 1-4 and 1-5 show the ordering of bytes within words and the ordering of words within multiple-word structures for the big-endian and littleendian conventions.

When the R4000 processor is configured as a big-endian system, byte 0 is the most-significant (leftmost) byte, thereby providing compatibility with MC 68000 and IBM 370 conventions. Figure 1-4 shows this configuration.

Higher

Address

Lower

Address

Word

Address

8 4 0

31 24 23 1615 8 7 0

12 13 1514

89 1110 45 76 01 32

Bit #

Figure 1-4 Big-Endian Byte Ordering

When configured as a little-endian system, byte 0 is always the leastsignificant (rightmost) byte, which is compatible with iAPX x86 and DEC VAX conventions. Figure 1-5 shows this configuration.

Higher

Address

Lower

Address

Word

Address

8 4 0

31 24 23 1615 8 7 0

15 14 1213 11 10 89

76 45 32 01

Bit #

Figure 1-5 Little-Endian Byte Ordering

24 MIPS R4000 Microprocessor User's Manual

Page 55

Introduction

In this text, bit 0 is always the least-significant (rightmost) bit; thus, bit designations are always little-endian (although no instructions explicitly designate bit positions within words).

Figures 1-6 and 1-7 show little-endian and big-endian byte ordering in doublewords.

Most-significant byte

63 56 55 48 47 40 39 32

Bit #

Byte #

Figure 1-6 Little-Endian Data in a Doubleword

Most-significant byte

Bit #

Byte #

01 3

Halfword

Least-significant byte

Word

31 24 23 16 15 8 7 0

32 01

Byte

70123456

Bit #

Bits in a Byte

Least-significant byte

Word

31 24 23 16 15 8 7 063 56 55 48 47 40 39 32

45 76

MIPS R4000 Microprocessor User's Manual 25

Halfword

Bit #

Byte

Bits in a Byte

Figure 1-7 Big-Endian Data in a Doubleword

07654321

Page 56

Chapter 1

The CPU uses byte addressing for halfword, word, and doubleword accesses with the following alignment constraints:

• Halfword accesses must be aligned on an even byte boundary (0, 2, 4...).

• Word accesses must be aligned on a byte boundary divisible by four (0, 4, 8...).

• Doubleword accesses must be aligned on a byte boundary divisible by eight (0, 8, 16...).

The following special instructions load and store words that are not aligned on 4-byte (word) or 8-word (doubleword) boundaries:

LWL LWR SWL SWR LDL LDR SDL SDR

These instructions are used in pairs to provide addressing of misaligned words. Addressing misaligned data incurs one additional instruction cycle over that required for addressing aligned data.

Figures 1-8 and 1-9 show the access of a misaligned word that has byte address 3.

Higher

Address

31 24 23 1615 8 7 0

45 6

Bit #

Lower

Address

Figure 1-8 Big-Endian Misaligned Word Addressing

Higher

Address

31 24 23 16 15 8 7 0

Lower

Address

Bit #

645

Figure 1-9 Little-Endian Misaligned Word Addressing

26 MIPS R4000 Microprocessor User's Manual

Page 57

Coprocessors (CP0-CP2)

The MIPS ISA defines three coprocessors (designated CP0 through CP2):

• Coprocessor 0 (CP0) is incorporated on the CPU chip and supports the virtual memory system and exception handling. CP0 is also referred to as the System Control Coprocessor.

• Coprocessor 1 (CP1) is reserved for the on-chip, ﬂoating-point coprocessor, the FPU.

• Coprocessor 2 (CP2) is reserved for future deﬁnition by MIPS.

CP0 and CP1 are described in the sections that follow.

System Control Coprocessor, CP0

CP0 translates virtual addresses into physical addresses and manages exceptions and transitions between kernel, supervisor, and user states. CP0 also controls the cache subsystem, as well as providing diagnostic control and error recovery facilities.

The CP0 registers shown in Figure 1-10 and described in Table 1-19 manipulate the memory management and exception handling capabilities of the CPU.

Introduction

MIPS R4000 Microprocessor User's Manual 27

Page 58

Chapter 1

Index Random EntryLo0 EntryLo1 Context PageMask Wired

BadVAddr Count EntryHi Compare SR Cause

0 1 2 3 4 5 6 7 8 9 10 11 12 13

Config LLAddr WatchLo WatchHi

XContext

ECC CacheErr TagLo TagHi

16 17 18 19 20 21 22 23 24 25 26 27 28

29 EPC PRId 15

ErrorEPC

Exception Processing Memory Management Reserved

Figure 1-10 R4000 CP0 Registers

28 MIPS R4000 Microprocessor User's Manual

Page 59

Introduction

Table 1-19 System Control Coprocessor (CP0) Register Definitions

Number Register Description

0 Index Programmable pointer into TLB array 1 Random Pseudorandom pointer into TLB array(read only) 2 EntryLo0 Low half of TLB entry for even virtual address (VPN) 3 EntryLo1 Low half of TLB entry for odd virtual address (VPN)

4 Context

Pointer to kernel virtual page table entry (PTE) in 32-bit

addressing mode 5 PageMask TLB Page Mask 6 Wired Number of wired TLB entries 7 — Reserved 8 BadVAddr Bad virtual address 9 Count Timer Count 10 EntryHi High half of TLB entry 11 Compare Timer Compare 12 SR Status register 13 Cause Cause of last exception 14 EPC Exception Program Counter 15 PRId Processor Revision Identifier 16 Config Configuration register 17 LLAddr Load Linked Address 18 WatchLo Memory reference trap address low bits 19 WatchHi Memory reference trap address high bits 20 XContext Pointer to kernel virtual PTE table in 64-bit addressing mode 21–25 — Reserved

26 ECC

Secondary-cache error checking and correcting (ECC) and

Primary parity 27 CacheErr Cache Error and Status register 28 TagLo Cache Tag register 29 TagHi Cache Tag register 30 ErrorEPC Error Exception Program Counter 31 — Reserved

MIPS R4000 Microprocessor User's Manual 29

Page 60

Chapter 1

Floating-Point Unit (FPU), CP1

The MIPS floating-point unit (FPU) is designated CP1; the FPU extends the CPU instruction set to perform arithmetic operations on floating-point values. The FPU, with associated system software, fully conforms to the requirements of ANSI/IEEE Standard 754–1985, IEEE Standard for Binary Floating-Point Arithmetic.

The FPU features include:

• Full 64-bit Operation. The FPU can contain either 16 or 32

64-bit registers to hold single-precision or double-precision values. The FPU also includes a 32-bit Status/Control register that provides access to all IEEE-Standard exception handling capabilities.

• Load and Store Instruction Set. Like the CPU, the FPU uses a

load- and store-based instruction set. Floating-point operations are started in a single cycle and their execution overlaps other ﬁxed-point or ﬂoating-point operations.

• Tightly-coupled Coprocessor Interface. The FPU is on the

CPU chip, and appears to the programmer as a simple extension of the CPU (accessed as CP1). Together, the CPU and FPU form a tightly-coupled unit with a seamless integration of ﬂoating-point and ﬁxed-point instruction sets. Since each unit receives and executes instructions in parallel, some ﬂoatingpoint instructions can execute at the same rate (two instructions per cycle) as ﬁxed-point instructions.

30 MIPS R4000 Microprocessor User's Manual

Page 61

Memory Management System (MMU)

The R4000 processor has a 36-bit physical addressing range of 64 Gbytes. However, since it is rare for systems to implement a physical memory space this large, the CPU provides a logical expansion of memory space by translating addresses composed in the large virtual address space into available physical memory addresses. The R4000 processor supports the following two addressing modes:

• 32-bit mode, in which the virtual address space is divided into 2 Gbytes per user process and 2 Gbytes for the kernel.

• 64-bit mode, in which the virtual address is expanded to 1 Tbyte (240 bytes) of user virtual address space.

A detailed description of these address spaces is given in Chapter 4.

The Translation Lookaside Buffer (TLB)

Virtual memory mapping is assisted by a translation lookaside buffer, which caches virtual-to-physical address translations. This fullyassociative, on-chip TLB contains 48 entries, each of which maps a pair of variable-sized pages ranging from 4 Kbytes to 16 Mbytes, in multiples of four.

Introduction

Instruction TLB

The R4000 processor has a two-entry instruction TLB (ITLB) which assists in instruction address translation. The ITLB is completely invisible to software and exists only to increase performance.

Joint TLB

An address translation value is tagged with the most-significant bits of its virtual address (the number of these bits depends upon the size of the page) and a per-process identifier. If there is no matching entry in the TLB, an exception is taken and software refills the on-chip TLB from a page table resident in memory; this TLB is referred to as the joint TLB (JTLB) because it contains both data and instructions jointly. The JTLB entry to be rewritten is selected at random.

MIPS R4000 Microprocessor User's Manual 31

Page 62

Chapter 1

Operating Modes

The R4000 processor has three operating modes:

• User mode

• Supervisor mode

• Kernel mode

The manner in which memory addresses are translated ormapped depends on the operating mode of the CPU; this is described in Chapter 4.

Cache Memory Hierarchy

To achieve a high performance in uniprocessor and multiprocessor systems, the R4000 processor supports a two-level cache memory hierarchy that increases memory access bandwidth and reduces the latency of load and store instructions. This hierarchy consists of on-chip instruction and data caches, together with an optional external secondary cache that varies in size from 128 Kbytes to 4 Mbytes.

The secondary cache is assumed to consist of one bank of industrystandard static RAM (SRAM) with output enables, arranged as a quadword (128-bit) data array, with a 25-bit-wide tag array. Check fields are added to both data and tag arrays to improve data integrity.

The secondary cache can be configured as a joint cache, or split into separate instruction and data caches. The maximum secondary cache size is 4 Mbytes; the minimum secondary cache size is 128 Kbytes for a joint cache, or 256 Kbytes total for split instruction/data caches. The secondary cache is direct mapped, and is addressed with the lower part of the physical address.

Primary and secondary caches are described in more detail in Chapter 11.

32 MIPS R4000 Microprocessor User's Manual

Page 63

Primary Caches

The R4000 processor incorporates separate on-chip primary instruction and data caches to fill the high-performance pipeline. Each cache has its own 64-bit data path, and each can be accessed in parallel.

The R4000 processor primary caches hold from 8 Kbytes to 32 Kbytes; the R4400 processor primary caches are ﬁxed at 16 Kbytes.

Cache accesses can occur up to twice each cycle. This provides the integer and floating-point units with an aggregate bandwidth of 1.6 Gbytes per second at a MasterClock frequency of 50 MHz.

Secondary Cache Interface

The R4000SC (secondary cache) and R4000MC (multiprocessor) versions of the processor allow connection to an optional secondary cache. These processors provide all of the secondary cache control circuitry, including error checking and correcting (ECC) protection, on chip.

The Secondary Cache interface includes:

• a 128-bit data bus

• a 25-bit tag bus

• an 18-bit address bus

• SRAM control signals

Introduction

The 128-bit-wide data bus is designed to minimize cache miss penalties, and allow the use of standard low-cost SRAM in secondary cache.

MIPS R4000 Microprocessor User's Manual 33

Page 64

Chapter 1

34 MIPS R4000 Microprocessor User's Manual

Page 65

CPU Instruction Set Summary

This chapter is an overview of the central processing unit (CPU) instruction set; refer to Appendix A for detailed descriptions of individual CPU instructions.

An overview of the floating-point unit (FPU) instruction set is in Chapter 6; refer to Appendix B for detailed descriptions of individual FPU instructions.

MIPS R4000 Microprocessor User's Manual 35

Page 66

Chapter 2

2.1 CPU Instruction Formats

Each CPU instruction consists of a single 32-bit word, aligned on a word boundary. There are three instruction formats—immediate (I-type), jump (J-type), and register (R-type)—as shown in Figure 2-1. The use of a small number of instruction formats simplifies instruction decoding, allowing the compiler to synthesize more complicated (and less frequently used) operations and addressing modes from these three formats as needed.

I-Type (Immediate)

J-Type (Jump)

op target

R-Type (Register)

rs rt immediate

1110 6 5

rd sa

015162021252631

0252631

015162021252631

functop rs rt

op 6-bit operation code rs 5-bit source register specifier

rt immediate 16-bit immediate value, branch displacement or

target 26-bit jump target address rd 5-bit destination register specifier sa 5-bit shift amount funct 6-bit function field

In the MIPS architecture, coprocessor instructions are implementationdependent; see Appendix A for details of individual Coprocessor 0 instructions.

36 MIPS R4000 Microprocessor User's Manual

5-bit target (source/destination) register or branch condition

address displacement

Figure 2-1 CPU Instruction Formats

Page 67

Load and Store Instructions

Load and store are immediate (I-type) instructions that move data between memory and the general registers. The only addressing mode that load and store instructions directly support is base register plus 16-bit signed immediate offset.

Scheduling a Load Delay Slot

A load instruction that does not allow its result to be used by the instruction immediately following is called a delayed load instruction. The instruction slot immediately following this delayed load instruction is referred to as the load delay slot.

In the R4000 processor, the instruction immediately following a load instruction can use the contents of the loaded register, however in such cases hardware interlocks insert additional real cycles. Consequently, scheduling load delay slots can be desirable, both for performance and R-Series processor compatibility. However, the scheduling of load delay slots is not absolutely required.

Defining Access Types

CPU Instruction Set Summary

Access type indicates the size of an R4000 processor data item to be loaded

or stored, set by the load or store instruction opcode. Access types are defined in Appendix A.

Regardless of access type or byte ordering (endianness), the address given specifies the low-order byte in the addressed field. For a big-endian configuration, the low-order byte is the most-significant byte; for a littleendian configuration, the low-order byte is the least-significant byte.

†

The access type, together with the three low-order bits of the address, define the bytes accessed within the addressed doubleword (shown in Table 2-1). Only the combinations shown in Table 2-1 are permissible; other combinations cause address error exceptions. See Appendix A for individual descriptions of CPU load and store instructions.

† Data formats are described in Chapter 1.

MIPS R4000 Microprocessor User's Manual 37

Page 68

Chapter 2

Table 2-1 Byte Access within a Doubleword

Access Type

Mnemonic

(Value)

Low Order

Address

Bits

Big endian

(63-----------31------------0)

Byte

Bytes Accessed

Little endian

(63-----------31------------0)

Byte210

Doubleword (7) 0 0 0 0123456776543210

0 0 0 0123456 6543210

Septibyte (6)

001 12345677654321 0 0 0 012345 543210

Sextibyte (5)

010 234567765432 0 0 0 01234 43210

Quintibyte (4)

011 3456776543 0 0 0 0123 3210

Word (3)

100 45677654 000012 210 001 123 321

Triplebyte (2)

100 456 654 101 567765 00001 10 010 23 32

Halfword (1)

100 45 54 110 6776 0000 0 001 1 1 010 2 2 011 3 3

Byte (0)

100 4 4 101 5 5 110 6 6 111 77

38 MIPS R4000 Microprocessor User's Manual

Page 69

Computational Instructions

Computational instructions can be either in register (R-type) format, in which both operands are registers, or in immediate (I-type) format, in which one operand is a 16-bit immediate.

Computational instructions perform the following operations on register values:

• arithmetic

• logical

• shift

• multiply

• divide

These operations fit in the following four categories of computational instructions:

• ALU Immediate instructions

• three-Operand Register-Type instructions

• shift instructions

• multiply and divide instructions

CPU Instruction Set Summary

64-bit Operations

When operating in 64-bit mode, 32-bit operands must be sign extended. The result of operations that use incorrect sign-extended 32-bit values is unpredictable.

MIPS R4000 Microprocessor User's Manual 39

Page 70

Chapter 2

Cycle Timing for Multiply and Divide Instructions

Any multiply instruction in the integer pipeline is transferred to the multiplier as remaining instructions continue through the pipeline; the product of the multiply instruction is saved in the HI and LO registers.

If the multiply instruction is followed by an MFHI or MFLO before the product is available, the pipeline interlocks until this product does become available.

Table 2-2 gives the execution time for integer multiply and divide operations. The “Total Cycles” column gives the total number of cycles required to execute the instruction. The “Overlap” column gives the number of cycles that overlap other CPU operations; that is, the number of cycles required between the present instruction and a subsequent MFHI or MFLO without incurring an interlock. If this value is zero, the operation is not performed in parallel with any other CPU operation.

Table 2-2 Multiply/Divide Instruction Cycle Timing

Instruction Total Cycles Overlap

MULT 12 10 MULTU 12 10 DIV 75 0 DIVU 75 0 DMULT 20 18 DMULTU 20 18 DDIV 139 0 DDIVU 139 0

For more information about computational instructions, refer to the individual instruction as described in Appendix A.

40 MIPS R4000 Microprocessor User's Manual

Page 71

Jump and Branch Instructions

Jump and branch instructions change the control flow of a program. All jump and branch instructions occur with a delay of one instruction: that is, the instruction immediately following the jump or branch (this is known as the instruction in the delay slot) always executes while the target instruction is being fetched from storage.

Overview of Jump Instructions

Subroutine calls in high-level languages are usually implemented with Jump or Jump and Link instructions, both of which are J-type instructions. In J-type format, the 26-bit target address shifts left 2 bits and combines with the high-order 4 bits of the current program counter to form an absolute address.

Returns, dispatches, and large cross-page jumps are usually implemented with the Jump Register or Jump and Link Register instructions. Both are R-type instructions that take the 32-bit or 64-bit byte address contained in one of the general purpose registers.

For more information about jump instructions, refer to the individual instruction as described in Appendix A.

CPU Instruction Set Summary

†

Overview of Branch Instructions

All branch instruction target addresses are computed by adding the address of the instruction in the delay slot to the 16-bit offset (shifted left 2 bits and sign-extended to 32 bits). All branches occur with a delay of one instruction.

If a conditional branch likely is not taken, the instruction in the delay slot is nullified.

For more information about branch instructions, refer to the individual instruction as described in Appendix A.

† Taken branches have a 3 cycle penalty in this implementation. See Chapter 3 for more

information.

MIPS R4000 Microprocessor User's Manual 41

Page 72

Chapter 2

Special Instructions

Special instructions allow the software to initiate traps; they are always R-type. For more information about special instructions, refer to the individual instruction as described in Appendix A.

Exception Instructions

Exception instructions are extensions to the MIPS ISA. For more information about exception instructions, refer to the individual instruction as described in Appendix A.

Coprocessor Instructions

Coprocessor instructions perform operations in their respective coprocessors. Coprocessor loads and stores are I-type, and coprocessor computational instructions have coprocessor-dependent formats.

Individual coprocessor instructions are described in Appendices A (for CP0) and B (for the FPU, CP1).

CP0 instructions perform operations specifically on the System Control Coprocessor registers to manipulate the memory management and exception handling facilities of the processor. Appendix A details CP0 instructions.

42 MIPS R4000 Microprocessor User's Manual

Page 73

The CPU Pipeline

This chapter describes the basic operation of the CPU pipeline, which includes descriptions of the delay instructions (instructions that follow a branch or load instruction in the pipeline), interruptions to the pipeline flow caused by interlocks and exceptions, and R4400 implementation of an uncached store buffer.

The FPU pipeline is described in Chapter 6.

MIPS R4000 Microprocessor User's Manual 43

Page 74

Chapter 3

3.1 CPU Pipeline Operation

The CPU has an eight-stage instruction pipeline; each stage takes one PCycle (one cycle of PClock, which runs at twice the frequency of MasterClock). Thus, the execution of each instruction takes at least eight PCycles (four MasterClock cycles). An instruction can take longer—for example, if the required data is not in the cache, the data must be retrieved from main memory.

Once the pipeline has been filled, eight instructions are executed simultaneously. Figure 3-1 shows the eight stages of the instruction pipeline; the next section describes the pipeline stages.

MasterClock

Cycle

PCycle

(8-Deep)

IF IS RF EX DF DS TC WB

Current CPU

Figure 3-1 Instruction Pipeline Stages

IF IS RF EX DF DS TC WB

Cycle

44 MIPS R4000 Microprocessor User's Manual

Page 75

3.2 CPU Pipeline Stages

This section describes each of the eight pipeline stages:

• IF - Instruction Fetch, First Half

• IS - Instruction Fetch, Second Half

• RF - Register Fetch

• EX - Execution

• DF - Data Fetch, First Half

• DS - Data Fetch, Second Half

• TC - Tag Check

• WB - Write Back

IF - Instruction Fetch, First Half

During the IF stage, the following occurs:

• Branch logic selects an instruction address and the instruction cache fetch begins.

• The instruction translation lookaside buffer (ITLB) begins the virtual-to-physical address translation.

The CPU Pipeline

IS - Instruction Fetch, Second Half

During the IS stage, the instruction cache fetch and the virtual-to-physical address translation are completed.

RF - Register Fetch

During the RF stage, the following occurs:

• The instruction decoder (IDEC) decodes the instruction and checks for interlock conditions.

• The instruction cache tag is checked against the page frame number obtained from the ITLB.

• Any required operands are fetched from the register ﬁle.

MIPS R4000 Microprocessor User's Manual 45

Page 76

Chapter 3

EX - Execution

During the EX stage, one of the following occurs:

• The arithmetic logic unit (ALU) performs the arithmetic or logical operation for register-to-register instructions.

• The ALU calculates the data virtual address for load and store instructions.

• The ALU determines whether the branch condition is true and calculates the virtual branch target address for branch instructions.

DF - Data Fetch, First Half

During the DF stage, one of the following occurs:

• The data cache fetch and the data virtual-to-physical translation begins for load and store instructions.

• The branch instruction address translation and translation lookaside buffer (TLB)† update begins for branch instructions.

• No operations are performed during the DF, DS, and TC stages for register-to-register instructions.

DS - Data Fetch, Second Half

During the DS stage, one of the following occurs:

• The data cache fetch and data virtual-to-physical translation are completed for load and store instructions. The Shifter aligns data to its word or doubleword boundary.

• The branch instruction address translation and TLB update are completed for branch instructions.

TC - Tag Check

For load and store instructions, the cache performs the tag check during the TC stage. The physical address from the TLB is checked against the cache tag to determine if there is a hit or a miss.

† The TLB is described in Chapter 4.

46 MIPS R4000 Microprocessor User's Manual

Page 77

Clock

The CPU Pipeline

WB - Write Back

For register-to-register instructions, the instruction result is written back to the register file during the WB stage. Branch instructions perform no operation during this stage.

Figure 3-2 shows the activities occurring during each ALU pipeline stage, for load, store, and branch instructions.

Phase

Stage

IFetch

and

Decode

ALU

Load/Store

Branch

IC1 Instruction cache access stage 1 IC2 Instruction cache access stage 2 ITLB1 Instruction address translation stage 1 ITLB2 Instruction address translation stage 2 ITC Instruction tag check IDEC Instruction decode RF Register operand fetch ALU Operation DVA Data virtual address calculation DC1 Data cache access stage 1 DC2 Data cache access stage 2 LSA Data load or store align JTLB1 Data/Instruction address translation stage 1 JTLB2 Data/Instruction address translation stage 2 DTC Data tag check IVA Instruction virtual address calculation WB Write back to register file

IF IS RF EX DF DS TC WB

IC1 IC2

ITLB1 ITLB2 ITC

12121212121212

IDEC

ALU DVA DC1 DC2

IVA

LSA

JTLB1 JTLB2 DTC WB

MIPS R4000 Microprocessor User's Manual 47

Figure 3-2 CPU Pipeline Activities

Page 78

Chapter 3

3.3 Branch Delay

The CPU pipeline has a branch delay of three cycles and a load delay of two cycles. The three-cycle branch delay is a result of the branch comparison logic operating during the EX pipeline stage of the branch, producing an instruction address that is available in the IF stage, four instructions later.

Figure 3-3 illustrates the branch delay.

branch

target

IF IS RF EX DF DS TC WB

Branch Delay

3.4 Load Delay

The completion of a load at the end of the DS pipeline stage produces an operand that is available for the EX pipeline stage of the third subsequent instruction.

Figure 3-4 shows the load delay of two pipeline stages.

load

IF IS RF EX DF DS TC WB

Figure 3-3 CPU Pipeline Branch Delay

three branch delay instructions

IF IS RF EX DF DS TC WB

f(load)

Load

Delay

48 MIPS R4000 Microprocessor User's Manual

IF IS RF EX DF DS TC WB

Figure 3-4 CPU Pipeline Load Delay

two load delay instructions

Page 79

3.5 Interlock and Exception Handling

Smooth pipeline flow is interrupted when cache misses or exceptions occur, or when data dependencies are detected. Interruptions handled using hardware, such as cache misses, are referred to as interlocks, while those that are handled using software are called exceptions.

As shown in Figure 3-5, all interlock and exception conditions are collectively referred to as faults.

Faults

The CPU Pipeline

Software

Exceptions

Stalls

Hardware

Interlocks

Slips

Figure 3-5 Interlocks, Exceptions, and Faults

There are two types of interlocks:

• stalls, which are resolved by halting the pipeline

• slips, which require one part of the pipeline to advance while another part of the pipeline is held static

At each cycle, exception and interlock conditions are checked for all active instructions.

Because each exception or interlock condition corresponds to a particular pipeline stage, a condition can be traced back to the particular instruction in the exception/interlock stage, as shown in Figure 3-6. For instance, an Illegal Instruction (II) exception is raised in the execution (EX) stage.

Tables 3-1 and 3-2 describe the pipeline interlocks and exceptions listed in Figure 3-6.

MIPS R4000 Microprocessor User's Manual 49

Page 80

Chapter 3

Clock PCycle

1212121212121212

Pipeline Stage

State

IF IS RF EX DF DS TC WB

ITM ICM CPBE DCM

Stall*

*MP stalls can occur at any stage; they are not associated with any instruction or pipe stage

SXT WA STI

IF IS RF EX DF DS TC WB

LDI MultB

Slip

DivB MDOne ShSlip FCBsy

IF IS RF EX DF DS TC WB

ITLB Intr OVF DTLB DBE

IBE FPE TLBMod Watch IVACoh ExTrap DVACoh

Exceptions

II DECCErr BP NMI SC Reset CUn IECCErr

Figure 3-6 Correspondence of Pipeline Stage to Interlock Condition

50 MIPS R4000 Microprocessor User's Manual

Page 81

The CPU Pipeline

Table 3-1 Pipeline Exceptions

Exception Description

ITLB Instruction Translation or Address Exception Intr External Interrupt IBE IBus Error IVACoh IVA Coherent II Illegal Instruction BP Breakpoint SC System Call CUn Coprocessor Unusable IECCErr Instruction ECC Error OVF Integer Overflow FPE FP Interrupt ExTrap EX Stage Traps DTLB Data Translation or Address Exception TLBMod TLB Modified DBE Data Bus Error Watch Memory Reference Address Compare DVACoh DVA Coherent DECCErr Data ECC Error NMI Non-maskable Interrupt Reset Reset

MIPS R4000 Microprocessor User's Manual 51

Page 82

Chapter 3

Interlock Description

ITM Instruction TLB Miss ICM Instruction Cache Miss CPBE Coprocessor Possible Exception SXT Integer Sign Extend STI Store Interlock DCM Data Cache Miss WA Watch Address Exception LDI Load Interlock MultB Multiply Unit Busy DivB Divide Unit Busy MDOne Mult/Div One Cycle Slip ShSlip Var Shift or Shift > 32 bits FCBsy FP Busy

Exception Conditions

Table 3-2 Pipeline Interlocks

When an exception condition occurs, the relevant instruction and all those that follow it in the pipeline are cancelled. Accordingly, any stall conditions and any later exception conditions that may have referenced this instruction are inhibited; there is no benefit in servicing stalls for a cancelled instruction.

After instruction cancellation, a new instruction stream begins, starting execution at a predefined exception vector. System Control Coprocessor registers are loaded with information that identifies the type of exception and auxiliary information such as the virtual address at which translation exceptions occur.

52 MIPS R4000 Microprocessor User's Manual

Page 83

Stall Conditions

Often, a stall condition is only detected after parts of the pipeline have advanced using incorrect data; this is called apipeline overrun. When a stall condition is detected, all eight instructions—each different stage of the pipeline—are frozen at once. In this stalled state, no pipeline stages can advance until the interlock condition is resolved.

Once the interlock is removed, the restart sequence begins two cycles before the pipeline resumes execution. The restart sequence reverses the pipeline overrun by inserting the correct information into the pipeline.

Slip Conditions

When a slip condition is detected, pipeline stages that must advance to resolve the dependency continue to be retired (completed), while dependent stages are held until the required data is available.

External Stalls

External stall is another class of interlocks. An external stall originates outside the processor and is not referenced to a particular pipeline stage. This interlock is not affected by exceptions.

The CPU Pipeline

Interlock and Exception Timing

To prevent interlock and exception handling from adversely affecting the processor cycle time, the R4000 processor uses both logic and circuit pipeline techniques to reduce critical timing paths. Interlock and exception handling have the following effects on the pipeline:

• In some cases, the processor pipeline must be backed up (reversed and started over again from a prior stage) to recover from interlocks.

• In some cases, interlocks are serviced for instructions that will be aborted, due to an exception.

These two cases are discussed below.

MIPS R4000 Microprocessor User's Manual 53

Page 84

Chapter 3

Backing Up the Pipeline

An example of pipeline back-up occurs in a data cache miss, in which the late detection of the miss causes a subsequent instruction to compute an incorrect result.

When this occurs, not only must the cache miss be serviced but the EX stage of the dependent instruction must be re-executed before the pipeline can be restarted. Figure 3-7 illustrates this procedure; a minus (–) after the pipeline stage descriptor (for instance, EX–) indicates the operation produced an incorrect result, while a plus (+) indicates the successful re-execution of that operation.

Cycle

Restart

Load

ALU

Run Run Run Run Run Run Run Stl Stl Stl Stl Stl Run Run Run Run Run

Rst2 Rst1

IF IS RF EX DF DS TC DF DS TC WB

IF IS RF EX DF DS DF DS TC WB

IF IS RF EX DF DF DS TC WB

IF IS RF EX- RF EX+ DF DS TC WB

IF IS RF EX DF DS TC WB

Figure 3-7 Pipeline Overrun

54 MIPS R4000 Microprocessor User's Manual

Page 85

Aborting an Instruction Subsequent to an Interlock

The interaction between an integer overflow and an instruction cache miss is an example of an interlock being serviced for an instruction that is subsequently aborted.

In this case, pipelining the overflow exception handling into the DF stage allows an instruction cache miss to occur on the next immediate instruction. Figure 3-8 illustrates this; aborted instructions are indicated with an asterisk (*).

The CPU Pipeline

Cycle

Stall

Restart

ALU

Run Run Run Run Stl Stl Stl Stl Stl Run Run Run Run Run Run Run

InstrCacheMiss

Rst2 Rst1

IF IS RF EX DF DS TC WB*

OVF

IF IS RF IF IS RF EX DF DS TC WB*

ICM

IF IS IF IS RF EX DF DS TC WB*

IF IF IS RF EX DF DS TC WB*

Figure 3-8 Instruction Cache Miss

Even though the line brought in by the instruction cache could have been replaced by a line of the exception handler, no performance loss occurs, since the instruction cache miss would have been serviced anyway, after returning from the exception handler. Handling of the exception is done in this fashion because the frequency of an exception occurring is, by definition, relatively low.

MIPS R4000 Microprocessor User's Manual 55

Page 86

Chapter 3

Pipelining the Exception Handling

Pipelining of interlock and exception handling is done by pipelining the logical resolution of possible fault conditions with the buffering and distributing of the pipeline control signals.

In particular, a half clock period is provided for buffering and distributing the run control signal; during this time the logic evaluation to produce run for the next cycle begins. Figure 3-9 shows this process for a sequence of loads.

Clock

Phase

Load1:

Load2:

Load3:

DF DS TC WB

1212121212

TagCk Resolve Buffer

DF DS TC WB

TagCk Resolve Buffer

DF DS TC WB

TagCk Resolve Buffer

Figure 3-9 Pipelining of Interlock and Exception Handling

56 MIPS R4000 Microprocessor User's Manual

Page 87

Clock

The CPU Pipeline

The decision whether or not to advance the pipeline is derived from these three rules:

• All possible fault-causing events, such as cache misses, translation exceptions, load interlocks, etc., must be individually evaluated.

• The fault to be serviced is selected, based on a predeﬁned priority as determined by the pipeline stage of the asserted faults.

• Pipeline advance control signals are buffered and distributed.

Figure 3-10 illustrates this process.

Phase

Cycle

Run Run Run Run

Evaluate Resolve Buffer

12 1212

Evaluate Resolve Buffer

Figure 3-10 Pipeline Advance Decision

MIPS R4000 Microprocessor User's Manual 57

Page 88

Chapter 3

Special Cases

Performance Considerations

In some instances, the pipeline control state machine is bypassed. This occurs due to performance considerations or to correctness considerations, which are described in the following sections.

A performance consideration occurs when there is a cache load miss. By bypassing the pipeline state machine, it is possible to eliminate up to two cycles of load miss latency. Two techniques, address acceleration and address prediction, increase performance.

Address Acceleration

Address acceleration bypasses a potential cache miss address. It is relatively straightforward to perform this bypass since sending the cache miss address to the secondary cache has no negative impact even if a subsequent exception nullifies the effect of this cache access. Power is wasted when the miss is inhibited by some fault, but this is a minor effect.

Address Prediction

Another technique used to reduce miss latency is the automatic increment and transmission of instruction miss addresses following an instruction cache miss. This form of latency reduction is called address prediction: the subsequent instruction miss address is predicted to be a simple increment of the previous miss address. Figure 3-11 shows a cache miss in which the cache miss address is changed based on the detection of the miss.

Cycle

Address

Restart

Load

Run Run Run Run Run Run Run Stl Stl Stl Stl Stl Stl Stl Stl Run

Cache Index

Rst1

Rst2

Rst3

IF IS RF EX DF DS TC DF DS TC WB

Figure 3-11 Load Address Bypassing

Correctness Considerations

An example in which bypassing is necessary to guarantee correctness is a cache write.

58 MIPS R4000 Microprocessor User's Manual

Page 89

3.6 R4400 Processor Uncached Store Buffer

The R4400 processor contains an uncached store buffer to improve the performance of uncached stores over that available from an R4000 processor. When an uncached store reaches the write-back (WB) stage in the CPU pipeline, the CPU must stall until the store is sent off-chip. In the R4400 processor, a single-entry buffer stores this uncached WB-stage data on the chip without stalling the pipeline.

If a second uncached store reaches the WB stage in the R4400 processor before the first uncached store has been moved off-chip, the CPU stalls until the store buffer completes the first uncached store. To avoid this stall, the compiler can insert seven instruction cycles between the two uncached stores, as shown in Figure 3-12. A single instruction that requires seven cycles to complete could be used in place of the seven No Operation (NOP) instructions.

SW R2, (r3) # uncached store NOP # NOP 1 NOP # NOP 2 NOP # NOP 3 NOP # NOP 4 NOP # NOP 5 NOP # NOP 6 NOP # NOP 7 SW R2, (R3) # uncached store

The CPU Pipeline

Figure 3-12 Pipeline Sequence for Back-to-Back Uncached Stores

If the two uncached stores execute within a loop, the two killed instructions which are part of the loop branch latency are included in the count of seven interpolated cycles. Figure 3-13 shows the four NOP instructions that need to be scheduled in this case.

MIPS R4000 Microprocessor User's Manual 59

Page 90

Chapter 3

Loop: SW R2, (R3) # uncached store

NOP NOP NOP B Loop # branch to loop NOP killed # branch latency killed # branch latency

Figure 3-13 Back-to-Back Uncached Stores in a Loop

The timing requirements of the System interface govern the latency between uncached stores; back-to-back stores can be sent across the interface at a maximum rate of one store for every four external cycles. If the R4400 processor is programmed to run in divide-by-2 mode (for more information about divided clock, see the description of SClock in Chapter

10), an uncached store can occur every eight pipeline cycles. If a larger clock divisor is used, more pipeline cycles are required for each store.

CAUTION: The R4000 processor always had a strongly-ordered execution; however, with the addition of the uncached store buffer in the R4400 there is a potential for out-of-order execution (described in the section of the same name in Chapter 11, and Uncached Loads or Stores in Chapter 12).

60 MIPS R4000 Microprocessor User's Manual

Page 91

Memory Management

The MIPS R4000 processor provides a full-featured memory management unit (MMU) which uses an on-chip translation lookaside buffer (TLB) to translate virtual addresses into physical addresses.

This chapter describes the processor virtual and physical address spaces, the virtual-to-physical address translation, the operation of the TLB in making these translations, and those System Control Coprocessor (CP0) registers that provide the software interface to the TLB.

MIPS R4000 Microprocessor User's Manual 61

Page 92

Chapter 4

4.1 Translation Lookaside Buffer (TLB)

Mapped virtual addresses are translated into physical addresses using an on-chip TLB.† The TLB is a fully associative memory that holds 48 entries, which provide mapping to 48 odd/even page pairs (96 pages). When address mapping is indicated, each TLB entry is checked simultaneously for a match with the virtual address that is extended with an ASID stored in the EntryHi register.

The address mapped to a page ranges in size from 4 Kbytes to 16 Mbytes, in multiples of 4—that is, 4K, 16K, 64K, 256K, 1M, 4M, 16M.

Hits and Misses

If there is a virtual address match, or hit, in the TLB, the physical page number is extracted from the TLB and concatenated with the offset to form the physical address (see Figure 4-1).

If no match occurs (TLB miss), an exception is taken and software refills the TLB from the page table resident in memory. Software can write over a selected TLB entry or use a hardware mechanism to write into a random entry.

Multiple Matches

If more than one entry in the TLB matches the virtual address being translated, the operation is undeﬁned. To prevent permanent damage to the part, the TLB may be disabled if more than several entries match. The TLB-Shutdown (TS) bit in the Status register is set to 1 if the TLB is disabled.

† There are virtual-to-physical address translations that occur outside of the TLB. For

example, addresses in the kseg0 and kseg1 spaces are unmapped translations. In these spaces the physical address is derived by subtracting the base address of the space from the virtual address.

62 MIPS R4000 Microprocessor User's Manual

Page 93

4.2 Address Spaces

This section describes the virtual and physical address spaces and the manner in which virtual addresses are converted or “translated” into physical addresses in the TLB.

Virtual Address Space

The processor virtual address can be either 32 or 64 bits wide,† depending on whether the processor is operating in 32-bit or 64-bit mode.

• In 32-bit mode, addresses are 32 bits wide. The maximum user process size is 2 gigabytes (231).

• In 64-bit mode, addresses are 64 bits wide. The maximum user process size is 1 terabyte (240).

Figure 4-1 shows the translation of a virtual address into a physical address.

Memory Management

1. Virtual address (VA) represented by the virtual page number (VPN) is compared with tag in TLB.

2. If there is a match, the page frame number (PFN) representing the upper bits of the physical address (PA) is output from the TLB.

3. The Offset, which does not pass through the TLB, is then concatenated to the PFN.

Figure 4-1 Overview of a Virtual-to-Physical Address Translation

TLB

Virtual address

ASID

VPN

Offset

TLB

Entry

PFN

Offset

Physical address

† Figure 4-8 shows the 32-bit and 64-bit versions of the processor TLB entry.

MIPS R4000 Microprocessor User's Manual 63

Page 94

Chapter 4

As shown in Figures 4-2 and 4-3, the virtual address is extended with an 8-bit address space identifier (ASID), which reduces the frequency of TLB flushing when switching contexts. This 8-bit ASID is in the CP0 EntryHi register, described later in this chapter. TheGlobal bit (G) is in the EntryLo0 and EntryLo1 registers, described later in this chapter.

Physical Address Space

Using a 36-bit address, the processor physical address space encompasses 64 gigabytes. The section following describes the translation of a virtual address to a physical address.

Virtual-to-Physical Address Translation

Converting a virtual address to a physical address begins by comparing the virtual address from the processor with the virtual addresses in the TLB; there is a match when the virtual page number (VPN) of the address is the same as the VPN field of the entry, and either:

• the Global (G) bit of the TLB entry is set, or

• the ASID ﬁeld of the virtual address is the same as the ASID ﬁeld of the TLB entry.

This match is referred to as a TLB hit. If there is no match, a TLB Miss exception is taken by the processor and software is allowed to refill the TLB from a page table of virtual/physical addresses in memory.

If there is a virtual address match in the TLB, the physical address is output from the TLB and concatenated with the Offset, which represents an address within the page frame space. The Offset does not pass through the TLB.

Virtual-to-physical translation is described in greater detail throughout the remainder of this chapter; Figure 4-20 is a flow diagram of the process shown at the end of this chapter.

The next two sections describe the 32-bit and 64-bit address translations.

64 MIPS R4000 Microprocessor User's Manual

Page 95

32-bit Mode Address Translation

Figure 4-2 shows the virtual-to-physical-address translation of a 32-bit mode address.

• The top portion of Figure 4-2 shows a virtual address with a 12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 20 bits of the address represent the VPN, and index the 1M-entry page table.

• The bottom portion of Figure 4-2 shows a virtual address with a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining 8 bits of the address represent the VPN, and index the 256entry page table.

Virtual Address with 1M (220) 4-Kbyte pages

28 11 0

2931

3239

20 bits = 1M pages

Memory Management

ASID

Bits 31, 30 and 29 of the virtual address select user, supervisor, or kernel address spaces.

Virtual-to-physical translation in TLB

ASID

8 bits = 256 pages

Virtual Address with 256 (28)16-Mbyte pages

Figure 4-2 32-bit Mode Virtual Address Translation

VPN

20 12

Virtual-to-physical translation in TLB

TLB

36-bit Physical Address

35 0

PFN

TLB

Offset

28 293132

VPN

8 24

Offset

Offset passed unchanged to physical memory

Offset

Offset passed unchanged to physical memory

MIPS R4000 Microprocessor User's Manual 65

Page 96

Chapter 4

64-bit Mode Address Translation

Figure 4-3 shows the virtual-to-physical-address translation of a 64-bit mode address. This figure illustrates the two extremes in the range of possible page sizes: a 4-Kbyte page (12 bits) and a 16-Mbyte page (24 bits).

• The top portion of Figure 4-3 shows a virtual address with a 12-bit, or 4-Kbyte, page size, labelled Offset. The remaining 28 bits of the address represent the VPN, and index the 256Mentry page table.

• The bottom portion of Figure 4-3 shows a virtual address with a 24-bit, or 16-Mbyte, page size, labelled Offset. The remaining 16 bits of the address represent the VPN, and index the 64Kentry page table.

Virtual Address with 256M (228) 4-Kbyte pages

6471

6162 40 39

28 bits = 256M pages

11 0

ASID

Bits 62 and 63 of the virtual address select user, supervisor, or kernel address spaces.

0 or -1

Virtual-to-physical translation in TLB

636471 6162 40 24

ASID

Figure 4-3 64-bit Mode Virtual Address Translation

VPN Offset

TLB

36-bit Physical Address

35 0

PFN

Virtual-to-physical translation in TLB

Offset

TLB

0 or -1

Virtual Address with 64K (216)16-Mbyte pages

VPN

16 bits = 64K pages

23 0

Offset

Offset passed unchanged to physical memory

66 MIPS R4000 Microprocessor User's Manual

Page 97

Operating Modes

The processor has three operating modes that function in both 32- and 64bit operations:

These modes are described in the next three sections.

User Mode Operations

In User mode, a single, uniform virtual address space—labelled User segment—is available; its size is:

Figure 4-4 shows User mode virtual address space.

0x FFFF FFFF

Memory Management

• User mode

• Supervisor mode

• Kernel mode

• 2 Gbytes (231 bytes) in 32-bit mode (useg)

• 1 Tbyte (240bytes) in 64-bit mode (xuseg)

32-bit* 64-bit

0x FFFF FFFF FFFF FFFF

Address

Error

1 TB

Mapped

0x 8000 0000

0x 0000 0000

Address

Error

2 GB

Mapped

0x 0000 0100 0 0 00 0000

useg xuseg

0x 0000 0000 0000 0000

Figure 4-4 User Mode Virtual Address Space

*NOTE: The R4000 uses 64-bit addresses internally. When the kernel is running in Kernel mode, it initializes registers before switching modes, and saves (or restores, whichever is appropriate) register values on context switches. In 32-bit mode, a valid address must be a 32-bit signed number, where bits 63:32 = bit 31. In normal operation it is not possible for a 32-bit User-mode program to produce invalid addresses. However, although it would be an error, it is possible for a Kernel-mode program to erroneously place a value that is not a 32-bit signed number into a 64-bit register, in which case the User-mode program generates an invalid address.

MIPS R4000 Microprocessor User's Manual 67

Page 98

Chapter 4

The User segment starts at address 0 and the current active user process resides in either useg (in 32-bit mode) or xuseg (in 64-bit mode). The TLB identically maps all references to useg/xuseg from all modes, and controls cache accessibility.

†

The processor operates in User mode when the Status register contains the following bit-values:

• KSU bits = 10

• EXL = 0

• ERL = 0

In conjunction with these bits, the UX bit in the Status register selects between 32- or 64-bit User mode addressing as follows:

• when UX = 0, 32-bit useg space is selected and TLB misses are handled by the 32-bit TLB reﬁll exception handler

• when UX = 1, 64-bit xuseg space is selected and TLB misses are handled by the 64-bit XTLB reﬁll exception handler

Table 4-1 lists the characteristics of the two user mode segments, useg and

xuseg.

Table 4-1 32-bit and 64-bit User Mode Segments

Status Register

Address Bit

Values

Segment

Name

Address Range Segment SizeBit Values

KSU EXL ERL UX

32-bit

A(31) = 0

64-bit

A(63:40) = 0

† The cached (C) ﬁeld in a TLB entry determines whether the reference is cached; see Figur e

4-8.

68 MIPS R4000 Microprocessor User's Manual

102000useg

102001xuseg

0x0000 0000

through

0x7FFF FFFF

0x0000 0000 0000 0000

through

0x0000 00FF FFFF FFFF

2 Gbyte

(231 bytes)

1 Tbyte

(240 bytes)

Page 99

Memory Management

32-bit User Mode (useg)

In User mode, when UX = 0 in the Status register, User mode addressing is compatible with the 32-bit addressing model shown in Figure 4-4, and a 2-Gbyte user address space is available, labelled useg.

All valid User mode virtual addresses have their most-significant bit cleared to 0; any attempt to reference an address with the most-significant bit set while in User mode causes an Address Error exception.

The system maps all references to useg through the TLB, and bit settings within the TLB entry for the page determine the cacheability of a reference.

64-bit User Mode (xuseg)

In User mode, when UX =1 in theStatus register, User mode addressing is extended to the 64-bit model shown in Figure 4-4. In 64-bit User mode, the processor provides a single, uniform address space of 240 bytes, labelled xuseg.

All valid User mode virtual addresses have bits 63:40 equal to 0; an attempt to reference an address with bits 63:40 not equal to 0 causes an Address Error exception.

Supervisor Mode Operations

Supervisor mode is designed for layered operating systems in which a true kernel runs in R4000 Kernel mode, and the rest of the operating system runs in Supervisor mode.

The processor operates in Supervisor mode when the Status register contains the following bit-values:

• KSU = 01

• EXL =0

• ERL = 0

In conjunction with these bits, the SX bit in the Status register selects between 32- or 64-bit Supervisor mode addressing:

• when SX = 0, 32-bit supervisor space is selected and TLB misses are handled by the 32-bit TLB reﬁll exception handler

• when SX = 1, 64-bit supervisor space is selected and TLB misses are handled by the 64-bit XTLB reﬁll exception handler

MIPS R4000 Microprocessor User's Manual 69

Page 100

Chapter 4

Figure 4-5 shows Supervisor mode address mapping. Table 4-2 lists the characteristics of the supervisor mode segments; descriptions of the address spaces follow.

0x FFFF FFFF 0x E000 0000

0x C000 0000

0x A000 0000

0x 8000 0000

0x 0000 0000

32-bit*

Address

error

0.5 GB

Mapped

Address

error

Address

error

0x FFFF FFFF FFFF FFFF 0x FFFF FFFF E000 0000

sseg

0x FFFF FFFF C000 0000

0x 4000 0100 0000 0000

0x 4000 0000 0000 0000

64-bit

Address

error

0.5 GB

Mapped

Address

error 1 TB

Mapped

csseg

xsseg

Address

2 GB

Mapped

suseg

0x 0000 0100 0000 0000

0x 0000 0000 0000 0000

error

1 TB

Mapped

xsuseg

Figure 4-5 Supervisor Mode Address Space

*NOTE: The R4000 uses 64-bit addresses internally. In 32-bit mode, a valid address must be a 32-bit signed number, where bits 63:32 = bit

31. In normal operation it is not possible for a 32-bit Supervisor-mode

program to create an invalid address through arithmetic operations. However 32-bit-mode Supervisor programs must not create addresses using base register+offset calculations that produce a 32-bit 2’s- complement overflow; in specific, there are two prohibited cases:

• offset with bit 15 = 0 and base register with bit 31 = 0, but (base register+offset) bit 31 = 1

• offset with bit 15 = 1 and base register with bit 31 = 1, but (base register+offset) bit 31 = 0

Using this invalid address produces an undefined result.

70 MIPS R4000 Microprocessor User's Manual