MIPS MIPS32 74K, MIPS32 74Kf, MIPS32 74Kc Programming Manual

Programming the MIPS32® 74K™ Core
Family
Document Number: MD00541
Revision 02.14
March 30, 2011
Unpublished rights (if any) reserved under the copyright laws of the United States of America and other countries.
This document contains information that is proprietary to MIPS Tech, LLC, a Wave Computing company (“MIPS”) and MIPS’ affiliates as applicable. Any copying, reproducing, modifying or use of this information (in whole or in part) that is not expressly permitted in writing by MIPS or MIPS’ affiliates as applicable or an authorized third party is strictly prohibited. At a minimum, this information is protected under unfair competition and copyright laws. Violations thereof may result in criminal penalties and fines. Any document provided in source format (i.e., in a modifiable form such as in FrameMaker or Microsoft Word format) is subject to use and distribution restrictions that are independent of and supplemental to any and all confidentiality restrictions. UNDER NO CIRCUMSTANCES MAY A DOCUMENT PROVIDED IN SOURCE FORMAT BE DISTRIBUTED TO A THIRD PARTY IN SOURCE FORMAT WITHOUT THE EXPRESS WRITTEN PERMISSION OF MIPS (AND MIPS’ AFFILIATES AS APPLICABLE) reserve the right to change the information contained in this document to improve function, design or otherwise.
MIPS and MIPS’ affiliates do not assume any liability arising out of the application or use of this information, or of any error or omission in such information. Any warranties, whether express, statutory, implied or otherwise, including but not limited to the implied warranties of merchantability or fitness for a particular purpose, are excluded. Except as expressly provided in any written license agreement from MIPS or an authorized third party, the furnishing of this document does not give recipient any license to any intellectual property rights, including any patent rights, that cover the information in this document.
The information contained in this document shall not be exported, reexported, transferred, or released, directly or indirectly, in violation of the law of any country or international law, regulation, treaty, Executive Order, statute, amendments or supplements thereto. Should a conflict arise regarding the export, reexport, transfer, or release of the information contained in this document, the laws of the United States of America shall be the governing law.
The information contained in this document constitutes one or more of the following: commercial computer software, commercial computer software documentation or other commercial items. If the user of this information, or any related documentation of any kind, including related technical data or manuals, is an agency, department, or other entity of the United States government ("Government"), the use, duplication, reproduction, release, modification, disclosure, or transfer of this information, or any related documentation of any kind, is restricted in accordance with Federal Acquisition Regulation 12.212 for civilian agencies and Defense Federal Acquisition Regulation Supplement 227.7202 for military agencies. The use of this information by the Government is further restricted in accordance with the terms of the license agreement(s) and/or applicable contract terms and conditions covering this information from MIPS Technologies or an authorized third party.
MIPS, MIPS I, MIPS II, MIPS III, MIPS IV, MIPS V, MIPSr3, MIPS32, MIPS64, microMIPS32, microMIPS64, MIPS-3D, MIPS16, MIPS16e, MIPS-Based, MIPSsim, MIPSpro, MIPS-VERIFIED, Aptiv logo, microAptiv logo, interAptiv logo, microMIPS logo, MIPS Technologies logo, MIPS-VERIFIED logo, proAptiv logo, 4K, 4Kc, 4Km, 4Kp, 4KE, 4KEc, 4KEm, 4KEp, 4KS, 4KSc, 4KSd, M4K, M14K, 5K, 5Kc, 5Kf, 24K, 24Kc, 24Kf, 24KE, 24KEc, 24KEf, 34K, 34Kc, 34Kf, 74K, 74Kc, 74Kf, 1004K, 1004Kc, 1004Kf, 1074K, 1074Kc, 1074Kf, R3000, R4000, R5000, Aptiv, ASMACRO, Atlas, "At the core of the user experience.", BusBridge, Bus Navigator, CLAM, CorExtend, CoreFPGA, CoreLV, EC, FPGA View, FS2, FS2 FIRST SILICON SOLUTIONS logo, FS2 NAVIGATOR, HyperDebug, HyperJTAG, IASim, iFlowtrace, interAptiv, JALGO, Logic Navigator, Malta, MDMX, MED, MGB, microAptiv, microMIPS, Navigator, OCI, PDtrace, the Pipeline, proAptiv, Pro Series, SEAD-3, SmartMIPS, SOC-it, and YAMON are trademarks or registered trademarks of MIPS and MIPS’ affiliates as applicable in the United States and other countries.
All other trademarks referred to herein are the property of their respective owners.
WƌŽŐƌĂŵŵŝŶŐ ƚŚĞ D/W^ϯϮΠ ϳϰ<Ρ ŽƌĞ &ĂŵŝůLJ ZĞǀŝƐŝŽŶ ϬϮϭϰ
Table of Contents
Chapter 1: Introduction........................................................................................................................11
1.1: Chapters of this manual............................................................................................................................. 12
1.2: Conventions............................................................................................................................................... 12
1.3: 74K™ core features................................................................................................................................... 13
1.4: A brief guide to the 74K core implementation ........................................................................................ 14
1.4.1: Notes on pipeline overview diagram (Figure 1.1):............................................................................ 14
1.4.2: Branches and branch delays............................................................................................................17
1.4.3: Loads and load-to-use delays.......................................................................................................... 18
1.4.4: Queues, Resource limits and Consequences.................................................................................. 19
Chapter 2: Initialization and identity...................................................................................................21
2.1: Probing your CPU - Config CP0 registers.................................................................................................21
2.1.1: The Config register........................................................................................................................... 22
2.1.2: The Config1-2 registers.................................................................................................................... 23
2.1.3: The Config3 register......................................................................................................................... 24
2.1.4: The Config6 register......................................................................................................................... 25
2.1.5: CPU-specific configuration — Config7.............................................................................................26
2.2: PRId register — identifying your CPU type ............................................................................................... 26
Chapter 3: Memory map, caching, reads, writes and translation ....................................................29
3.1: The memory map ...................................................................................................................................... 29
3.2: Fixed mapping option................................................................................................................................30
3.3: Reads, writes and synchronization............................................................................................................ 30
3.3.1: Read/write ordering and cache/memory data queues in the 74K core......................................... 30
3.3.2: The “sync” instruction in 74K family cores....................................................................................31
3.3.3: Write gathering and “write buffer flushing” in 74K family cores..................................................... 32
3.4: Caches ...................................................................................................................................................... 32
3.4.1: The L2 cache option.........................................................................................................................32
3.4.2: Cacheability options......................................................................................................................... 33
3.4.3: Uncached accelerated writes........................................................................................................... 34
3.4.4: The cache instruction and software cache management.................................................................34
3.4.5: Cache instructions and CP0 cache tag/data registers.....................................................................35
3.4.6: L1 Cache instruction timing..............................................................................................................37
3.4.7: L2 cache instruction timing............................................................................................................... 37
3.4.8: Cache management when writing instructions - the “synci” instruction ........................................... 37
3.4.9: Cache aliases...................................................................................................................................38
3.4.10: Cache locking.................................................................................................................................39
3.4.11: Cache initialization and tag/data registers ..................................................................................... 39
3.4.12: L23TagLo Regiser..........................................................................................................................40
3.4.13: L23DataLo Register.......................................................................................................................40
3.4.14: L23DataHi Register........................................................................................................................40
3.4.15: TagLo registers in special modes .................................................................................................. 41
3.4.16: Parity error exception handling and the CacheErr register............................................................41
3.4.17: ErrCtl register................................................................................................................................. 42
3.5: Bus error exception ................................................................................................................................... 43
3.6: Scratchpad memory/SPRAM..................................................................................................................... 44
3.7: Common Device Memory Map..................................................................................................................46
3 Programming the MIPS32® 74K™ Core Family, Revision 02.14
3.8: The TLB and translation............................................................................................................................47
3.8.1: A TLB entry......................................................................................................................................47
3.8.2: Live translation and micro-TLBs....................................................................................................... 48
3.8.3: Reading and writing TLB entries: Index, Random and Wired..........................................................48
3.8.4: Reading and writing TLB entries - EntryLo0-1, EntryHi and PageMask registers............................ 49
3.8.5: TLB initialization and duplicate entries.............................................................................................50
3.8.6: TLB exception handlers — BadVaddr, Context, and ContextConfig registers.................................51
Chapter 4: Programming the 74K™ core in user mode....................................................................55
4.1: User-mode accessible “Hardware registers”.............................................................................................55
4.2: Prefetching data ........................................................................................................................................ 56
4.3: Using “synci” when writing instructions...................................................................................................... 56
4.4: The multiplier.............................................................................................................................................57
4.5: Tuning software for the 74K family pipeline ........................................................................................... 58
4.5.1: Cache delays and mitigating their effect..........................................................................................58
4.5.2: Branch delay slot..............................................................................................................................59
4.6: Tuning floating-point..................................................................................................................................59
4.7: Branch misprediction delays...................................................................................................................... 60
4.8: Load delayed by (unrelated) recent store.................................................................................................. 60
4.9: Minimum load-miss penalty.......................................................................................................................60
4.10: Data dependency delays.........................................................................................................................61
4.10.1: More complicated dependencies ................................................................................................... 64
4.11: Advice on tuning instruction sequences (particularly DSP).....................................................................65
4.12: Multiply/divide unit and timings................................................................................................................ 65
Chapter 5: Kernel-mode (OS) programming and Release 2 of the MIPS32® Architecture............67
5.1: Hazard barrier instructions ........................................................................................................................ 67
5.2: MIPS32® Architecture Release 2 - enhanced interrupt system(s)............................................................ 68
5.2.1: Traditional MIPS® interrupt signalling and priority........................................................................... 69
5.2.2: VI mode - multiple entry points, interrupt signalling and priority.......................................................70
5.2.3: External Interrupt Controller (EIC) mode.......................................................................................... 70
5.3: Exception Entry Points .............................................................................................................................. 71
5.3.1: Summary of exception entry points..................................................................................................72
5.4: Shadow registers....................................................................................................................................... 73
5.5: Saving Power ............................................................................................................................................ 75
5.6: The HWREna register - Control user rdhwr access .................................................................................. 75
Chapter 6: Floating point unit..............................................................................................................77
6.1: Data representation...................................................................................................................................77
6.2: Basic instruction set................................................................................................................................... 78
6.3: Floating point loads and stores.................................................................................................................. 79
6.4: Setting up the FPU and the FPU control registers .................................................................................... 79
6.4.1: IEEE options .................................................................................................................................... 79
6.4.2: FPU “unimplemented” exceptions (and how to avoid them)............................................................79
6.4.3: FPU control register maps ............................................................................................................... 80
6.5: FPU pipeline and instruction timing...........................................................................................................82
6.5.1: FPU register dependency delays..................................................................................................... 84
6.5.2: Delays caused by long-latency instructions looping in the M1 stage............................................... 84
6.5.3: Delays on FP load and store instructions......................................................................................... 84
6.5.4: Delays when main pipeline waits for FPU to decide not to take an exception................................. 84
6.5.5: Delays when main pipeline waits for FPU to accept an instruction..................................................85
6.5.6: Delays on mfc1/mtc1 instructions .................................................................................................... 85
Programming the MIPS32® 74K™ Core Family, Revision 02.14 4
6.5.7: Delays caused by dependency on FPU status register fields.......................................................... 85
6.5.8: Slower operation in MIPS I™ compatibility mode............................................................................85
Chapter 7: The MIPS32® DSP ASE .....................................................................................................87
7.1: Features provided by the MIPS® DSP ASE..............................................................................................87
7.2: The DSP ASE control register...................................................................................................................88
7.2.1: DSP accumulators ........................................................................................................................... 89
7.3: Software detection of the DSP ASE..........................................................................................................89
7.4: DSP instructions........................................................................................................................................90
7.4.1: Hints in instruction names................................................................................................................ 90
7.4.2: Arithmetic - 64-bit............................................................................................................................. 91
7.4.3: Arithmetic - saturating and/or SIMD Types......................................................................................91
7.4.4: Bit-shifts - saturating and/or SIMD types.......................................................................................... 91
7.4.5: Comparison and “conditional-move” operations on SIMD types......................................................91
7.4.6: Conversions to and from SIMD types .............................................................................................. 92
7.4.7: Multiplication - SIMD types with result in GP register ...................................................................... 92
7.4.8: Multiply Q15s from paired-half and accumulate...............................................................................93
7.4.9: Load with register + register address............................................................................................... 93
7.4.10: DSPControl register access........................................................................................................... 93
7.4.11: Accumulator access instructions....................................................................................................94
7.4.12: Dot products and building blocks for complex multiplication..........................................................94
7.4.13: Other DSP ASE instructions .......................................................................................................... 95
7.5: Macros and typedefs for DSP instructions ................................................................................................ 95
7.6: Almost Alphabetically-ordered table of DSP ASE instructions..................................................................96
7.7: DSP ASE instruction timing.....................................................................................................................100
Chapter 8: 74K™ core features for debug and profiling.................................................................102
8.1: EJTAG on-chip debug unit ...................................................................................................................... 102
8.1.1: Debug communications through JTAG..........................................................................................103
8.1.2: Debug mode...................................................................................................................................103
8.1.3: Exceptions in debug mode.............................................................................................................104
8.1.4: Single-stepping .............................................................................................................................. 104
8.1.5: The “dseg” memory decode region................................................................................................ 104
8.1.6: EJTAG CP0 registers, particularly Debug......................................................................................106
8.1.7: The DCR (debug control) memory-mapped register......................................................................108
8.1.8: The DebugVectorAddr memory-mapped register..........................................................................110
8.1.9: JTAG-accessible registers.............................................................................................................110
8.1.10: Fast Debug Channel....................................................................................................................112
8.1.11: EJTAG breakpoint registers......................................................................................................... 115
8.1.12: Understanding breakpoint conditions...........................................................................................117
8.1.13: Imprecise debug breaks...............................................................................................................118
8.1.14: PC Sampling with EJTAG............................................................................................................118
8.1.15: JTAG-accessible and memory-mapped PDtrace TCB Registers ................................................ 119
8.2: PDtrace™ instruction trace facility........................................................................................................... 121
8.2.1: 74K core-specific fields in PDtrace™ JTAG-accessible registers..................................................121
8.2.2: CP0 registers for the PDtrace™ logic............................................................................................123
8.2.3: JTAG triggers and local control through TraceIBPC/TraceDBPC..................................................125
8.2.4: UserTraceData1 reg and UserTraceData2 reg.............................................................................. 126
8.2.5: Summary of when trace happens .................................................................................................. 126
8.3: CP0 Watchpoints..................................................................................................................................... 128
8.3.1: The WatchLo0-3 registers..............................................................................................................128
8.3.2: The WatchHi0-3 registers .............................................................................................................. 128
8.4: Performance counters.............................................................................................................................129
5 Programming the MIPS32® 74K™ Core Family, Revision 02.14
8.4.1: Reading the event table.................................................................................................................130
Appendix A: References ....................................................................................................................135
Appendix B: CP0 register summary and reference.........................................................................137
B.1: Miscellaneous CP0 register descriptions................................................................................................140
B.1.1: Status register................................................................................................................................141
B.1.2: The UserLocal register .................................................................................................................. 143
B.1.3: Exception control: Cause and EPC registers................................................................................. 143
B.1.3.1: The Cause register............................................................................................................... 143
B.1.4: The EPC register...........................................................................................................................145
B.1.5: Count and Compare ...................................................................................................................... 145
B.2: Registers for CPU Configuration............................................................................................................. 145
B.2.1: The Config7 register......................................................................................................................145
B.3: Registers for Cache Diagnostics............................................................................................................. 148
B.3.1: Different views of ITagLo/DTagLo.................................................................................................148
B.3.2: Dual (virtual and physical) tags in the 74K core D-cache — DTagHi register...............................149
B.3.3: Pre-decode information in the I-cache - the ITagHi Register......................................................... 149
B.3.4: The DDataLo, IDataHi and IDataLo registers................................................................................ 150
B.3.5: The ErrorEPC register...................................................................................................................150
Appendix C: MIPS® Architecture quick-reference sheet(s) ...........................................................151
C.1: General purpose register numbers and names ...................................................................................... 151
C.2: User-level changes with Release 2 of the MIPS32® Architecture.......................................................... 151
C.2.1: Release 2 of the MIPS32® Architecture - new instructions for user-mode ................................... 151
C.2.2: Release 2 of the MIPS32® Architecture - Hardware registers from user mode............................ 152
C.3: FPU changes in Release 2 of the MIPS32® Architecture....................................................................... 153
Appendix D: Revision History ...........................................................................................................155
Programming the MIPS32® 74K™ Core Family, Revision 02.14 6
List of Figures
Figure 1.1: Overview of The 74K™ Pipeline........................................................................................................... 14
Figure 2.1: Fields in the Config Register................................................................................................................. 22
Figure 2.2: Fields in the Config1 Register............................................................................................................... 23
Figure 2.3: Fields in the Config2 Register............................................................................................................... 23
Figure 2.4: Config3 Register Format....................................................................................................................... 24
Figure 2.5: Config6 Register Format....................................................................................................................... 25
Figure 2.6: Fields in the PRId Register...................................................................................................................26
Figure 3.1: Fields in the encoding of a cache instruction........................................................................................ 34
Figure 3.2: Fields in the TagLo Registers ..............................................................................................................39
Figure 3.3: L23TagLo Register Format................................................................................................................... 40
Figure 3.4: L23DataLo Register Format..................................................................................................................40
Figure 3.5: L23DataHi Register Format..................................................................................................................41
Figure 3.6: Fields in the CacheErr Register ...........................................................................................................41
Figure 3.7: Fields in the ErrCtl Register.................................................................................................................. 43
Figure 3.8: SPRAM (scratchpad RAM) configuration information in TagLo............................................................ 45
Figure 3-9: Fields in the CDMMBase Register........................................................................................................46
Figure 3.10: Fields in the Access Control and Status (ACSR) Register ................................................................. 47
Figure 3.11: Fields in a 74K™ core TLB entry........................................................................................................ 48
Figure 3.12: Fields in the EntryHi and PageMask registers.................................................................................... 49
Figure 3.13: Fields in the EntryLo0-1 registers.......................................................................................................50
Figure 3.14: Fields in the Context register when Config3CTXTC=0 and Config3SM=0.........................................51
Figure 3.15: Fields in the Context register when Config3CTXTC=1 or Config3SM=1............................................ 52
Figure 3.16: Fields in the ContextConfig register................................................................................................... 53
Figure 5.1: Fields in the IntCtl Register................................................................................................................... 69
Figure 5.2: Fields in the EBase Register.................................................................................................................72
Figure 5.3: Fields in the SRSCtl Register ............................................................................................................... 73
Figure 5.4: Fields in the SRSMap Register............................................................................................................. 74
Figure 5.5: Fields in the HWREna Register............................................................................................................75
Figure 6.1: How floating point numbers are stored in a register ............................................................................ 78
Figure 6.2: Fields in the FIR register....................................................................................................................... 80
Figure 6.3: Floating point control/status register and alternate views..................................................................... 81
Figure 6.4: Overview of the FPU pipeline .............................................................................................................. 83
Figure 7.1: Fields in the DSPControl Register........................................................................................................88
Figure 8.1: Fields in the EJTAG CP0 Debug register ........................................................................................... 107
Figure 8.2: Exception cause bits in the debug register.........................................................................................108
Figure 8.3: Debug register - exception-pending flags...........................................................................................108
Figure 8.4: Fields in the memory-mapped DCR (debug control) register ............................................................. 109
Figure 8.5: Fields in the memory-mapped DCR (debug control) register ............................................................. 110
Figure 8.6: IFields in the JTAG-accessible Implementation register..................................................................... 110
Figure 8.7: Fields in the JTAG-accessible EJTAG_CONTROL register...............................................................111
Figure 8.8: Fast Debug Channel........................................................................................................................... 113
Figure 8.9: Fields in the FDC Access Control and Status (FDACSR) Register....................................................113
Figure 8.10: Fields in the FDC Config (FDCFG) Register.....................................................................................114
Figure 8.11: Fields in the FDC Status (FDSTAT) Register...................................................................................114
Figure 8.12: Fields in the FDC Receive (FDRX) Register.....................................................................................115
Figure 8.13: Fields in the FDC Transmit (FDTXn) Registers................................................................................115
Figure 8.14: Fields in the IBS/DBS (EJTAG breakpoint status) registers.............................................................116
7 Programming the MIPS32® 74K™ Core Family, Revision 02.14
Figure 8.15: Fields in the hardware breakpoint control registers (IBCn, DBCn)...................................................117
Figure 8.16: Fields in the TCBCONTROLE register ............................................................................................. 122
Figure 8.17: Fields in the TCBCONFIG register ................................................................................................... 123
Figure 8.18: Fields in the TraceControl Register .................................................................................................. 123
Figure 8.19: Fields in the TraceControl2 Register ................................................................................................ 123
Figure 8.20: Fields in the TraceControl3 register.................................................................................................. 123
Figure 8.21: Fields in the TraceIBPC/TraceDBPC registers................................................................................. 125
Figure 8.22: Fields in the WatchLo0-3 Register.................................................................................................... 128
Figure 8.23: Fields in the WatchHi0-3 Register .................................................................................................... 128
Figure 8.24: Fields in the PerfCtl0-3 Register....................................................................................................... 129
Figure B.1: Fields in the Status Register...............................................................................................................141
Figure B.2: Fields in the Cause Register .............................................................................................................. 143
Figure B.3: Fields in the TagLo-WST Register ..................................................................................................... 148
Figure B.4: Fields in the TagLo-DAT Register......................................................................................................149
Figure B.5: Fields in the DTagHi Register.............................................................................................................149
Figure B.6: Fields in the ITagHi Register..............................................................................................................149
Programming the MIPS32® 74K™ Core Family, Revision 02.14 8
List of Tables
Table 2.1: Roles of Config registers........................................................................................................................ 21
Table 2.2: 74K® core releases and PRId[Revision] fields...................................................................................26
Table 3.1: Basic MIPS32® architecture memory map............................................................................................29
Table 3.2: Fixed memory mapping..........................................................................................................................30
Table 3.3: Cache Code Values...............................................................................................................................34
Table 3.4: Operations on a cache line available with the cache instruction............................................................ 36
Table 3.1: Caches and their CP0 cache tag/data registers.....................................................................................37
Table 3.5: L23DataLo Register Field Description ................................................................................................... 40
Table 3.6: L23DataHi Register Field Description.................................................................................................... 41
Table 3.7: Recommended ContextConfig Values................................................................................................... 53
Table 4.1: Hints for “pref” instructions..................................................................................................................... 57
Table 4.2: Register eager consumer delays.......................................................................................................62
Table 4.3: Producer register delays.................................................................................................................... 63
Table 5.1: All Exception entry points....................................................................................................................... 73
Table 6.1: FPU (co-processor 1) control registers..................................................................................................80
Table 6.2: Long-latency FP instructions.................................................................................................................. 84
Table 7.1: Mask bits for instructions accessing the DSPControl register................................................................93
Table 7.2: DSP instructions in alphabetical order...................................................................................................96
Table 8.1: JTAG instructions for the EJTAG unit..................................................................................................103
Table 8.2: EJTAG debug memory region map (“dseg”)........................................................................................ 105
Table 8.3: Fields in the JTAG-accessible EJTAG_CONTROL register ................................................................ 111
Table 8.4: FDC Register Mapping.........................................................................................................................113
Table 8.5: Mapping TCB Registers in drseg........................................................................................................119
Table 8.6: Fields in the TCBCONTROLA register.................................................................................................122
Table 8.7: Fields in the TCBCONTROLB register.................................................................................................122
Table 8.8: Performance Counter Event Codes in the PerfCtl0-3[Event] field. ...................................................... 131
Table B.1: Register index by name....................................................................................................................... 137
Table B.2: CP0 registers by number..................................................................................................................... 138
Table B.3: CP0 Registers Grouped by Function................................................................................................... 140
Table B.4: Encoding privilege level in Status[UM,SM].......................................................................................... 142
Table B.5: Values found in Cause[ExcCode]........................................................................................................ 144
Table B.6: Fields in the Config7 Register..............................................................................................................146
Table C.1: Conventional names of registers with usage mnemonics ................................................................... 151
Table C.2: Release 2 of the MIPS32® Architecture - new instructions................................................................. 152
9 Programming the MIPS32® 74K™ Core Family, Revision 02.14
Programming the MIPS32® 74K™ Core Family, Revision 02.14 10
Chapter 1
Introduction
The MIPS32® 74Kcore is the first member of a family of synthesizable CPU cores launched in 2007, and offers the highest performance yet from a synthesizable core. It does this by issuing two instructions simultaneously (where possible) and by using a long pipeline to enable relatively high frequency operation. Conventional high-throughput designs of this type are slowed by dependencies between consecutive instructions, so 74K family cores use out-of- order execution to work around short-term dependencies and keep the pipeline full.
74K Cores offer better performance in the same process compared to MIPS Technologies’mid-range24K® family, at the cost of a larger and more complex core.
Intended Audience
This document is for programmers who are already familiar with the MIPS® architecture and who can read MIPS assembler language (if that’s not you yet, you’d probably benefit from reading a generic MIPS book - see Appendix
A, “References” on page 135).
More precisely, you should definitely be reading this manual if you have an OS, compiler, or low-level application which already runs on some earlier MIPS CPU, and you want to adapt it to the 74K core. So this document concen­trates on where a MIPS 74K family core behaves differently from its predecessors. That’s either:
Behavior which is not completely specified by Release 2 of the MIPS32® architecture: these either concern priv-
ileged operation, or are timing-related.
Behavior which was standardized only in the recent Release 2 of the MIPS32 specification (and not in previous
versions). All Release 2 features are formally documented in [MIPS32]1, and [MIPS32V1] describes the main
changes added by Release 2.
But the summary is too brief to program from, and the details are widely spread; so you’ll find a reminder of the
changes here. Changes to user-privilege instructions are found in Appendix C, “MIPS® Architecture quick-
reference sheet(s)” on page 151, and changes to kernel-privilege (OS) instructions and facilities are detailed in
Chapter 5, “Kernel-mode (OS) programming and Release 2 of the MIPS32® Architecture” on page 67.
Details of timing, relevant to engineers optimizing code (and that very small audience of compiler writers), found
in Section 4.5 “Tuning software for the 74K‘ family pipeline”.
This manual is distinct from the [SUM] reference manual: that is a CPU reference organized from a hardware view­point. If you need to write processor subsystem diagnostics, this manual will not be enough! If you want a very care­ful corner-cases-included delineation of exactly what an instruction does, you’ll need [MIPS32]... and so on.
For readability, some MIPS32 material is repeated here, particularly where a reference would involve a large excur­sion for the reader for a small saving for the author. Appendices mention every user-level-programming difference any active MIPS software engineer is likely to notice when programming the 74K core.
1. References (in square brackets) are listed in Appendix A, “References” on page 135.
Programming the MIPS32® 74K™ Core Family, Revision 02.14 11
All 74K cores are able to run programs encoded with the MIPS16e™ instruction set extension - which makes the binary significantly smaller, with some trade-off in performance. MIPS16e code is rarely seen - it’s almost exclu­sively produced by compilers, and in a debugger view is pretty much a subset of the regular MIPS32 instruction set ­so you’ll find no further mention of it in this manual; please refer to [MIPS16e].
The document is arranged functionally: very approximately, the features are described in the order they’d come into play in a system as it bootstraps itself and prepares for business. But a lot of the CPU-specific data is presented in co­processor zero (“CP0”) registers, so you’ll find a cross-referenced list of 74K core CP0 registers in Appendix B, “CP0
register summary and reference” on page 137.
1.1 Chapters of this manual
Chapter 2, “Initialization and identity” on page 21: what happens from power-up? boot ROM material, but a
good place to cover how you recognize hardware options and configure software-controlled ones.
Chapter 3, “Memory map, caching, reads, writes and translation” on page 29: everything about memory
accesses.
Chapter 4, “Programming the 74K™ core in user mode” on page 55: features relevant to user-level program-
ming; instruction timing and tuning, hardware registers, prefetching.
1.1 Chapters of this manual
Chapter 5, “Kernel-mode (OS) programming and Release 2 of the MIPS32® Architecture” on page 67: 74K-
core-specific information about privileged mode programming.
Chapter 6, “Floating point unit” on page 77: the 74K core’s floating point unit, available on models called
74Kf™.
Chapter 7, “The MIPS32® DSP ASE” on page 87: A brief summary of the MIPS DSP ASE (revision 2), avail-
able on members of the 74K core family.
Chapter 8, “74K™ core features for debug and profiling” on page 102: the debug unit, performance counters and
watchpoints.
Appendix A, “References” on page 135: more reading to broaden your knowledge.
Appendix B, “CP0 register summary and reference” on page 137: all the registers, and references back into the
main text.
Appendix C, “MIPS® Architecture quick-reference sheet(s)” on page 151: a few reference sheets, and some
notes on what was new in MIPS32 and its second release.
1.2 Conventions
Instruction mnemonics are in bold monospace; register names in small monospace. Register fields are shown after the register name in square brackets, so the interrupt enable bit in the status register appears as Status[IE].
CP0 register numbers are denoted by n.s, where “n” is the register number (between 0-31) and “s” is the “select” field (0-7). If the select field is omitted, it’s zero. A select field of “x” denotes all eight potential select numbers.
In this book most registers are described in context, spread through various sections, so there are cross-referenced tables to help you find specific registers. To find a register by name, look in Table B.1, then look up the CP0 number
Programming the MIPS32® 74K™ Core Family, Revision 02.14 12
Introduction
in Table B.2 and you will find a link to the register description (a hotlink if you’re reading on-screen, and a reference including page number if you’re reading paper).
Register diagrams in this book are found in the list of figures. Register fields may show a background color, coded to distinguish different types of fields:
read-write read-only reserved,
Numeric values below the field diagram show the post-reset value for a field which is reset to a known value.
1.3 74K™ core features
All 74K family cores conform to Release 2 of the MIPS32 architecture. You may have the following options:
I- and D-Caches: 4-way set associative; I-cache may be 0 Kbytes, 16Kbytes, 32Kbytes or 64Kbytes in size. D-
cache may be 0 Kbytes, 16Kbytes, 32Kbytes or 64Kbytes in size. 32Kbyte caches are likely to be the most pop-
ular; 64Kbyte caches will involve some cost in frequency in most processes. The D-cache may even be entirely
omitted, when the system is fitted with high-speed memory on the cache interface (scratchpad RAM or SPRAM:
see Section 3.6 “Scratchpad memory/SPRAM”.)
The caches are virtually indexed but physically tagged (the D-cache also keeps a virtual tag which is used to save
a little time, but the final hit/miss decision is always checked with the physical tag). Optionally (but usually) the
32K and 64K
which explains some software-visible effects. The option is selected when the “cache wrapper” was defined for
the 74K core in your design and shows up as the Config7[AR] bit. L2 (secondary) cache: you can configure your
74K core with MIPS Technologies’ L2 cache between 128Kbyte and 1Mbyte in size. Full details are in “MIPS®
PDtrace™ Interface and Trace Control Block Specification”, MIPS Technologies document MD00439. Current
revision is 4.30: you need revision 4 or greater to get multithreading trace information. [L2CACHE], but pro-
gramming information is in Section 3.4 “Caches” of this manual.
2
D-cache configurations can be made free of cache aliases — see Section 3.4.9, "Cache aliases",
always zero
unused software-only write has
unusual effect.
Fast multiplier: 1-per-clock repeat rate for 32×32 multiply and multiply/accumulate.
DSP ASE: this instruction set extension adds a lot of new computational instructions with a fixed-point math unit
crafted to speed up popular signal-processing algorithms, which form a large part of the computational load for
voice and imaging applications. Some of these functions do two math operations at once on two 16-bit values
held in one 32-bit register. 74K family cores support Revision 2 of the DSP ASE.
There’s a guide to the DSP ASE in Chapter 7, “The MIPS32® DSP ASE” on page 87 and the full manual is
[MIPSDSP].
Floating point unit (FPU): if fitted, this is a 64-bit unit (with 64-bit load/store operations), which most often runs
at half or two-thirds the clock rate of the integer unit (you can build the system to run the FPU at the same clock
rate as the integer core, but it will then limit the speed of the whole CPU).
The “CorExtend®” instruction set extension: is available on all 74K CPUs. [CorExtend] defines a hardware
interface which makes it relatively straightforward to add logic to implement new computational (register-to-reg-
ister) instructions in your CPU, using predefined instruction encodings. It’s matched by a set of software tools
2. Note that a 4-way set associative cache of 16Kbyte or less (assuming a 4Kbyte minimum page size) can’t suffer
from aliases.
13 Programming the MIPS32® 74K™ Core Family, Revision 02.14
1.4 A brief guide to the 74K core implementation
which allow users to create assembly language mnemonics and C macros for the new instructions. But there’s
very little about the CorExtend ASE in this manual.
1.4 A brief guide to the 74K core implementation
The 74K family is based around a long (14-19 stage) pipeline with dual issue, and executes instructions out-of-order to maintain progress around short-term dependencies. The longer pipeline allows for a higher frequency than can be reached by 24K® family cores (in a comparable process), and the more sophisticated instruction scheduling means that the 74K core also gets more work done per cycle.
Long-pipeline CPUs can trip up on dependencies (they need a result from a previous instruction), on branches (they don’t know where to fetch the next instructions until the branch instruction is substantially complete), and on loads (even on cache hits, the data cannot be available for some number of instructions). Earlier MIPS Technologies cores had no real trouble with dependencies (dependent instructions, in almost all cases, can run in consecutive cycles). That’s not so in the longer-pipeline 74K core, and its key trick to get around dependencies is out-of-order execution. But the techniques used to deal with branches and loads still include branch prediction, non-blocking loads and late writes — all familiar from MIPS Technologies’ 24K and 34K® core families.
Figure 1.1 Overview of The 74K™ Pipeline
74K pipeline stages
I-cache
x4
IT
IFU
speculative fetch
BHT
ID IS
x4
IB
IDU
issue
DD
DR DS
DM
reg file
completion buffers
ALU AGEN
x2
out-of-order execution
AF
EM EA
rename
map
D-cache
AC
EC
AB
ES
EB
AM
cache miss data updates
GRU
x2
in-order
completion
memory pipeline
loads. stores, etc
external
read/write
WB
GC
read data
1.4.1 Notes on pipeline overview diagram (Figure 1.1):
Although this diagram is considerably simpler (and further abstracted from reality) than those in [SUM], there is still a lot to digest. Rectangles and circles with a thick outline are major functional units — the rectangles are the active
Programming the MIPS32® 74K™ Core Family, Revision 02.14 14
Introduction
units and each has a phrase (in italics) summarizing what it does. The three-letter acronyms match those found in the detailed descriptions, and the pipeline stage names used in the detailed descriptions are across the top. Tosimplify the picture the integer multiply unit and the (optional) floating point unit have been omitted — once you figure out what’s going on, they shouldn’t be too hard to put back. So:
The 74K core’s instruction fetch unit (“IFU”) is semi-autonomous. It’s 128 bits wide, and handles four instruc-
Issue: the IDU (“instruction decode/dispatch unit”) keeps its own queue of instructions and tries to find two of
tions at a bite.
The IFU works a bit like a dog being taken for a walk. It rushes on ahead as long as the lead will stretch (the IFU,
processing instructions four at a time, can rapidly get ahead). Even though you’re in charge, your dog likes to go
first - and so it is with the IFU. Like a dog, the IFU guesses where you want to go, strongly influenced by the way
you usually go. If you make an unexpected turn there is a brief hiatus while the dog comes back and gets up front
again.
The IFU has a queue to keep instructions in when it’s running ahead of the rest of the CPU. This kind of design is
called a “decoupled” IFU.
them which can be issued in parallel. The instruction set is strictly divided into AGEN instructions (loads, stores,
prefetch, cacheops; conditional moves, branches and jumps) and ALU (everything else). If all else is good, the
IDU can issue one instruction of each type in every cycle. Instructions are marked with their place in the program
sequence, but are not necessarily issued in order. An instruction may leapfrog ahead of program order in the
IDU’s queue, if all the data it needs is ready (or at least will be ready by the time it’s needed).
Instructions which execute ahead of time can’t write data to real registers — that would disrupt the operation of
their program predecessors, which might execute later. It may turn out that such an instruction shouldn’t haverun
at all if there was a mispredicted branch, or an earlier-in-program-order instruction took an exception. Instead,
each instruction is assigned a completion buffer (CB) entry to receive its result. The CB entry also keeps informa-
tion about the instruction and where it came from. An instruction which is dependent on this one for a source reg-
ister value but runs soon afterward can get its data from the CB. CB-resident values can be found through the
rename map; that map is indexed by register number and points to the CB reserved by the instruction which will
write or has written a register value.
out-of-orderexecution:theeffectoftheaboveisthatinstructions are issued in “dataflow” order,asdeterminedby
their dependencies on register values produced by other instructions. Up to 32 instructions can be somewhere
between available for issue and completed in the 74K core — those instructions are often said to be in flight. The
32 possible instructions correspond to 32 CB entries — 14 for AGEN instructions, 18 for ALU instructions.
Inside the “execution” box the AGEN and ALU instructions proceed strictly through two internally-pipelined
units of the same names. The two pipelines are in lockstep, and are kept that way. This sounds rigid, but is help-
ful. When the IDU issues an instruction, it does not have to know that an instruction’s data is ready “right now”:
it’s enough that the instruction producing that data is far enough along either execution pipeline. When no other
progress can be made its probably best to think of the IDU issuing a “no-op” or “bubble” into either or both pipe-
lines.
Most of the time the execution pipelines just keep running — the IDU tries to detect any reason why an instruc-
tion cannot run through either the AGEN or ALU pipe.When dependent instructions run close together, the data
doesn’t have time to go into a register or CB entry and be read out again. Instead it can flow down a dedicated
bypass connection between two particular pipestages — a routine trick used in pipelined logic. In the 74K core
there are bypasses interconnecting the AGEN and ALU pipelines, as well as within each pipeline. But whereas
pipeline multiplexing in a conventional design is controlled by comparing register numbers, in 74K cores we
compare completion buffer entry IDs.
15 Programming the MIPS32® 74K™ Core Family, Revision 02.14
1.4 A brief guide to the 74K core implementation
There are a few simple instructions where the ALU produces its results in one clock (they’re listed in Table 4.3),
but most ALU instructions require two clocks: so, in the 74K core, dependent ALU instructions cannot usually
be run back-to-back. This would have a catastrophic effect on the performance of an in-order CPU, because
many instructions are dependent on their immediate predecessor. But an out-of-order CPU will run just fine,
because there are also a reasonable number of cases where an instruction is not dependent on its immediate pre-
decessor, so the pipeline can find something to run. The CPU will slow down if fed with a sequence of relatively
long-latency instructions each of which is dependent on its predecessor, of course. For example, in the AGEN
pipeline it takes four cycles to turn a load address into load data (assuming a cache hit). So chasing a chain of
pointers through memory will take at least four cycles per pointer.
Optimistic issue: any instruction which is issued may yet not run to completion (there might be an exception on
an earlier-in-program instruction, for example). But some instructions are issued even though they are directly
dependent on something we’renot sure about — they’reissued optimistically. The most common example is that
instructions dependent on load data are issued as if we were confident the load will hit in the L1 cache.
Sometimes it turns out we were wrong. Notably, sometimes the load we’re dependent on suffers a cache miss. In
this case the hardware does the simplest thing: rather than attempt to single out the now unviable instruction, we
take a redirect on the load-value-consuming instruction we issued optimistically — that is, we discard all work
on that instruction and its successors, and ask the front end of the pipeline to start again from scratch, re-fetching
the instruction from the I-cache.
In-order completion: at the end of the execution unit we take the oldest in-flight instruction (with luck, the sec-
ond-oldest too) and, if it’s results are ready, we graduate3one or two instructions (“GRU” stands for “graduation
unit”). Before we do that, we make a last minute check for exceptions: if one of the proposed graduates has
encountered a condition which should cause an exception it will be carrying that information with it, we discard
that instruction and do a redirect to the start of the appropriate exception handler. On successful graduation the
instruction’s results are copied from its CB entry back to a real CPU register, and it’s finished.
Because instruction effects aren’t “publicly” visible until graduation, our out-of-order CPU appears to the pro-
grammer to be running sequentially just like any other MIPS32-compliant CPU.
More details about out-of-order execution
That’s the basic flow. But the dual-issue, out-of-order design has some subtle points which can affect how programs run:
Mispredicted branches and redirects: because of the long pipeline, the 74K core relies very heavily on good
branch prediction. When the IFU guesses wrong about a conditional branch, or can’t compute the target for a
jump-register instruction, that’s detected somewhere down the AGEN pipeline (usually the “EC” stage). By then
we’ll have done a minimum of 12 cycles of work on the wrong path.
Whenever a branch is resolved the prediction result is sent back to the IFU to maintain its history table. For most
branches, the prediction result is sent back at the same time as we resolve the branch, which means that a few
branches which don’t graduate can affect the branch history. That’s OK, it was only a heuristic.
Exceptions: can’t be resolved until we’re committed to running an instruction and have completed all its prede-
cessors. So they’re resolved only at graduation. That posts an exception handler address down to the front of a
pipe, clearing out all prefetched and speculatively-executed instructions in the process. There will be at least 19
3. Curiously,the alternative word to “graduation” (for an instruction being committed in an out-of-order design) is“retirement”: a rather different stage of one’s career. I guess that from a software point of view we’reglad that the instruction is now grown up and real, while the hardware is now ready to wave goodbye to it.
Programming the MIPS32® 74K™ Core Family, Revision 02.14 16
Introduction
Loads and Stores: the L1 cache lookup happens inside the out-of-order execution pipeline. But only loads which
cycles between the point where the exception is processed in the graduation unit and the time when the first instruction of the exception handler graduates.
hit in the L1 cache are complete when they graduate. Other loads and stores graduate and then start actions in the memory pipeline. It’s probably fairly obvious how a store can be “stored” — so long as the hardware keeps a note of the address and data of the store, the cache/memory update can be done later. On the 74K core, even a write into the L1 cache is deferred until after graduation. While the write is pending, the cache hardware has to keep a note in case some later instruction wants to load the same value before we’ve completed the write; but that’s familiar technology.
It’s less obvious that we can allow load instructions which L1-miss to graduate. But on the 74K core, loads are non-blocking — a load executes, and results in data being loaded into a GP register at some time in the future. Any later instruction which reads the register value must wait until the load data has arrived. So load instructions are allowed to graduate regardless of how far away their data is. Once the instruction graduates its CB entry must be given back, so data arriving for a graduated load is sent directly to the register file.
There’s another key reason why we did this: with only L1 accesses done out-of-order, loads and stores only become visible outside the CPU after they graduate, so there’s no worry about other parts of the system seeing unexpected effects from speculative instructions.
An instruction which depends on a load which misses will (unless it was a long, long way behind in instruction sequence) have to wait. Most often the consuming instruction will become a candidate for issue before we know whether the load hit in the L1 cache. In this case the dependent instruction is issued: we’re optimists, hoping for a hit. If a consuming instruction reaches graduation and finds the load missed, we must do a “redirect”, re-fetch­ing the consuming instruction and everything later in program order). Next time the consuming instruction is an issue candidate, we’ll know the load has missed, and the consumer will not get issued until the load data has arrived. The redirect for the consuming instruction is quite expensive (19 or more cycles), but in most cases that overhead will be hidden in the time taken to return data for the cache miss.
Stores are less complicated. But since even the cache must not be updated until the store instruction graduates, the memory pipeline is used for writing the L1 cache too: even store L1-hits result in action in the memory pipe­line.
1.4.2 Branches and branch delays
The MIPS architecture defines that the instruction following a branch (the “branch delay slot” instruction) is always executed4. That means that the CPU has one instruction it knows will be executed while it’s figuring out where a
branch is going. But with the 74K core’s long pipeline we don’t finally know whether a conditional branch should be taken, and won’t have computed the target address for a jump-register,until about 8 stages down the pipeline. It’s bet­ter to guess (and pay the price when we’re wrong) than to wait to be certain. Several different tricks are used:
The decoupled IFU (the electronic dog) runs ahead of the rest of the CPU by fetching four instructions per clock.
Branch instructions are identified very early (in fact, they’re marked when instructions are fetched into the I–
cache). MIPS branch and jump instructions (at least those not dependent on register values) are easy to decode, and the IFU decodes them locally to calculate the target address.
4. That’s not quite accurate: there are special forms of conditional branches called “branch likely” which are defined to execute the branch delay slot instruction only when the branch is taken. Note that the “likely” part of the name has nothing to do with branch prediction; the 74K core’s branch prediction system treats the “likelies” just like any other branches. The dependency between a branch condition and the branch delay slot instruction is annoying to keep track of in an out-of-order machine, and MIPS would prefer you not to use branch-likely instructions.
17 Programming the MIPS32® 74K™ Core Family, Revision 02.14
1.4 A brief guide to the 74K core implementation
The IFU’s branch predictor guesses whether conditional branches will be taken or not - it’s not magic, it uses a BHT (a “Branch History Table”) of what happened to branches in the past, indexed by the low bits of the loca- tion of the branch instruction. This particular hardware is an example of Combined branch prediction (majority voting between three different algorithms, one of which is gshare; if you want to know, there’s a good wikipedia article whose topic name is “Branch Predictor”). The branch predictor is taking a good guess. It can seem sur­prising that the predictor makes no attempt to discover whether the history stored in a BHT slot is really that of the current branch, or another one which happened to share the same low address bits; we’re going to be wrong sometimes. It guesses correctly most of the time.
In this way the IFU can predict the next-instruction address and continue to run ahead.
When the IFU guesses wrong, it doesn’t know (the dog just rushes ahead until its owner reaches the fork). The
branch mispredict will be noticed once the branch instruction has been issued and carried through to the AGEN “EC” stage, and is executed in its full context (“resolved”). On detecting a mispredict, the CPU must discard the instructions based on the bad guess (which will not have graduated yet, so will not have changed any vital
machine state) and start fetching instructions from the correct target5. The exact penalty paid by a program which suffers a mispredict depends on how busy the execution unit is, and how early it resolves the branch; the mini­mum penalty is 12 cycles.
Even when we guess right, the branch target calculation in the IFU takes a little while to operate. A rapid
sequence of correctly-predicted branches can empty the queues, causing a program to run slower.
Jump-register instruction targets are unpredictable: the IFU has no knowledge of register data and can’t in gen-
eral anticipate it. But jump-register instructions are relatively rare, except for subroutine returns. In the MIPS ISA you return from subroutines using a jump-register instruction, jr $31 (register 31 is, by a strong conven­tion, used to hold the return address). So on every call instruction, the IFU pushes the return address onto a small
stack; and on every jr $31 it pops the value of the stack and uses that as its guess for the branch target6. We have no way of knowing the target of a jr instruction which uses a register other than $31. When we find
one of those, instruction fetch stops until the correct address is computed up in the AGEN pipeline, 12 or more clocks later.
1.4.3 Loads and load-to-use delays
Even short-pipeline MIPS CPUs can’t deliver load data to the immediately following instruction without a delay, even on a cache hit. Simple MIPS pipelines typically deliver the data one clock later: a one clock “load-to-use delay”. Compilers and programmers try to put some useful and non-dependent operation between the load and its first use.
The 74K core’s long pipeline means that a full D-cache hit takes four clocks to return the data, not two: that would be a three-clock “load-to-use delay”. A pair of loads dependent on each other (one fetches the other’s base address) must be issued at least four cycles apart (that’s optimistic, hoping-for-a-hit timing).
But the AGEN and ALU pipelines are “skewed”, with ALU results delivered a cycle later than AGEN results. That means that when an ALU operation is dependent on a load, it can be issued only three cycles after the load. There’s a price to pay: a load/store whose base address is computed by a preceding ALU instruction must be issued a clock
5. In “branch-likely” variants of conditional branch instructions a mispredict means we also did the wrong thing with the instruction in the branch delay slot. To fix that up, we need to refetch the branch itself, so the penalty is at least one cycle higher.
6. The return-stack guess will be wrong for subroutines containing nested calls deeper than the size of the return stack; but sub­routines high up the call tree are much more rarely executed, so this isn’t so bad.
Programming the MIPS32® 74K™ Core Family, Revision 02.14 18
Introduction
later than an ALU instruction with the same dependency — that’s usually a three cycle delay, because most ALU operations already take an extra clock to produce their result.
It’s like the skewed pipeline which experts in MIPS Technologies’ 24K® family might remember, and has the same motivation: ALU operations dependent on recent loads are more common than loads dependent on recent ALU oper­ations.
1.4.4 Queues, Resource limits and Consequences
Queues which can fill up include:
Cache refills in flight: Is dependent on the size of the “FSB” queue - this and other queues are described in more
detail under Section 3.3, "Reads, writes and synchronization". The CPU does not wait for a cache refill process — at least not until it needs data from the cache miss. But in practice most load data is used almost at once, so the CPU will stop very soon after a miss. As a result, you’re unlikely to ever have four refills in flight unless you are using prefetch or otherwise deliberately optimizing loops. If a series of aggressive prefetches miss often enough, the fourth outstanding load-miss will use the last FSB entry, preventing further loads from graduating and even­tually blocking up the whole CPU until the load data returns. It’s likely to be good practice for code making con­scious use of prefetches to ration itself to a number of operations slightly less than the size of the FSB.
Non-blocking loads to registers (nine): there are nine entries in the “LDQ”, each of which remembers one out-
standing load, and which register the data is destined to return to. Compiled code is unlikely to reach this limit. If you write carefully optimized code where you try to fill load-use delays (perhaps for data you think will not hit in the D-cache) you may hit this problem.
Lines evicted from the cache awaiting writeback (4+): writes are collected in the “WBB” queue. The 74K core’s
ability to write data will in almost all circumstances exceed the bandwidth available to memory; so a long enough burst of uncached or write-through writes will eventually slow to memory speed. Otherwise, you’re unlikely to suffer from this.
Queues in the coprocessor interface: the 74K core hides its out-of-order character from any coprocessors, so
coprocessor hardware need be no more complicated than it is for MIPS Technologies’24Kcore. The coprocessor hardware sees its instructions strictly in order. Each coprocessor instruction also makes its own way through the integer execution unit. Between the execution unit and coprocessor there are some queues which can fill up:
IOIQ (8 entries): instructions being issued — strictly in program order — to a coprocessor.
CBIDQ (8 entries): data being returned from a coprocessor by an instruction which writes a GP register. But prior to graduation the data goes back to a completion buffer (hence the queue acronym).
CLDQ (8 entries): track data being loaded to coprocessor registers (the job done for the GPRs by the LDQ above). CLDQ data isn’t necessarily provided in instruction sequence: in particular MIPS Technologies floating-point unit accepts FP load data as and when it arrives, making FP loads non-blocking.
The dispatch process stalls (flooding the ALU and AGEN pipes with bubbles) when there is no space in any of these queues.
19 Programming the MIPS32® 74K™ Core Family, Revision 02.14
1.4 A brief guide to the 74K core implementation
Programming the MIPS32® 74K™ Core Family, Revision 02.14 20
Chapter 2
Initialization and identity
What happens when the CPU is first powered up? These functions are perhaps more often associated with a ROM monitor than an OS.
2.1 Probing your CPU - Config CP0 registers
The four registers Config and Config1-3 are 32-bit CP0 registers which contain information about the CPU’s capa­bilities. Config1-3 are strictly read-only. The few writable fields in Config — notably Config[K0] — are there for historic compatibility, and are typically written once soon after bootstrap and never changed again.
The 74K core also defines Config7 for some implementation-specific settings (which most programmers will never use).
Broadly speaking the registers have these roles:
Table 2.1 Roles of Config registers
Config A mix of historical and CPU-dependent information, described in Figure 2.1 below. Some
fields are writable.
Config1 Read-only, strictly totheMIPS32architecture. Config1 shows the primary cache configuration Config2
Config3 Read-only, strictly to Release 2 of the [MIPS32] architecture.
Config6 Provides information about the presence of optional extensions to the base MIPS32 architec-
Config7 74K-core-specific, with both read-only and writable fields. It’s a strong convention that the
and basic CPU capabilities, while Config2showsinformationaboutL2 and L3 caches, if fitted (the L2 and the L3 cache is unavailable in 74K family cores). Shown in Figure 2.2 and Figure
2.3 below.
More CPU capability information.
ture in addition to those specified in Config2 and Config3.
writable fields should default to “expected” behavior, so beginners may simply leave these fields alone. The fields are described later, in Section B.2.1 “The Config7 register”.
While initializing your CPU, you might also want to look at the EBase register, which can be used to relocate your exception entry points: see Figure 5.2 and the text round it.
Programming the MIPS32® 74K™ Core Family, Revision 02.14 21
2.1 Probing your CPU - Config CP0 registers
2.1.1 The Config register
Figure 2.1 Fields in the Config Register
31 30 28 27 25 24 23 22 21 20 19 18 17 16 15 14 13 12 10 9 7 6 4 3 2 0
M K23 KU ISP DSP UDI SB 0 WC MM 0 BM BE AT AR MT 0 VI K0
12 2 01 0 1 02
In Figure 2.1:
M: reads 1 if Config1 is available (it always is).
K23, KU, K0: set the cacheability attributes of chunks of the memory map by writing these fields. All share a 3-bit
encoding with the cacheability field found in TLB entries, which is described in Table 3.3 in Section
3.4.2 “Cacheability options”.
Config[K0] sets the cacheability of kseg0, but it would be very unusual to make that anything other than cacheable
(on different, cache-coherent CPUs, it may want to be set to cacheable-coherent). The power-on value of this standard field is not mandated by the [MIPS32] architecture; but the 74K core follows the recommendation to set it to "2", making "kseg0" uncached. That can be surprising; early system initialization software typically re-writes it to "3" in order that kseg0 will be cached, as expected.
If your
74K core-based system uses fixed mapping instead of having a TLB, Config[K23] is for program addresses
0xC000.0000-0xFFFF.FFFF (the “kseg2” and “kseg3” areas), while Config[KU] is for program addresses 0x0000.0000-0x7FFF.FFFF (the “kuseg” area). If you have a TLB, these regions are mapped and these fields are unused (write only zeroes to them).
ISP, DSP: read 1 if I-side and/or D-side scratchpad (SPRAM) is fitted, see Section 3.6, "Scratchpad memory/
SPRAM".
(Don’t confuse this with the MIPS DSP ASE, whose presence is indicated by Config3[DDSP].)
UDI: reads 1 if your core implements user-defined "CorExtend" instructions. “CorExtend” is available on cores whose
name ends in "Pro".
SB: read-only "SimpleBE" bus mode indicator. If set, means that this core will only do simple partial-word transfers on
its OCP interface; that is, the only partial-word transfers will be byte, aligned half-word and aligned word.
If zero, it may generate partial-word transfers with an arbitrary set of bytes enabled (which some memory controllers may not like).
WC: Warning: this is a diagnostic/test field, not intended for customer use, and may vanish without notice from a
future version of the core. Set this 1 to make the Config1[IS] and Config1[DS] fields writable, which allows you to reduce the number of avail-
able L1 I- and D-cache ``sets per way'', and shrink the usable cache size. You'd never want to do this in a real system, but it is conceivable it might be useful for debug or performance analysis. If you have an L2 cache configured, then this makes Config2[SS] writable in the same way.
MM: writable: set 1 if you want writes resulting from separate store instructions in write-through mode merged into a
single (possibly burst) transaction at the interface. This has no affect on cache writebacks (which are always whole blocks together) or uncached writes (which are never merged).
Programming the MIPS32® 74K™ Core Family, Revision 02.14 22
Initialization and identity
BM: read-only - tells you whether your bus uses sequential or sub-block burst order; set by hardware to match your sys-
tem controller.
BE: reads 1 for big-endian, 0 for little-endian. AT: MIPS32 or MIPS64 compliance On 74K family cores it will read “0”, but the possible values are:
0 MIPS32 1 MIPS64 instruction set but MIPS32 address map 2 MIPS64 instruction set with full address map
AR: Architecture revision level. On 74K family cores it will read “1”, denoting release 2 of the MIPS32 specification.
0 MIPS32/MIPS64 Release 1 1 MIPS32/MIPS64 Release 2
MT: MMU type (all MIPS Technologies cores may be configured as type 1 or 3):
0 None 1 MIPS32/64 compliant TLB 2 “BAT” type 3 MIPS-standard fixed mapping
VI: 1 if the L1 I-cache is virtual (both indexed and tagged using virtual address). No contemporary MIPS Technologies
core has a virtual I-cache.
K0: as described in the notes above on Config[K23] etc, this field determines the cacheing behaviour of the fixed kseg0
memory region .
2.1.2 The Config1-2 registers
These two read-only registers tell you the size of the TLB, and the size and organization of L1, L2 and L3 caches (a zero “line size” is used to indicate a cache which isn’t there). They’re best described together.
Config1 has some fields which tell you about the presence of some of the older extensions to the base MIPS32 archi-
tecture are implemented on this core. These bits ran out, and other extensions are noted in Config3.
Figure 2.2 Fields in the Config1 Register
31 30 25 24 22 21 19 18 16 15 13 12 10 9 7 6 5 4 3 2 1 0
M MMUSize IS IL IA DS DL DA C2 MD PC WR CA EP FP
1434301111
Figure 2.3 Fields in the Config2 Register
31 30 28 27 24 23 20 19 16 15 13 12 11 8 7 4 3 0
M TU TS TL TA SU L2B SS SL SA
10 0 0 0 0 0
Config1[M]: continuation bit, 1 if Config2 is implemented.
23 Programming the MIPS32® 74K™ Core Family, Revision 02.14
2.1 Probing your CPU - Config CP0 registers
Config1[MMUSize]: the size of the TLB array (the array has MMUSize+1 entries). Config1[IS,IL,IA,DS,DL,DA]: for each cache this reports
S
Number of sets per way. Calculate as: 64 × 2
L
Line size. Zero means no cache at all, otherwise calculate as: 2 × 2
A Associativity/number of ways - calculate as A + 1
S
L
So if (IS, IL, IA) is (2,4,3) you have 256 sets/way, 32 bytes per line and 4-way set associative: that’s a 32Kbyte cache.
Config1[C2,FP]: 1 if coprocessor 2 or or an FPU (coprocessor 1) fitted, respectively. A coprocessor 2 would be a cus-
tomer-designed coprocessor.
Config1[MD]: 1 if MDMX ASE is implemented in the floating point unit (very unlikely for the 74K core). Config1[PC]: there is at least one performance counter implemented, see Section 8.4, "Performance counters". Config1[WR]: reads 1 because the 74K core always has watchpoint registers, see Section 8.3, "CP0 Watchpoints". Config1[CA]: reads 1 because the MIPS16e compressed-code instruction set is available (as it generally is on MIPS
Technologies cores).
Config1[EP]: reads 1 because an EJTAG debug unit is always provided, see Section 8.1, "EJTAG on-chip debug unit". Config2[M]: continuation bit, 1 if Config3 is implemented. Config2[TU]: implementation-specific bits related to tertiary cache, if fitted. Can be writable. Config2[TS,TL,TA]: tertiary cache size and shape - encoded just like Config1[IS,IL,IA] which see above. Config2[SU]: implementation-specific bits for secondary cache, if fitted. Can be writable. Config2[L2B]: Set to disable L2 cache (“bypass mode”). Setting this bit also forces Config2[SL] to 0 — most OS code
will conclude that there isn't an L2 cache on the system, which can be useful.
Writing this bit controls a signal out to the L2 cache hardware. However, reading it does not read back what you just wrote: it reflects the value of a signal sent back from the L2 cache. With MIPS Technologies' L2 cache logic, that feedback signal will reflect the value you just wrote, with some implementation-dependent delay (it's unlikely to be 100 cycles, but it could easily be more than 10). For more details refer to “MIPS® PDtrace™ Interface and Trace
Control Block Specification”, MIPS Technologies document MD00439. Current revision is 4.30: you need revision 4 or greater to get multithreading trace information. [L2CACHE].
Config2[SS,SL,SA]: secondary cache size and shape, encoded like Config1[IS,IL,IA] above.
2.1.3 The Config3 register
Config3 provides information about the presence of optional extensions to the base MIPS32 architecture. A few of
them were in Config2, but that ran out of bits.
Figure 2.4 Config3 Register Format
31 30 29 28 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
M 0 CMGCR ULRI 0 DSP2P DSPP
CTXTC
0 VEIC VInt SP
CDMM
MT SM TL
Programming the MIPS32® 74K™ Core Family, Revision 02.14 24
Initialization and identity
31 30 29 28 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
011100
Fields shown in Figure 2.4 include:
Config3[M]: continuation bit which is zero, because there is no Config4. Config3[CMCGR]: reads 1 if Global Control Register in the Coherence Manager are implemented and the
CMGCRBase register is present. Reads 0 otherwise Config3[ULRI]: reads 1 if the core implements the UserLocal register, typically used by software threads packages. DSP2P, DSPP: DSPP reads 1 if the MIPS DSP extension is implemented — as described in Chapter 7, “The
MIPS32® DSP ASE” on page 87. If so, DSP2P reads 1 if your CPU conforms to revision 2 of the DSP ASE — as
the 74K core does.
CTXTC: reads 1 when the ContextConfig register is implemented. The width of the BadVPN2 field in the Context
register depends on the contents of this register.
VEIC: read-only bit from the core input signal SI_EICPresent which should be set in the SoC to alert software to the
availability of an EIC-compatible interrupt controller, see Section 5.2, "MIPS32® Architecture Release 2 - enhanced
interrupt system(s)".
VInt: reads 1 when the 74K core can handle vectored interrupts. SP: reads 0 when the 74K core does not support sub-4Kbyte page sizes. CDMM: reads 0 when the 74K core does not support the Common Device Memory Map. SM: reads 0, the 74K core does not handle instructions from the "SmartMIPS" ASE. TL: reads 1 if your core is configured to do instruction trace.
2.1.4 The Config6 register
Config3 provides information about the presence of optional extensions to the base MIPS32 architecture in addition to
those specified in Config2 and Config3.
Figure 2.5 Config6 Register Format
31 15 14 13 12 10 9 8 7 2 1 0
0 SPCD SYND
SPCD disables performance counter clock shutdown. The primary use of this bit is to keep performance counters
alive when the core is in sleep mode.
SYND disables Synonym tag update. By default, all synonym load misses will opportunistically update the tag so
that subsequent loads will hit at lookup.
IFUPerfCtl NMRUP
NMRUD 0
JRCP JRCD
IFUPerfCtl encodes IFU events that provide debug and performance information for the IFU pipeline.
NMRUP indicates that a Not Most Recently Used JTLB replacement scheme is present.
25 Programming the MIPS32® 74K™ Core Family, Revision 02.14
2.2 PRId register — identifying your CPU type
NMRUD disables the Most Recently Used JTLB replacement scheme bit.
JRCP indicates that a JR Cache is implemented.
JRCD indicates that JR Cache Prediction is enabled.
2.1.5 CPU-specific configuration — Config7
Config7 is packed with implementation-specific fields. Most of the time, you leave them alone (a few of them might
sometimes need to be set as required by your SoC designer). So we’ve left these registers defined in the all-CP0
appendix, in Section B.2.1 “The Config7 register”.
2.2 PRId register — identifying your CPU type
This register identifies the CPU to software. It’s appropriately printed as part of the start-up display by any software
telling the world about the CPU on start-up; but when portable software is configuring itself around different CPU
attributes, it’s always preferable to sense those attributes directly — look in other Config registers, or perhaps use a
directed software probe.
Figure 2.6 Fields in the PRId Register
31 24 23 16 15 8 7 5 4 2 1 0
CoOpt CoID Imp
Major Minor Patch
1 0x97
Rev
PRId[CoOpt]: Whatever is specified by the SoC builder who synthesizes the core — refer to your SoC manual. It
should be a number between 0 and 127 — higher values are reserved by MIPS Technologies.
PRId[CoID]: Company ID, which in this case is “1” for MIPS Technologies Inc.: PRId[Imp]: Identifies the particular processor, which in this case is 0x97 for the 74K family. Any processor with differ-
ent CP0 features must have a new PRId field.
PRId[Rev]: The revision number of the core design, used to index entries in errata lists etc. By MIPS Technologies’
convention the revision field is divided into three subfields: a major and minor number; with a nonzero "patch" revi-
sion number is for a release with no functional change. Core licensees can consult [ERRATA] for authoritative infor-
mation about the revision IDs associated with releases of the 74K core.
The following incomplete and not up-to-date table of historical revisions is provided as a guide to program-
mers who don’t have [ERRATA] on hand:
Table 2.2 74K® core releases and PRId[Revision] fields
Release
Identifier
2_0_* 1.0.0 / 0x20 First (GA) release of the 34K core September 30, 2005 2_1_* 2.1.0 / 0x44 MR1 release. Bug fixes, 8KB cache support. March 10, 2006
Programming the MIPS32® 74K™ Core Family, Revision 02.14 26
PRId[Revision]
Maj.min.patch/hex Description Date
Initialization and identity
2_2_0 2.2.0 / 0x48 Allow up to 9 TCs, alias-free 64KB L1 D-cache option. August 31, 2006 2_2_1 2.2.1 / 0x49 Enable use of MIPS SOC-it® L2 Cache Controller. October 12, 2006 2_3_* 2.3.0 / 0x4c Less interlocks round cache instructions, relocatable
2_4_* 2.4.0 / 0x50 New UserLocal register,alias-proofI-cache hit-invalidate
2_5_* 2.5.0/0x54 Errata fixes January, 2009 1_1_* 1.1.0/0x24 Errata fixes January, 2009 1_2_* 1.2.0/0x28 Feature updates: improved low power support, fast debug
2_0_* 2.0.0 / 0x40 General availability of 24K core. March 19, 2004 3_0_* 3.0.0 / 0x60 COP2 option improvements. September 30, 2004 3_2_* 3.2.0 / 0x68 PDtrace available. March 18, 2005 3_4_* 3.4.0 / 0x6c ISPRAM (I-side scratchpad) option added June 30, 2005 3_5_* 3.5.0 / 0x74 8KB cache option December 30, 2005 3_6_* 3.6.0 / 0x78 L2 support., 64KB alias-free D-cache option, option to
3_7_* 3.7.0 / 0x7c Less interlocks round cache instructions, relocatable
4_0_* 4.0.0 / 0x80 New UserLocal register,alias-proofI-cache hit-invalidate
4_1_* 4.1.0/0x84 Errata fixes January, 2009 2_0_* 2.0.0 / 0x40 General availability of 24KE core. June 30, 2005 2_1_* 2.1.0 / 0x44 8KB cache option December 30, 2005 2_2_* 2.2.0 / 0x48 L2 support., 64KB alias-free D-cache option, option to
2_3_* 2.3.0 / 0x4c Less interlocks round cache instructions, relocatable
2_4_* 2.4.0 / 0x50 New UserLocal register,alias-proofI-cache hit-invalidate
2_5_0 2.5.0/0x54 Errata fixes January, 2009
1_0_* 1.0.0 / 0x20 Early-access release of 74K family RTL. January 31, 2007 2_0_0* 2.0.0 / 0x40 First generally-available release of 74K family core. May 11, 2007 2_1_0* 2.1.0 / 0x44 Can wait with interrupts disabled. October 31, 2007
Table 2.2 74K® core releases and PRId[Revision] fields
January 3, 2007
reset exception vector location.
October 31, 2007 operation, can wait with interrupts disabled, per-TC per­formance counters.
July, 2009 channel, on-chip PDtrace buffers
July 12, 2006 haveupto 8 outstanding cache misses(previousmaximum
4).
January 3, 2007 reset exception vector location.
October 31, 2007 operation, can wait with interrupts disabled.
July 12, 2006 haveupto 8 outstanding cache misses(previousmaximum
4).
January 3, 2007 reset exception vector location.
October 31, 2007 operation, can wait with interrupts disabled.
27 Programming the MIPS32® 74K™ Core Family, Revision 02.14
2.2 PRId register — identifying your CPU type
Programming the MIPS32® 74K™ Core Family, Revision 02.14 28
Chapter 3
Memory map, caching, reads, writes and translation
In this chapter:
Section 3.1, "The memory map": basic memory map of the system.
Section 3.3, "Reads, writes and synchronization"
Section 3.4, "Caches"
Section 3.6, "Scratchpad memory/SPRAM": optional on-chip, high-speed memory (particularly useful when dual-ported to the OCP interface).
Section 3.8, "The TLB and translation": how translation is done and supporting CP0 registers.
3.1 The memory map
A 74K core system can be configured with either a TLB (virtual memory translation unit) or a fixed memory map­ping.
A TLB-equipped sees the memory map described by the [MIPS32] architecture (which will be familiar to anyone who has used a 32-bit MIPS architecture CPU) and is summarized in Table 3.1. The TLB gives you access to a full 32-bit physical address on the system interface. More information about the TLB in Section 3.8, "The TLB and
translation".
Table 3.1 Basic MIPS32® architecture memory map
Segment Virtual range What happens to accesses here?
Name
kuseg 0x0000.0000-0x7FFF.FFFF The only region accessible to user-privilege programs.
Mapped by TLB entries.
kseg0 0x8000.0000-0x9FFF.FFFF a fixed-mapping window onto physical addresses
0x0000.0000-0x1FFF.FFFF. Almost invariably cache­able - but in fact other choices are available, and are selected by Config[K0], see Figure 2.1. Accessible only to kernel-privilege programs.
kseg1 0xA000.0000-0xBFFF.FFFF a fixed-mapping window onto the same physical
address range 0x0000.0000-0x1FFF.FFFF as “kseg0”
- but accesses here are uncached. Accessible only to kernel-privilege programs.
kseg2 0xC000.0000-0xDFFF.FFFF Mapped through TLB, accessible with supervisor or
sseg
kseg3 0xE000.0000-0xFFFF.FFFF Mapped through TLB, accessible only with kernel
kernel privilege (hence the alternate name).
privileges.
Programming the MIPS32® 74K™ Core Family, Revision 02.14 29
3.2 Fixed mapping option
With the fixed mapping option, virtual address ranges are hard-wired to particular physical address windows, and cacheability options are set through CP0 register fields as summarized in Table 3.2:
Table 3.2 Fixed memory mapping
Segment Virtual range Physical range Cacheability
Name bits from
kuseg 0x0000.0000-0x7FFF.FFFF 0x4000.0000-0xBFFF.FFFF Config[KU] kseg0 0x8000.0000-0x9FFF.FFFF 0x0000.0000-0x1FFF.FFFF Config[K0] kseg1 0xA000.0000-0xBFFF.FFFF 0x0000.0000-0x1FFF.FFFF (uncached)
kseg2/3 0xC000.0000-0xFFFF.FFFF 0xC000.0000-0xFFFF.FFFF Config[K23]
Even in fixed-mapping mode, the cache parity error status bit Status[ERL] still has the effect (required by the MIPS32 architecture) of taking over the normal mapping of “kuseg”; addresses in that range are used unmapped as physical addresses, and all accesses are uncached, until
Status[ERL] is cleared again.
3.3 Reads, writes and synchronization
3.2 Fixed mapping option
The MIPS architecture permits implementations a fair amount of freedom as to the order in which loads and stores appear at the CPU interface. Most of the time anything goes: so long as the software behaves correctly, the MIPS architecture places few constraints on the order of reads and writes seen by some other agent in a system.
3.3.1 Read/write ordering and cache/memory data queues in the 74K core
To understand the timing of loads and stores (and sometimes instruction fetches), we need to say a little more about the internal construction of the 74K core. In order to maximize performance:
Loads are non-blocking: execution continues “through” a load instruction, and only stops when the program tries to use the GPR value it just loaded.
Writes are “posted”: a write from the core is put aside (the hardware stores both address and data) until the CPU can get access to the system interface and send it off. Even writes which hit in the cache are posted, occurring after the instruction graduates.
Cache refills are handled after the “missing” load has graduated: most of the time the CPU will quite soon get hung up on an instruction which needs the data from the miss, but this is not necessarily the case. The CPU runs on after the load instruction, with the memory pipeline logic remembering and handling the load completion.
All of these are implemented with “queues”, called the LDQ, WBB and FSB (for “fill/store buffer” — it’s used both for writes which hit and for refills after a cache miss) respectively. All the queues handle data first-come, first served. The WBB and FSB queues need to be snooped - a subsequent store to a location with a load pending had better not be allowed to go ahead until the original load data has reached the cache, for example. So each queue entry is tagged with the address of the data it contains.
An LDQ entry is required for every load that misses in the cache. This queue allows the CPU to keep running even though there are outstanding loads. When the load data is finally returned from the system, the LDQ and the main core logic act together to write this data into the correct GPR (which will then free up any instructions whose issue is blocked waiting for this data).
Programming the MIPS32® 74K™ Core Family, Revision 02.14 30
Loading...
+ 126 hidden pages