any event, you cannot reproduce any part of this document, in any form, without the express written
consent of PMC-Sierra, Inc.
PMC-2002227 (R2)
Disclaimer
None of the information contained in this document constitutes an express or implied warranty by PMCSierra, Inc. as to the sufficiency, fitness or suitability for a particular purpose of any such information or the
fitness, or suitability for a particular purpose, merchantability, performance, compatibility with other parts
or systems, of an y of t he pr oducts of PMC-Si erra , Inc., or an y port io n ther eof, r efer red to i n this document .
PMC-Sierra, Inc. expressly disclaims all re presentations and war ra nties of any kind rega rdi ng the contents
or use of the information, including, but not limited to, express and implied warranties of accuracy,
completeness, merchantability, fitness for a particular use, or non-infringement.
In no event will PMC-Sierra, Inc. be liable for any direct, indirect, special, incidental or consequential
damages, including, but not limited to, lost profits, lost business or lost data resulting from any use of or
reliance upon the infor ma tion, whether or not PMC-Sierra, Inc . has been a dvised of the possibility of such
damage.
Trademarks
RM7000A and Fast Packet Cache are trademarks of PMC-Sierra, Inc.
Patents
The technology discussed is protected by one or more of the following Patents.
U.S. Patent Numbers
Relevant patent applications and other patents may also exist.
5,953,748, 5,953,748, 5,953,74 8
Contacting PMC-Sierra
PMC-Sierra, Inc.
8555 Baxter Place Burnaby, BC
Canada V5A 4V7
•Fully static CMOS design with dynamic power down logic
•RM5271 pin compatible, 304 pin TBGA package, 31x31 mm
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 9
Document ID: PMC-2002227, Issue 2
2Block Diagram
Figure 1 Block Diagram
Secondary Tags
Set A
Primary Data Cache
4-way Set Associative
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Extenal Cache Controller
On-chip 256K Byte Secondary Cache, 4-way Set Associative
Secondary Tags
Set B
DTag
DTLB
Secondary Tags
Set C
ITag
ITLB
Secondary Tags
Set D
Primary Instruction Cache
4-way Set Associative
A/D Bus
Pad Bus
Store Buffer
Write Buffer
D Bus
Floating-Point
Load/Align
Floating-Point
Register File
Packer/Unpacker
Comparator
Floating-Point
MultAdd, Add, Sub,
Cvt, Div, Sqrt
Multiplier Array
Read Buffer
Coprocessor 0
System/Memory
Control
PC Incrementer
Floating-Point Control
Branch PC Adder
ITLB Virtual
Program CounterInt Mult, Div, Madd
Pad Buffer
Joint TLB
Address Buffer
IVA
F-Pipe Bus
DVA
Integer Register File
Adder
StAln/Sh
Logicals
FA Bus
DTLB Virtual
PLL/Clocks
Prefetch Buffer
Instruction Dispatch Unit
F Pipe Register
M Pipe Register
M-Pipe Bus
Load Aligner
F PipeM Pipe
Adder
Shifter
Logicals
Integer Control
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 10
Document ID: PMC-2002227, Issue 2
3Description
PMC-Sierra’s RM7000A is a highly integrated symmetric superscalar microprocessor capable of
issuing two instructions each processor cycle. It has two high-performance 64-bit integer units as
well as a high-throughput, fully pipelined 64-bit floating point unit.
The RM7000A integrates 16 KB 4-way set associative instruction and data caches along with an
integrated 256 KB 4-way set associative secondary. The primary data and secondary caches are
write-back and non-blocking. An optional external tertiary cache provides high-performance
capability even in app lications with very large data sets.
The memory management unit contains a 64/48-entry fully associative TLB and a 64-bit system
interface supporting multiple outstanding reads with out-of-order return and hardware prioritized
and vectored interrupts.
The RM7000A ideally suits high-end embedded control applications such as internetworking,
high-performance image manipulati on, high-sp eed print ing, and 3-D vi sualizati on. The RM7000A
is also applicable to the low end workstation market where its balanced integer and floating-point
performance and direct support for a large tertiary cache (up to 8 MB) provide outstanding price/
performance.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 11
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4Hardware Overview
The RM7000A offers a high-level of integration targeted at high-performance embedded
applications. The key elements of the RM7000A are described throughout this section.
4.1CPU Registers
The RM7000A CPU contains 32 general purpose registers (GPR), two special purpose registers
for integer multiplication and division, and a program counter; there are no condition code bits.
Figure 2 shows the user visible state.
Figure 2 CP0 Registers
General Purpose Registers
630
0630
r1HI
r2630
•LO
•
•
•630
r29PC
r30
r31
Released
Multiply/Divide Registers
Program Counter
4.2Superscalar Dispatch
The RM7000A incorporates a superscalar dispatch unit that allows it to issue up to two
instructions per cycle. For purposes of instruction issue, the RM7000A defines four classes of
instructions: integer, load/store, branches, and floating-point. There are two logical pipelines, the
function, or F, pipeline and the memory, or M, pipeline. Note however that the M pip e ca n exe cut e
integer as well as memory type instru ctions.
Table 1 Instruction Issue Rules
F PipeM Pipe
one of:one of:
integer, branch, floating-point,
integer mul, div
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 12
Document ID: PMC-2002227, Issue 2
integer, load/store
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Figure 2 is a simplification of the pipeline section and illustrates the basics of the instruction issue
mechanism.
Figure 3 Instruction Issue Paradigm
Instruction
Cache
Dispatch
Unit
F Pipe IBus
M Pipe IBus
FP
F Pipe
The figure illustrates that one F pipe instruction and one M pipe instruction can be issued
concurrently but that two M pipe or two F pipe instructions cannot be issued. Table 2 specifies
more completely the instructions within each class.
T able 2 Dual Issue Instruction Classes
integerload/store
add, sub, or ,
xor, sh ift, etc .
4.3Pipeline
The logical length of both the F an d M pipel ines i s fiv e stages with st ate c ommitti ng in t he reg ister
write, or W, pipe stage. The physical length of the floating-point execution pipeline is actually
seven stag es but this is completely transparent to the user.
FP
M Pipe
lw, sw, ld, sd,
ldc1, sdc1,
mov, movc,
fmov, etc.
Integer
F Pipe
floatingpointbranch
fadd, fsub,
fmult, fm add,
fdiv, fcmp,
fsqrt, etc.
Integer
M Pipe
beq, bne,
bCzT, bCzF, j,
etc.
Figure 4 shows instruction execution within the RM7000A when instructions are issuing
simultaneously down both pipelines. As illustrated in the figure, up to ten instructions can be
executing simultaneously. This figure pres ents a somewhat simplistic view of the processors
operation since the out-of-order completion of loads, stores, and long latency floating-point
operations can result in there being even more instructions in process than what is shown.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 13
Document ID: PMC-2002227, Issue 2
Figure 4 Pipeline
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
I0
I1
I2
I3
I4
I5
I6
I72I1I1R2R1A2A1D2D1W2W
I8
I9
1I-1R:
2I:
2R:
1A:
1A:
1A-2A:
2A:
2A-2D:
1D:
2W:
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
Instruction cache access
Instruction virtual to physical address translation
Register file read, Bypass calculation, Instruction decode, Branch address calculation
Issue or slip decision, Branch decision
Data virtual address calculation
Integer add, logical, shift
Store Align
Data cache access and load align
Data virtual to physical address translation
Register file write
Note that instruction dependencies, resource conflicts, and branches may result in some of the
instruction slots being occupied by
4.4Integer Unit
The RM7000A implements the MIP S IV Instru ction Set Architect ure. Addit ionally, the RM7000A
includes two implementation specific i nst r u ct ion s not f ound in the baselin e MI PS I V I SA, b ut that
are useful in the embedded market place. These instructions are integer multiply-accumulate
(MAD) and three-operand integer m ultiply (MUL).
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
one cycle
NOPs.
The RM7000A integer unit includes thirty-two general purpose 64-bit registers, the HI/LO result
registers for two-operand integer multiply/divide operations, and the program counter, or PC.
There are two separate execution units, one of which can execute function (F) type instructions
and one which can e xecute memor y (M) type instruc tions. Ref er to Table 1 for the inst ruction issue
rules.
Note that integer multip ly/divide instructions, as well as their corresponding
MFHI and MFLO
instructions, can only be executed in the F type execution unit. Within each execution unit the
operational characteristics are the same as on previous MIPS designs with single cycle ALU
operations (add, sub, logical, shift), one cycle load delay, and an autonomous multiply/divide unit.
Register File
The RM7000A has thirty-two general purpose registers with register location 0 (r0) hard wired to
a zero value. Thes e regist ers are use d for scalar integer operatio ns and addr ess cal culation . In order
to service the two integer execution units, the register file has four read ports and two write ports
and is fully bypassed both within and between the two execution units to minimize operation
latency in the pipeline.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 14
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4.5ALU
The RM7000A has two complete integer ALUs each consisting of an integer adder/subtractor, a
logic unit, and a shifter. Table 3 shows the functions performed by the ALUs for each execution
unit. Each of these units is optimized to perform all operations in a single processor cycle.
Table 3 ALU Operations
UnitF PipeM Pipe
Adderadd, subadd, sub, data address
Logiclogic, moves, zero shifts
(nop)
Shifternon zero shiftnon zero shift, store
4.6Integer Multiply/Divide
The RM7000A has a single dedicated integer multiply/divide unit optimized for high-speed
multiply and multiply-accumulate operations. The multiply/divide unit resides in the F type
execution unit. Table 4 shows the performance of the multiply/divide unit on each operation.
Released
add
logic, moves, zero shifts
(nop)
align
Table 4 Integer Multiply/Divide Operations
Operand
Opcode
MULT/U,
MAD/U
MUL
DMULT,
DMUL TU
DIV, DIVDany 36360
DDIV,
DDIVU
SizeLatency
16 bit430
32 bit540
16 bit432
32 bit543
any980
any68680
Repeat
Rate
Stall
Cycles
The baseline MIPS IV ISA specifies that the results of a multiply or divide operation be placed in
the Hi and Lo registers. These values can then be transferred to the general purpose register file
using the Move-from-Hi and Move-from-Lo (
MFHI/MFLO) instru ctions.
In addition to the baseline MIPS IV integer multiply instructions, the RM7000A also implements
the 3-operand multiply instruction,
MUL. This instruction spec ifies that the multiply re sult go
directly to the integer register file rather than the Lo register. The portion of the multiply that
would have normally gone i nto the Hi re gister i s discard ed. For applicat ions where i t is known tha t
the upper half of the multiply result is not required, using the
necessity of executing an explicit
MFLO instruction.
MUL instruction eliminates the
The multiply-add instructions,
MAD and MADU, multiply two ope rands and add the resulting
product to the current contents of the Hi and Lo registers. The multip ly-accumulate operat ion is
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 15
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
the core primitive of almost all signal processing algorithms. Therefore, using the RM7000A
eliminates the need for a separate DSP engine in many embedded applications.
4.7Floating-Point Coprocessor
The RM7000A incorporates a high-performance fully pipelined floating-point coprocessor which
includes a floating-po int register file and autonomous execution units for multiply/a dd/convert and
divide/square root. The floating-point coprocessor is a tightly coupled execution unit, decoding
and executing instructions in parallel with, and in the case of floating-point loads and stores, in
cooperation with the M pipe of the integer unit. The superscalar capabilities of the RM7000A
allow floating-point computation instructions to issue concurrently with integer instructions.
4.8Floating-Point Unit
The RM7000A floating-point execution unit supports single and double precision arithmetic, as
specified in the IEEE S tanda rd 754. The ex ecution uni t is broken i nto a separa te divide /square ro ot
unit and a pipelined multiply/add unit. Overlap of divide/square root and multiply/add is
supported.
The RM7000A maintains fully precise floating-point exceptions while allowing both overlapped
and pipelined operations. Precise exceptions are extremely important in object-oriented
programming environments and highly desirable for debugging in any environment.
Released
Floating-point operations include:
•add
•subtract
•multiply
•divide
•square root
•reciprocal
•reciprocal square root
•conditional moves
•conversion between fixed-point and floating-point format
•conversion between floating-point formats
•floating-point compare
Table 5 gives the latencies of the floating-point instructions in internal processor cycles.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 16
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
The floating-point general register file (FGR) is made up of thirty-two 64-bit registers. With the
floating-point load and store double instructions,
take advantage of the 64-bit wide data cache and issue a floating-point coprocessor load or store
doubleword instruction in every cycle.
The floating-point control register file contains two registers; one for determining configuration
and revision information for the coprocessor, and one for control and status information. These
registers are primar ily used f or diagnost ic software , exception handling, st ate savi ng and resto ring,
and control of rounding modes.
To support superscalar operations the FGR has four read ports and two write ports and is fully
bypassed to minimize operation latency in the pipeline. Three of the read ports and one write port
are used to support the combined multiply-add instruction while the fourth read and second write
port allows for concurrent floating-point load or store and conditional move operations.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 17
Document ID: PMC-2002227, Issue 2
LDC1 and SDC1, the floating-point unit can
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4.10 System Control Coprocessor (CP0)
The system control coprocessor (CP0) is responsible for the virtual memory sub-system, the
exception control system, and the diagnostics capability of the processor.
For memory management support, the RM7000A CP0 is logically identical to the RM5200
Family. For interrupt exceptions and diagnostics, the RM7000A is a superset of the RM5200
Family, implementing additional features described in the following sections on Interrupts, Test/
Breakpoint registers, and Performance Counters.
The memory management unit co ntrol s the virtu al memory syste m page mapping . It co nsist s of a n
instruction address translation buffer (ITLB) a data address translation buffer (DTLB), a Joint TLB
(JTLB), and coprocessor registers used by the virtual memory mapping sub-system.
4.11System Control Coprocessor Registers
The RM7000A incorporates all CP0 registers internally. These registers provide the path through
which the virtual memory system’s page mapping is examined and modified, exceptions are
handled, and operatin g modes are controlled (ke rn el vs. user mode, interr upt s e nabled or disabled,
cache features). In addition, the RM7000A includes registers to implement a real-time cycle
counting facility, to aid in cache and system diagnostics, and to assist in data error detection.
Released
T o supp ort the non-bloc king c aches an d enhanced interr upt handl ing capa biliti es of t he RM7000A,
both the data and control register spaces of CP0 are supported. In the data register space, which is
accessed using the
MFC0 and MTC0 instructions, the RM7000A supports the same registers as
found in the RM5200 Family. In the control space, which is accessed by the previously unused
CTC0 and CFC0 instructions, the RM7000A suppor ts f ive ne w r egi st ers. The first thr ee of these
new 32-bit registers support the enhanced interrupt handling capabilities; Interrupt Control,
Interrupt Priority Level Lo (IPLLO), and Interrup t Priority Lev el Hi (IPLHI). These registers are
described further in the section on interrupt handling. Two other registers, Imprecise Error 1 and
Imprecise Error 2, have been added to help diagnose bus errors that occur on non-blocking
memory references.
Figure 5 shows the CP0 registers.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 18
Document ID: PMC-2002227, Issue 2
Figure 5 CP0 Registers
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Context
4*
Count
9*
Status
12*
EPC
14*
Watch2
19*
ECC
26*
LLAddr
17*
PageMask
47
0
5*
EntryHi
10*
TLB
(entries protected
from TLBWR)
TagLo
28*
Used for memory
management
EntryLo0
2*
EntryLo1
3*
TagHi
29*
Info
7*
Index
0*
Random
1*
Wired
6*
PRId
15*
Config
16*
* Register number
4.12 Virtual to Physical Address Mapping
The RM7000A provides three modes of virtual addressing:
BadVAddr
8*
Compare
11*
Cause
13*
Watch1
18*
XContext
20*
CacheErr
27*
ErrorEPC
30*
Used for exception
processing
Perf Counter
25*
Perf Ctr Cntrl
22*
Watch Mask
24*
IPLLO
18*
IPLHI
19*
IntControl
20*
Imp Error 1
26*
Imp Error 2
27*
Control Space Registers
•user mode
•kernel mode
•supervisor mode
These modes allow sys tem softwar e to provide a secure environment for us er processe s. Bits in the
CP0 Status registe r det ermine which vi rtual addr essing mode is used. I n user mode, t he RM7000A
provides a single, uniform virtual address space of 256 GB (2 GB in 32-bit mode).
When operating in the kernel mode, four distinct virtual address spaces, totalling 1024 GB (4 GB
in 32-bit mode), are simultaneously available and are differentiated by the high-order bits of the
virtual address.
The RM7000A processor also supports a supervisor mode in which the virtual address space is
256.5 GB (2.5 GB in 32-bit mode), divided into three regions based on the high-order bits of the
virtual address. Figure 6 shows the address space layout for 32-bit operations.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 19
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Figure 6 Kernel Mode Virtual Addressing (32-bit)
0xFFFFFFFFKernel virtual address space
(kseg3)
0xE0000000Mapped, 0.5GB
0xDFFFFFFFSupervisor virtual address space
(ksseg)
0xC0000000Mapped, 0.5GB
0xBFFFFFFFUncached kernel physical address space
(kseg1)
0xA0000000Unmapped, 0.5GB
0x9FFFFFFFCached kernel physical address space
(kseg0)
0x80000000Unmapped, 0.5GB
Released
0x7FFFFFFFUser virtual address space
When the RM7000A is configured for 64-bit addressing, the virtual address space layout is an
upward compatible extension of the 32-bit virtual address space layout.
4.13 Joint TLB
For fast virtual-to-physical address translation, the RM7000A uses a large, fully associative TLB
that maps virtual pages to their corresponding physical addresses. As indicated by its name, the
JTLB is used for b oth inst ruction and data translat ions. The JTLB is or gani zed as pa irs of e ven/od d
entries, and maps a virtual address and address space identifier (ASID) into the large, 64 GB
physical address space. By default, the JTLB is configured as 48 pairs of even/odd entries. The
optional 64 even/odd entry configuration is set at boot time.
Two mechanisms are provided to assist in controlling the amount of mapped space and the
replacement characte ristic s of various memory regi ons. First, the page si ze can be conf igured, on a
per-entry basis, to use page sizes in the range of 4 KB to 16 MB (in 4x multiples). The CP0
PageMask register is loaded wi th the d esired p age size of a ma pping, and that si ze is s tored int o the
TLB, along with the virtual address, when a new entry is written. Thus, operating systems can
create spec ial purpose maps; for example, an entire frame buffer can be m emory mapped using
only one TLB entry.
(kuseg)
Mapped, 2.0GB
The second mechanism controls the replacement algorithm when a TLB miss occurs. The
RM7000A provides a random replacement algorithm to select a TLB entry to be written with a
new mapping. However, the processor also provides a mechanism whereby a system specific
number of mappings can be locked into the TLB, thereby avoiding random replacement. This
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 20
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
mechanism uses the CP0 Wired register and allows the operating system to guarantee that certain
pages are always mapped for performance reasons and to avoid a deadlock condition. This
mechanism also facilitates the design of real-time systems by allowing deterministic access to
critical software.
The JTLB also contains information that controls the cache coherency protocol for each page.
Specifically, each page has attribute bits to determine whether the coherency algorithm is:
•uncached
•write-back
•write-through with write-allocate
•write-through without write-allocate
•write-back with secondary and tertiary bypass
Note that both of the write-through protocols bypass both the secondary and the tertiary caches
since neither of these caches support writes of less than a complete cache line.
These protocols are used for both code and data on the RM7000A with data using write-back or
write-through depending on the application. The write-through modes support the same efficient
frame buffer handling as the RM5200 Family.
4.14 Instruction TLB
The RM7000A uses a 4-entry instructio n TLB (ITLB). The ITLB offers the followin g advan ta ges ;
•Minimizes contention for the JTLB
•Eliminates the critical path of translating through a large associative array
•Allows instruction address and data address translations to occur in parallel
•Saves power
Each ITLB entry maps a 4 KB page. The ITLB improves performance by allowing instruction
address translation to occur in parallel with data address translation. When a miss occurs on an
instructio n address translation by the ITLB, the least-recently used ITLB entry is filled from the
JTLB. The operation of the ITLB is c ompletely transparent to the user.
4.15 Data TLB
The RM7000A uses a 4-entry data TLB (DTLB) for the same reasons cited above for the ITLB.
Each DTLB entry maps a 4 KB page. The DTLB improves performance by allowing data address
translation to occur in parallel with instruction address translation. When a miss occurs on a data
address translation, the DTLB is filled from the JTLB. The DTLB refill is pseudo-LRU; the least
recently used ent r y of th e least recently used pair of entrie s is filled. The opera ti on of the DTLB is
completely transparent to the user.
4.16 Cache Memory
The RM7000A contains integrated primary instruction and data caches that support single cycle
access, as well as a lar g e un ifie d second ary ca che with a t hree cycle miss pen alt y fro m the pr imary
caches. Each primary cache has a 64-bit read path and a 128-bit write path. Both caches can be
accessed simultaneously. The primary caches provide the integer and floating-point units with an
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 21
Document ID: PMC-2002227, Issue 2
aggregate bandwidth of 6.4 GB per second at an internal clock frequency of 400 MHz. During an
instruction or data primary cache refill, the secondary cache can provide a 64-bit datum every
cycle following the initial three cycle latency for a peak bandwidth of 3.6 GB per second. For
applications requi ring eve n higher performan ce, the RM700 0A also ha s a dire ct inte rface t o a lar ge
external te rtiary cache.
4.17 Instruction Cache
The RM7000A has an integrated 16 KB, four- way set assoc iative inst ruction c ache that is virtual ly
indexed and physically tagged. The effective physical index eliminates the potential for virtual
aliases in the cache.
The data array portion of the instruction cache is 64 bits wide and protected by word parity while
the tag array holds a 24-bit physical address, 14 control bits, a valid bit, and a single parity bit.
By accessing 64 bits pe r cy cle , th e instruction cache is a ble to supply two instruction s per cycle to
the superscalar di spatch unit. For s ig nal pr oce ssing, graphics, and ot her numerical code sequences
where a floating-point load or store and a floating-point computation instruction are being issued
together in a loop, the entire bandwidth available from the instruction cache is consumed by
instruction issue. For typical integer code mixes, where instruction dependencies and other
resource constraints restrict the level of parallelism that can be achieved, the extra instruction
cache bandwidth is used to fetch both the taken and non-taken branch paths to minimize the
overall penalty for branches.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
A 32-byte (eight instruction) line size is used to maximize the communication efficiency between
the instruction cache and the secondary cache, tertiary cache, or memory system.
The RM7000A supports cache locking on a per line basis. The contents of each line of the cache
can be locked by setting a bit in the Tag RAM. Locking the line prevents its contents from being
overwritten by a subsequent cache miss. Refills occur only into unlocked cache lines. This
mechanism allows the programmer to lock critical code into the cache, thereby g uaranteeing
deterministic behavior for the locked code sequence.
4.18 Data Cache
The RM7000A has an integrated 16 KB, four-way set associative data cache that is virtually
indexed and physically tagged. Line size is 32 bytes (8 words). The effective physical index
eliminates the potential for virtual aliases in the cache.
The data cache is non-blocking; that is, a miss in the data cache does not necessarily stall the
processor pipeline. As long as no instruction is encountered which is dependent on the data
reference which caused the miss, the pipeline continues to advance. Once there are two cache
misses outstanding, the processor stalls if it encounters another load or store instruction.
The data array portion of the data cache is 64 bits wide and protected by byte parity while the tag
array holds a 24-bit physic al addre ss, 3 control bits, a two-bit cac he st at e field, and two parity bits.
The most commonly used wri te policy is writ e-b ack, which means that a st ore to a cache li ne d oes
not immediately cau se memo ry to b e updat ed. This in creas es syst em perf ormance by redu cing bu s
traffic and eliminating the bottleneck of waiting for each store operation to finish before issuing a
subsequent memory operation. Software can, however, select write-through on a per-page basis
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 22
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
when appropriate, such as for frame buffers. Cache protocols supported for the data cache are as
follows:
1.Uncached
Reads to addresses in a memory area identified as uncached do not access the cache. Writes to
such addresses are written directly to main memory witho ut updating the cache.
2.Write-back
Loads and instruction fetches first search the cache, reading the next memory hierarchy level
only if the d esired data is not cache resident. On data store operations, the cache is first
searched to determine if the tar get address is cache resid ent. If it is resid ent, the cache con tents
are updated and the cache line is marked for later write-back. If the cache lookup misses, the
target line is first brought into the cache, afterwhich the write is performed as above.
3.Write-through with write allocate
Loads and instruction fetches first search the cache, reading from memory only if the desired
data is not cache resident; write-through data is never cached in the secondary or tertiary
caches. On data store operations, the cache is first searched to determine if the target address is
cache resident. If it is resident, the primary cache contents are updated and main memory is
written, leaving the write-back bit of the cache line unchanged; no writes occur to the
secondary or tertia ry cache s. If the cac he lookup misse s, the tar g et line is firs t brought into the
cache, afterwhich the write is performed as above.
4.Write-through without write allocate
Loads and instruction fetches first search the cache, reading from memory only if the desired
data is not cache resident; write-through data is never cached in the secondary or tertiary
caches. On data store operations, the cache is first searched to determine if the target address is
cache resident. If it is resident, the cache contents are updated and main memory is written,
leaving the write-back bit of the cache line unchanged; no writes occur to the secondary or
tertiary caches. If the cache lookup misses, only main memory is written.
5.Fast Packet Cache™ (Write-back with secondary and tertiary bypass)
Loads and instruction fetches first search the primary cache, reading from memory only if the
desired data is not resident; the secondary and tertiary caches are not searched. On data store
operations, the primary cache is first searched to determine if the target address is resident. If
it is resident, the cache cont ent s ar e updated, and the ca che line marked for la ter write-back. If
the cache lookup misses, the target line is first brought into the cache, afterwhich the write is
performed as above.
Associated with the data cache is the store buffer. When the RM7000A executes a
STORE
instruction, this single-entry buffer is written with the store data while the tag comparison is
performed. If the tag matches, then the data is written into the data cache in the next cycle that the
data cache is not accessed (the next non-load cycle). The store buffer allows the RM7000A to
execute a store every processor cycle and to perform back-to-back stores without penalty. In the
event of a store immediately followed by a load to the same address, a combined merge and cache
write occurs such that no penalty is incurred.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 23
Document ID: PMC-2002227, Issue 2
4.19 Secondary Cache
The RM7000A has an integrated 256 KB, four-way set associative, block write-back secondary
cache. The secondary cache has a 32-byte line size, a 64-bit bus width to match the system
interface and primary cache bus widths, and is protected with doubleword parity. The secondary
cache tag array holds a 20-bit physica l a ddress, 2 control b it s, a th ree bit cache state fi el d, and two
parity bits.
By integrating a seconda ry cache, t he RM7000A is a ble to d ecreas e the l atency of a pri mary cache
miss without significantly increasing the number of pins and the amount of power required by the
processor. From a technology point of view, integrating a secondary cache leverages CMOS
technology by using silicon to build the structures that are most amenable to silicon technology;
building very dense, low power memory arrays rather than large power hungry I/O buffers.
Further benefits of an integrated secondary cache are flexibility in the cache organization and
management policies that are not practical with an external cache. Two previously mentioned
examples are the 4-way associativity and write-back cache protocol.
A third management policy for which integration affords flexibility is cache hierarchy
management. With multiple levels of cache, it is necessary to specify a policy for dealing with
cases where two cache lines at level n of the hierarchy could possibly be sharing an entry in level
n+1 of the hierarchy.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
The RM7000A allows entries to be stored in the primary caches that do not necessarily have a
corresponding entry in the secondary; t h e RM7000A does not force th e pr i mari es to be a subset of
the secondary. For example, if primary cache line A is being filled and a cache line already exists
in the secondary for prima ry cac he li ne B at the locat ion whe re pr imary A’s line would reside, then
that secondary entry is replaced by an entry corresponding to primary cache line A and no action
occurs in the primary for cache line B. This operation creates the aforementioned scenario where
the primary cache l ine , whi ch initially had a corresponding secondary entry, no longer has such an
entry. Such a primary line is called an orphan. In general, cache li nes at level n+1 of the hierarc hy
are called parents of level n’s children.
Another RM7000A cache management optimization occurs for the case of a secondary cache line
replacement where the secondary line is dirty and has a corresponding dirty line in the primary. In
this case, since it is permissible to leave the dirty line in the primary, it is not necessary to write the
secondary line back to main memory. Taking this scenario one step further, a final optimization
occurs when the a for emen ti oned dirty primary line is replaced by anot her line and must be wri t ten
back. In this case it is written directly to memory, bypassi ng the secondary cache.
4.20 Secondary Caching Protocols
Unlike the primary dat a cac he, t he secondary cache supports only uncached a nd block write-back.
As noted earlier, cache lines managed with either of the write-through protocols are not placed in
the secondary cache. A new caching attribute, write-back with secondary and tertiary bypass,
allows the secondary, and tertiary caches to be bypassed entirely. When this attribute is selected,
the secondary and tertiary caches are not filled on load misses and are not written on dirty writebacks from the primary cache.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 24
Document ID: PMC-2002227, Issue 2
4.21 Tertiary Cache
The RM7000A has direct support for an external tertiary cache. The tertiary cache is direct
mapped and block writ e-throug h with byt e parity protection for data . The RM7000A t ertiary c ache
operates identical to the secondary cache of the RM527x while supporting additional size
increments to support 4 MB and 8 MB caches.
The tertiary interface uses the SysAD bus for data and tags while providing a separate bus, TcLine[17:0], for addresses, along with a number of tertiary cache specific control signals.
A tertiary read looks nearly identical to a stand ard processor read except that the tag chip enab le
signal, TcTCE*, is assert ed concurrently with ValidOut* and Release*, initiating a ta g pr obe and
indicating to the external controller that a tertiary cache access is being performed. As a result, the
external co ntroller monitors the te rtiary hit si gnal, TcMatch. If a hit is indicated the controller
aborts the memory read and refrains from acquiring control of the system interface. Along with
TcTCE*, the processor also asserts the tag data enable signal, TcTDE*, which causes the tag
RAMs to latch the SysAD address internally for use as the replacement tag if a cache miss occurs.
On a tertiary miss, a refill is accomplished with a two signal handshake between the data output
enable sign al, TcDOE*, which is deasserted by the controller, and the tag and data write enable
signal, TcCWE* , asserted by the processor. Figure 7 illustrates a tert iary cache hit followed by a
miss.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Figure 7 Tertiary Cache Hit and Miss
Master
SysClock
SysAD
TcLine[17:0]
TcWord[1:0]
TcTCE*
TcMatch
TcDCE*
TcCWE*
TcDOE*
Processor
AddrData1 Data2
Index
I0
Data0AddrData0
I1I2I0I3I0I1I2I3I1
Tertiary (Hit)Tertiary (Miss)
Data3Data1
Processor
Data0
Index
System
Data1
Other capabilities of the tertiary interface include block write, tag invalidate, and tag probe. For
details of these transactions as well as detailed timing waveforms for all the tertiary cache
transactio ns, refer to the RM7000A Bus Interface Specifi cation. The tertiary cache tag can e asily
be implemented with standard components such as the Motorola MCM69T618.
The RM7000A cache attributes for the instruction, data, internal secondary, and optional external
tertiary cac hes are summarized in Table 6.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 25
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
T a ble 6 Cache Attributes
AttributeInstructionDataSecondaryTertiary
Size16KB16KB256KB512K, 1M, 2M, 4M,
or 8M
Associativity4-way4-way4-waydirect mapped
Replacement
Algorithm.
Line size32 byte32 byte32 byte32 byte
IndexvAddr
TagpAddr
Write policyn.a.write-back, write-
read policyn.a.non-blocki ng (2
read ordercritical word firstcritical word firstcritical word firstcritical word first
write orderNAsequentialsequentialsequential
miss restart
The RM7000A allows critical co de or data fr agments to be locked into the pr imar y and sec ondary
caches. The user has complete control over the locking function. For instruction and data
fragments in the primary caches, locking is accomplished by setting either or both of the cache
lock enable bits and specifying the set in the CP0 ECC register, then executing either a load
instruction for data, or a Fill_I cache operation for instructions.
Only sets A and B within each cache can be locked. Locking within the secondary works
identically to the primaries using a separate secondary lock enable bit and the same set selection
field. As with the primaries, only sets A and B can be locked. Table 7 summarizes the cache
locking capabilities.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 26
Document ID: PMC-2002227, Issue 2
Table 7 Cache Locking Control
Lock
Cache
Primary IECC[27]ECC[28]=0→A
Primary DECC[26]ECC[28]=0→A
SecondaryECC[25]ECC[28]=0→A
EnableSet SelectActiva t e
ECC[28]=1→B
ECC[28]=1→B
ECC[28]=1→B
4.23 Cache Management
To improve the performance of critical data movement operations in the embedded environment,
the RM7000A significantly improves the speed of operation of certain critical cache management
operations. In particular, the speed of the Hit-Writeback-Invalidate and Hit-Invalidate cache
operations has been improved, in some cases by an order of magnitude, over that of other MIPS
processors. For example, Table 8 compares the RM7000A with the R4000 processor.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Fill_I
Load/Store
Fill_I or
Load/Store
T a ble 8 Penalty Cycles
OperationCondition
Hit-WritebackInvalidate
Hit-InvalidateMiss07
Miss07
Hit-Clean312
Hit-Dirty3+n14+n
Hit29
For the Hit-Dirty case of Hit-Writeback-Invalidate in Table 8 above, if the writeback buffer is full
from some previous cache evicti on, then n is the number of cycles req uired to empty th e writeback
buffer. If the buffer is empty then n is zero.
The penalty value in Table 8 is the number of processor cycles beyond the one cycle required to
issue the instruction that is required to implement the operation.
4.24 Primary Write Buffer
Writes to secondary cache or external memory, whether cache miss write-backs or stores to
uncached or write-through addresses, use the integrated primary write buffer. The write buffer
holds up to four 64-bit ad dress an d data pai rs. The entir e buf fer is used for a dat a cac he write-b ack
and allows the processor to proceed in parallel with memory update. For uncached and writethrough stores, the write buffer significantly increases performance by decoupling the SysAD bus
transfers fr om the instruction exec ution stream .
Penalty
RM7000AR4000
4.25 System Interface
The RM7000A provides a high-performance 64-bit system interface which is compatible with the
RM5200 Family. As an enhancement to the SysAD bus interface, the RM70 00A allows half-
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 27
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
integral clock multipliers, thereby providing greater granularity when selecting pipeline and
system interface frequencies.
The SysAD interf ace cons ists of a 64-bi t Addres s/Da ta bus wit h 8 check bits and a 9-bi t command
bus. In addition, there are ten handshake signals and ten interrupt inputs. The interface is capable
of transferring data between the processor and memory at a peak rate of 1000 MB/sec with a 125
MHz SysClock.
Figure 8 shows a typical embedded system using the RM7000A. This example shows a system
with a bank of DRAMs, an optional tertiary cache, and an interface ASIC which provides DRAM
control as well as an I/O port.
Figure 8 Typical Embedded System Block Diagram
DRAM
72
Latch
72
RM7000A
TcLine, etc.
Tertiary Cache
SysCmd
72
(optional)
4.26 System Address/Data Bus
The 64-bit System Address Data (SysAD) bus is used to transfer addresses and data between the
RM7000A and the rest of the system. It is protected with an 8-bit parity check bus, SysADC[7:0].
The system interface is configurable to allow easy interfacing to me mory and I/O systems of
varying frequenci es. T he da ta rat e and the bus frequency at which the RM7000A transmits data to
the system interface are programmable at boot time via mode control bits. In addition, the rate at
which the processor re ceives dat a is fully con trolled by the externa l device. Ther efore, either a lo w
cost interface requiring no read or write buffering, or a faster, high-performance interface can be
designed to communicate with the RM7000A.
SysAD Bus
72
25
Flash/
Boot
ROM
Address
Control
8
Memory I/O
Controller
xx
PCI Bus
4.27 System Command Bus
The RM7000A interface has a 9-bit System Command bus, SysCmd[8:0]. The command bus
indicates whether the SysAD bus carries address or data information on a per-clock basis. If the
SysAD bus carries address, the SysCmd bus indicates the transaction type (for example, a read or
write). If the SysAD bus carries data, then the SysCmd bus contains information about the data
(for example, this is the last data word transmitted, or the data contains an error). The SysCmd bus
is bidirectional to support both processor requests and external requests to the RM7000A.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 28
Document ID: PMC-2002227, Issue 2
Processor requests are init iate d by the RM7000A a nd res ponded t o by an extern al dev ice. Ext ernal
requests are issued by an external device and require the RM7000A to respond.
The RM7000A supports one- to eight-byte transfers as well as 32-byte block transfers on the
SysAD bus. In the case of a sub-doubleword transfer, the 3 low-order address bits give the byte
address of the transf er, and the SysCmd bus indicates the number of bytes being transferred.
4.28 Handshake Signals
There are ten handshake sign als on th e syste m interf ace . Two of these, RdRdy* and WrRdy*, are
driven by an extern al dev ice to i ndicat e to t he RM700 0A whether it c an acce pt a ne w read or writ e
transaction. The RM7000A sampl es t hese signals before deasserting the ad dre ss on read and write
requests.
ExtRqst* and Release* are used to transfer control of the SysAD and SysCmd buses from the
processor to an external device. When an external device requires control of the bus, it asserts
ExtRqst*. The RM7000A responds by asserting Release* to release the system interfa ce to slave
state.
PRqst* and PAck* are used to transfer control of the SysAD and SysCmd buses from the external
agent to the proces sor. These two pins have been added to the SysAD interface to suppor t multip le
outstanding reads and facilitate non-blocking caches. When the processor needs to reacquire
control of the interface, it asserts PRqst*. The external device responds by asser ti ng PAck* to
return control of the interface to the processor.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
RspSwap* is also a new pin and is used by the external agent to indicate to the processor when it
is returning data out of order. For example, when there are two outstanding reads, the external
agent asserts RspSwap* when it is going to re tu rn the data for th e seco nd rea d b efore it r et urns the
data for the fir st rea d. RspSwap * must be a ssert ed by the ext ernal age nt two cycl es ahea d of when
it presents data so that the processor has time to switch to the correct address for writes into the
tertiary cache.
RdType is another new pin on the i nte rfac e that indi cates whethe r a r ead is an inst ructi on rea d o r a
data read. Wh en asserted, the reference is an instru ction read. When deasse rted it is a data read.
RdType is only valid during valid address cycles.
ValidOut* and ValidIn* are used by the RM7000A and the external device respectively to
indicate that there is a valid command or data on the SysAD and SysCmd buses. The RM7000A
asserts ValidOut* when it is driving these buses with a valid command or data, and the external
device drives ValidIn* when it has control of the buses and is driving a valid command or data.
4.29 System Interface Operation
To support non-blocking caches and data prefetch instructions, the RM7000A allows two
outstanding reads. An external device may respond to read requests in whatever order it chooses
by using the response order indicator pin RspSwap*. No more than two read requests are
submitted to the external device. Sup port for multiple outstand ing reads can be ena bled or di sabled
via a boot-time mode bit. Refer to Table 16 for a complete list of mode bits.
The RM7000A can issue read and write requests to an external device, while an external device
can issue null and write requests to the RM7000A.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 29
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
For processor reads, the RM7000A asserts ValidOut* and simultaneously drives the address and
read command on the SysAD and SysCmd buses. If the system interface has RdRdy* asserted,
then the processor tristates its drivers and releases the system interface to slave state by asserting
Release*. The external device can then begin sending data to the RM7000A.
Figure 9 shows a processor block read request and the external agent read response for a system
with either no tertiary cache or a transaction where the tertiary is being bypassed.
Figure 9 Processor Block Read
SysClock
SysAD
SysCmd
ValidOut*
ValidIn*
RdRdy*
WrRdy*
Release*
AddrData0Data1
Read
NDataNData NEOD
NData
Data2
Data3
In Figure 9 the read latency is 4 cycles (ValidOut* to ValidIn*), and the response data pattern is
DDxxDD. Figure 10 shows a processor block write where the processor was programmed with
write-back data rate boot code 2, or DDxxDDxx.
Finally, Figure 11 shows a typical sequence resulting in two outstanding reads both with initial
tertiary cache accesses, as explained in the following sequence.
1.The processor issues a read which misses in the tertiary cache.
2.The external agent takes control of the bus in preparation for returning data to the processor.
3.The proc esso r encount ers another inter nal cache miss and there fore ass erts PRqst* in order to
regain control of the bus.
4.The external agent pulses PAck*, returning control of the bus to the processor.
5.The processor issues a read for the second miss.
6.The second cycle a lso misses in the tertiary.
7.The RspSwap* pin is asserted to denote the out of order response. Not shown in the figure is
the completion of the data transfer for the second miss, or any of the data transfer for the first
miss.
8.The external agent retakes control of the bus and begins returning data (out of order) for the
second miss to the processor
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 30
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Figure 10 Processor Block Write
SysClock
Released
SysAD
SysCmd
ValidOut*
ValidIn*
RdRdy*
WrRdy*
Release*
AddrData0Data1Data2Data3
WriteNData NDataNDataNEOD
Figure 11 Multiple Outstanding Reads
Master
SysClock
SysAD
SysCmd
RspSwap*
ValidOut*
ValidIn*
Processor
Addr
Read
Tertiary(Miss)Tertiary(Miss)
Data0
1
1
Data1Data1
System
2
Processor
Addr
Read
System
5
Data0
NData
Data1
2
2
NData
2
2
Data0
7
8
Release*
PRqst*
PAck*
TcMatch
1
3
4
6
4.30 Data Prefetch
The RM7000A is the first PMC-Sierra design to support the MIPS IV integer data prefetch
(
PREF) and floating-point data prefetch (PREFX) instructions. These instructions are used by
the compiler or by an assembly language programmer when it is known or suspected that an
upcoming data reference is going to miss in the cache. By appropriately placing a prefetch
instruction, the memory latency can be hidden under the execution of other instructions. In cases
where the execution of a prefetch ins tr uct ion would cause a memory ma nagement or address error
exception the prefetch is treated as a
The “Hint” field of the data prefetch instruction is used to specify the action taken by the
instruction. The ins truction can ope rate normally (tha t is, fetching dat a as if for a load oper ation) or
it can allocate and fill a cache line with zeroes on a primary data cache miss.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 31
Document ID: PMC-2002227, Issue 2
NOP.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4.31 Enhanced Write Modes
The RM7000A implements two enhancements to the original R4000 write mechanism: Write
Reissue and Pipeline Writes. The original R4000 allowed a write on the SysAD bus every four
SysClock cycles. Hence for a non-block write, this meant that two out of every four cycles were
wait states.
Pipelined write mode eliminates these two wait states by allowing the processor to drive a new
write address onto the bus immediately after the previous data cycle. This allows for higher
SysAD bus utilization. However, at high frequencies the processor may drive a subsequent write
onto the bus prior to the time the external agent deasserts WrRdy*, indicating that it can not
accept another write cycle. This can cause the cycle to be aborted.
Write re issue mode is an enhance ment to pipeli ned writ e mode and allo ws the proce ssor to re is sue
aborted wr ite cycles. If WrRdy* is deasserted during the issue phase of a write operation, the
cycle is aborted by the processor and reissue d at a later time.
In write reissue mode, a rate of one write every two bus cycles can be achieved. Pipelined writes
have the same two bus cycle write repeat rate, but can issue one additional write following the
deassertion of WrRdy*.
Released
4.32 External Requests
The RM7000A can respond to certain requests issued by an external device. These requests take
one of two forms: Write requests and Null requests. An external device execut es a write request
when it wishes to update one of the processors writable resources such as the internal interrupt
register. A null request is executed when the external device wishes the processor to reassert
ownership of the proce ssor ex terna l in terf ace. On ce the ex terna l devic e has acquir ed cont rol of t he
processor interface via ExtRqst*, it can execute a null request after completing an independent
transaction between itself and system memory in a system where memory is connected directly to
the SysAD bus. Normally this transaction w ould be a DMA read or write from the I/ O system.
4.33 Test/Breakpoint Registers
To facilitate hardware and software debugging, the RM7000A incorporates a pair of Test/Breakpoint, or Watch registers, called Watch1 and Watch2, Each Watch register can be separately
enabled to watch for a load address, a store address, or an instruction address. All address
comparisons are done on physical addresses. An associated register, Watch Mask, has also been
added so that either or both of the Watch registers can compare against an address range rather
than a specific address. The range granularity is limited to a power of two.
When enabled, a match of either Watch register results in an exception. If the Watch is enabled for
a load or store address then the exception is the Watch exception as defined for the R4000 by
Cause exception code twenty-three. If the Watch is enabled for instruction addresses then a newly
defined Instruction Watch exception is taken and the Cause code is sixteen. The Watch register
which caused the exception is indicated by Cause bits 25:24. Table 9 summarizes a Watch
operation.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 32
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Table 9 Watch Control Register
RegisterBit Field/Function
63626160:3635:21:0
Watch1, 2StoreLoadInstr0Addr0
31:210
Watch MaskMaskMask
Note that the W1 and W2 bits of the Cause register indicate which Watch register caused a particular Watch exception.
4.34 Performance Counters
To facilitate system tuning, the RM7000A implements a performance counter using two new CP0
registers, PerfCount and PerfControl. The PerfCount register is a 32-bit writable counter which
causes an interrupt when bit 31 is set. The PerfControl register is a 32-bit register containing a 5bit field which sele cts one of twenty-two event types as well as a handful of bits which c ontrol the
overall counting fun ction. Note tha t only one event type can be counte d at a time and that co unting
can occur for user code, kernel code, or both. The event types and control bits are listed in Table
10.
Watch
2
Released
Mask
Watch
1
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 33
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
09:Stall cycles
0A: Secondary cac he misses
0B: Instruction cache misses
0C: Data cache misses
0D: Data TLB misses
0E: Instruction TLB misses
0F:Joint TLB instruction misses
10:Joint TLB data misses
11:Branches taken
12:Branches issued
13:Secondary cache writebacks
14:Primary cache writebacks
15:Dcache miss stall cycles (cycles where both cache miss tokens taken and a third address is
requested)
16:Cache misses
17:FP possible exception cycles
18:Slip Cycles due to multiplier busy
19:Coprocessor 0 slip cycles
1A: Slip cycles doe to pending non-blocking loads
1B: Write buffer full stall cycles
1C: Cache instruction stall cycles
1D: Multiplier stall cycles
1E: Stall cycles due to pending non-blocking loads - stall start of exception
7:5Reserved (must be zero)
8Count in Kernel Mode
0:Disable
1:Enable
9Count in User Mode
0:Disable
1:Enable
10Count Enable
0:Disable
1:Enable
31:11Reserved (must be zero)
Released
The performance counter interrupt only occurs when interrupts are enabled in the St atus register,
IE=1, and the Interrupt Mask bit 13 (IM[13]) of the coprocessor 0 interrupt control register is set.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 34
Document ID: PMC-2002227, Issue 2
Since the performance coun ter can be se t up to cou nt clock cycl es, it can be used as eit her a second
timer, or a wat chd og i nterrupt. A watchdog inte rr upt can be used as an aid in debuggi ng sys tem or
software “hangs.” Typically the software is setup to periodically update the count so that no
interrupt occurs. When a hang occurs the interrup t ultimately triggers, thereby bre aking free fr om
the hang-up.
4.35 Interrupt Handling
In order to provide better real time interrupt handling, the RM7000A provides an extended set of
hardware interrupts, each of which can be separately prioritized and separately vectored.
In addition to the standard six external interrupt pins, the RM7000A provides four more interrupt
pins for a total of ten external interrupts.
As described above, the performance counter is also a hardware interrupt source using INT[13].
Historically in the MIPS architecture, interrupt 7 (INT[7]) was used as th e timer interrupt. The
RM7000A provides a separate interrupt, INT[12], for this purpose, thereby releasing INT[7] for
use as a pure external interrupt.
All interrupts (INT[13:0]), the Performance Coun ter, and the Timer , hav e cor re sponding interrupt
mask bits, IM[13..0], and interrupt pending bits, IP[13..0], in the Status, Interrupt Co ntrol, and
Cause registers. The bit assignments for the Interrupt Control and Cause registers are shown in
Table 11 and Table 12. The Status register has not changed from the RM5200 Family and is not
shown.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
The IV bit in the Cause register is the global enable bit for the enhanced interrupt features . If this
bit is clear then interrupt operation is compatible with the RM5200 Family.
In the Interrupt Control register, the interrupt vector spacing is controlled by the Spacing field as
described below. The Interrupt Mask field (IM[15:8]) contains the interrupt mask for interrupts
eight through thirteen. IM[15:14] are reserved for future use.
The Timer Enable (TE) bit is used to gate the Timer Interrupt to the Cause Register. If TE is set to
0, the Timer Interrupt is not gated to IP[12]. If TE i s set to 1, the Timer Interrupt is gated to IP[12].
The setting for Mode Bit 11 is used to determine if the Timer Interrupt replaces the external
Interrupt ( Int[5]*) as an input to IP[7] in the Cause Register. If Mode Bit 11 is set to 1, the Timer
Interrupt is gated to IP[7].
In order to utilize both the external Interrupt (Int[5]*) and the internal Timer Interrupt, Mode Bit
11 must be set to 0, and TE must be set to 1. In this case, the Timer Interrupt will utilize IP[12], and
Int[5]* will utilize IP[7]. Please also reference the logic diagram for interrupt signals in the
RM7000 User Manual.
The Interrupt Control register uses IM13 to enable the Performance Counter Control.
Priority of the interrupts is set via two new coprocessor 0 registers called Interrupt Priority Level
Lo (IPLLO) and Interrupt Priority Level Hi (IPLHI).
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 35
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Table 11 Cause Register
313029,282726252423..876..20,1
BD0CE0W2W1IVIP[15..0]0EXC0
Table 12 Interrupt Control Register
31..1615..876..54..0
0IM[15..8]TE0Spacing
Table 13 IPLLO Register
31..2827..2423..2019..1615..1211..87..43..0
IPL7IPL6IPL5IPL4IPL3IPL2IPL1IPL0
Table 14 IPLHI Register
31..2827..2423..2019..1615..1211..87..43..0
00IPL13IPL12IPL11IPL10IPL9IPL8
In the IPLLO and IPLHI registers, each interrupt is represented by a four-bit field, thereby
allowing each interru pt to be programmed with a priority level from 0 to 13 inclusive. The
priorities can be set in any manner, including having all the priorities set exa ctly t he same. Pr iorit y
0 is the highest level and priority 15 the lowest. The format of the priority level registers is shown
in T abl e 13 and Table 14 above. The priority lev el reg isters are l ocated in the copr ocessor 0 control
register space.
In addition to programmable priority levels, the RM7000A also permits the spacing between
interrupt vectors to be programmed. For example, the minimum spacing between two adjacent
vectors is 0x20 while the maximum is 0x200. This programma bility all ows the use r to either set up
the vectors as jumps to the actu al inte rrupt routin es or, if interrupt latency is paramount, to include
the entire interrupt routine at one vector. Table 15 illustrates the complete set of vector spacing
selections along with the coding as required in the Interrupt Con trolregister bits 4:0.
In general, the acti ve interrupt priority, combined with the spacing setting, generates a vect or offse t
which is then added to the interrupt base address of 0x200 to generate the interrupt exception
offset. This offset is then added to the exception base to produce the final interrupt vector address.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 36
Document ID: PMC-2002227, Issue 2
Table 15 Interrupt Vector Spacing
ICR[4..0] Spacing
0x00x000
0x10x020
0x20x040
0x40x080
0x80x100
0x100x200
othersreserved
4.36 Standby Mode
The RM7000A provides a means to reduce the amount of power consumed by the internal core
when the CPU is not performing any useful operations. This state is known as Standby Mode.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Executing the
WAIT instruction enables interrupts and causes the processor to enter Standby
Mode. If the SysAD bus is currently idle when the WAIT instruction comp letes the W pipe stage,
the internal processor clock stops, thereby freezing the pipeline. The phase lock loop, or PLL,
internal timer/counter, and the "wake up" input pins: INT[9.0]*, NMI*, ExtReq*, Reset*, and ColdReset* continue to operate in their normal fashion.
If the SysAD bus is not idle when the
WAIT is treated as a NOP until the bus operation is completed. Once the processor is in Standby,
any interrupt, including the internally generated timer interrupt, causes the processor to exit
Standby and resume operation where it left off. The
idle loop of the operating system or real time executive.
4.37 JTAG Interface
The RM7000A interface supports JTAG boundary scan in conformance with IEEE 1149.1. The
JTAG interface is useful for checking the integrity of the processor’s pin connections.
4.38 Boot-Time Options
The RM7000A operating modes are initialized at power-up by the boot-time mode control
interface. The serial boot-time mode control interface operates at a very low frequency (SysClock
divided by 256), allowin g the init iali zatio n infor mat ion to be kept i n a low cos t EPROM or syst em
interface AS IC.
4.39 Boot-Time Modes
WAIT i nstruction completes th e W pipe stage, then the
WAIT instruction is typically ins erted in the
The boot-time serial mode stream is defined in Table 16. Bit 0 is presented to the processor as the
first bit in the stream whe n VccOK is de-asserted. Bit 255 is the last bit transferred.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 37
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Table 16 Boot Time Mode Stream
Mode bit DescriptionMode bit Description
0reserved (must be zero)17:16System configuration identifiers - software
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 38
Document ID: PMC-2002227, Issue 2
5Pin Descriptions
The following is a list of control, data, clock, tertiary cache, interrupt, and miscellaneous pins of
the RM7000A.
Table 17 System Interface
Pin NameTypeDescription
ExtRqst*InputExternal request
Release*OutputRelease interface
RdRdy*InputRead Ready
WrRd y*InputWrite Ready
ValidIn*InputValid Input
ValidOut*OutputValid output
PRqst*OutputProcessor Req uest
PAck*InputProcessor Acknowledge
RspSwap*InputResponse Swap
RdTypeOutputRead Type
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Signals that the system interface is submitting an external request.
Signals that the processor is releasing the system interface to slave
state
Signals that an external agent can now accept a processor read.
Signals that an external agent can now accept a processor write
request.
Signals that an external agent is now drivin g a valid address or data on
the SysAD bus and a valid command or data identifier on the SysCmd
bus.
Signals that the pro ce ss or is n ow d r iv ing a v ali d add res s or dat a o n the
SysAD bus and a valid comm and or data iden tifi er on the Sy sCm d bus .
When asserted this signa l requ es ts tha t cont rol of the sy st em interfa ce
be returned to the processor. This is enabled by Mode Bit 26.
When asserted, in response to PRqst*, this signal indicates to the
processor that it has been granted control of the system interface.
RspSwap* is used by th e ex ternal agent to signal the proces sor wh en it
is about to return a memory reference out of order; i.e., of two
outstanding memory references, the data for the second reference is
being returned ahead of t he data for th e first refere nce. In ord er that the
processor will have time to sw itch the ad dress to the terti ary cache, this
signal must be a ss erte d a mi nim um o f two cycles prior t o t he data itself
being presented. Note that this signal works as a toggle; i.e., for each
cycle that it is held asserted the order of return is reversed. By default,
anytime the processor issues a second read it is assumed that the
reads will be returned in order; i.e., no action is required if the reads are
indeed returned in order. This is enabled by Mode Bit 26.
During the address cycle of a read request, RdType indicates whether
the read request is an instruction read or a data read.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 39
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Pin NameTypeDescription
SysAD(63:0)Input/OutputSystem address/data bus
A 64-bit address and data bus for communication between the
processor and an external agent.
SysADC(7:0)Input/OutputSystem address/data check bus
An 8-bit bus contain ing pari ty che ck bi ts for the SysAD bus durin g da ta
cycles.
SysCmd(8:0)Input/OutputSystem command/data identifier bus
A 9-bit bus for command and data identifier transmission between the
processor and an external agent.
SysCmdPInput/OutputSystem Command/Data Identifier Bus Parity
For the RM7000A, unused on input and zero on output.
Table 18 Clock/Control Interface
Pin NameTypeDescription
SysClockInputSystem clock
Master clock input used as the system interface reference clock. All
output timings are relative to this input clock. Pipeline operation
frequency is derived by multiplying this clock up by the factor selected
during boot initialization
VccPInputVcc for PLL
Quiet VccInt for the internal phase locked loop. Must be connected to
VccInt through a filt er circuit .
VssPInputVss for PLL
Quiet Vss for the internal phase locked loop. Must be connected to
VssInt through a filt er circuit .
Released
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 40
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Table 19 Tertiary Cache Interface
Pin NameTypeDescription
TcCLR*OutputTertiary Cache Block Clear
Requests that all vali d bits be cleared in the Tag RAMs. Many RAM s
may not support a block cle ar the refo r e the block clea r cap abi li ty is not
required for the cache to operate.
TcCWE*(1:0)OutputTertiary Cache Write Enable
Asserted to cause a write to the cache. Two identical signals are
provided to balance the c apa ci tiv e lo ad relative to the remaining c ac he
interface signals.
TcDCE*(1:0)OutputTertiary Cache Data RAM Chip Enable
When asserted this signal causes the data RAMs to read out their
contents. Two identical signals are provided to balance the capacitive
load relative to the remaining cache interface signals
TcDOE*InputTerti ary Cache Da ta RA M Output Enab le
When asserted this signal causes the data RAMs to drive data onto
their I/O pins. This signal is monitored by the processor to determine
when to drive the data RAM write enable in a tertiary cache miss refill
sequence.
TcLine(17:0)OutputTertiary Cache Line Index
TcMatchInputTertiary Cache Tag Match
This signal is asserted by the cache Tag RAMs when a match occurs
between the value on its da ta inputs and the co ntents of the addre ssed
location in the RAM.
TcTCE*OutputTertiary Cache Tag RAM Chip Enable
When asserted this signal will cause eith er a probe or a write of the Tag
RAMs depending on the state of the Tag RAMs write enable signal.
This signal is monitored by the external agent and indicates to it that a
tertiary cache access is occurring.
TcTDE*OutputTertiary Cache Tag RAM Data Enable
When asserted this signal causes the value on the data inputs of the
Tag RAM to be latched into the RAM. If a refill o f the RAM is n ecessar y,
this latched value will be written into the Tag RAM array. Latching the
Tag allows a shared address/data bus to be used without incurring a
penalty to re-present the Tag during the refill sequence.
TcTOE*OutputTertiary Cache Tag RAM Output Enable
When asserted t his signal causes the Tag RAMs to d rive dat a onto th eir
I/O pins.
TcWord(1:0)Input/OutputTertiary Cache Double Word Index
Driven by the processor on cache hits and by the external agent on
cache miss refills.
TcValidInput/OutputTertiary Cache Valid
This signal is driven by the processor as appropriate to make a cache
line valid or invalid. On Tag read operations the Tag RAM will drive this
signal to indicate the line state.
Released
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 41
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Table 20 Interrupt Interface
Pin NameTypeDescription
Int*(9:0)InputInterrupt
Ten general processor interrupts, bit-wise ORed with bits 9:0 of the
Non-maskable interrupt, O Red with bit 15 of the interrupt registe r (bit 6
in R5000 compatibility mode).
Table 21 JTAG Interface
Pin NameTypeDescription
JTDIInputJTAG data in
JTAG serial data in.
JTCKInputJTAG clock input
JTAG serial clock input.
JTDOOutputJTAG data out
JTAG serial data out.
JTMSInputJTAG command
JTAG command signal, signals that the incoming serial data is
command data.
Released
Table 22 Initialization Interface
Pin NameTypeDescription
BigEndianInputBig Endian / Little Endian Control
Allows the system to change the processor addressing mode without
rewriting the mode ROM.
VccOKInputVcc is OK
When asserted, this signal indicates to the RM7000A that the VccInt
power supply has been above the recommended value for more than
100 milliseconds and will remain stable. The assertion of VccOK
initiates the reading of the boot-time mode control serial stream.
ColdReset*InputCold Reset
This signal must be asserted for a power on reset or a cold reset.
ColdReset must be de-asserted synchronously with SysClock.
Reset*InputReset
This signal must be asserted for any reset sequence. It may be
asserted synchronously or asynchronously for a cold reset, or
synchronously to initiate a warm reset. Reset must be de-asserted
synchronously with SysClock.
ModeClockOutputBoot Mode Clock
Serial boot-mode data clock output at the system clock frequency
divided by two hundred and fifty six.
ModeInInputBoot Mode Data In
Serial boot-mode data input.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 42
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
6Absolute Maximum Ratings
SymbolRatingLimitsUnit
V
TERM
T
CASE
T
STG
I
IN
I
OUT
Notes
1. Stresses greater than those listed under ABSOLUTE MAXIMUM RATINGS may cause per-
manent damage to the device. This is a stress rating only and functional operation of the
device at these or any other conditions above those indicated in the operational sections of
this specification is not implied. Exposure to absolute maximum rating conditions for
extended p eriods may affect reliability.
2. V
minimum = -2.0 V for pulse width less than 15 ns. VIN should not exceed 3.9 Volts.
IN
3. When V
4. Not more than one output should be shorted at a time. Duration of the short should not
exceed 30 seconds.
Terminal Voltage with respect to VSS
Operating Temperature
Commercial
Industrial
Storage Temperature–55 to +125°C
DC Input Current
DC Output Current
< 0V or VIN > VccIO
IN
3
4
1
2
to +3.9
–0.5
0 to +85
–40 to +85
±20mA
±20mA
V
°C
°C
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 43
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
7Recommended Operating Conditions
GradeCPU SpeedTemperatureVss VccIntVccIOVccP
Commercial 300 - 350 MHz0°C to +85°C
(Case)
400 MHz0°C to +70°C
(Case)
Industrial350MHz-40°C to +85°C
(Case)
Notes
1. VccIO should not exceed VccInt by greater than 2.0 V during the power-up sequence.
2. Applying a logic high state to any I/O pin before VccInt becomes stable is not recommended.
3. As specified in IEEE 1149.1 (JTAG), the JTMS pin must be held high during reset to avoid
entering JTAG test mode. Refer to the RM7000A Family Users Manual, Appendix E.
4. VccP must be connected to VccInt through a p assive fil ter cir cuit. See RM70 00 Famil y User’s
Manual for recommended circuit.
0V1.65V ± 50 mV 3.3 V ± 150 mV or
0V1.8V ± 50 mV3.3 V ± 150 mV or
0V1.65 ± 50 mV3.3 V ± 150 mV or
2.5 V ± 200 mV
2.5 V ± 200 mV
2.5 V ± 200 mV
Released
1.65V ± 50
mV
1.8V ± 50 mV
1.65V ± 50
mV
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 44
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
8DC Electrical Characteristics
(VccIO = 3.15V - 3.45V)
ParameterMinimum Maximum Conditions
V
OL
V
OH
V
OL
V
OH
V
IL
V
IH
I
IN
(V
IO = 2.3V - 2.7V)
cc
VccIO - 0.2V
2.4V
-0.3V0.8V
2.0VVccIO + 0.3V
ParameterMinimum Maximum Conditions
V
OL
V
OH
V
OL
V
OH
V
OL
V
OH
V
IL
V
IH
I
IN
2.1V
2.0
1.7
-0.3V0.7V
1.7VVccIO + 0.3V
0.2V|I
0.4V|I
±15 µA
±15 µA
0.2V|I
0.4V|I
0.7V|I
±15 µA
±15 µA
OUT
OUT
VIN = 0
= VccIO
V
IN
OUT
OUT
OUT
VIN = 0
= VccIO
V
IN
Released
|= 100 µA
| = 2 mA
|= 100 µA
| = 1 mA
| = 2 mA
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 45
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
9Power Consumption
ParameterConditions
standby255300370
VccInt
Power
(mWatts)
active
Notes
1. Worst case s upply vo ltage (maxim um VccInt ) with wo rst ca se tempe rature ( maximu m TCase).
2. Dhrystone 2.1 instruction mix.
3. I/O supply power is application dependant, but typically <20% of VccInt.
Maximum with no FPU operation
Maximum worst case instruction
mix
CPU Speed
300 MHz350 MHz400 MHz
1
Max
2
235027503200
250030004000
Max
1
Max
Released
1
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 46
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
10AC Electrical Characteristics
10.1 Capacitive Load Deration
ParameterSymbolMinMaxUnits
Load DerateC
10.2 Clock Parameters
ParameterSymbol
SysClock Hight
SysClock Lowt
SysClock
Frequency
SysClock Periodt
Clock Jitter for
SysClock
SysClock Rise
Time
SysClock Fall
Time
ModeClock
Period
JTAG Clock
Period
Note:Operation of the RM7000A is only guaranteed with the Phase Lock Loop Enabled.
SCHigh
SCLow
SCP
t
JitterIn
t
SCRise
t
SCFall
t
ModeCKP
t
JTAGCKP
LD
CPU Speed
Test
Conditions
Transition ≤ 5ns333ns
Transition ≤ 5ns333ns
300 MHz350 MHz400 MHz
MinMaxMinMaxMinMax
33.310033.311733.13125MHz
Released
2ns/25pF
Units
10308.530830ns
±150±150±150ps
222ns
222ns
256256256t
444t
SCP
SCP
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 47
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
10.3 System Interface Parameters
Parameter1Symbol Test Conditions
mode14..13 = 10
Data Output
Data Setup
Data Hold
4
2,3
t
4
t
t
DO
DS
DH
6
(fastest)
mode14..13 = 01
(slowest)
t
= see above table
rise
t
= see above table
fall
Notes
1. Timings are measured from 0.425 x VccIO of clock to 0.425 x VccIO of signal for 3.3V I/O.
Timings are measured from 0.48 x VccIO of clock to 0.48 x VccIO of signal for 2.5V I/O.
2. Capacitive load for all maximum output timings is 50 pF. Minimum output timings are for
theoretical no load conditions - unt ested.
3. Data Output timing applies to all signal pins whether tristate I/O or output only.
4. Setup and Hold parameters apply to all signal pins whether tristate I/O or input only.
5. Only mode 14:13 = 10 is tested and guaranteed.
6. Data shown is for 3.3 V I/O. For 2.5 V I/O derate all times by .5 nS.