use. In any event, you cannot reproduce any part of this document, in any form, without the
express written consent of PMC-Sierra, Inc.
PMC-2002175 (R1)
Disclaimer
None of the information co ntained in this document co nst it ut es an express or implied warran ty by
PMC-Sierr a, Inc. as to the sufficiency, fitness or suitability for a particular pu r pose of any such
information or the fitness, or suitability for a particular purpose, merchantability, performance,
compatibility with other parts or systems, of any of the products of PMC-Sierra, Inc., or any
portion thereof, referred to in this document. PMC-Sierra, Inc. expressly disclaims all
representations and warranties of any kind regarding the contents or use of the information,
including, but not limited to, express and implied warranties of accuracy, completeness,
merchantability, fitness for a particular use, or non-infringement.
In no event will PMC-Sierra, Inc. be liable for any direct, indirect, special, incidental or
consequential damages, including, but not limited to, lost profits, lost business or lost data
resulting from any use of or reliance upon the information, whether or not PMC-Sierra, Inc. has
been advised of the possibility of such damage.
Trademarks
RM7000 and Fast Packet Cache are trademarks of PMC-Sierra, Inc.
Contacting PMC-Sierra
PMC-Sierra, Inc.
105-8555 Baxter Place Burnaby, BC
Canada V5A 4V7
•I&D Test/Break-point (Watch) registers for emulation & debug
•Performance counter for system and software tuning & debug
•Fourteen fully prioritiz ed vectored i nterrupts - 10 external, 2 internal, 2 software
•Fully static CMOS design with dynamic power down logic
•RM5271 pin compatible, 304 pin TBGA package, 31x31 mm
MAD/MADU) and three-
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 9
Document ID: PMC-2002175, Issue 1
2Block Diagram
Figure 1 Block Diagram
Secondary Tags
Set A
Primary Data Cache
4-way Set Associative
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
Released
Extenal Cache Controller
On-chip 256K Byte Secondary Cache, 4-way Set Associative
Secondary Tags
Set B
DTag
DTLB
Secondary Tags
Set C
ITag
ITLB
Secondary Tags
Set D
Primary Instruction Cache
4-way Set Associative
A/D Bus
Pad Bus
Store Buffer
Write Buffer
D Bus
Floating-Point
Load/Align
Floating-Point
Register File
Packer/Unpacker
Comparator
Floating-Point
MultAdd, Add, Sub,
Cvt, Div, Sqrt
Multiplier Array
Read Buffer
Coprocessor 0
System/Memory
Control
PC Incrementer
Floating-Point Control
Branch PC Adder
ITLB Virtual
Program CounterInt Mult, Div, Madd
Pad Buffer
Joint TLB
Address Buffer
IVA
F-Pipe Bus
DVA
Integer Register File
Adder
StAln/Sh
Logicals
FA Bus
DTLB Virtual
PLL/Clocks
Prefetch Buffer
Instruction Dispatch Unit
F Pipe Register
M Pipe Register
M-Pipe Bus
Load Aligner
F PipeM Pipe
Adder
Shifter
Logicals
Integer Control
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 10
Document ID: PMC-2002175, Issue 1
3Description
PMC-Sierra’s RM7000 is a highly integrated symmetric superscalar microprocessor capable of
issuing two instructions each processor cycle. It has two high-performance 64-bit integer units as
well as a high-throughput, f ully pi peline d 64-bit float ing point unit. To keep its mul tiple executi on
units running efficiently, the RM7000 integrates not only 16 KB 4-way set associative instruction
and data caches but backs them up with an integrated 256 KB 4-way set associative secondary as
well. For maximum effici ency, the data an d secondary cache s are write-back an d non-blocking. An
optional external tertiary cache provides high-performance capability even in applications having
very large data sets.
A RM5200 Family compatible, operating system friendlymemory management unit with a 64/48entry fully associative TLB and a high-performance 64-bit system interface supporting multiple
outstanding reads with out-of-order return and hardware prioritized and vectored interrupts round
out the main features of the processor.
The RM7000 is ideally suited for high-end embedded control applications such as
internetworking, high-performance image manipulation, high-speed printing, and 3-D
visualization. The RM7000 is also applicable to the low end workstation market where its
balanced integer and fl oati ng-poi nt per formanc e and di rect suppor t for a large tertiar y cache (up t o
8 MB) provide outstanding price/performance.
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
Released
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 11
Document ID: PMC-2002175, Issue 1
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
4Hardware Overview
The RM7000 offers a high-level of integration targeted at high-performance embedded
applications. The key elements of the RM7000 are briefly described below.
4.1CPU Registers
Like all MIPS ISA processors, the RM7000 CPU has a simple, clean user visible state consisting
of 32 general pu rpo se registers (GPR), two special purpose r egi sters for integer mul ti pl ic ati on and
division, and a program counter; there are no condition code bits. Figure 2 shows the user visible
state.
Figure 2 CP0 Registers
General Purpose Registers
630
0630
r1HI
r2630
•LO
•
•
•630
r29PC
r30
r31
Released
Multiply/Divide Registers
Program Counter
4.2Superscalar Dispatch
The RM7000 has an efficient symmetric superscalar dispatch unit which allows it to issue up to
two instructions per cycle. For purposes of instruction issue, the RM7000 defines four classes of
instructions: integer, load/store, branches, and floating-point. There are two logical pipelines, the
function, or F, p ipe li ne and the memory, or M, pipeline. Note however that the M pip e ca n execute
integer as well as memory type instruc tions.
Table 1 Instruction Issue Rules
F PipeM Pipe
one of:one of:
integer, branch, floating-point,
integer mul, div
Figure 3 is a simplification of the pipeline section and illustrates the basics of the instruction issue
mechanism.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 12
Document ID: PMC-2002175, Issue 1
integer, load/store
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
Figure 3 Instruction Issue Paradigm
Instruction
Cache
Dispatch
Unit
F Pipe IBus
M Pipe IBus
Released
FP
F Pipe
The figure illustrates that one F pipe instruction and one M pipe instruction can be issued
concurrently but that two M pipe or two F pipe instructions cannot be issued. Table 2 specifies
more completely the instructions within each class.
T able 2 Dual Issue Instruction Classes
integerload/storefloating-pointbranch
add, sub, or , xor , shift,
etc.
The symmetric superscalar capability of the RM7000, in combination with its low latency integer
execution units and high-throughput fully pipelined floating-point execution unit, provides
unparalleled price/performance in computational intensive embedded applications.
4.3Pipeline
The logical length of both the F an d M pipel ines i s fiv e stages with st ate c ommitti ng in t he reg ister
write, or W, pipe stage. The physical length of the floating-point execution pipeline is actually
seven stag es but this is co mpletely transparent t o the user.
FP
M Pipe
lw, sw, ld, sd, ldc1,
sdc1, mov, movc,
fmov, etc.
Integer
F Pipe
Integer
M Pipe
fadd, fsub, fmult,
fmadd, fdiv, fcmp,
fsqrt, etc.
beq, bne, bCzT,
bCzF, j, etc.
Figure 4 shows instruction execution within the RM7000 when instructions are issuing
simultaneously down both pipelines. As illustrated in the figure, up to ten instructions can be
executing simultaneou sly. This figure presents a somewhat simplistic view of the processors
operation however since the out-of-order completion of loads, stores, and long latency floatingpoint operations can res ult in there be ing even more instructions in process than what is shown.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 13
Document ID: PMC-2002175, Issue 1
Figure 4 Pipeline
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
Released
I0
I1
I2
I3
I4
I52I1I1R2R1A2A1D2D1W2W
I6
I72I1I1R2R1A2A1D2D1W2W
I8
I92I1I1R2R1A2A1D2D1W2W
1I-1R:
2I:
2R:
1A:
1A:
1A-2A:
2A:
2A-2D:
1D:
2W:
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
Instruction cache access
Instruction virtual to physical address translation
Register file read, Bypass calculation, Instruction decode, Branch address calculation
Issue or slip decision, Branch decision
Data virtual address calculation
Integer add, logical, shift
Store Align
Data cache access and load align
Data virtual to physical address translation
Register file write
Note that instruction dependencies, resource conflicts, and branches result in some of the
instruction slots being occupied by
4.4Integer Unit
Like the RM5200 Fcamily, the RM7000 implements the MIPS IV Instruction Set Architecture,
and is therefore fully upward compatible with applications that run on processors such as the
R4650 and R4700 that implement the earlier generation MIPS III Instruction Set Architecture.
Additionally, the RM7000 includes two implementation specific instructions not found in the
baseline MIPS IV ISA, but that are useful in the embedded market place. Described in detail in a
later sectio n, these instructions are integer multiply-accumulate and three-operand integer
multiply.
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
one cycle
NOPs.
The RM7000 integer unit includes thirty-two general purpose 64-bit registers, the HI/LO result
registers for the two-operand integer multiply/divide operations, and the program counter, or PC.
There are two separate execution units, one of which can execute function, or F, type instructions
and one which can execute memory, or M, type instructions. See above for a description of the
instruction types and the issue rules. As a special case, integer multiply/divide instructions as well
as their corresponding
MFHI and MFLO instructions can only be executed in the F type
execution unit. Within each execution unit the operational characteristics are the same as on
previous MIPS designs with single cycle ALU operations (add, sub, logical, shift), one cycle load
delay, and an autonomous multiply/divide unit.
Register File
The RM7000 has thirty-two general purpose registers with register location 0 (r0) hard wired to a
zero value. These registers are used for scalar integer operations and address calculation. In order
to service the two integer execution units, the register file has four read ports and two write ports
and is fully bypassed both within and between the two execution units to minimize operation
latency in the pipeline.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 14
Document ID: PMC-2002175, Issue 1
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
4.5ALU
The RM7000 has two complete integer ALUs each consisting of an integer adder/subtractor, a
logic unit, and a shifter. Table 3 shows the functions performed by the ALUs for each execution
unit. Each of these units is optimized to perform all operations in a single processor cycle.
Table 3 ALU Operations
UnitF PipeM Pipe
Adderadd, subadd, sub, data address
Logicl ogic, moves, zero shifts
(nop)
Shifternon zero shiftnon zero shift, store align
4.6Integer Multiply/Divide
The RM7000 has a si ngle dedi cated i nteger mul tiply/d ivide un it opti mized for high-sp eed multi ply
and multiply-accumulate operations. The multiply/divide unit resi des in the F type execution uni t.
Table 4 shows the performance of the multiply/divide unit on each operation.
Released
add
logic, moves, zero shifts
(nop)
Table 4 Integer Multiply/Divide Operations
Operand
Opcode
MULT/U,
MAD/U
MUL16 bit432
DMULT,
DMUL TU
DIV, DIVDany 36360
DDIV,
DDIVU
SizeLatency
16 bit430
32 bit540
32 bit543
any980
any68680
Repeat
Rate
Stall
Cycles
The baseline MIPS IV ISA specifies that the results of a multiply or divide operation be placed in
the Hi and Lo registers. These values can then be transferred to the general purpose register file
using the Move-from-Hi and Move-from-Lo (
MFHI/MFLO) instructions.
In addition to the baselin e MIPS IV integer multip ly instructi ons, the RM7000 also imple ments the
3-operand multipl y instr uction, MUL. This instruction sp ecifies that t he mult iply res ult go d irectly
to the integer register file rather than the Lo register. The portion of the multiply that would have
normally gone into the Hi register is discarded. For applications where it is known that the upper
half of the multiply result is not required, using the
executing an explicit
MFLO instruction.
MUL instruction eliminates t he necessity of
Also included in the RM7000 are the multiply-add instructions
MAD/MADU. This instruction
multiplies two operands and adds the resulting product to the current contents of the Hi and Lo
registers. The multiply-accumulate operation is the core primitive of almost all signal processing
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 15
Document ID: PMC-2002175, Issue 1
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
algorithms allowing the RM7000 to eliminate the need for a separate DSP engine in many
embedded applications.
By pipelining the multipl y- acc umulate function and dynamica ll y determining the size of the input
operands, the RM7000 is able to maximize throughput while still using an area efficient
implementation.
4.7Floating-Point Coprocessor
The RM7000 incorporates a high-performance fully pipelined floating-point coprocessor which
includes a floating-po int register file and autonomous execution units for multiply/a dd/convert and
divide/square root . The f loati ng-poi nt cop roc essor is a tight ly coup le d co-e xecuti on unit , d ecodin g
and executing instructions in parallel with, and in the case of floating-point loads and stores, in
cooperation with the M pipe of the int eger unit. As described earlier, the superscalar capabilities of
the RM7000 allow floating-point computation instructions to issue concurrently with integer
instructions.
4.8Floating-Point Unit
The RM7000 floating-point execution unit supports single and double precision arithmetic, as
specified in the IEEE S tanda rd 754. The ex ecution uni t is broken i nto a separa te divide /square ro ot
unit and a pipelined multiply/add unit. Overlap of divide/square root and multiply/add is
supported.
Released
The RM7000 maintains fully precise floating-point exceptions while allowing both overlapped
and pipelined operations. Precise exceptions are extremely important in object-oriented
programming environments and highly desirable for debugging in any environment.
The floating-point unit’s operation set includes floating-point add, subtract, multiply, multiply-
add, divide, square roo t, recipr ocal, rec iprocal squa re root, c ondition al moves, conversio n between
fixed-point and floating-point format, conversion between floating-point formats, and floatingpoint compare. Table 5 gives the latencies of the floating-point instructions in internal processor
cycles.
4.9Floating-Point General Register File
The floating-point general register file, FGR, is made up of thirty-two 64-bit registers. With the
floating-point load and store double instructions,
take advantage of the 64-bit wide data cache and issue a floating-point coprocessor load or store
doubleword instruction in every cycle.
The floating-point control register file contains two registers; one for determining configuration
and revision information for the coprocessor and one for control and status information. These
registers are primar ily used f or diagnost ic software , exception handling, st ate savi ng and resto ring,
and control of rounding modes.
LDC1 and SDC1, the floating-point unit can
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 16
Document ID: PMC-2002175, Issue 1
RM7000™ Microprocessor with On-Chip Secondary Cache Datasheet
To support superscalar operations, the FGR has four read ports and two write ports, and is fully
bypassed to minimize operation latency in the pipeline. Three of the read ports and one write port
are used to support the combined multiply-add instruction while the fourth read and second write
port allows a concurrent floating-point load or store and conditional moves.
4.10 System Control Coprocessor (CP0)
The system control copr ocessor (CP0) in the MIPS architecture is responsible for the virtual
memory sub-system, th e exception control sys tem, and the diagnost i cs capability of the p roc ess or.
In the MIPS architecture, the system control coprocessor (and thus the kernel s oftware) is
implementation dependent. For memory management, the RM7000 CP0 is logically identical to
that of the RM5200 Family and R5000. For interrupt ex ceptions and diagnosti cs, the RM7000 is a
superset of the RM5200 Family and R5000 implementi ng addition al feature s described later in the
sections on Interrupts, the Test/Breakpoint facility, and the Performance Counter facility.
The memory management unit co ntrol s the virtu al memory syste m page mapping . It co nsist s of a n
instructio n address translation bu ffer (ITLB), a data address translation b uffer (DTLB), a Joint
TLB (JTLB), and coprocessor registers used by the virtual memory mapping sub-system.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 17
Document ID: PMC-2002175, Issue 1
Loading...
+ 37 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.