any event, you cannot reproduce any part of this document, in any form, without the express written
consent of PMC-Sierra, Inc.
PMC-2002227 (R2)
Disclaimer
None of the information contained in this document constitutes an express or implied warranty by PMCSierra, Inc. as to the sufficiency, fitness or suitability for a particular purpose of any such information or the
fitness, or suitability for a particular purpose, merchantability, performance, compatibility with other parts
or systems, of an y of t he pr oducts of PMC-Si erra , Inc., or an y port io n ther eof, r efer red to i n this document .
PMC-Sierra, Inc. expressly disclaims all re presentations and war ra nties of any kind rega rdi ng the contents
or use of the information, including, but not limited to, express and implied warranties of accuracy,
completeness, merchantability, fitness for a particular use, or non-infringement.
In no event will PMC-Sierra, Inc. be liable for any direct, indirect, special, incidental or consequential
damages, including, but not limited to, lost profits, lost business or lost data resulting from any use of or
reliance upon the infor ma tion, whether or not PMC-Sierra, Inc . has been a dvised of the possibility of such
damage.
Trademarks
RM7000A and Fast Packet Cache are trademarks of PMC-Sierra, Inc.
Patents
The technology discussed is protected by one or more of the following Patents.
U.S. Patent Numbers
Relevant patent applications and other patents may also exist.
5,953,748, 5,953,748, 5,953,74 8
Contacting PMC-Sierra
PMC-Sierra, Inc.
8555 Baxter Place Burnaby, BC
Canada V5A 4V7
•Fully static CMOS design with dynamic power down logic
•RM5271 pin compatible, 304 pin TBGA package, 31x31 mm
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 9
Document ID: PMC-2002227, Issue 2
2Block Diagram
Figure 1 Block Diagram
Secondary Tags
Set A
Primary Data Cache
4-way Set Associative
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Extenal Cache Controller
On-chip 256K Byte Secondary Cache, 4-way Set Associative
Secondary Tags
Set B
DTag
DTLB
Secondary Tags
Set C
ITag
ITLB
Secondary Tags
Set D
Primary Instruction Cache
4-way Set Associative
A/D Bus
Pad Bus
Store Buffer
Write Buffer
D Bus
Floating-Point
Load/Align
Floating-Point
Register File
Packer/Unpacker
Comparator
Floating-Point
MultAdd, Add, Sub,
Cvt, Div, Sqrt
Multiplier Array
Read Buffer
Coprocessor 0
System/Memory
Control
PC Incrementer
Floating-Point Control
Branch PC Adder
ITLB Virtual
Program CounterInt Mult, Div, Madd
Pad Buffer
Joint TLB
Address Buffer
IVA
F-Pipe Bus
DVA
Integer Register File
Adder
StAln/Sh
Logicals
FA Bus
DTLB Virtual
PLL/Clocks
Prefetch Buffer
Instruction Dispatch Unit
F Pipe Register
M Pipe Register
M-Pipe Bus
Load Aligner
F PipeM Pipe
Adder
Shifter
Logicals
Integer Control
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 10
Document ID: PMC-2002227, Issue 2
3Description
PMC-Sierra’s RM7000A is a highly integrated symmetric superscalar microprocessor capable of
issuing two instructions each processor cycle. It has two high-performance 64-bit integer units as
well as a high-throughput, fully pipelined 64-bit floating point unit.
The RM7000A integrates 16 KB 4-way set associative instruction and data caches along with an
integrated 256 KB 4-way set associative secondary. The primary data and secondary caches are
write-back and non-blocking. An optional external tertiary cache provides high-performance
capability even in app lications with very large data sets.
The memory management unit contains a 64/48-entry fully associative TLB and a 64-bit system
interface supporting multiple outstanding reads with out-of-order return and hardware prioritized
and vectored interrupts.
The RM7000A ideally suits high-end embedded control applications such as internetworking,
high-performance image manipulati on, high-sp eed print ing, and 3-D vi sualizati on. The RM7000A
is also applicable to the low end workstation market where its balanced integer and floating-point
performance and direct support for a large tertiary cache (up to 8 MB) provide outstanding price/
performance.
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 11
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4Hardware Overview
The RM7000A offers a high-level of integration targeted at high-performance embedded
applications. The key elements of the RM7000A are described throughout this section.
4.1CPU Registers
The RM7000A CPU contains 32 general purpose registers (GPR), two special purpose registers
for integer multiplication and division, and a program counter; there are no condition code bits.
Figure 2 shows the user visible state.
Figure 2 CP0 Registers
General Purpose Registers
630
0630
r1HI
r2630
•LO
•
•
•630
r29PC
r30
r31
Released
Multiply/Divide Registers
Program Counter
4.2Superscalar Dispatch
The RM7000A incorporates a superscalar dispatch unit that allows it to issue up to two
instructions per cycle. For purposes of instruction issue, the RM7000A defines four classes of
instructions: integer, load/store, branches, and floating-point. There are two logical pipelines, the
function, or F, pipeline and the memory, or M, pipeline. Note however that the M pip e ca n exe cut e
integer as well as memory type instru ctions.
Table 1 Instruction Issue Rules
F PipeM Pipe
one of:one of:
integer, branch, floating-point,
integer mul, div
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 12
Document ID: PMC-2002227, Issue 2
integer, load/store
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
Figure 2 is a simplification of the pipeline section and illustrates the basics of the instruction issue
mechanism.
Figure 3 Instruction Issue Paradigm
Instruction
Cache
Dispatch
Unit
F Pipe IBus
M Pipe IBus
FP
F Pipe
The figure illustrates that one F pipe instruction and one M pipe instruction can be issued
concurrently but that two M pipe or two F pipe instructions cannot be issued. Table 2 specifies
more completely the instructions within each class.
T able 2 Dual Issue Instruction Classes
integerload/store
add, sub, or ,
xor, sh ift, etc .
4.3Pipeline
The logical length of both the F an d M pipel ines i s fiv e stages with st ate c ommitti ng in t he reg ister
write, or W, pipe stage. The physical length of the floating-point execution pipeline is actually
seven stag es but this is completely transparent to the user.
FP
M Pipe
lw, sw, ld, sd,
ldc1, sdc1,
mov, movc,
fmov, etc.
Integer
F Pipe
floatingpointbranch
fadd, fsub,
fmult, fm add,
fdiv, fcmp,
fsqrt, etc.
Integer
M Pipe
beq, bne,
bCzT, bCzF, j,
etc.
Figure 4 shows instruction execution within the RM7000A when instructions are issuing
simultaneously down both pipelines. As illustrated in the figure, up to ten instructions can be
executing simultaneously. This figure pres ents a somewhat simplistic view of the processors
operation since the out-of-order completion of loads, stores, and long latency floating-point
operations can result in there being even more instructions in process than what is shown.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 13
Document ID: PMC-2002227, Issue 2
Figure 4 Pipeline
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Released
I0
I1
I2
I3
I4
I5
I6
I72I1I1R2R1A2A1D2D1W2W
I8
I9
1I-1R:
2I:
2R:
1A:
1A:
1A-2A:
2A:
2A-2D:
1D:
2W:
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
Instruction cache access
Instruction virtual to physical address translation
Register file read, Bypass calculation, Instruction decode, Branch address calculation
Issue or slip decision, Branch decision
Data virtual address calculation
Integer add, logical, shift
Store Align
Data cache access and load align
Data virtual to physical address translation
Register file write
Note that instruction dependencies, resource conflicts, and branches may result in some of the
instruction slots being occupied by
4.4Integer Unit
The RM7000A implements the MIP S IV Instru ction Set Architect ure. Addit ionally, the RM7000A
includes two implementation specific i nst r u ct ion s not f ound in the baselin e MI PS I V I SA, b ut that
are useful in the embedded market place. These instructions are integer multiply-accumulate
(MAD) and three-operand integer m ultiply (MUL).
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
one cycle
NOPs.
The RM7000A integer unit includes thirty-two general purpose 64-bit registers, the HI/LO result
registers for two-operand integer multiply/divide operations, and the program counter, or PC.
There are two separate execution units, one of which can execute function (F) type instructions
and one which can e xecute memor y (M) type instruc tions. Ref er to Table 1 for the inst ruction issue
rules.
Note that integer multip ly/divide instructions, as well as their corresponding
MFHI and MFLO
instructions, can only be executed in the F type execution unit. Within each execution unit the
operational characteristics are the same as on previous MIPS designs with single cycle ALU
operations (add, sub, logical, shift), one cycle load delay, and an autonomous multiply/divide unit.
Register File
The RM7000A has thirty-two general purpose registers with register location 0 (r0) hard wired to
a zero value. Thes e regist ers are use d for scalar integer operatio ns and addr ess cal culation . In order
to service the two integer execution units, the register file has four read ports and two write ports
and is fully bypassed both within and between the two execution units to minimize operation
latency in the pipeline.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 14
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4.5ALU
The RM7000A has two complete integer ALUs each consisting of an integer adder/subtractor, a
logic unit, and a shifter. Table 3 shows the functions performed by the ALUs for each execution
unit. Each of these units is optimized to perform all operations in a single processor cycle.
Table 3 ALU Operations
UnitF PipeM Pipe
Adderadd, subadd, sub, data address
Logiclogic, moves, zero shifts
(nop)
Shifternon zero shiftnon zero shift, store
4.6Integer Multiply/Divide
The RM7000A has a single dedicated integer multiply/divide unit optimized for high-speed
multiply and multiply-accumulate operations. The multiply/divide unit resides in the F type
execution unit. Table 4 shows the performance of the multiply/divide unit on each operation.
Released
add
logic, moves, zero shifts
(nop)
align
Table 4 Integer Multiply/Divide Operations
Operand
Opcode
MULT/U,
MAD/U
MUL
DMULT,
DMUL TU
DIV, DIVDany 36360
DDIV,
DDIVU
SizeLatency
16 bit430
32 bit540
16 bit432
32 bit543
any980
any68680
Repeat
Rate
Stall
Cycles
The baseline MIPS IV ISA specifies that the results of a multiply or divide operation be placed in
the Hi and Lo registers. These values can then be transferred to the general purpose register file
using the Move-from-Hi and Move-from-Lo (
MFHI/MFLO) instru ctions.
In addition to the baseline MIPS IV integer multiply instructions, the RM7000A also implements
the 3-operand multiply instruction,
MUL. This instruction spec ifies that the multiply re sult go
directly to the integer register file rather than the Lo register. The portion of the multiply that
would have normally gone i nto the Hi re gister i s discard ed. For applicat ions where i t is known tha t
the upper half of the multiply result is not required, using the
necessity of executing an explicit
MFLO instruction.
MUL instruction eliminates the
The multiply-add instructions,
MAD and MADU, multiply two ope rands and add the resulting
product to the current contents of the Hi and Lo registers. The multip ly-accumulate operat ion is
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 15
Document ID: PMC-2002227, Issue 2
RM7000A™ Microprocessor with On-Chip Secondary Cache Data Sheet
the core primitive of almost all signal processing algorithms. Therefore, using the RM7000A
eliminates the need for a separate DSP engine in many embedded applications.
4.7Floating-Point Coprocessor
The RM7000A incorporates a high-performance fully pipelined floating-point coprocessor which
includes a floating-po int register file and autonomous execution units for multiply/a dd/convert and
divide/square root. The floating-point coprocessor is a tightly coupled execution unit, decoding
and executing instructions in parallel with, and in the case of floating-point loads and stores, in
cooperation with the M pipe of the integer unit. The superscalar capabilities of the RM7000A
allow floating-point computation instructions to issue concurrently with integer instructions.
4.8Floating-Point Unit
The RM7000A floating-point execution unit supports single and double precision arithmetic, as
specified in the IEEE S tanda rd 754. The ex ecution uni t is broken i nto a separa te divide /square ro ot
unit and a pipelined multiply/add unit. Overlap of divide/square root and multiply/add is
supported.
The RM7000A maintains fully precise floating-point exceptions while allowing both overlapped
and pipelined operations. Precise exceptions are extremely important in object-oriented
programming environments and highly desirable for debugging in any environment.
Released
Floating-point operations include:
•add
•subtract
•multiply
•divide
•square root
•reciprocal
•reciprocal square root
•conditional moves
•conversion between fixed-point and floating-point format
•conversion between floating-point formats
•floating-point compare
Table 5 gives the latencies of the floating-point instructions in internal processor cycles.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 16
Document ID: PMC-2002227, Issue 2
Loading...
+ 37 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.