use. In any event, you cannot reproduce any part of this document, in any form, without the
express written consent of PMC-Sierra, Inc.
PMC-2010145 (P1)
Disclaimer
None of the information co ntained in this document co nst it ut es an express or implied warran ty by
PMC-Sierr a, Inc. as to the sufficiency, fitness or suitability for a particular pu r pose of any such
information or the fitness, or suitability for a particular purpose, merchantability, performance,
compatibility with other parts or systems, of any of the products of PMC-Sierra, Inc., or any
portion thereof, referred to in this document. PMC-Sierra, Inc. expressly disclaims all
representations and warranties of any kind regarding the contents or use of the information,
including, but not limited to, express and implied warranties of accuracy, completeness,
merchantability, fitness for a particular use, or non-infringement.
In no event will PMC-Sierra, Inc. be liable for any direct, indirect, special, incidental or
consequential damages, including, but not limited to, lost profits, lost business or lost data
resulting from any use of or reliance upon the information, whether or not PMC-Sierra, Inc. has
been advised of the possibility of such damage.
Trademarks
RM7000A and Fast Packet Cache are trademarks of PMC-Sierra, Inc.
Patents
The technology discussed is protected by one or more of the following Patents:
U.S. Patent Numbers 5,953,748 5,606,683 5,760,620.
Relevant patent applications and other patents may also exist.
Contacting PMC-Sierra
PMC-Sierra, Inc.
105-8555 Baxter Place Burnaby, BC
Canada V5A 4V7
•Fully static CMOS design with dynamic power down logic
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 9
Document ID: PMC-2010145, Issue 2
2Block Diagram
Figure 1 Block Diagram
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Preliminary
Secondary Tags
Set A
Primary Data Cache
4-way Set Associative
Store Buffer
Write Buffer
Read Buffer
D Bus
Floating-Point
Load/Align
Floating-Point
Register File
Packer/Unpacker
Comparator
Floating-Point
MultAdd, Add, Sub,
Cvt, Div, Sqrt
Multiplier Array
256KB Secondary Cache, 4-way Set Associative
Secondary Tags
Set B
DTag
DTLB
Floating-Point Control
Pad Buffer
Address Buffer
Joint TLB
Coprocessor 0
System/Memory
Control
PC Incrementer
Branch PC Adder
ITLB Virtual
Program Counter
ITag
ITLB
F-Pipe Bus
DVA
IVA
Primary Instruction Cache
Instruction Dispatch Unit
Integer Register File
M Pipe
Adder
StAIn/Sh
Logicals
FA Bus
DTLB Virtual
PLL/Clocks
4-way Set Associative
Pad BusA/D Bus
Prefetch Buffer
F Pipe Register
M Pipe Register
M-Pipe Bus
Load Aligner
F Pipe
Adder
Shifter
Logicals
Integer Control
Int Mult, Div, Madd
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 10
Document ID: PMC-2010145, Issue 2
3Description
PMC-Sierra’s RM7065A is a highly integrated symmetric superscalar microprocessor capable of
issuing two instructions each processor cycle. It has two high-performance 64-bit integer units as
well as a high-throughput, fully pipelined 64-bit floating point unit.
The RM7065A integrates 16 KB 4-way set associative instruction and data caches along with an
integrated 256 KB 4-way set associative secondary. The primary data and secondary caches are
write-back and non-blocking.
The memory management unit contains a 64/48-entry fully associative TLB and a 64-bit system
interface supporting multiple outstanding reads with out-of-order return and hardware prioritized
and vectored interrupts.
The RM7065A ideally suits high-end embedded control applications such as internetworking,
high-performance image manipulati on, high-sp eed print ing, and 3-D vi sualizati on. The RM7065A
is also applicable to the low end workstation market where its balanced integer and floating-point
performance provide outstanding price/performance.
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Preliminary
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 11
Document ID: PMC-2010145, Issue 2
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4Hardware Overview
The RM7065A offers a high-level of integration targeted at high-performance embedded
applications. The key elements of the RM7065A are described throughout this section.
4.1CPU Registers
The RM7065A CPU contains 32 general purpose registers (GPR), two special purpose registers
for integer multiplication and division, and a program counter; there are no condition code bits.
Figure 2 shows the user visible state.
Figure 2 CP0 Registers
General Purpose Registers
630
0630
r1HI
r2630
•LO
•
•
•630
r29PC
r30
r31
Preliminary
Multiply/Divide Registers
Program Counter
4.2Superscalar Dispatch
The RM7065A incorporates a superscalar dispatch unit that allows it to issue up to two
instructions per cycle. For purposes of instruction issue, the RM7065A defines four classes of
instructions: integer, load/store, branches, and floating-point. There are two logical pipelines, the
function, or F, pipeline and the memory, or M, pipeline. Note however that the M pip e ca n exe cut e
integer as well as memory type instruc tions.
Table 1 Instruction Issue Rules
F PipeM Pipe
one of:one of:
integer, branch, floating-point,
integer mul, div
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 12
Document ID: PMC-2010145, Issue 2
integer, load/store
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Preliminary
Figure 2 is a simplification of the pipeline section and illustrates the basics of the instruction issue
mechanism.
Figure 3 Instruction Issue Paradigm
Instruction
Cache
Dispatch
Unit
F Pipe IBus
M Pipe IBus
FP
F Pipe
The figure illustrates that one F pipe instruction and one M pipe instruction can be issued
concurrently but that two M pipe or two F pipe instructions cannot be issued. Table 2 specifies
more completely the instructions within each class.
T able 2 Dual Issue Instruction Classes
integerload/store
add, sub, or ,
xor, sh ift, etc .
4.3Pipeline
The logical length of both the F an d M pipel ines i s fiv e stages with st ate c ommitti ng in t he reg ister
write, or W, pipe stage. The physical length of the floating-point execution pipeline is actually
seven stag es but this is co mpletely transparent t o the user.
FP
M Pipe
lw, sw, ld, sd,
ldc1, sdc1,
mov, movc,
fmov, etc.
Integer
F Pipe
floatingpointbranch
fadd, fsub,
fmult, fm add,
fdiv, fcmp,
fsqrt, etc.
Integer
M Pipe
beq, bne,
bCzT, bCzF, j,
etc.
Figure 4 shows instruction execution within the RM7065A when instructions are issuing
simultaneously down both pipelines. As illustrated in the figure, up to ten instructions can be
executing simultaneou sly. This figure presents a somewhat simplistic view of the processors
operation since the out-of-order completion of loads, stores, and long latency floating-point
operations can result in there being even more instructions in process than what is shown.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 13
Document ID: PMC-2010145, Issue 2
Figure 4 Pipeline
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
Preliminary
I0
I1
I2
I3
I4
I5
I6
I72I1I1R2R1A2A1D2D1W2W
I8
I9
1I-1R:
2I:
2R:
1A:
1A:
1A-2A:
2A:
2A-2D:
1D:
2W:
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
Instruction cache access
Instruction virtual to physical address translation
Register file read, Bypass calculation, Instruction decode, Branch address calculation
Issue or slip decision, Branch decision
Data virtual address calculation
Integer add, logical, shift
Store Align
Data cache access and load align
Data virtual to physical address translation
Register file write
Note that instruction dependencies, resource conflicts, and branches may result in some of the
instruction slots being occupied by
4.4Integer Unit
The RM7065A implements the MIP S IV Instru ction Set Architect ure. Addit ionally, the RM7065A
includes two implementation specific i nst r u ct ion s not f ound in the baselin e MI PS I V I SA, b ut that
are useful in the embedded market place. These instructions are integer multiply-accumulate
(MAD) and three-operan d integer multiply (MUL).
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
2I1I1R2R1A2A1D2D1W2W
one cycle
NOPs.
The RM7065A integer unit includes thirty-two general purpose 64-bit registers, the HI/LO result
registers for two-operand integer multiply/divide operations, and the program counter, or PC.
There are two separate execution units, one of which can execute function (F) type instructions
and one which can e xecute memor y (M) type instruc tions. Ref er to Table 1 for the inst ruction issue
rules.
Note that integer multiply/divide instructions, as well as thei r correspond ing
MFHI and MFLO
instructions, can only be executed in the F type execution unit. Within each execution unit the
operational characteristics are the same as on previous MIPS designs with single cycle ALU
operations (add, sub, logical, shift), one cycle load delay, and an autonomous multiply/divide unit.
Register File
The RM7065A has thirty-two general purpose registers with register location 0 (r0) hard wired to
a zero value. Thes e regist ers are use d for scalar integer operatio ns and addr ess cal culation . In order
to service the two integer execution units, the register file has four read ports and two write ports
and is fully bypassed both within and between the two execution units to minimize operation
latency in the pipeline.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 14
Document ID: PMC-2010145, Issue 2
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
4.5ALU
The RM7065A has two complete integer ALUs each consisting of an integer adder/subtractor, a
logic unit, and a shifter. Table 3 shows the functions performed by the ALUs for each execution
unit. Each of these units is optimized to perform all operations in a single processor cycle.
Table 3 ALU Operations
UnitF PipeM Pipe
Adderadd, subadd, sub, data address
Logiclogic, moves, zero shifts
(nop)
Shifternon zero shiftnon zero shift, store
4.6Integer Multiply/Divide
The RM7065A has a single dedicated integer multiply/divide unit optimized for high-speed
multiply and multiply-accumulate operations. The multiply/divide unit resides in the F type
execution unit. Table 4 shows the performance of the multiply/divide unit on each operation.
Preliminary
add
logic, moves, zero shifts
(nop)
align
Table 4 Integer Multiply/Divide Operations
Operand
Opcode
MULT/U,
MAD/U
MUL
DMULT,
DMUL TU
DIV, DIVDany 36360
DDIV,
DDIVU
SizeLatency
16 bit430
32 bit540
16 bit432
32 bit543
any980
any68680
Repeat
Rate
Stall
Cycles
The baseline MIPS IV ISA specifies that the results of a multiply or divide operation be placed in
the Hi and Lo registers. These values can then be transferred to the general purpose register file
using the Move-from-Hi and Move-from-Lo (
MFHI/MFLO) instru ctions.
In addition to the baseline MIPS IV integer multiply instructions, the RM7065A also implements
the 3-operand multiply instruction,
MUL. This instruction specifies that the multiply result go
directly to the integer register file rather than the Lo register. The portion of the multiply that
would have normally gone i nto the Hi re gister i s discard ed. For applicat ions where i t is known tha t
the upper half of the mul tiply result is not require d, using the
necessity of executing an explicit
MFLO instruction.
MUL instruction eliminates the
The multiply-add instructions,
MAD and MADU, multiply two operands and add the resulting
product to the current contents of th e Hi and Lo registers. The multiply-accumulate operation is
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 15
Document ID: PMC-2010145, Issue 2
RM7065A™ Microprocessor with On-Chip Secondary Cache Data Sheet
the core primitive of almost all signal processing algorithms. Therefore, using the RM7065A
eliminates the need for a separate DSP engine in many embedded applications.
4.7Floating-Point Coprocessor
The RM7065A incorporates a high-performance fully pipelined floating-point coprocessor which
includes a floating-po int register file and autonomous execution units for multiply/a dd/convert and
divide/square root. The floating-point coprocessor is a tightly coupled execution unit, decoding
and executing instructions in parallel with, and in the case of floating-point loads and stores, in
cooperation with the M pipe of the integer unit. The superscalar capabilities of the RM7065A
allow floating-point computation instructions to issue concurrently with integer instructions.
4.8Floating-Point Unit
The RM7065A floating-point execution unit supports single and double precision arithmetic, as
specified in the IEEE S tanda rd 754. The ex ecution uni t is broken i nto a separa te divide /square ro ot
unit and a pipelined multiply/add unit. Overlap of divide/square root and multiply/add is
supported.
The RM7065A maintains fully precise floating-point exceptions while allowing both overlapped
and pipelined operations. Precise exceptions are extremely important in object-oriented
programming environments and highly desirable for debugging in any environment.
Preliminary
Floating-point operations include:
•add
•subtract
•multiply
•divide
•square root
•reciprocal
•reciprocal square root
•conditional moves
•conversion between fixed-point and floating-point format
•conversion between floating-point formats
•floating-point compare
Table 5 gives the latencies of the floating-point instructions in internal processor cycles.
Proprietary and Confidential to PMC-Sierra, Inc and for its Customer’s Internal Use 16
Document ID: PMC-2010145, Issue 2
Loading...
+ 36 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.