– 260 Dhrystone MIPS at 200MHz
– 100 peak MFLOP/s at 200MHz
– Two-way set associative caches
– Simple 5-stage pipeline
◆
High level of integration
– 64-bit, 200 MHz integer CPU
– 64-bit floating-point unit
– 16KB instruction cache
– 16KB data cache
– Flexible MMU with large, fully associative TLB
◆
Low-power operation
– 3.3V power supply, for the “RV” part
– 5V power supply, for the “R” part
– Dynamic power management
– Standby mode reduces internal power
◆
Fully software & pin-compatible with 40XX Processor Family
◆
Available in 179-pin PGA or 208-pin QFP
IDT79R4700
◆
Available at 80-200MHz, with mode bit dependent output
clock frequencies
◆
64GB physical address space
◆
Processor family for a wide variety of embedded
applications
– LAN switches
– Routers
– Color printers
Description
Description
DescriptionDescription
The IDT79R4700 64-bit RISC Microprocessor is both software and
pin-compatible with the R4
capabilities, the R4700 provides more computational power and data
movement bandwidth than is delivered to typical embedded systems by
32-bit processors.
The R4700 is upwardly software compatible with the IDT79R3000
microprocessor family, including the IDTRISController™ 79R3051™,
™
R3052
475
, R3041™, R3081™ as well as the R4640™, R4650™, RC64474/
™
and R5000™. An array of development tools facilitates rapid
development of R4700-based systems, allowing a v ariety of customers
access to the MIPS Open Architecture philosophy.
processor family. With 64-bit processing
XXX
™
Block Diagram
Block Diagram
Block DiagramBlock Diagram
SysAD
Instruction Tag A
Instruction Tag B
Coprocessor 0
System/Memory
Data Tag A
DTLB Ph ysica l
Data Tag B
Address Buffer
ITLB Ph ysical
AuxTag
Tag
Joint TLB
Control
DVA
IVA
Data Set A
Store Bu ffer
Write Buffer
Read Buffer
Data Set B
DBus
Control
Floating-point
Register File
Unpacker/Packer
Floating-point
Add/Sub/Cvt/Div/Sqrt
Integer Divid e
Floating-point/Integer
Mult i ply
Phase Lock Loop, Clocks
The IDT logo is a registered trademark and RC32134, RC32364, RC6414 5, RC64474, RC64475, RC4650, RC4640, RC4600,RC4700 RC3081, RC3052, RC3051, RC3041, RISController, and RISCore are trademarks of Integrated Device Technology, Inc.
Floating-point Control
Instruction Set A
Instruction Select
Instruction Register
Instruction Set B
IBus
Load Aligner
Integer Register File
Integer/Address Adder
Data TLB Virtual
Shifter/Store Aligner
Logic Un it
PC Incrementer
Branch Adder
Instruction TLB Virtual
Program Counter
Integer Control
2001 Integrated Device Technology, Inc .
1 of 25April 10, 2001
DSC 9096
IDT79R4700
4
This data sheet provides an overview of the R4700’s CPU features
and architecture. A more detailed description of this processor is
provided in the IDT79R4700 RISC Processor Hardware User ’s Manual,
available from Integrated Device Technology (IDT). Information on
development support, applications notes and complementary products
is available on the IDT Web site www.idt.com or through your local IDT
sales representative.
Note: Throughout this data sheet and any other IDT materials for this
device, the R4700 indicates a 5V part; RV4700 designates a reduced
voltage (3V) part; and the RC4700 reflects either.
resource dependencies are made transparent to the programmer,
insuring transportability among implementations of the MIPS instruction
set architecture.
The MIPS integer unit implements a load/store architecture with
single cycle ALU operations (logical, shift, add, sub) and an autonomous multiply/divide unit. Register resources include:
◆
32 general-purpose orthogonal integer registers
◆
HI/LO result registers, for the integer multiply/divide unit
◆
Program counter
Also, the on-chip floating-point co-processor adds 32 floating-point
registers and a floating-point control/status register.
Count
9*
Status
12*
EPC
14*
Context
4*
BadVAddr
8*
PRId
15*
Tag Hi
29*
ECC
26*
7
0
PageMask
5*
EntryHi
10*
(entries protected
from T LB W R )
* Register number
TLB
EntryLo0
2*
EntryLo1
3*
3*
Index
0*
Random
1*
Wired
6*
Figure 1 The RC4700 CPO Registers
Hardware Overview
Hardware Overview
Hardware OverviewHardware Overview
The RC4700 processor family brings a high-level of integration
designed for high-performance computing. The R4700’s key elements
are briefly described below. A more detailed explanation of each
subsystem is available in the user’s manual.
Compare
11*
Cause
13*
ErrorEPC
30*
XContext
20*
LLAddr
17*
Config
16*
TagLo
28*
CacheErr
27*
Register Fil e
Register Fil e
Register Fil eRegister Fil e
The R4700 has 32 general-purpose registers (shown in Figure 2).
These registers are used for scalar integer operations and address
calculation. The register file consists of two read ports and one write
port and is fully bypassed to minimize operation latency in the pipeline.
General Purpose RegistersMultiply/Divide Registers
630
0630
r1HI
r2630
•LO
•
•Program Counter
•630
r29PC
r30
r31
Figure 2 R4700 CPU Registers
ALU
ALU
ALUALU
The RC4700 ALU consists of the integer adder and logic unit. The
adder performs address calculations in addition to arithmetic operations,
Pipeline
Pipeline
PipelinePipeline
The RC4700 uses a simple 5-stage pipeline, similar to the pipeline
structure implemented in the IDT79R32364. This pipeline’s simplicity
allows the RC4700 to be lower cost and lower power than super-scalar
or super-pipelined processors. The pipeline stages are shown in Figure
3 on page 3.
and the logic unit performs all logical and shift operations. Each of these
units is highly optimized and can perform an operation in a single pipeline cycle.
Integer Multiply/Divide
Integer Multiply/Divide
Integer Multiply/DivideInteger Multiply/Divide
To perform integer multiply and divide operations, the RC4700 uses
the floating-point unit. The results of the operation are placed in the HI
Integer Execution Engine
Integer Execution Engine
Integer Execution EngineInteger Execution Engine
The R4700 implements the MIPS-III Instruction Set architecture and
is upwardly compatible with applications that run on earlier generation
parts.
and LO registers. The values can then be transferred to the general
purpose register file using the MFHI/MFLO instructions. To prevent the
Implementation of the MIPS-III architecture results in 64-bit operations, better code density, greater multi-processing support, improved
performance for commonly used code sequences in operating system
kernels and faster execution of floating-point intensive applications. All
occurrence of an interlock or stall, a required number of processor
internal cycles must occur between an integer multiply or divide and a
subsequent MFHI or MFLO operation.
The RC4700 incorporates a complete floating-point co-processor on
chip and includes a floating-point register file and execution units. The
floating-point co-processor forms a “seamless” interface with the integer
unit, decoding and executing instructions in parallel with the integer unit.
Floating-Point Units
Floating-Point Units
Floating-Point UnitsFloating-Point Units
The RC4700 floating-point execution units support single and double
precision arithmetic, as specified in the IEEE Standard 754. The execution unit is separated into a multiply unit and a combined add/convert/
divide/square root unit. Overlap of multiplies and add/subtract is
supported. The multiplier is partially pipelined, allowing a new multiply to
begin every four cycles.
The RC4700 maintains fully precise floating-point exceptions while
allowing both overlapped and pipelined operations. Precise exceptions
are extremely important in mission-critical environments and highly
desirable for debugging in any environment.
The floating-point unit operation’s set includes floating-point add,
subtract, multiply, divide, square root, conversion between fixed-point
and floating-point format, conversion among floating-point formats and
floating-point compare. These operations comply with the IEEE Standard 754.
Table 1 lists the latencies of some of the floating-point instructions in
internal processor cycles. Note that multiplies are pipelined so that a
new multiply can be initiated every four pipeline cycles
Floating-Point General Register File
Floating-Point General Register File
Floating-Point General Register FileFloating-Point General Register File
The floating-point register file is made up of thirty-two 64-bit registers. With the LDC1 and SDC1 instructions the floating-point unit can
take advantage of the 64-bit wide data cache and issue a co-processor
load or store doubleword instruction in every cycle.
The floating-point control register space contains two registers: one
for determining configuration and revision information for the coprocessor and one for control and status information. These are primarily
involved with diagnostic software, exception handling, state saving and
restoring, and control of rounding modes.
System Control Co-processor (CP0)System Control Co-processor (CP0)
The system control co-processor in the MIPS architecture is responsible for the virtual memory sub-system, the exception control system
and the diagnostics capability of the processor. In the MIPS architecture, the system control co-processor (and thus the kernel software) is
implementation dependent.
System Control Co-Processor Registers
System Control Co-Processor Registers
System Control Co-Processor RegistersSystem Control Co-Processor Registers
The RC4700 incorporates all system control co-processor (CP0)
registers, on-chip. These registers (shown in Figure 1 on page 2)
provide the path through which the virtual memory system’s page
mapping is examined and changed, exceptions are handled and operating modes are controlled (kernel vs. user mode, interrupts enabled or
disabled, cache features). In addition, to aid in cache diagnostic testing
and assist in data error detection, the RC4700 includes registers to
implement a real-time cycle counting facility.
To establish a secure environment for user processing, the RC4700
provides the user, supervisor, and kernel modes of virtual addressing,
available to system software. Bits in a status register determine which
virtual addressing mode is used.
While in user mode, the RC4700 provides a single, uniform virtual
address space of 256GB (2GB for 32-bit address mode). When operating in the kernel mode, four distinct virtual address spaces—totalling
1024GB (4GB in 32-bit address mode)—are simultaneously available
and are differentiated by the high-order bits of the virtual address.
Single
Precision
Double
Precision
4 of 25April 10, 2001
IDT79R4700
The RC4700 processor also supports a supervisor mode in which the
virtual address space is 256.5GB (2.5GB in 32-bit address mode),
divided into three regions that are based on the high-order bits of the
virtual address. If the RC4700 is configured for 64-bit virtual addressing,
the virtual address space layout is an upwardly compatible extension of
the 32-bit virtual address space layout. Figure 4 on page 5 shows the
address space layout for the 32-bit virtual address operation.
Memory Management Unit (MMU)
Memory Management Unit (MMU)
Memory Management Unit (MMU)Memory Management Unit (MMU)
The Memory management unit controls the virtual memory system
page mapping. It consists of an instruction address translation buffer
(the ITLB), a data address translation buffer (the DTLB), a Joint TLB (the
JTLB), and co-processor registers used for the virtual memory mapping
sub-system.
Instruction TLB (ITLB)
Instruction TLB (ITLB)
Instruction TLB (ITLB)Instruction TLB (ITLB)
The RC4700 also incorporates a two-entry instruction TLB. Each
entry maps a 4KB page. The instruction TLB improves performance by
allowing instruction address translation to occur in parallel with data
address translation. When a miss occurs on an instruction address
translation, the least-recently used ITLB entry is filled from the JTLB.
The operation of the ITLB is invisible to the user.
Data TLB (DTLB)
Data TLB (DTLB)
Data TLB (DTLB)Data TLB (DTLB)
The RC4700 also incorporates a four-entry data TLB. Each entry
maps a 4KB page. The data TLB improves performance by allowing
data address translation to occur in parallel with instruction address
translation. When a miss occurs on a data address translation, the DTLB
is filled from the JTLB. The DTLB refill is pseudo-LRU: the least recently
used entry of the least recently used half is filled. The operation of the
DTLB is invisible to the user.
Joint TLB (JTLB)
Joint TLB (JTLB)
Joint TLB (JTLB)Joint TLB (JTLB)
For fast virtual-to-physical address decoding, the RC4700 uses a
large, fully associative TLB that maps 96 virtual pages to their corresponding physical addresses. The TLB is organized as 48 pairs of evenodd entries and maps a virtual address and address space identifier into
the large, 64GB physical address space.
Two mechanisms are provided to assist in controlling the amount of
mapped space and the replacement characteristics of various memory
regions. First, the page size can be configured, on a per-entry basis, to
map a page size of 4KB to 16MB (in multiples of 4). A CP0 register is
loaded with the page size of a mapping, and that size is entered into the
TLB when a new entry is written. Thus, operating systems can provide
special purpose maps; for example, a typical frame buffer can be
memory mapped using only one TLB entry.
The second mechanism controls the replacement algorithm, when a
TLB miss occurs. The RC4700 provides a random replacement algorithm to select a TLB entry to be written with a new mapping; however,
the processor provides a mechanism whereby a system specific number
of mappings can be locked into the TLB and avoid being randomly
replaced. This facilitates the design of real-time systems, by all owing
deterministic access to critical software.
The joint TLB also contains information to control the c ache coherency protocol for each page. Specifically, each page has attribute bits to
determine whether the coherency algorithm is uncached, non-coherent
write-back, non-coherent write-through write-allocate or non-coherent
write-through no write-allocate. Non-coherent write-back is typically
used for both code and data on the RC4700; however, hardware-based
cache coherency is not supported.
0xFFFFFFFF
0xE0000000
0xDFFFFFFFSupervisor virtual address space
To keep the RC4700’s high-performance pipeline full and operating
efficiently, the RC4700 incorporates on-chip instruction and data caches
that can be accessed in a single processor cycle. Each cache has its
own 64-bit data path and can be accessed in parallel.
Instruction Cache
Instruction Cache
Instruction CacheInstruction Cache
The RC4700 incorporates a two-way set associative on-chip instruction cache. This virtually indexed, physically tagged cache is 16KB in
size and is protected with word parity.
5 of 25April 10, 2001
IDT79R4700
Because the cache is virtually indexed, the virtual-to-physical
address translation occurs in parallel with the cache access, further
increasing performance by allowing these two operations to occur simultaneously. The tag holds a 24-bit physical address and valid bit and is
parity protected.
The instruction cache is 64-bits wide and can be refilled or accessed
in a single processor cycle. For a peak instruction bandw idth of 800MB/
sec at 200MHz, instruction fetches require only 32 bits per cycle. To
reduce power dissipation, sequential accesses take advantage of the
64-bit fetch. To minimize the cache miss penalty, cache miss refill writes
use 64 bits-per-cycle, and to maximize cache performance, the line size
is eight instructions (32 bytes).
Data Cache
Data Cache
Data CacheData Cache
For fast, single cycle data access, the RC4700 includes a 16KB onchip data cache that is two-way set associative with a fixed 32-byte
(eight words) line size.
The data cache is protected with byte parity and its tag is protected
with a single parity bit. It is virtually indexed and physically tagged to
allow simultaneous address translation and data cache access
The normal write policy is writeback, which means that a s tore to a
cache line does not immediately cause memory to be updated. This
increases system performance by reducing bus traffic and eliminating
the bottleneck of waiting for each store operation to finish before issuing
a subsequent memory operation. Software can however select writethrough on a per-page basis when it is appropriate, such as for frame
buffers.
Associated with the data cache is the store buffer. When the RC4700
executes a Store instruction, this single-entry buffer gets written with the
store data while the tag comparison is performed. If the tag matches,
then the data is written into the data cache in the next cycle that the data
cache is not accessed (the next non-load cycle). The store buffer allows
the R4700 to execute a store instruction every processor cycle and to
perform back-to-back stores without penalty.
The data cache can provide 8 bytes each clock cycle, for a peak
bandwidth of 1.6 GB/sec.
The system interface consists of a 64-bit Address/Data bus with
eight check bits and a 9-bit command bus protected with parity. In addition, there are eight handshake signals and six interrupt inputs. The
interface has a simple timing specification and is capable of transferring
data between the processor and memory at a peak rate of 500MB/sec
with a 67MHz bus.
System Address/Data Bus
System Address/Data Bus
System Address/Data BusSystem Address/Data Bus
The 64-bit System Address Data (SysAD) bus is used to transfer
addresses and data between the RC4700 and the rest of the system. It
is protected with an 8-bit parity check bus, SysADC.
The system interface is configurable to allow easier interfacing to
memory and I/O systems of varying frequencies. The data rate and the
bus frequency at which the RC4700 transmits data to the system interface are programmable via boot time mode control bits. Also, the rate at
which the processor receives data is fully controlled by the external
device. Therefore, either a low cost interface requiring no read or write
buffering or a faster, high performance interface can be designed to
communicate with the RC4700. Again, the system designer has the flexibility to make these price/performance trade-offs.
System Command Bus
System Command Bus
System Command BusSystem Command Bus
The RC4700 interface has a 9-bit System Command (SysCmd) bus.
The command bus indicates whether the SysAD bus carries an address
or data. If the SysAD carries an address, then the SysCmd bus also
indicates what type of transaction is to take place (for example, a read
or write). If the SysAD carries data, then the SysCmd bus also gives
information about the data (for example, this is the last data word transmitted, or the cache state of this data line is clean exclusive). The
SysCmd bus is bidirectional to support both processor requests and
external requests to the RC4700. Processor requests are initiated by
the RC4700 and responded to by an external device. External requests
are issued by an external device and require the RC4700 to respond.
The RC4700 supports one to eight byte and block transfers on the
SysAD bus. In the case of a sub-doubleword transfer, the low-order
three address bits give the byte address of the transfer, and the
SysCmd bus indicates the number of bytes being transferred.
Write Buffer
Write Buffer
Write BufferWrite Buffer
Writes to external memory—whether they are cache miss writebacks, stores to uncached or write-through addresses—use the on-chip
write buffer. The write buf fer holds a maximum of four 64-bit address and
64-bit data pairs. The entire buffer is used for a data cache writeback
and allows the processor to proceed in parallel with memory updates.
System Interface
System Interface
System InterfaceSystem Interface
The RC4700 supports a 64-bit system interface. This interface operates from two clocks—TClock[1:0] and RClock[1:0]—provided by the
RC4700, at some division of the internal clock.
6 of 25April 10, 2001
Handshak e S ignals
Handshak e S ignals
Handshak e S ignalsHandshake Sign als
There are six handshake signals on the system interface. Two of
these, RdRdy* and WrRdy* are used by an external device to indicate to
the RC4700 whether it can accept a new read or write transaction. The
RC4700 samples these signals before deasserting the address on read
and write requests.
ExtRqst* and Release* are used to transfer control of the SysAD and
SysCmd buses between the processor and an external device. When
an external device needs to control the interface, it asserts ExtRqst*.
The RC4700 responds by asserting Release* to release the system
interface to slave state.
IDT79R4700
ValidOut* and ValidIn* are used by the RC4700 and the external
device respectively to indicate that there is a valid command or data on
the SysAD and SysCmd buses. The RC4700 asserts ValidOut* when it
is driving these buses with a valid command or data, and the external
device drives ValidIn* when it has control of the buses and is driving a
valid command or data.
Non-overlapping System Interface
Non-overlapping System Interface
Non-overlapping System InterfaceNon-overlapping System Interface
The RC4700 bus uses a non-overlapping system interface. This
means that only one processor request may be outstanding at a time
and that the request must be serviced by an external device before the
RC4700 issues another request. The RC4700 can issue read and write
requests to an external device, and an external device can issue read
and write requests to the RC4700.
For processor read transaction the RC4700 asserts ValidOut* and
simultaneously drives the address and read command on the SysAD
and SysCmd buses. If the system interface has RdRdy* asserted, then
the processor tristates its drivers and releases the system interface to
slave state by asserting Release*. The external device can then begin
sending the data.
Figure 5 on page 10 shows a processor block read request and the
external agent read response. The read latency is four cycles (ValidOut*
to ValidIn*), and the response data pattern is DDxxDD. Figure 6 on
page 10 shows a processor block write.
information to be kept in a low-cost serial EEPROM; alternatively, the
20-or-so bits could be generated by the system interface ASIC or a
simple PAL.
Immediately after the V
bit stream of 256 bits to initialize all fundamental operational modes.
After initialization is complete, the processor continues to drive the serial
clock output, but no further initialization bits are read.
JTAG Interface
JTAG Interface
JTAG InterfaceJTAG Interface
The RC4700 supports the JTAG interface pins, with the serial input
connected to serial output. Boundary scan is not supported.
Boot-Time Modes
Boot-Time Modes
Boot-Time ModesBoot-Time Modes
The boot-time serial mode stream is defined in Table 3. Bit 0 is the
first bit presented to the processor when VCCOK
last.
Power Management
Power Management
Power ManagementPower Management
CP0 is also used to control the power management for the RC4700.
This is the standby mode and can be used to reduce the power
consumption of the internal core of the CPU. Standby mode is entered
by executing the WAIT instruction with the SysAD bus idle and is exited
by an interrupt.
CCOK
signal is asserted, the processor reads a
is asserted; bit 255 is the
1111
Write Reissue and Pipel ine Write
Write Reissue and Pipel ine Write
Write Reissue and Pipel ine WriteWrite Reissue and Pipel ine Write
The RC4700 implements additional write protocols that have been
designed to improve performance. This implementation doubles the
effective write bandwidth. The write re-issue has a high repeat rate of
two cycles per write. A write issues if WrRdy* is asserted two cycles
earlier and is still asserted at the issue cycle. If it is not still asserted, the
last write re-issues again. Pipelined writes have the same two cycle per
write repeat rate but can issue one additional write after WrRdy* deasserts. They still follow the issue rule as R4x00 mode for other writes.
External Requests
External Requests
External RequestsExternal Requests
The RC4700 responds to requests issued by an external device. The
requests can take several forms. An external device may need to supply
data in response to an RC4700 read request or it may need to gain
control over the system interface bus to access other resources which
may be on that bus. It also may issue requests to the processor, such as
a request for the RC4700 to write to the RC4700 interrupt register. The
RC4700 supports Write, Null, and Read Response external requests.
Boot-Time Options
Boot-Time Options
Boot-Time OptionsBoot-Time Options
Fundamental operational modes for the processor are initialized by
the boot-time mode control interface. The boot-time mode control interface is a serial interface operating at a very low frequency (MasterClock
divided by 256). The low-frequency operation allows the initialization
Standby Mode Operations
Standby Mode Operations
Standby Mode OperationsStandby Mode Operations
The RC4700 provides a means to reduce the amount of power
consumed by the internal core when the CPU would otherwise not be
performing any useful operations. This is known as “Standby Mode.”
Entering Standby Mode
Entering Standby Mode
Entering Standby ModeEntering Standby Mode
Executing the WAIT instruction enables interrupts and enters
Standby mode. When the WAIT instruction finishes the W pipe-stage, if
the SysAd bus is currently idle, the internal clocks will shut down, thus
freezing the pipeline. The PLL, internal timer, some of the input pin
clocks (Int[5:0]*, NMI*, ExtReq*, Reset*, and ColdReset*), and the
output clocks—TClock[1:0], RClock[1:0] SyncOut, Modeclock and
MasterOut—will continue to run. If the conditions are not correct when
the WAIT instruction finishes the W pipe-stage (such as the SysAd bus
is not idle), the WAIT is treated as a NOP.
Once the CPU is in Standby Mode, any interrupt— including the
internally generated timer interrupt—will cause the CPU to exit S tandby
Mode.
1.
The R4700 implements advanced power management, to substantially
reduce the average power dissipation of the device. This operation is described
in the R4700 Microprocessor Hardware User’s Manual.
7 of 25April 10, 2001
IDT79R4700
Thermal Consid erations
Thermal Consid erations
Thermal Consid erationsThermal Consider ati ons
The RC4700 uses special packaging techniques to improve the
thermal properties of high-speed processors. The RC4700 is packaged
using cavity down packaging in a 179-pin PGA package, and a 208-lead
QFP package. These packages effectively dissipate the power of the
CPU, increasing device reliability.
The R4700 is guaranteed in a case temperature range of 0° to +85°
C. The type of package, speed (power) of the device, and airflow conditions affect the equivalent ambient temperature conditions that will meet
this specification.
The equivalent allowable ambient temperature, T
using the thermal resistance from case to ambient (∅
A, can be calculated
CA) of the given
package. The following equation relates ambient and case temperatures:
T
A = TC
- P * ∅CA
where P is the maximum power consumption at hot temperature,
calculated by using the maximum I