500 MHz PowerPC
7410 Daughtercard
AltiVec Parallel
Vector Technology
Scalable from Two
to Hundreds of
Processors
1K CFFT in 20 µs
on Each Processor
125 MHz Memory
System with
Prefetch and ECC
Advanced DMA
Engine for Chained
Submatrix Moves
267 MB/s RACE++
Switch Fabric
Interconnect
Fast L2 Cache
(250 MHz)
Embedded computing
reaches a new level of
performance with
RACE++
PowerPC
tercards from Mercury
Each PowerPC 7410
daughtercard contains two
500 MHz MPC7410 microprocessors
with AltiVec™ technology. These unique
microprocessors combine a modern superscalar RISC architecture with an AltiVec
parallel vector execution unit.
The AltiVec vector processing unit
revolutionizes the performance of
computationally intensive applications such
as image and signal processing. Each
vector unit can operate in parallel on up to
four floating-point numbers or up to
sixteen 8-bit integers. This dramatically
accelerates vector arithmetic and provides
greater application performance on smaller,
less power-hungry processors.
AltiVec technology also represents a leap
in simplifying the programming required
to achieve high performance. Whereas
previous DSP-based systems required
handcrafted assembly language code for
optimal performance, easy-to-use extensions to the C language provide a direct
mapping to AltiVec instructions. This
permits developers to program more
productively in a higher-level language,
even for critical sections of code.
®
Series
®
7410 daugh-
Computer Systems.
Optimized Performance
To keep the processor fed with ample data,
increased emphasis is placed on the
memory system and communications fabric
that delivers data to the processor. Each
compute node on the 500 MHz PowerPC
7410 daughtercard has a dedicated fabric
interface at 267 MB/s and maximum memory speed of 125 MHz. By maximizing the
performance of the memory and the fabric
interface to the processor, Mercury has
optimized RACE++ compute nodes for
processing continuous streams of data.
AltiVec in RACE++ Computers
The computational power of RACE++
Series systems is built from compute nodes
comprised of processors, memory, and
interfaces to the RACE++ interconnect.
PowerPC 7410 daughtercards each contain
two compute nodes.
Each compute node (CN) consists of an
MPC7410 microprocessor with AltiVec
technology, level 2 (L2) cache, synchronous
DRAM (SDRAM), and a Mercury-designed
ASIC. This CN ASIC contains architectural
advancements that enhance concurrency
between arithmetic and I/O operations.
PowerPC 7410 Daughtercard Architecture
With the huge increase in processing
performance brought by AltiVec, most
applications are no longer CPU-limited.
Mercury can configure systems with hundreds of compute
nodes, communicating over the second-generation RACE++
switch fabric interconnect. Merging RACE++ and AltiVec
technology provides embedded computers with unprecedented computational power.
AltiVec Vector Processing Unit
The AltiVec vector processing unit operates on 128 bits of
data concurrently with the other PowerPC execution units.
AltiVec instructions may be interleaved with other PowerPC
instructions without any penalty such as a context switch. The
128-bit wide execution unit can be used to operate on four
floating-point numbers, four 32-bit integers, eight 16-bit
integers, or sixteen 8-bit integers simultaneously.
AltiVec instructions are carried out by one of two AltiVec
sub-units. The Vector arithmetic logic unit handles the
vector fixed-point and vector floating-point operations. Two
floating-point operations are possible in a single cycle with the
vector multiply-add instruction and the vector negative
multiply-subtract instruction.
The Permute sub-unit incorporates a crossbar network to
perform 16 individual byte moves in a single cycle. This
capability can be used for simple tasks such as converting the
"endian-ness" of data or for more complicated tasks such as
byte interleaving, dynamic address alignment, or accelerating
small look-up tables.
PowerPC RISC Architecture
In addition to the AltiVec execution unit, the MPC7410
contains a floating-point unit and two integer units that can
operate concurrently with the AltiVec unit. Data and instructions are fed through two on-chip, 32-Kbyte, eight-way
set-associative caches that enhance performance of both
vector and scalar code.
Each PowerPC 7410 CN also includes a fully pipelined
backside L2 cache operating at 250 MHz. This high-
performance cache system provides quick access to data
previously loaded from memory but too large to fit into the
on-chip cache.
Compute Node ASIC
The CN ASIC, included in each compute node, acts as both
a memory controller and as a network interface to the
RACE++ switch fabric interconnect. The CN ASIC includes
an enhanced DMA controller, a high-performance memory
system with error checking and correcting, metering logic,
and a RACE++ interface. By combining memory control
and network interface into a single chip, Mercury's compute
node provides the highest performance with the lowest power
consumption and highest reliability.
High-Performance Memory System
Mercury's high-performance memory subsystem allows the
memory to reach the intrinsic limits of its performance
capability with:
125-MHz Synchronous DRAM
Prefetch Buffers: bring sequential data to the ASIC ahead
of their explicit requests by the processor. These prefetch
buffers greatly improve the performance of the CN in vector operations such as those used in DSP applications.
FIFO Buffers: efficiently overlap accesses to SDRAM from
the local processor and the RACEway interconnect.
The PowerPC CN contains error-correcting circuitry for
improved data integrity. One-bit errors are corrected on the
fly, and multi-bit errors generate an interrupt error condition.
Enhanced DMA Controller
Each CN has an advanced DMA controller to support
RACEway transfers at 267 MB/s with chaining and striding.
MPC7410 Data
and Instruction Flow
Compute Node ASIC Architecture