•850-MHZ and 1-GHz Device: 0°C to 100°C
– Extended Temperature:
(Selectable at Boot-Time)
• Two 1x Serial RapidIO® Links, v1.2 Compliant
– 1.25-, 2.5-, 3.125-Gbps Link Rates
– Message Passing and DirectIO Support
– Error Management Extensions and
Congestion Control
• One 1.8-V Inter-Integrated Circuit (I2C) Bus
• Two 1.8-V McBSPs
(1)
1
Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2All trademarks are the property of their respective owners.
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of the Texas
Instruments standard warranty. Production processing does not
necessarily include testingof all parameters.
(1)
Note: Advance Information is presented in this document for
the C6474 1.2-GHz extended temperature device.
• SmartReflex™ Class 0 - 0.9-V to 1.2-V Adaptive
Core Voltage
• 1.8-V, 1.1-V I/Os
1.1CUN/GUN/ZUN BGA Package (Bottom View)
The devices are designed for a package temperature range of 0°C to 100°C (commercial temperature
range; 1-GHz device), -40°C to 100°C (extended temperature range; 1-GHz device), 0°C to 95°C
(commercial temperature range; 850-MHz and 1.2-GHz device), and -40°C to 95°C (extended temperature
range; 1.2-GHz device). A heatsink is required so that this range is not exceeded.
NOTE
Advance Information is presented in this document for the C6474 1.2-GHz extended
temperature device.
The TMS320C64x+ DSPs (including the TMS320C6474 device) are the highest-performance multicore
DSP generation in the TMS320C6000™ DSP platform.
The C6474 device is based on the third-generation high-performance, advanced VelociTI™
very-long-instruction-word (VLIW) architecture developed by Texas Instruments (TI).
The C64x+™ devices are upward code-compatible from previous devices that are part of the C6000™
DSP platform.
1.2.1Core Processor
Based on 65-nm process technology and 3.6 GHz of total raw DSP processing power with performance of
up to 28,800 million instructions per second (MIPS) [or 28,800 16-bit MMACs per cycle], the C6474 device
offers cost-effective solutions to high-performance DSP programming challenges with three independent
DSP subsystems. The DSP possesses the operational flexibility of high-speed controllers and numerical
capability of array processors.
The C64x+ DSP core employs eight functional units, two register files, and two data paths. Like the earlier
C6000 devices, two of these functional units are multipliers or .M units. Each C64x+ .M unit doubles the
multiply throughput versus the C64x core by performing four 16-bit x 16-bit multiply-accumulates (MACs)
every clock cycle. Thus, eight 16-bit x 16-bit MACs can be executed every cycle on the C64x+ core. At
a1.2-GHz rate, this means 9600 16-bit MMACs can occur every microsecond. Moreover, each multiplier
on the C64x+ core can compute one 32-bit x 32-bit MAC or four 8-bit x 8-bit MACs every clock cycle.
SPRS552F–OCTOBER 2008–REVISED JULY 2010
The C6474 DSP integrates a large amount of on-chip memory organized as a three-level memory system.
The level-1 data memories on the device are 32 KB each. This memory can be configured as mapped
RAM, cache, or some combination of the two. When configured as cache, L1 program (L1P) is a
direct-mapped cache where as L1 data (L1D) is a two-way set associative cache. The level-3 (L3) ROM is
64 KB in the device. The C64x+ megamodule also has a 32-bit peripheral configuration (CFG) port, an
internal DMA (IDMA) controller, a system component with reset/boot control, and a free-running 32-bit
timer for time stamp.
The C64x+ DSP core has a complete set of development tools which includes: a new C compiler, an
assembly optimizer to simplify programming and scheduling, and a Windows® debugger interface for
visibility into source code execution.
The DMA switch fabric provides enhanced on-chip connectivity between the DSP cores and the
peripherals and accelerators.
1.2.2Peripherals
The peripheral set includes: an inter-integrated circuit bus module (I2C); two multichannel buffered serial
ports (McBSPs) each at 100 Mbps; six 64-bit general-purpose timers (also configurable as twelve 32-bit
timers); 16 general-purpose input/output ports (GPIO) with programmable interrupt/event generation
modes; a 1000-Mbps Ethernet media access controller (EMAC), which provides an efficient interface
between the C6474 DSP core processor and the network; a management data input/output (MDIO)
module (also part of EMAC), which controls PHY configuration and status monitoring; a frame
synchronization (FSYNC) module, which synchronizes DMA transactions; a semaphore hardware block
(Semaphore), which allows access to shared resources with unique interrupts to each of the cores to
identify when that core has acquired the resource; and a 16-/32-bit DDR2 SDRAM interface.
The I2C port allows the DSP to easily control peripheral devices and communicate with a host processor.
The device includes two Serial RapidIO® (SRIO) with link rates of 1.25 Gbps, 2.5 Gbps or 3.125 Gbps.
This high-bandwidth peripheral is used for point-to-point inter-device communication and may connect the
TCI6487/8 device to other DSPs, ASICs, or switches on the same board or across the backplane. This
dramatically improves system performance and reduces system cost for applications that include multiple
DSPs on a board such as video and telecom infrastructures and medical/imaging. The SRIO also provides
alarm, interrupt, and messaging events.
The device includes the SerDes-based antenna interface (AIF) capable of up to 3.072 Gbps operation per
link. The AIF comprises six high-speed serial links, compliant to OBSAI RP3 and CPRI standards. The
antenna interface is used to connect the backplane for antenna data transmission and reception. Each link
of the AIF includes a differential receive and transmit signal pair.
1.2.3Accelerators
The device has two high-performance embedded coprocessors [enhanced Viterbi Decoder Coprocessor
(VCP2) and enhanced turbo decoder coprocessor (TCP2)] that significantly speed up channel-decoding
operations on-chip. The VCP2 operating at CPU clock divided-by-3 can decode over 694 7.95-Kbps
adaptive multi-rate (AMR) [K=9, R=1/3] voice channels. The VCP2 supports constraint lengths K = 5, 6, 7,
8, and 9, rates R = 3/4, 1/2, 1/3, and 1/5, and flexible polynomials, while generating hard decisions or soft
decisions. The TCP2 operating at CPU clock divided-by-3 can decode up to fifty 384-Kbps or eight
2-Mbps turbo encoded channels (assuming 6 iterations). The TCP2 implements the max*log-map
algorithm and is designed to support all polynomials and rates required by third-generation partnership
projects (3 GPP and 3 GPP2), with fully programmable frame length and turbo interleaver. Decoding
parameters such as the number of iterations and stopping criteria are also programmable.
Communications between the VCP2/TCP2 and the CPU are carried out through the EDMA3 controller.
Table 2-1 provides an overview of the C6474 DSP. The tables show significant features of the C6474
device, including the capacity of on-chip RAM, the peripherals, the CPU frequency, and the package type
with pin count.
Table 2-1. Characteristics of the C6474 Processor
HARDWARE FEATURESC6474
PeripheralsDDR2 Memory Controller (32-bit bus width) [1.8 V I/O]
Not all peripherals pins(clock memory = DDRREFCLK(N|P)
are available at the same
time.
(For more detail, see
Table 2-1. Characteristics of the C6474 Processor (continued)
HARDWARE FEATURESC6474
Product Status
Device Part Numbers(For more details on C64x+ DSP part numbering, see
(1) PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas
Instruments standard warranty. Production processing does not necessarily include testing of all parameters. Note: Advance Information
is presented in this document for the C6474 1.2-GHz extended temperature device.
(1)
Product Preview (PP), Advance Information (AI), or
Production Data (PD)
Figure 2-11)
2.2CPU (DSP Core) Description
The C64x+ central processing unit (CPU) consists of eight functional units, two register files, and two data
paths as shown in Figure 2-1. The two general-purpose register files (A and B) each contain 32 (thirty-two)
32-bit registers for a total of 64 registers. The general-purpose registers can be used for data or can be
data address pointers. The data types supported include packed 8-bit data, 32-bit data, 40-bit data, and
64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are stored in register
pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or 32 MSBs in the next
upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
SPRS552F–OCTOBER 2008–REVISED JULY 2010
PD
TMS320C6474CUN/GUN/ZUN
The C64x+ CPU extends the performance of the C64x core through enhancements and new features.
Each C64x+ .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, two 16 x
16 bit multiplies, two 16 x 32 bit multiplies, four 8 x 8 bit multiplies, four 8 x 8 multiplies with add
operations and four 16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There
is also support for Galois filed multiplication for 8-bit and 32-bit data. Many communications algorithms
such FFTs and modems require complex multiplication. The complex multiply (CMPY) instruction takes
four 16-bit inputs and produces a 32-bit real and a 32-bit imaginary output. There are also complex
multiplies with rounding capability that produces one 32-bit packed output that contain 16-bit real and
16-bit imaginary values. The 32 x 32 bit multiply instructions provide the extended precision necessary for
audio and other high-precision algorithms on a variety of signed and unsigned 32-bit data types.
The .L or arithmetic logic unit now incorporates the ability to do parallel add/subtract operations on a pair
of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C64x+ core enhances the .S unit in several ways. In the C64x core, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C64X+ core, they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
Other new features include:
•SPLOOP - a small instruction buffer in the CPU that aids in creation of software pipelining loops where
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
•Compact Instructions - The native instruction size of the C6000 devices is 32 bits. Many common
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C64x+
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
•Instruction Set Enhancements - As noted above, there are new instructions such as 32-bit
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
•Exception Handling - Intended to aid the programmer in isolating bugs. The C64x+ CPU is able to
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as watchdog time expiration).
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
•Time-Stamp Counter - Primarily targeted for real-time operating system (RTOS) robustness, a
free-running time-stamp counter is implemented in the CPU that is not sensitive to system stalls.
For more details on the C64x+ CPU and its enhancements over the C64x architecture, see the following
documents:
•TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (literature number SPRU732)
•TMS320C64x+ DSP Cache User's Guide (literature number SPRU862)
•TMS320C64x+ Megamodule Reference Guide (literature number SPRU871)
•TMS320C64X to TMS320C64x+ CPU Migration Guide (literature number SPRAA84)
A. On .M unit, dst2 is 32 MB.
B. On .M unit, dst1 is 32 LSB.
C. On 64x+ CPU .M unit, src2 is 32 bits; on C64x+ CPU .M unit, src2 is 64 bits.
D. On .L and .S units, odd dst connects to odd register files and even dst connects to even register files.
Figure 2-1. TMS320C64x+TM CPU (DSP Core) Data Path
Submit Documentation Feedback
Product Folder Link(s) :TMS320C6474
TMS320C6474
SPRS552F–OCTOBER 2008–REVISED JULY 2010
2.3Memory Map Summary
Table 2-2 shows the memory map address of the C6474 device. For more information about the registers
in these address ranges, click on the links in the table. The external memory configuration register
address ranges in the C6474 device begin at the hex address location 0x7000 for DDR2 Memory
Controller.
The boot sequence is a process by which the DSP's internal memory is loaded with program and data
sections. The DSP's internal registers are programmed with predetermined values. The boot sequence is
started automatically after each power-on reset, warm reset, and system reset. A local reset to an
individual C64x+ Megamodule should not affect the state of the hardware boot controller on the device.
For more details on the initiators of the resets, see Section 7.7, Reset Controller.
The C6474 device supports several boot processes begins execution at the ROM base address, which
contains the bootloader code necessary to support various device boot modes. The boot processes are
software driven; using the BOOTMODE[3:0] device configuration inputs to determine the software
configuration that must be completed.
2.4.1Boot Modes Supported
The device supports several boot processes, which leverage the internal boot ROM. Most boot processes
are software driven, using the BOOTMODE[3:0] device configuration inputs to determine the software
configuration that must be completed. From a hardware perspective, there are three possible boot modes:
•No Boot (BOOTMODE[3:0] = 0000b)
With no boot, the CPU executes directly from the internal L2 RAM located at address 0x80 0000.
Note: Device operations are undefined if invalid code is located at address 0x80 0000. This boot mode
is a hardware boot mode.
•Public ROM Boot
The C64x+ Megamodule Core 0 is released from reset and begins executing from the L3 ROM base
address. C64x+ Megamodule Core 0 is responsible for performing the boot process (e.g., from I2C
ROM, Ethernet, or RapidIO), after which C64x+ Megamodule Core 0 brings the other C64x+
megamodule cores out of reset by setting to 1 the EVTPULSE4 bit (bit 4) of the C64x+ Megamodule
Core 0's EVTASRT register. This process is valid only once: writing 1, then writing 1 again will not
bring Core 1 and 2 out of reset again. Then, the C64x+ Megamodule Core 0 begins execution from the
entry address defined in the boot table. The C64x+ Megamodule Core 1 and 2 begin execution from
their L2 RAMs' base address.
The boot process performed by C64x+ Megamodule Core 0 in public ROM boot is determined by the
BOOTMODE[3:0] value in the DEVSTAT register. C64x+ Megamodule Core 0 reads this value, and then
executes the associated boot process in software.
No Boot0000bNo Boot (BOOTMODE[3:0] = 0000b)
I2C Master Boot A0001bSlave I2C address is 0x50. C64x+ Megamodule Core 0 configures I2C, acts as a
I2C Master Boot B0010bSimilar to I2C boot A except the slave I2C address is 0x51.
I2C Slave Boot0011The C64x+ Megamodule Core 0 configures I2C and acts as a slave and will accept
EMAC Master Boot0100bTI Ethernet Boot, C64x+ Megamodule Core 0 configures EMAC0 and EDMA, if
EMAC Slave Boot0101b
EMAC Forced-Mode Boot0110b
Reserved0111bReserved
Serial RapidIO Boot (Config 0)1000bThe C64x+ Megamodule Core 0 configures the SRIO and an external host loads the
Serial RapidIO Boot (Config 1)1001b
Serial RapidIO Boot (Config 2)1010b
Serial RapidIO Boot (Config 3)1011b
master to the I2C bus and copies data from an I2C EEPROM or a device acting as an
I2C slave to the DSP using a predefined boot table format. The destination address
and length are contained within the boot table. After boot table copy is complete, the
C64x+ Megamodule Core 0 brings the other C64x+ Megamodule Cores out of reset
by setting to 1 the EVTPULSE4 bit (bit 4) of the C64x+ Megamodule Core EVTASRT
register.
data and code section packets through the I2C interface. It is required that an I2C
master in present in the system.
required, and brings the code image into the internal on-chip memory via the protocol
defined by the boot method (EMAC bootloader). After initializing the on-chip memory
to the known state, C64x+ Megamodule Core 0 brings the other C64x+ Megamodule
Cores out of reset.
application via SRIO peripheral, using directIO protocol. A doorbell interrupt is used to
indicate that the code has been loaded. For more details on the Serial RapidIO
configurations, see Table 2-4.
www.ti.com
C64x+ Megamodule Core 0 configures Serial RapidIO and EDMA, if required, and brings the code image
into the internal on-chip memory via the protocol defined by the boot method (SRIO bootloader) and then
C64x+ Megamodule Core 0 brings the other C64x+ Megamodule Cores out of reset. Note that SRIO boot
modes are only supported on port 0.
Table 2-4. Serial RapidIO (SRIO) Supported Boot Modes
Any of the boot modes can be used to download a second-level bootloader. A second-level bootloader
allows for any level of customization to current boot methods as well as the definition of a completely
customized boot.
The terminal functions table (Table 2-5) identifies the external signal names, the pin type (I, O, O/Z, or
I/O/Z), whether the pin has any internal pullup/pulldown resistors, and the signal function description.
Megamodule Core 0, C64x+ Megamodule Core 1, and C64x+ Megamodule Core
2, respectively. NMIs are edge-driven (rising edge). Any noise on the NMI pin
may trigger an NMI interrupt; therefore, if the NMI pin is not used, it is
recommended that the NMI pin be grounded rather than relying on the IPD.
System Clock Input to Antenna Interface and main PLL (Main PLL optional vs
ALTCORECLK)
Alternate Core Clock Input to main PLL (vs SYSCLK)
DDR Reference Clock Input to DDR PLL
System Clock Output to be used as a general purpose output clock for debug
purposes
SIGNAL DESCRIPTION
www.ti.com
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = internal pulldown, IPU = internal pullup. All internal pullups and pulldowns are 100 mA.
GPIO5 is mapped to L2_CONFIG is a reserved bootstrap pin and should be
pulled up to DV
during bootstrap
GPIO[7:6] are not multiplexed
GPIO[11:8] are mapped to DEVNUM[3:0]
(see Section 2.4.1, Boot Modes Supported)
GPIO[15:12] are not multiplexed
MISCELLANEOUS
Voltage Control Outputs to variable core power supply (open-drain buffers)
Note: These pins must be externally pulled up. For more infomation, see the
TMS320C6474 Hardware Design Guide application report (literature number