PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of Texas Instruments
standard warranty. Production processing does not necessarily include
testing of all parameters.
This page intentionally left blank
Revision History
Revision History
This data manual revision history highlights the technical changes made to the SPRS241C device-specific data
manual to make it a SPRS241D revision.
Scope: Applicable updates to the C64x device family, specifically relating to the TMS320C6418 device, have
been incorporated.
Added the device-specific information supporting the TMS320C6418 silicon revision 1.1 device, which is now in
the production data (PD) stage of development (see ADDS/CHANGES/DELETES).
OSCV
Changed Description from “Power for crystal oscillator (1.2 V), Do not connect to board power 1.4 V; “ to “Power for crystal
oscillator (1.2 V), Do not connect to board power CVDD; “
D0.13-µm/6-Level Cu Metal Process (CMOS)
D3.3-V I/Os, 1.4-V Internal (-600)
D3.3-V I/Os, 1.2-V Internal (A-500)
VelociTI.2, VelociTI, and TMS320C64x are trademarks of Texas Instruments.
All trademarks are the property of their respective owners.
†
IEEE Standard 1149.1-1990 Standard-Test-Access Port and Boundary Scan Architecture.
August 2004 − Revised January 2006SPRS241D
15
Functional Overview
2Functional Overview
2.1GTS and ZTS BGA Packages (Bottom View)
GTS and ZTS 288-PIN BALL GRID ARRAY (BGA) PACKAGES
(BOTTOM VIEW)
AB
AA
Y
W
V
U
T
R
P
N
M
L
K
J
H
G
F
E
D
C
B
A
21
19
17
15
13
11
5431
2
9
10
876
14
12
18
16
22
20
Figure 2−1. GTS and ZTS BGA Packages (Bottom View)
16
August 2004 − Revised January 2006SPRS241D
2.2Description
The TMS320C64x DSPs (including the TMS320C6418 device) are the highest-performance fixed-point
DSP generation in the TMS320C6000 DSP platform. The TMS320C6418 (C6418) device is based on the
second-generation high-performance, advanced VelociTI very-long-instruction-word (VLIW) architecture
(VelociTI.2) developed by Texas Instruments (TI). The high-performance, lower-cost C6418 DSP enables
customers to reduce system costs for telecom, software radio, Digital Terrestrial Television Broadcasting
(DTTB), and digital Broadcast Satellite/Communication Satellite (BS/CS) applications. The C64x is a
code-compatible member of the C6000 DSP platform.
With performance of up to 4800 million instructions per second (MIPS) at a clock rate of 600 MHz, the C6418
device offers cost-effective solutions to high-performance DSP programming challenges. The C6418 DSP
possesses the operational flexibility of high-speed controllers and the numerical capability of array
processors. The C64x DSP core processor has 64 general-purpose registers of 32-bit word length and eight
highly independent functional units—two multipliers for a 32-bit result and six arithmetic logic units (ALUs)—
with V elociTI.2 extensions. The VelociTI.2 extensions in the eight functional units include new instructions
to accelerate the performance in video and imaging applications and extend the parallelism of the VelociTI
architecture. The C6418 can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of
2400 million MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 4800 MMACS. The C6418
DSP also has application-specific hardware logic, on-chip memory, and additional on-chip peripherals similar
to the other C6000 DSP platform devices.
The C6418 device has a high-performance embedded coprocessor [Viterbi Decoder Coprocessor (VCP)] that
significantly speed up channel-decoding operations on-chip. The VCP operating at CPU clock divided-by-4
can decode over 500 7.95-Kbps adaptive multi-rate (AMR) [K = 9, R = 1/3] voice channels. The VCP supports
constraint lengths K = 5, 6, 7, 8, and 9, rates R = 1/2, 1/3, and 1/4, and flexible polynomials, while generating
hard decisions or soft decisions. Communications between the VCP and the CPU are carried out through the
EDMA controller.
Description
The C6418 uses a two-level cache-based architecture and has a powerful and diverse set of peripherals. The
Level 1 program cache (L1P) is a 128-Kbit direct mapped cache and the Level 1 data cache (L1D) is a 128-Kbit
2-way set-associative cache. The Level 2 memory/cache (L2) consists of an 4-Mbit memory space that is
shared between program and data space. L2 memory can be configured as mapped memory, cache (up to
256K bytes), or combinations of the two. The peripheral set includes: two multichannel buffered audio serial
ports (McASPs); two inter-integrated circuit bus modules (I2Cs) ; two multichannel buffered serial ports
(McBSPs); three 32-bit general-purpose timers; a user-configurable 16-bit or 32-bit host-port interface
(HPI16/HPI32); a 16-pin general-purpose input/output port (GP0) with programmable interrupt/event
generation modes; and a 32-bit glueless external memory interface (EMIFA), which is capable of interfacing
to synchronous and asynchronous memories and peripherals.
Each McASP port supports one transmit and one receive clock zone, with six serial data pins which can be
individually allocated to any of the two zones. The serial port supports time-division multiplexing on each pin
from 2 to 32 time slots. The C6418 has sufficient bandwidth to support all six serial data pins transmitting a
192-kHz stereo signal. Serial data in each zone may be transmitted and received on multiple serial data pins
simultaneously and formatted in a multitude of variations on the Philips Inter-IC Sound (I
2
S) format.
In addition, the McASP transmitter may be programmed to output multiple S/PDIF, IEC60958, AES-3, CP-430
encoded data channels simultaneously , with a single RAM containing the full implementation of user data and
channel status fields.
McASP also provides extensive error-checking and recovery features, such as the bad clock detection circuit
for each high-frequency master clock which verifies that the master clock is within a programmed frequency
range.
The I2C ports on the TMS320C6418 allows the DSP to easily control peripheral devices and communicate
with a host processor. In addition, the standard multichannel buffered serial port (McBSP) may be used to
communicate with serial peripheral interface (SPI) mode peripheral devices.
TMS320C6000, and C6000 are trademarks of Texas Instruments.
August 2004 − Revised January 2006SPRS241D
17
Device Characteristics
Not all peripherals pins
Not all peripherals pins
are available at the
Configuration section).
Voltage
The C6418 has a complete set of development tools which includes: a new C compiler, an assembly optimizer
to simplify programming and scheduling, and a Windows debugger interface for visibility into source code
execution.
2.3Device Characteristics
Table 2−1, provides an overview of the C6418 DSP. The tables show significant features of the C6418 device,
including the capacity of on-chip RAM, the peripherals, the CPU frequency, and the package type with pin
count.
Table 2−1. Characteristics of the C6418 Processor
HARDWARE FEATURESC6418
EMIFA (32-bit bus width)
(clock source = AECLKIN, CLKOUT4, or CLKOUT6)
Peripherals
are available at the
same time (For more
detail, see the Device
CPU ID + CPU Rev IDControl Status Register (CSR.[31:16])0x0C01
JTAG BSDL_IDJTAGID register (address location: 0x01B3F008)0x0007902F
FrequencyMHz
Cycle Timens
Voltage
PLL OptionsCLKIN frequency multiplier
BGA Package23 x 23 mm288-Pin Flip-Chip Plastic BGA (GTS and ZTS)
Process Technologyµm0.13 µm
Product Status
†
On this C64x device, the rated EMIF speed affects only the SDRAM interface on the EMIF. For more detailed information, see the EMIF device
speed portion of this data sheet.
‡
PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas Instruments standard
warranty. Production processing does not necessarily include testing of all parameters.
‡
EDMA (64 independent channels)1
McASPs (use Peripheral Clock and AUXCLK)2
I2Cs (use Peripheral Clock)2
HPI (32- or 16-bit user selectable)1 (HPI16 or HPI32)
McBSPs
GP0[15:8] pins are muxed with the HPI HD[15:8] pins and GP0[2:1] pins are muxed with CLKOUT6 and CLKOUT4,
GP0
GP0
‡
OSCILLATOR
and PLL
(x1, x5 − x12, x16,
x18, x19 − x22, x24)
Boot Configuration
Power-Down
Logic
respectively.
Figure 2−2. Functional Block Diagram
August 2004 − Revised January 2006SPRS241D
19
CPU (DSP Core) Description
2.4CPU (DSP Core) Description
The CPU fetches VelociTI advanced very-long instruction words (VLIWs) (256 bits wide) to supply up to
eight 32-bit instructions to the eight functional units during every clock cycle. The V elociTI VLIW architecture
features controls by which all eight units do not have to be supplied with instructions if they are not ready to
execute. The first bit of every 32-bit instruction determines if the next instruction belongs to the same execute
packet as the previous instruction, or whether it should be executed in the following clock as a part of the next
execute packet. Fetch packets are always 256 bits wide; however, the execute packets can vary in size. The
variable-length execute packets are a key memory-saving feature, distinguishing the C64x CPUs from other
VLIW architectures. The C64x VelociTI.2 extensions add enhancements to the TMS320C62x DSP
VelociTI architecture. These enhancements include:
•Register file enhancements
•Data path extensions
•Quad 8-bit and dual 16-bit extensions with data flow enhancements
•Additional functional unit hardware
•Increased orthogonality of the instruction set
•Additional instructions that reduce code size and increase register flexibility
The CPU features two sets of functional units. Each set contains four units and a register file. One set contains
functional units .L1, .S1, .M1, and .D1; the other set contains units .D2, .M2, .S2, and .L2. The two register
files each contain 32 32-bit registers for a total of 64 general-purpose registers. In addition to supporting the
packed 16-bit and 32-/40-bit fixed-point data types found in the C62x VelociTI VLIW architecture, the
C64x register files also support packed 8-bit data and 64-bit fixed-point data types. The two sets of functional
units, along with two register files, compose sides A and B of the CPU [see the functional block and CPU (DSP
core) diagram, and Figure 2−3]. The four functional units on each side of the CPU can freely share the 32
registers belonging to that side. Additionally, each side features a “data cross path”—a single data bus
connected to all the registers on the other side, by which the two sets of functional units can access data from
the register files on the opposite side. The C64x CPU pipelines data-cross-path accesses over multiple clock
cycles. This allows the same register to be used as a data-cross-path operand by multiple functional units in
the same execute packet. All functional units in the C64x CPU can access operands via the data cross path.
Register access by functional units on the same side of the CPU as the register file can service all the units
in a single clock cycle. On the C64x CPU, a delay clock is introduced whenever an instruction attempts to read
a register via a data cross path if that register was updated in the previous clock cycle.
In addition to the C62x DSP fixed-point instructions, the C64x DSP includes a comprehensive collection
of quad 8-bit and dual 16-bit instruction set extensions. These VelociTI.2 extensions allow the C64x CPU
to operate directly on packed data to streamline data flow and increase instruction set efficiency.
Another key feature of the C64x CPU is the load/store architecture, where all instructions operate on registers
(as opposed to data in memory). Two sets of data-addressing units (.D1 and .D2) are responsible for all data
transfers between the register files and the memory. The data address driven by the .D units allows data
addresses generated from one register file to be used to load or store data to or from the other register file.
The C64x .D units can load and store bytes (8 bits), half-words (16 bits), and words (32 bits) with a single
instruction. And with the new data path extensions, the C64x .D unit can load and store doublewords (64 bits)
with a single instruction. Furthermore, the non-aligned load and store instructions allow the .D units to access
words and doublewords on any byte boundary. The C64x CPU supports a variety of indirect addressing modes
using either linear- or circular-addressing with 5- or 15-bit offsets. All instructions are conditional, and most
can access any one of the 64 registers. Some registers, however, are singled out to support specific
addressing modes or to hold the condition for conditional instructions (if the condition is not automatically
“true”).
TMS320C62x and C62x are trademarks of Texas Instruments.
20
August 2004 − Revised January 2006SPRS241D
CPU (DSP Core) Description
The two .M functional units perform all multiplication operations. Each of the C64x .M units can perform two
16 × 16-bit multiplies or four 8 × 8-bit multiplies per clock cycle. The .M unit can also perform 16 × 32-bit multiply
operations, dual 16 × 16-bit multiplies with add/subtract operations, and quad 8 × 8-bit multiplies with add
operations. In addition to standard multiplies, the C64x .M units include bit-count, rotate, Galois field multiplies,
and bidirectional variable shift hardware.
The two .S and .L functional units perform a general set of arithmetic, logical, and branch functions with results
available every clock cycle. The arithmetic and logical functions on the C64x CPU include single 32-bit, dual
16-bit, and quad 8-bit operations.
The processing flow begins when a 256-bit-wide instruction fetch packet is fetched from a program memory.
The 32-bit instructions destined for the individual functional units are “linked” together by “1” bits in the least
significant bit (LSB) position of the instructions. The instructions that are “chained” together for simultaneous
execution (up to eight in total) compose an execute packet. A “0” in the LSB of an instruction breaks the chain,
effectively placing the instructions that follow it in the next execute packet. A C64x DSP device enhancement
now allows execute packets to cross fetch-packet boundaries. In the TMS320C62x/TMS320C67x DSP
devices, if an execute packet crosses the fetch-packet boundary (256 bits wide), the assembler places it in
the next fetch packet, while the remainder of the current fetch packet is padded with NOP instructions. In the
C64x DSP device, the execute boundary restrictions have been removed, thereby, eliminating all of the
NOPs added to pad the fetch packet, and thus, decreasing the overall code size. The number of execute
packets within a fetch packet can vary from one to eight. Execute packets are dispatched to their respective
functional units at the rate of one per clock cycle and the next 256-bit fetch packet is not fetched until all the
execute packets from the current fetch packet have been dispatched. After decoding, the instructions
simultaneously drive all active functional units for a maximum execution rate of eight instructions every clock
cycle. While most results are stored in 32-bit registers, they can be subsequently moved to memory as bytes,
half-words, or doublewords. All load and store instructions are byte-, half-word-, word-, or
doubleword-addressable.
For more details on the C64x CPU functional units enhancements, see the following documents:
•TMS320C6000 CPU and Instruction Set Reference Guide (literature number SPRU189)
•TMS320C64x Technical Overview (literature number SPRU395)
TMS320C67x is a trademark of Texas Instruments.
August 2004 − Revised January 2006SPRS241D
21
CPU (DSP Core) Description
ST1b (Store Data)
ST1a (Store Data)
Data Path A
LD1b (Load Data)
LD1a (Load Data)
DA1 (Address)
32 MSBs
32 LSBs
32 MSBs
32 LSBs
src1
.L1
src2
long dst
long src
long src
long dst
src1
.S1
src2
long dst
long dst
src1
.M1
src2
src2
src1
.D1
src2
dst
dst
dst
dst
dst
8
8
8
8
Register
File A
(A0−A31)
See Note A
See Note A
2X
Data Path B
DA2 (Address)
LD2a (Load Data)
LD2b (Load Data)
ST2a (Store Data)
ST2b (Store Data)
32 LSBs
32 MSBs
32 MSBs
32 LSBs
src2
.D2
src1
src2
src1
.M2
long dst
src2
.S2
src1
long dst
long src
long src
long dst
src2
.L2
src1
dst
dst
dst
dst
1X
See Note A
See Note A
Register
File B
(B0− B31)
8
8
8
8
Control Register
File
NOTE A: For the .M functional units, the long dst is 32 MSBs and the dst is 32 LSBs.
Figure 2−3. TMS320C64x CPU (DSP Core) Data Paths
22
August 2004 − Revised January 2006SPRS241D
2.5Memory Map Summary
Table 2−2 shows the memory map address ranges of the C6418 device. Internal memory is always located
at address 0 and can be used as both program and data memory. The external memory address ranges in
the C6418 device begin at the hex address location 0x8000 0000 for EMIFA.
Table 2−2. TMS320C6418 Memory Map Summary
Memory Map Summary
MEMORY BLOCK DESCRIPTION
Internal RAM (L2) [C6418]512K
Reserved [C6418]512K
Reserved15M
Reserved8M
External Memory Interface A (EMIFA) Registers256K
L2 Registers256K
HPI Registers256K
McBSP 0 Registers256K
McBSP 1 Registers256K
Timer 0 Registers256K
Timer 1 Registers256K
Interrupt Selector Registers256K
EDMA RAM and EDMA Registers256K
Reserved512K
Timer 2 Registers256K
GP0 Registers256K minus 4K
Device Configuration Registers4K
I2C0 Data and Control Registers16K
I2C1 Data and Control Registers16K
Reserved16K
McASP0 Control Registers16K
McASP1 Control Registers16K
Reserved176K
VCP Control Registers128K
Reserved128K
Emulation256K
Reserved528K
Reserved3.5M
QDMA Registers52
Reserved928M minus 52
McBSP 0 Data64M
McBSP 1 Data64M
Reserved64M
McASP0 Data1M
McASP1 Data1M
Reserved62M
Figure 2−4 shows the detail of the L2 architecture on the TMS320C6418 device. For more information on the
L2MODE bits, see the cache configuration (CCFG) register bit field descriptions in the TMS320C64xTwo-Level Internal MemoryReference Guide (literature number SPRU610).
Table 2−3 through Table 2−21 identify the peripheral registers for the C6418 device by their register names,
acronyms, and hex address or hex address range. For more detailed information on the register contents, bit
names and their descriptions, see the specific peripheral reference guide listed in the TMS320C6000 DSPPeripherals Overview Reference Guide (literature number SPRU190).
Table 2−3. EMIFA Registers
HEX ADDRESS RANGEACRONYMREGISTER NAMECOMMENTS
0180 0000GBLCTLEMIFA global control
0180 0004CECTL1EMIFA CE1 space control
0180 0008CECTL0EMIFA CE0 space control
0180 000C−Reserved
0180 0010CECTL2EMIFA CE2 space control
0180 0014CECTL3EMIFA CE3 space control
0180 0018SDCTLEMIFA SDRAM control
0180 001CSDTIMEMIFA SDRAM refresh control
0180 0020SDEXTEMIFA SDRAM extension
0180 0024 − 0180 003C−Reserved
0180 0040PDTCTLPeripheral device transfer (PDT) control
0180 0044CESEC1EMIFA CE1 space secondary control
0180 0048CESEC0EMIFA CE0 space secondary control
0180 004C−Reserved
0180 0050CESEC2EMIFA CE2 space secondary control
0180 0054CESEC3EMIFA CE3 space secondary control
0180 0058 − 0183 FFFF–Reserved
Table 2−4. L2 Cache Registers (C64x)
HEX ADDRESS RANGEACRONYMREGISTER NAMECOMMENTS
0184 0000CCFGCache configuration register
0184 0004 − 0184 0FFC−Reserved
0184 1000EDMAWEIGHTL2 EDMA access control register
01A0 13E0 − 01A0 13F7−Reload/link parameters for Event 148 (6 words)
01A0 13F8 − 01A0 13FF−Scratch pad area (2 words)
01A0 1400 − 01A3 FFFF−Reserved
†
The C6418 device has 213 EDMA parameters total: 64-Event/Reload channels and 149-Reload only parameter sets [six (6) words each] that
can be used to reload/link EDMA transfers.
†
Parameters for Event 0
(6 words) or Reload/Link
Parameters for other Event
Reload/Link Parameters for
other Event 0−15
30
August 2004 − Revised January 2006SPRS241D
Loading...
+ 116 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.