•Six ALUs (32-/40-Bit), Each Supports
Single 32-Bit, Dual 16-Bit, or Quad 8-Bit
Arithmetic per Clock Cycle
•Two Multipliers Support Four 16 x 16-Bit
Multiplies (32-Bit Results) per Clock– 16K-Byte Instruction Cache
Cycle or Eight 8 x 8-Bit Multiplies (16-Bit
Results) per Clock Cycle
– Load-Store Architecture With Non-Aligned
Support
– 64 32-Bit General-Purpose Registers
– Instruction Packing Reduces Code Size
– All Instructions Conditional
– Additional C64x+™ Enhancements
•Protected Mode Operation
•Exceptions Support for Error Detection
and Program Redirection
•Hardware Support for Modulo Loop
Operation
• C64x+ Instruction Set Features
– Byte-Addressable (8-/16-/32-/64-Bit Data)
– 8-Bit Overflow Protection
– Bit-Field Extract, Set, Clear
– Normalization, Saturation, Bit-Counting
– Compact 16-Bit InstructionsAuto-Focus Module
– Additional Instructions to Support Complex•Resize Engine
Multiplies
Mapped)
Set-Associative)
– 64K-Byte L2 Unified Mapped RAM/Cache
(Flexible RAM/Cache Allocation)
• ARM926EJ-S Core
– Support for 32-Bit and 16-Bit (Thumb®
Mode) Instruction Sets
– DSP Instruction Extensions and Single Cycle
MAC
– ARM® Jazelle® Technology
– EmbeddedICE-RT™ Logic for Real-Time
Debug
• ARM9 Memory Architecture
– 8K-Byte Data Cache
– 16K-Byte RAM
– 8K-Byte ROM
• Embedded Trace Buffer™ (ETB11™) With 4KB
Memory for ARM9 Debug
• Endianness: Little Endian for ARM and DSP
• Video Imaging Co-Processor (VICP)
• Video Processing Subsystem
– Front End Provides:
•CCD and CMOS Imager Interface
•BT.601/BT.656 Digital YCbCr 4:2:2
(8-/16-Bit) Interface
•Preview Engine for Real-Time Image
Processing
•Glueless Interface to Common Video
Decoders
•Histogram Module
•Auto-Exposure, Auto-White Balance and
– Resize Images From 1/4x to 4x
– Separate Horizontal/Vertical Control
1
Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2All trademarks are the property of their respective owners.
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of the Texas
Instruments standard warranty. Production processing does not
necessarily include testing of all parameters.
The TMS320DM6446 (also referenced as DM6446) leverages TI’s DaVinci™ technology to meet the
networked media encode and decode application processing needs of next-generation embedded devices.
The DM6446 enables OEMs and ODMs to quickly bring to market devices featuring robust operating
systems support, rich user interfaces, high processing performance, and long battery life through the
maximum flexibility of a fully integrated mixed processor solution.
The dual-core architecture of the DM6446 provides benefits of both DSP and Reduced Instruction Set
Computer (RISC) technologies, incorporating a high-performance TMS320C64x+™ DSP core and an
ARM926EJ-S core.
The ARM926EJ-S is a 32-bit RISC processor core that performs 32-bit or 16-bit instructions and
processes 32-bit, 16-bit, or 8-bit data. The core uses pipelining so that all parts of the processor and
memory system can operate continuously.
The ARM core incorporates:
•A coprocessor 15 (CP15) and protection module
•Data and program Memory Management Units (MMUs) with table look-aside buffers.
•Separate 16K-byte instruction and 8K-byte data caches. Both are four-way associative with virtual
index virtual tag (VIVT).
The TMS320C64x+™DSPs arethe highest-performancefixed-point DSPgeneration inthe
TMS320C6000™ DSP platform. It is based on an enhanced version of the second-generation
high-performance, advanced very-long-instruction-word(VLIW) architecture developed byTexas
Instruments (TI), making these DSP cores an excellent choice for digital media applications. The C64x is a
code-compatible member of the C6000™ DSP platform. The TMS320C64x+ DSP is an enhancement of
the C64x+™ DSP with added functionality and an expanded instruction set.
SPRS283H–DECEMBER 2005–REVISED SEPTEMBER 2010
Any reference to the C64x™ DSP or C64x™ CPU also applies, unless otherwise noted, to the C64x+™
DSP and C64x+™ CPU, respectively.
With performance of up to 6480 million instructions per second (MIPS) at a clock rate of 810 MHz, the
C64x+ core offers solutions to high-performance DSP programming challenges. The DSP core possesses
the operational flexibility of high-speed controllers and the numerical capability of array processors. The
C64x+ DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly
independent functional units—two multipliers for a 32-bit result and six arithmetic logic units (ALUs). The
eight functional units include instructions to accelerate the performance in video and imaging applications.
The DSP core can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 3240 million
MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 6480 MMACS. For more details
on the C64x+ DSP, see the TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide
(literature number SPRU732).
The DM6446 also has application-specific hardware logic, on-chip memory, and additional on-chip
peripherals similar to the other C6000 DSP platform devices. The DM6446 core uses a two-level
cache-based architecture. The Level 1 program cache (L1P) is a 256K-bit direct mapped cache and the
Level 1 data cache (L1D) is a 640K-bit 2-way set-associative cache. The Level 2 memory/cache (L2)
consists of an 512K-bit memory space that is shared between program and data space. L2 memory can
be configured as mapped memory, cache, or combinations of the two.
The peripheral set includes: 2 configurable video ports; a 10/100 Mb/s Ethernet MAC (EMAC) with a
Management Data Input/Output (MDIO) module; an inter-integrated circuit (I2C) Bus interface; one audio
serial port (ASP); 2 64-bit general-purpose timers each configurable as 2 independent 32-bit timers;
1 64-bit watchdog timer; up to 71-pins of general-purpose input/output (GPIO) with programmable
interrupt/event generation modes, multiplexed with other peripherals; 3 UARTs with hardware
handshaking support on 1 UART; 3 pulse width modulator (PWM) peripherals; and 2 external memory
interfaces: an asynchronous external memory interface (EMIFA) for slower memories/peripherals, and a
higher speed synchronous memory interface for DDR2.
The DM6446 device includes a Video Processing Subsystem (VPSS) with two configurable video/imaging
peripherals: 1 Video Processing Front-End (VPFE) input used for video capture, 1 Video Processing
Back-End (VPBE) output with imaging co-processor (VICP) used for display.
The Video Processing Front-End (VPFE) is comprised of a CCD Controller (CCDC), a Preview Engine
(Previewer), Histogram Module, Auto-Exposure/White Balance/Focus Module (H3A), and Resizer. The
CCDC is capable of interfacing to common video decoders, CMOS sensors, and Charge Coupled Devices
(CCDs). The Previewer is a real-time image processing engine that takes raw imager data from a CMOS
sensor or CCD and converts from an RGB Bayer Pattern to YUV4:2:2. The Histogram and H3A modules
provide statistical information on the raw color data for use by the DM6446. The Resizer accepts image
data for separate horizontal and vertical resizing from 1/4x to 4x in increments of 256/N, where N is
between 64 and 1024.
The Video Processing Back-End (VPBE) is comprised of an On-Screen Display Engine (OSD) and a
Video Encoder (VENC). The OSD engine is capable of handling 2 separate video windows and 2 separate
OSD windows. Other configurations include 2 video windows, 1 OSD window, and 1 attribute window
allowing up to 8 levels of alpha blending. The VENC provides four analog DACs that run at 54 MHz,
providing a means for composite NTSC/PAL video, S-Video, and/or Component video output. The VENC
also provides up to 24 bits of digital output to interface to RGB888 devices. The digital output is capable of
8/16-bit BT.656 output and/or CCIR.601 with separate horizontal and vertical syncs.
The Ethernet Media Access Controller (EMAC) provides an efficient interface between the DM644x and
the network. The DM6446 EMAC support both 10Base-T and 100Base-TX, or 10 Mbits/second (Mbps)
and 100 Mbps in either half- or full-duplex mode, with hardware flow control and quality of service (QOS)
support.
www.ti.com
The Management Data Input/Output (MDIO) module continuously polls all 32 MDIO addresses in order to
enumerate all PHY devices in the system. Once a PHY candidate has been selected by the ARM, the
MDIO module transparently monitors its link state by reading the PHY status register. Link change events
are stored in the MDIO module and can optionally interrupt the ARM, allowing the ARM to poll the link
status of the device without continuously performing costly MDIO accesses.
The HPI, I2C, SPI, USB2.0, and VLYNQ ports allow DM6446 to easily control peripheral devices and/or
communicate with host processors. The DM6446 also provides multimedia card support, MMC/SD, with
SDIO support.
The DM6446 also includes a Video/Imaging Co-processor (VICP) to offload many video and imaging
processing tasks from the DSP core, making more DSP MIPS available for common video and imaging
algorithms. For more information on the VICP enhanced codecs, such as H.264 and MPEG4, please
contact your nearest TI sales representative.
The rich peripheral set provides the ability to control external peripheral devices and communicate with
external processors. For details on each of the peripherals, see the related sections later in this document
and the associated peripheral reference guides listed in Section 2.8.3.1, Related Documentation FromTexas Instruments.
The DM6446 has a complete set of development tools for both the ARM and DSP. These include C
compilers, a DSP assembly optimizer to simplify programming and scheduling, and a Windows™
debugger interface for visibility into source code execution.
This data manual revision history highlights the technical changes made to the SPRS283G device-specific
data manual to make it an SPRS283H revision.
Scope: Added information/data on silicon revision 2.3.
Applicable updates to the DM644x device family, specifically relating to the TMS320DM6446 device, have
been incorporated.
SPRS283H–DECEMBER 2005–REVISED SEPTEMBER 2010
Revision History
NOTE: Page numbers for previous revisions may differ from page numbers in the current version.
TMS320DM6446 Revision History
SEEADDITIONS/MODIFICATIONS/DELETIONS
Global
Section 2.1Table 2-1, Characteristics of the Processor:
Device Characteristics
Section 2.8.3.1
Related Documentation
From Texas
Instruments
Section 3.3.1.1
BOOTCFG Register
Description
Section 3.3.3
DSP Boot
Section 3.5.1Figure 3-6, MSTPRI1 Register:
Switched Central
Resource (SCR) Bus
Priorities
Section 3.5.4Figure 3-7, PINMUX0 Register:
PINMUX0 Register
Description
Section 6.3.1.3Figure 6-6, PLL1 and PLL2 Clock Domain Block Diagram:
DM6446 Power and
Clock Domains
Section 6.6.3Table 6-19, Switching Characteristics Over Recommended Operating Conditions for CLK_OUT1:
Clock PLL Electrical
Data/Timing (Input and
Output Clocks)
Section 6.10.1.2Table 6-35, Switching Characteristics Over Recommended Operating Conditions for Asynchronous
EMIFA ElectricalMemory Cycles for EMIFA Module:
Data/Timing
Table 2-1 provides an overview of the TMS320DM6446 SoC. The table shows significant features of the
device, including the capacity of on-chip RAM, peripherals, internal peripheral bus frequency relative to the
C64x+ DSP, and the package type with pin count.
Table 2-1. Characteristics of the Processor
HARDWARE FEATURESDM6446
DDR2 Memory ControllerDDR2 (16/32-bit bus width)
Asynchronous EMIF (EMIFA)
Flash CardsMMC/SD with secure data input/output (SDIO)
EDMA3
Timersseparate 32-bit timers)
Peripherals
Not all peripherals pins are
available at the same time.
(For more details, see
Section 3, Device
Configurations).
On-Chip Memory
CPU ID + CPU Rev IDControl Status Register (CSR.[31:16])0x1000
C64x+ MegamoduleRevision ID Register (MM_REVID[15:0])0x0000 (Silicon Revisions 1.3 and earlier)
UART3 (one with RTS and CTS flow control)
SPI1 (supports 2 slave devices)
I2C1 (Master/Slave)
Audio Serial Port [ASP]1
10/100 Ethernet MAC with Management Data
Table 2-1. Characteristics of the Processor (continued)
HARDWARE FEATURESDM6446
DM6446 - 810
CPU FrequencyMHzDM6446 - 594
DM6446A - 513
DM6446 - 810
Cycle TimensDM6446 - 594
DM6446A - 513
Voltage1.3 V (-810)
PLL Options
BGA Package361-Pin BGA (ZWT)
Process Technologymm0.09 mm
Product Status
(1) PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas
Instruments standard warranty. Production processing does not necessarily include testing of all parameters.
(1)
Core (V)
I/O (V)1.8 V, 3.3 V (-810, -594, A-513)
CLKIN frequency multiplier
(27 MHz reference)
16 x 16 mm
ball finish SnAgCu
Product Preview (PP),
Advance Information (AI),PD
or Production Data (PD)
DSP 810 MHz
ARM 405 MHz
DSP 594 MHz
ARM 297 MHz
DSP 513 MHz
ARM 256.5 MHz
DSP 1.23 ns
ARM 2.47 ns
DSP 1.68 ns
ARM 3.37 ns
DSP 1.95 ns
ARM 3.90 ns
1.2 V (-594, A-513)
x1 (Bypass), x30 (-810)
x1 (Bypass), x22 (-594, A-513)
www.ti.com
2.2Device Compatibility
The ARM926EJ-S RISC CPU is compatible with other ARM9 CPUs from ARM Holdings plc.
The C64x+ DSP core is code-compatible with the C6000™ DSP platform and supports features of the
C64x DSP family.
2.3ARM Subsystem
The ARM Subsystem is designed to give the ARM926EJ-S (ARM9) master control of the device. In
general, the ARM is responsible for configuration and control of the device; including the DSP Subsystem,
the VPSS Subsystem, and a majority of the peripherals and external memories.
The ARM Subsystem includes the following features:
•ARM926EJ-S RISC processor
•ARMv5TEJ (32/16-bit) instruction set
•Little endian
•Co-Processor 15 (CP15)
•MMU
•16KB Instruction cache
•8KB Data cache
•Write Buffer
•16KB Internal RAM (32-bit-wide access)
•8KB Internal ROM (ARM bootloader for non-EMIFA boot options)
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
The ARM Subsystem integrates the ARM926EJ-S processor. The ARM926EJ-S processor is a member of
ARM9 family of general-purpose microprocessors. This processor is targeted at multi-tasking applications
where full memory management, high performance, low die size, and low power are all important. The
ARM926EJ-S processor supports the 32-bit ARM and 16 bit THUMB instruction sets, enabling the user to
trade off between high performance and high code density. Specifically, the ARM926EJ-S processor
supports the ARMv5TEJ instruction set, which includes features for efficient execution of Java byte codes,
providing Java performance similar to Just in Time (JIT) Java interpreter, but without associated code
overhead.
The ARM926EJ-S processor supports the ARM debug architecture and includes logic to assist in both
hardware and software debug. The ARM926EJ-S processor has a Harvard architecture and provides a
complete high performance subsystem, including:
•ARM926EJ -S integer core
•CP15 system control coprocessor
•Memory Management Unit (MMU)
•Separate instruction and data Caches
•Write buffer
•Separate instruction and data Tightly-Coupled Memories (TCMs) [internal RAM] interfaces
•Separate instruction and data AHB bus interfaces
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
SPRS283H–DECEMBER 2005–REVISED SEPTEMBER 2010
For more complete details on the ARM9, refer to the ARM926EJ-S Technical Reference Manual, available
at http://www.arm.com.
2.3.2CP15
The ARM926EJ-S system control coprocessor (CP15) is used to configure and control instruction and
data caches, Tightly-Coupled Memories (TCMs), Memory Management Unit (MMU), and other ARM
subsystem functions. The CP15 registers are programmed using the MRC and MCR ARM instructions,
when the ARM in a privileged mode such as supervisor or system mode.
2.3.3MMU
The ARM926EJ-S MMU provides virtual memory features required by operating systems such as Linux®,
Windows® CE, Ultron®, ThreadX®, etc. A single set of two level page tables stored in main memory is
used to control the address translation, permission checks and memory region attributes for both data and
instruction accesses. The MMU uses a single unified Translation Lookaside Buffer (TLB) to cache the
information held in the page tables. The MMU features are:
•Standard ARM architecture v4 and v5 MMU mapping sizes, domains and access protection scheme.
•Invalidate TLB entry, selected by MVA, using CP15 register 8
•Lockdown of TLB entries, using CP15 register 10
2.3.4Caches and Write Buffer
The size of the Instruction Cache is 16KB, Data cache is 8KB. Additionally, the Caches have the following
features:
•Virtual index, virtual tag, and addressed using the Modified Virtual Address (MVA)
•Four-way set associative, with a cache line length of eight words per line (32-bytes per line) and with
two dirty bits in the Dcache
•Dcache supports write-through and write-back (or copy back) cache operation, selected by memory
region using the C and B bits in the MMU translation tables.
•Critical-word first cache refilling
•Cache lockdown registers enable control over which cache ways are used for allocation on a line fill,
providing a mechanism for both lockdown, and controlling cache corruption
•Dcache stores the Physical Address TAG (PA TAG) corresponding to each Dcache entry in the TAG
RAM for use during the cache line write-backs, in addition to the Virtual Address TAG stored in the
TAG RAM. This means that the MMU is not involved in Dcache write-back operations, removing the
possibility of TLB misses related to the write-back address.
•Cache maintenance operations provide efficient invalidation of, the entire Dcache or Icache, regions of
the Dcache or Icache, and regions of virtual memory.
www.ti.com
The write buffer is used for all writes to a noncachable bufferable region, write-through region and write
misses to a write-back region. A separate buffer is incorporated in the Dcache for holding write-back for
cache line evictions or cleaning of dirty cache lines. The main write buffer has 16-word data buffer and a
four-address buffer. The Dcache write-back has eight data word entries and a single address entry.
2.3.5Tightly Coupled Memory (TCM)
ARM internal RAM is provided for storing real-time and performance-critical code/data and the Interrupt
Vector table. ARM internal ROM enables non-EMIFA boot options, such as NAND and UART. The RAM
and ROM memories interfaced to the ARM926EJ-S via the tightly coupled memory interface that provides
for separate instruction and data bus connections. Since the ARM TCM does not allow instructions on the
D-TCM bus or data on the I-TCM bus, an arbiter is included so that both data and instructions can be
stored in the internal RAM/ROM. The arbiter also allows accesses to the RAM/ROM from extra-ARM
sources (e.g., EDMA3 or other masters). The ARM926EJ-S has built-in DMA support for direct accesses
to the ARM internal memory from a non-ARM master. Because of the time-critical nature of the TCM link
to the ARM internal memory, all accesses from non-ARM devices are treated as DMA transfers.
Instruction and Data accesses are differentiated via accessing different memory map regions, with the
instruction region from 0x0000 through 0x7FFF and data from 0x8000 through 0xFFFF. The instruction
region at 0x0000 and data region at 0x8000 map to the same physical 16KB TCM RAM. Placing the
instruction region at 0x0000 is necessary to allow the ARM Interrupt Vector table to be placed at 0x0000,
as required by the ARM architecture. The internal 16-KB RAM is split into two physical banks of 8KB
each, which allows simultaneous instruction and data accesses to be accomplished if the code and data
are in separate banks.
2.3.6Advanced High-Performance Bus (AHB)
The ARM Subsystem uses the AHB port of the ARM926EJ-S to connect the ARM to the Config bus and
the external memories. Arbiters are employed to arbitrate access to the separate D-AHB and I-AHB by the
Config Bus and the external memories bus.
2.3.7Embedded Trace Macrocell (ETM) and Embedded Trace Buffer (ETB)
To support real-time trace, the ARM926EJ-S processor provides an interface to enable connection of an
Embedded Trace Macrocell (ETM). The ARM926ES-J Subsystem in the DM6446 also includes the
Embedded Trace Buffer (ETB). The ETM consists of two parts:
•Trace Port provides real-time trace capability for the ARM9.
•Triggering facilities provide trigger resources, which include address and data comparators, counter,
and sequencers.
The DM6446 trace port is not pinned out and is instead only connected to the Embedded Trace Buffer.
The ETB has a 4KB buffer memory. ETB enabled debug tools are required to read/interpret the captured
trace data.
2.3.8ARM Memory Mapping
The ARM memory map is shown in Section 2.5, Memory Map Summary, of this document. The ARM has
access to memories shown in the following sections.
2.3.8.1ARM Internal Memories
The ARM has access to the following ARM internal memories:
•16KB ARM Internal RAM on TCM interface, logically separated into two 8KB pages to allow
simultaneous access on any given cycle if there are separate accesses for code (I-TCM bus) and data
(D-TCM) to the different memory regions.
•8KB ARM Internal ROM
2.3.8.2External Memories
The ARM has access to the following external memories:
The ARM9 has access to all of the peripherals on the DM6446 device with the exception of the VICP.
2.3.10 PLL Controller (PLLC)
The ARM Subsystem includes the PLL Controller. The PLL Controller contains a set of registers for
configuring DM6446’s two internal PLLs (PLL1 and PLL2). The PLL Controller provides the following
configuration and control:
•PLL Bypass Mode
•Set PLL multiplier parameters
•Set PLL divider parameters
•PLL power down
•Oscillator power down
The PLLs are briefly described in this document in Section 6.6, Clock PLLs. For more detailed information
on the PLLs and PLL Controller register descriptions, see the TMS320DM644x DMSoC ARM SubsystemReference Guide (literature number SPRUE14).
2.3.11 Power and Sleep Controller (PSC)
The ARM Subsystem includes the Power and Sleep Controller (PSC). Through register settings
accessible by the ARM9, the PSC provides two levels of power savings: peripheral/module clock gating
and power domain shut-off. Brief details on the PSC are given in Section 6.3, Power Supplies. For more
detailed information and complete register descriptions for the PSC, see the TMS320DM644x DMSoCARM Subsystem Reference Guide (literature number SPRUE14).
www.ti.com
2.3.12 ARM Interrupt Controller (AINTC)
The ARM Interrupt Controller (AINTC) accepts device interrupts and maps them to either the ARM’s IRQ
(interrupt request) or FIQ (fast interrupt request). The ARM Interrupt Controller is briefly described in
Section 6.7, Interrupts, of this document. For detailed information on the ARM Interrupt Controller, see the
TMS320DM644x DMSoC ARM Subsystem Reference Guide (literature number SPRUE14).
2.3.13 System Module
The ARM Subsystem includes the System module. The System module consists of a set of registers for
configuring and controlling a variety of system functions. For details and register descriptions for the
System module, see Section 3, Device Configurations, and see the TMS320DM644x DMSoC ARMSubsystem Reference Guide (literature number SPRUE14).
2.3.14 Power Management
DM6446 has several means of managing power consumption. There is extensive use of clock gating,
which reduces the power used by global device clocks and individual peripheral clocks. Clock
management can be utilized to reduce clock frequencies in order to reduce switching power. For more
details on power management techniques, see Section 3 (Device Configurations), Section 6 (Peripheraland Electrical Specifications), and see the TMS320DM644x DMSoC ARM Subsystem Reference Guide
(literature number SPRUE14).
DM6446 gives the programmer full flexibility to use any and all of the previously mentioned capabilities to
customize an optimal power management strategy. Several typical power management scenarios are
described in the following sections.
The DSP Subsystem includes the following features:
•C64x+ DSP CPU
•32KB L1 Program (L1P)/Cache (up to 32KB)
•80KB L1 Data (L1D)/Cache (up to 32KB)
•64KB Unified Mapped RAM/Cache (L2)
•Little endian
2.4.1C64x+ DSP CPU Description
The C64x+ Central Processing Unit (CPU) consists of eight functional units, two register files, and two
data paths as shown in Figure 2-1. The two general-purpose register files (A and B) each contain
32 32-bit registers for a total of 64 registers. The general-purpose registers can be used for data or can be
data address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit
data, 40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are
stored in register pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or
32 MSBs in the next upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
SPRS283H–DECEMBER 2005–REVISED SEPTEMBER 2010
The C64x+ CPU extends the performance of the C64x core through enhancements and new features.
Each C64x+ .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x
32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with
add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four
16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for
Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and
modems require complex multiplication. The complex multiply (CMPY) instruction takes for 16-bit inputs
and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding
capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The
32 x 32 bit multiply instructions provide the extended precision necessary for audio and other
high-precision algorithms on a variety of signed and unsigned 32-bit data types.
The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a
pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C64x+ core enhances the .S unit in several ways. In the C64x core, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C64x+ core they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
•SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
•Compact Instructions - The native instruction size for the C6000 devices is 32 bits. Many common
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C64x+
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
•Instruction Set Enhancement - As noted above, there are new instructions such as 32-bit
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
•Exceptions Handling - Intended to aid the programmer in isolating bugs. The C64x+ CPU is able to
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as a watchdog time expiration).
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
•Time-Stamp Counter - Primarily targeted for Real-Time Operating System (RTOS) robustness, a
free-running time-stamp counter is implemented in the CPU which is not sensitive to system stalls.
For more details on the C64x+ CPU and its enhancements over the C64x architecture, see the following
documents:
•TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (literature number SPRU732)
•TMS320C64x Technical Overview (literature number SPRU395)
A. On .M unit, dst2 is 32 MSB.
B. On .M unit, dst1 is 32 LSB.
C. On C64x CPU .M unit, src2 is 32 bits; on C64x+ CPU .M unit, src2 is 64 bits.
D. On .L and .S units, odd dst connects to odd register files and even dst connects to even register files.
TMS320DM6446
www.ti.com
SPRS283H–DECEMBER 2005–REVISED SEPTEMBER 2010
Figure 2-1. TMS320C64x+™ CPU (DSP Core) Data Paths
The DSP memory map is shown in Section 2.5, Memory Map Summary. Configuration of the control
registers for DDR2, EMIFA, and ARM Internal RAM is supported by the ARM. The DSP has access to
memories shown in the following sections.
2.4.2.1ARM Internal Memories
The DSP has access to the 16KB ARM Internal RAM on the ARM D-TCM interface (i.e., data only).
2.4.2.2External Memories
The DSP has access to the following External memories:
•DDR2 Synchronous DRAM
•Asynchronous EMIF / NOR Flash
2.4.2.3DSP Internal Memories
The DSP has access to the following DSP memories:
•L2 RAM
•L1P RAM
•L1D RAM
2.4.2.4C64x+ CPU
www.ti.com
The C64x+ core uses a two-level cache-based architecture. The Level 1 Program cache (L1P) is 32 KB
direct mapped cache and the Level 1 Data cache (L1D) is 80 KB 2-way set associated cache. The Level 2
memory/cache (L2) consists of a 64 KB memory space that is shared between program and data space.
L2 memory can be configured as mapped memory, cache, or a combination of both.
Table 2-2 shows a memory map of the C64x+ CPU cache registers for the device.
Memory Attribute Registers for EMIFA/VLYNQ Shadow 0x4200 0000 0x4FFF FFFF
2.4.3Peripherals
The DSP has controllability for the following peripherals:
•VICP
•EDMA3
•ASP
•2 Timers (Timer 0 and Timer1) that can each be configured as 1 64-bit or 2 32-bit timers
2.4.4DSP Interrupt Controller
The DSP Interrupt Controller accepts device interrupts and appropriately maps them to the DSP’s
available interrupts. The DSP Interrupt Controller is briefly described in Section 6.7, Interrupts, of this
document. For more detailed on the DSP Interrupt Controller, see the TMS320C64x/C64x+ DSP CPU andInstruction Set Reference Guide (literature number SPRU732).
2.5Memory Map Summary
Table 2-3 shows the memory map address ranges of the device. Table 2-4 depicts the expanded map of
the Configuration Space (0x0180 0000 through 0x0FFF FFFF). The device has multiple on-chip memories
associated with its two processors and various subsystems. To help simplify software development a
unified memory map is used where possible to maintain a consistent view of device resources across all
bus masters.
0x2000 00000x2000 7FFF32KDDR2 Control RegistersDDR2 Control RegistersDDR2 Control RegistersDDR2 Control Registers
0x2000 80000x41FF FFFF 544M-32k ReservedReservedReserved
(1) HPI's access to the configuration bus peripherals is limited to the power and sleep controller registers, PLL1 and PLL2 registers, and
HPI configuration registers.
(2) EMIFA shadow memory started a 0x4200 0000 is physically the same memory as location 0x0200 0000. Memory range 0x200 0000
through 0x09FF FFFF should only be used by C64x+ for data accesses. Memory range 0x4200 0000 through 0x4FFF FFFF can be
used by C64x+ for both code execution and data accesses.
Extensive use of pin multiplexing is used to accommodate the largest number of peripheral functions in
the smallest possible package. Pin multiplexing is controlled using a combination of hardware
configuration at device reset and software programmable register settings. For more information on pin
muxing, see Section 3.5.2, Multiplexed Pin Configurations, of this document.
2.6.1Pin Map (Bottom View)
Figure 2-2 through Figure 2-5 show the bottom view of the package pin assignments in four quadrants (A,
The terminal functions tables (Table 2-5 through Table 2-30) identify the external signal names, the
associated pin (ball) numbers along with the mechanical package designator, the pin type, whether the pin
has any internal pullup or pulldown resistors, and a functional pin description. For more detailed
information on device configuration, peripheral selection, multiplexed/shared pins, see Section 3, DeviceConfigurations, of this data manual.
Table 2-5. BOOT Terminal Functions
SIGNAL
NAMENO.
COUT0/
B3/A16I/O/Z
BTSEL0
COUT1/
B4/B16I/O/Z
BTSEL1
COUT2/bus width (EM_WIDTH). For an 8-bit-wide EMIFA data
B5/A17I/O/Zbus, EM_WIDTH = 0. For a 16-bit-wide EMIFA data bus,
EM_WIDTHEM_WIDTH = 1.
COUT3/source DSP_BT. The DSP is booted by the ARM when
B6/B17I/O/ZDSP_BT=0. The DSP boots from EMIFA when
DSP_BTDSP_BT=1.
YOUT0/
G5/D15I/O/Z
AEAW0
YOUT1/VPBE. At reset, the input states of AEAW[4:0] are
G6/D16I/O/Zsampled to set the EMIFA address bus width. See
AEAW1Section 3.4.2, Peripheral Selection at Device Reset, for
YOUT2/After reset, these are video encoder outputs YOUT[0:4]
G7/D17I/O/Zor RGB666/888 Red and Green data bit outputs G5, G6,
AEAW2G7, R3, and R4.
YOUT3/
R3/D18I/O/Z
AEAW3
YOUT4/
R4/E15I/O/Z
AEAW4
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = Internal pulldown, IPU = Internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal
TYPE
(1)
OTHER
(2) (3)
DESCRIPTION
BOOT
These pins are multiplexed between ARM boot mode and
the VPBE. At reset, the boot mode inputs BTSEL0 and
BTSEL1 are sampled to determine the ARM boot
IPDconfiguration. See below for the boot modes set by these
DV
DD18
inputs. See Section 3.3, Bootmode, for more details.
After reset, these are video encoder outputs COUT0 and
COUT1, or RGB666/888 Blue output data bits 3 and 4
B3/B4.
BTSEL1BTSEL0ARM Boot Mode
ARM ROM Boot (NAND, SPI)
[default]
DV
IPD
DD18
00
01ARM EMIFA Boot (NOR)
10ARM ROM Boot (HPI)
11ARM ROM Boot (UART0)
This pin is multiplexed between EMIFA and the VPBE. At
reset, the input state is sampled to set the EMIFA data
IPD
DV
DD18
After reset, it is video encoder output COUT2 or
RGB666/888 Blue output data bit 5 B5.
This pin is multiplexed between DSP boot and the VPBE.
At reset, the input state is sampled to set the DSP boot
IPD
DV
DD18
After reset, it is video encoder output COUT3 or
RGB666/888 Blue data bit 6 output B6.
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) Specifies the operating I/O supply voltage for each signal
(3) For more information, see Section 5.2, Recommended Operating Conditions.
TYPE
(1)
OTHER
(2)
DESCRIPTION
OSCILLATOR, PLL
Crystal input MXI for MX oscillator (system oscillator, typically 27 MHz). If a crystal
DD18
DD18
(3)
input is not used, but instead a physical clock-in source is supplied, this is the
external oscillator clock input.
Crystal output for MX oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, MXO should be left as a No Connect.
1.8-V power supply for MX oscillator. If a crystal input is not used, but instead a
physical clock-in source is supplied, MXVDDshould still be connected to the 1.8-V
power supply.
(3)
Ground for MX oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, MXVSSshould still be connected to ground.
Crystal input for M24 oscillator (24 MHz for USB). If a crystal input is not used, but
DD18
instead a physical clock-in source is supplied, this is the external oscillator clock
input. When the USB peripheral is not used, M24XI should be left as a No Connect.
Crystal output for M24 oscillator. If a crystal input is not used, but instead a physical
DD18
clock-in source is supplied, M24XO should be left as a No Connect. When the USB
peripheral is not used, M24XO should be left as a No Connect.
1.8-V power supply for M24 oscillator. If a crystal input is not used, but instead a
(3)
physical clock-in source is supplied, M24VDDshould still be connected to the 1.8-V
power supply. When the USB peripheral is not used, M24VDDshould be connected
to the 1.8-V power supply.
(3)
Ground for M24 oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, M24VSSshould still be connected to ground. When the
USB peripheral is not used, M24VSSshould be connected to ground.
(3)
1.8-V power supply for PLLs (system).
www.ti.com
Table 2-7. Clock Generator Terminal Functions
SIGNAL
NAMENO.
CLK_OUT0/
GPIO48
K1I/O/ZDV
CLK_OUT1/This pin is multiplexed between the USB clock generator, timer, and GPIO.
TIM_IN/E19I/O/ZDV
GPIO4912 MHz or 24 MHz clock outputs.
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) Specifies the operating I/O supply voltage for each signal
TYPE
(1)
OTHER
(2)
DESCRIPTION
CLOCK GENERATOR
This pin is multiplexed between the PLL1 clock generator and GPIO.
DD18
DD18
For the PLL1 clock generator, it is clock output CLK_OUT0. This is configurable for
13.5 MHz or 27 MHz clock outputs.
For the USB clock generator, it is clock output CLK_OUT1. This is configurable for
Table 2-8. RESET and JTAG Terminal Functions
SIGNAL
NAMENO.
RESETL4IThis is the active low global reset input.
TMSE6IJTAG test-port mode select input
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = Internal pulldown, IPU = Internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal
Table 2-8. RESET and JTAG Terminal Functions (continued)
SIGNAL
NAMENO.
TDOB5O/ZJTAG test-port data output
TDIA5IJTAG test-port data input
TCKA6IJTAG test-port clock input
RTCKB6O/ZJTAG test-port return clock output
TRSTD7IJTAG compatibility statement portion of this data manual (Section 6.25, IEEE
EMU1C6I/O/ZEmulation pin 1
EMU0D6I/O/ZEmulation pin 0
TYPE
(1)
OTHER
DV
IPU
DV
IPU
DV
DV
IPD
DV
IPU
DV
IPU
DV
(2) (3)
–
DD18
DD18
DD18
–
DD18
DD18
DD18
DD18
DESCRIPTION
JTAG test-port reset. For IEEE 1149.1 JTAG compatibility, see the IEEE 1149.1
1149.1 JTAG).
Table 2-9. EMIFA Terminal Functions
SIGNAL
NAMENO.
COUT2/sampled to set the EMIFA data bus width (EM_WIDTH). For an 8-bit-wide EMIFA
B5/A17I/O/Zdata bus, EM_WIDTH = 0. For a 16-bit-wide EMIFA data bus, EM_WIDTH = 1.
EM_WIDTHAfter reset, it is video encoder output COUT2 or RGB666/888 Blue output data bit 5
COUT3/sampled to set the DSP boot source DSP_BT. The DSP is booted by the ARM when
B6/B17I/O/ZDSP_BT=0. The DSP boots from EMIFA when DSP_BT=1.
DSP_BTAfter reset, it is video encoder output COUT3 or RGB666/888 Blue data bit 6 output
YOUT0/
G5/D15I/O/Z
AEAW0
YOUT1/
G6/D16I/O/Z
AEAW1
YOUT2/of AEAW[4:0] are sampled to set the EMIFA address bus width. See Section 3.4.2,
G7/D17I/O/ZPeripheral Selection at Device Reset, for details.
AEAW2After reset, these are video encoder outputs YOUT[0:4] or RGB666/888 Red and
YOUT3/
R3/D18I/O/Z
AEAW3
YOUT4/
R4/E15I/O/Z
AEAW4
EM_CS2/For EMIFA, this pin is Chip Select 2 output EM_CS2 for use with asynchronous
HCSmemories (i.e., NOR flash) or NAND flash. This is the chip select for the default boot
C2I/O/ZDV
EM_CS3B1I/O/ZDV
TYPE
(1)
(2) (3)
OTHER
EMIFA BOOT CONFIGURATION
This pin is multiplexed between EMIFA and the VPBE. At reset, the input state is
IPD
DV
DD18
B5.
This pin is multiplexed between DSP boot and the VPBE. At reset, the input state is
IPD
DV
DD18
B6.
IPD
DV
DD18
IPD
DV
DD18
These pins are multiplexed between EMIFA and the VPBE. At reset, the input states
IPD
DV
DD18
Green data bit outputs G5, G6, G7, R3, and R4.
IPD
DV
DD18
IPD
DV
DD18
EMIFA FUNCTIONAL PINS: ASYNC / NOR
This pin is multiplexed between EMIFA and HPI.
DD18
and ROM boot modes.
DD18
For EMIFA, this pin is Chip Select 3 output EM_CS3 for use with asynchronous
memories (i.e., NOR flash) or NAND flash.
DESCRIPTION
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = Internal pulldown, IPU = Internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal
EM_CS4/This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
GPIO9/T2I/O/ZDV
VLYNQ_SCRUN(i.e., NOR flash) or NAND flash.
EM_CS5/This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
GPIO8/T1I/O/ZDV
VLYNQ_CLOCK(i.e., NOR flash) or NAND flash.
EM_R/W/
INTRQ/G3I/O/ZDV
HR/W
EM_WAIT/
(RDY/BSY)/IPUThis pin is multiplexed between EMIFA (NAND/SmartMedia/xD), ATA/CF, and HPI.
IORDY/DV
F1I/O/Z
HRDY
EM_OE/
(RE)/
(IORD)/H4I/O/ZDV
DIOR/
HDS1
EM_WE
(WE)
(IOWR)/G2I/O/ZDV
DIOW/
HDS2
EM_BA[0]/
DA0/J3I/O/Z
HINT
EM_BA[1]/
DA1/H2I/O/ZDV
GPIO52
EM_A[21]/
GPIO10/T3I/O/ZDV
VLYNQ_TXD0
EM_A[20]/
GPIO11/R3I/O/ZDV
VLYNQ_RXD0
EM_A[19]/
GPIO12/R4I/O/ZDV
VLYNQ_TXD1
EM_A[18]/
GPIO13/P5I/O/ZDV
VLYNQ_RXD1
EM_A[17]/
GPIO14/R2I/O/ZDV
VLYNQ_TXD2
EM_A[16]/
GPIO15/R5I/O/ZDV
VLYNQ_RXD2
EM_A[15]/
GPIO16/P3I/O/ZDV
VLYNQ_TXD3
EM_A[14]/
GPIO17/P4I/O/ZDV
VLYNQ_RXD3
TYPE
(1)
(2) (3)
OTHER
DD18
DD18
DD18
DD18
DD18
DD18
For EMIFA, it is Chip Select 4 output EM_CS4 for use with asynchronous memories
For EMIFA, it is Chip Select 5 output EM_CS5 for use with asynchronous memories
This pin is multiplexed between EMIFA, ATA/CF, and HPI.
For EMIFA, it is read/write output EM_R/W.
For EMIFA, it is wait state extension input EM_WAIT.
This pin is multiplexed between EMIFA (NAND/SmartMedia/xD), ATA/CF, and HPI.
For EMIFA, it is output enable output EM_OE.
This pin is multiplexed between EMIFA (NAND/SmartMedia/xD), ATA/CF, and HPI.
For NAND/SmartMedia/xD or EMIFA, it is write enable output EM_WE.
DESCRIPTION
This pin is multiplexed between EMIFA, ATA/CF, and HPI.
For EMIFA, this is the Bank Address 0 output (EM_BA[0]).
IPDWhen connected to an 8-bit asynchronous memory, this pin is the lowest order bit of
DV
DD18
the byte address.
When connected to a 16-bit asynchronous memory, this pin has the same function
as EMIF address pin 22 (EM_A[22]).
This pin is multiplexed between EMIFA, ATA/CF, and GPIO.
For EMIFA, this is the Bank Address 1 output EM_BA[1].
DD18
When connected to a 16 bit asynchronous memory this pin is the lowest order bit of
the 16-bit word address.
When connected to an 8-bit asynchronous memory, this pin is the 2nd bit of the
address.
DD18
DD18
DD18
DD18
DD18
DD18
DD18
DD18
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 21 output EM_A[21].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 20 output EM_A[20].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 19 output EM_A[19].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 18 output EM_A[18].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 17 output EM_A[17].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 16 output EM_A[16].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 15 output EM_A[15].
This pin is multiplexed between EMIFA, GPIO, and VLYNQ.
For EMIFA, it is address bit 14 output EM_A[14].