•Six ALUs (32-/40-Bit), Each Supports
Single 32-Bit, Dual 16-Bit, or Quad 8-Bit
Arithmetic per Clock Cycle– 16K-Byte Instruction Cache
•Two Multipliers Support Four 16 x 16-Bit– 8K-Byte Data Cache
Multiplies (32-Bit Results) per Clock
Cycle or Eight 8 x 8-Bit Multiplies (16-Bit
Results) per Clock Cycle
– Load-Store Architecture With Non-Aligned
Support
– 64 32-Bit General-Purpose Registers
– Instruction Packing Reduces Code Size
– All Instructions Conditional
– Additional C64x+™ Enhancements
•Protected Mode Operation
•Exceptions Support for Error Detection
and Program Redirection
•Hardware Support for Modulo Loop
Operation
• C64x+ Instruction Set Features
– Byte-Addressable (8-/16-/32-/64-Bit Data)
– 8-Bit Overflow Protection
– Bit-Field Extract, Set, Clear
– Normalization, Saturation, Bit-Counting
– Compact 16-Bit Instructions
– Additional Instructions to Support Complex
Multiplies
Mapped)
Set-Associative)
– 64K-Byte L2 Unified Mapped RAM/Cache
(Flexible RAM/Cache Allocation)
• ARM926EJ-S Core
– Support for 32-Bit and 16-Bit (Thumb®
Mode) Instruction Sets
– DSP Instruction Extensions and Single Cycle
MAC
– ARM® Jazelle® Technology
– Embedded ICE-RT™ Logic for Real-Time
Debug
• ARM9 Memory Architecture
– 16K-Byte RAM
– 8K-Byte ROM
• Embedded Trace Buffer™ (ETB11™) With 4KB
Memory for ARM9 Debug
• Endianness: Little Endian for ARM and DSP
• Video Imaging Co-Processor (VICP)
• Video Processing Subsystem
– Front End Provides:
•CCD and CMOS Imager Interface
•BT.601/BT.656 Digital YCbCr 4:2:2
(8-/16-Bit) Interface
•Preview Engine for Real-Time Image
Processing
•Glueless Interface to Common Video
Decoders
•Histogram Module
•Auto-Exposure, Auto-White Balance, and
Auto-Focus Module
•Resize Engine
– Resize Images From 1/4x to 4x
– Separate Horizontal/Vertical Control
1
Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2All trademarks are the property of their respective owners.
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of the Texas
Instruments standard warranty. Production processing does not
necessarily include testingof all parameters.
• Audio Serial Port (ASP)
– I2S
– AC97 Audio Codec Interface
– Standard Voice Codec Interface (AIC12)
(Progressive)• 10/100 Mb/s Ethernet MAC (EMAC)
•Digital Output– IEEE 802.3 Compliant
– 8-/16-bit YUV or up to 24-Bit RGB– Media Independent Interface (MII)
– HD Resolution• VLYNQ™ Interface (FPGA Interface)
– Up to Two Video Windows• Host Port Interface (HPI) with 16-Bit
The TMS320DM6441 (also referenced as DM6441) leverages TI’s DaVinci™ technology to meet the
networked media encode and decode application processing needs of next-generation embedded devices.
The DM6441 enables OEMs and ODMs to quickly bring to market devices featuring robust operating
systems support, rich user interfaces, high processing performance, and long battery life through the
maximum flexibility of a fully integrated mixed processor solution.
The dual-core architecture of the DM6441 provides benefits of both DSP and Reduced Instruction Set
Computer (RISC) technologies, incorporating a high-performance TMS320C64x+ DSP core and an
ARM926EJ-S core.
The ARM926EJ-S is a 32-bit RISC processor core that performs 32-bit or 16-bit instructions and
processes 32-bit, 16-bit, or 8-bit data. The core uses pipelining so that all parts of the processor and
memory system can operate continuously.
The ARM core incorporates:
•A coprocessor 15 (CP15) and protection module
•Data and program memory management units (MMUs) with table look-aside buffers.
•Separate 16K-byte instruction and 8K-byte data caches. Both are four-way associative with virtual
index virtual tag (VIVT).
The TMS320C64x+™DSPs arethe highest-performancefixed-point DSPgeneration inthe
TMS320C6000™ DSP platform. It is based on an enhanced version of the second-generation
high-performance, advanced very-long-instruction-word(VLIW) architecture developed byTexas
Instruments (TI), making these DSP cores an excellent choice for digital media applications. The C64x is a
code-compatible member of the C6000™ DSP platform. The TMS320C64x+ DSP is an enhancement of
the C64x+ DSP with added functionality and an expanded instruction set.
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
Any reference to the C64x DSP or C64x CPU also applies, unless otherwise noted, to the C64x+ DSP and
C64x+ CPU, respectively.
With performance of up to 4104 million instructions per second (MIPS) at a clock rate of 513 MHz, the
C64x+ core offers solutions to high-performance DSP programming challenges. The DSP core possesses
the operational flexibility of high-speed controllers and the numerical capability of array processors. The
C64x+ DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly
independent functional units—two multipliers for a 32-bit result and six arithmetic logic units (ALUs). The
eight functional units include instructions to accelerate the performance in video and imaging applications.
The DSP core can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2052 million
MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 4104 MMACS. For more details
on the C64x+ DSP, see the TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide
(literature number SPRU732).
The DM6441 also has application-specific hardware logic, on-chip memory, and additional on-chip
peripherals similar to the other C6000 DSP platform devices. The DM6441 core uses a two-level
cache-based architecture. The Level 1 program cache (L1P) is a 256K-bit direct mapped cache and the
Level 1 data cache (L1D) is a 640K-bit 2-way set-associative cache. The Level 2 memory/cache (L2)
consists of an 512K-bit memory space that is shared between program and data space. L2 memory can
be configured as mapped memory, cache, or combinations of the two.
The peripheral set includes: two configurable video ports; a 10/100 Mb/s Ethernet MAC (EMAC) with a
management data input/output (MDIO) module; an inter-integrated circuit (I2C) bus interface; one audio
serial port (ASP); two 64-bit general-purpose timers each configurable as two independent 32-bit timers;
one 64-bit watchdog timer; up to 71 pins of general-purpose input/output (GPIO) with programmable
interrupt/event generation modes, multiplexed with other peripherals; three UARTs with hardware
handshaking support on one UART; three pulse width modulator (PWM) peripherals; and two external
memory interfaces: an asynchronous external memory interface (EMIFA) for slower memories/peripherals,
and a higher speed synchronous memory interface for DDR2.
The DM6441 device includes a video processing subsystem (VPSS) with two configurable video/imaging
peripherals: one video processing front-end (VPFE) input used for video capture, one video processing
back-end (VPBE) output with imaging coprocessor (VICP) used for display.
The video processing front-end (VPFE) consists of a CCD controller (CCDC), a preview engine
(previewer), histogram module, auto-exposure/white balance/focus module (H3A), and resizer. The CCDC
is capable of interfacing to common video decoders, CMOS sensors, and charge coupled devices (CCDs).
The previewer is a real-time image processing engine that takes raw imager data from a CMOS sensor or
CCD and converts from an RGB Bayer pattern to YUV4:2:2. The histogram and H3A modules provide
statistical information on the raw color data for use by the DM6441. The resizer accepts image data for
separate horizontal and vertical resizing from 1/4x to 4x in increments of 256/N, where N is between 64
and 1024.
The video processing back-end (VPBE) consists of an on-screen display engine (OSD) and a video
encoder (VENC). The OSD engine is capable of handling two separate video windows and two separate
OSD windows. Other configurations include two video windows, one OSD window, and one attribute
window allowing up to eight levels of alpha blending. The VENC provides four analog DACs that run at
54 MHz, providing a means for composite NTSC/PAL video, S-Video, and/or component video output. The
VENC also provides up to 24 bits of digital output to interface to RGB888 devices. The digital output is
capable of 8/16-bit BT.656 output and/or CCIR.601 with separate horizontal and vertical syncs. VFocus
(part of the VPBE functionality and operationally (e.g., 16-bit multiplexed address/data) is also provided.
The Ethernet media access controller (EMAC) provides an efficient interface between the DM6441 and the
network. The DM6441 EMAC support both 10Base-T and 100Base-TX, or 10 Mbits/second (Mbps) and
100 Mbps in either half- or full-duplex mode, with hardware flow control and quality of service (QOS)
support.
www.ti.com
The management data input/output (MDIO) module continuously polls all 32 MDIO addresses in order to
enumerate all PHY devices in the system. Once a PHY candidate has been selected by the ARM, the
MDIO module transparently monitors its link state by reading the PHY status register. Link change events
are stored in the MDIO module and can optionally interrupt the ARM, allowing the ARM to poll the link
status of the device without continuously performing costly MDIO accesses.
The HPI, I2C, SPI, USB2.0, and VLYNQ ports allow DM6441 to easily control peripheral devices and/or
communicate with host processors. The DM6441 also provides Memory Stick/Memory Stick PRO card
support, MMC/SD with SDIO support, and a universal serial bus (USB).
The DM6441 also includes a video/imaging coprocessor (VICP) to offload many video and imaging
processing tasks from the DSP core, making more DSP MIPS available for common video and imaging
algorithms. For more information on the VICP enhanced codecs, such as H.264 and MPEG4, please
contact your nearest TI sales representative.
The rich peripheral set provides the ability to control external peripheral devices and communicate with
external processors. For details on each of the peripherals, see the related sections later in this document
and the associated peripheral reference guides listed in Section 2.8.3.1, Related Documentation FromTexas Instruments.
The DM6441 has a complete set of development tools for both the ARM and DSP. These include C
compilers, a DSP assembly optimizer to simplify programming and scheduling, and a Windows™
debugger interface for visibility into source code execution.
This data manual revision history highlights the technical changes made to the SPRS359D device-specific
data manual to make it an SPRS359E revision.
Scope: Added information/data on silicon revision 2.3.
Applicable updates to the DM644x device family, specifically relating to the TMS320DM6441 device, have
been incorporated.
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
Revision History
NOTE: Page numbers for previous revisions may differ from page numbers in the current version.
•Added "In host boot mode, the ARM is the master and controls the reset and boot of the C64x+ ..."
paragraph
Submit Documentation Feedback
Product Folder Link(s): TMS320DM6441
TMS320DM6441
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
TMS320DM6441 Revision History (continued)
SEEADDITIONS/MODIFICATIONS/DELETIONS
Section 3.5.1Table 3-12, DM6441 Default Bus Master Priorities:
Switched Central
Resource (SCR) Bus
Priorities
Section 3.5.4
PINMUX0 Register
Description
•Added, for clarity, ", DMA_PRI bit fields" to the VPSSP Default Priority Level description [Cleared
Documentation Feedback Issue]
•Added "[For more detailed information ..." statement to the VPSSP, EDMATC0P, EDMATC1P, and
C64X+_DMAP rows
•Added "(MSTPRI1 Register)" to the HPIP row
•Removed VICPP row with Default Priority Level of 4
Figure 3-6, MSTPRI1 Register:
•Updated/changed the bit field of bits 22:20 from "RESERVED" to "HPIP"
•Updated/changed the default value of bits 22:20 from "R-100" to "R/W-100"
•"The PINMUX0 pin multiplexing register controls which peripheral is given ownership ..." paragraph:
–Updated/changed "... ownership over shared pins among EMAC, CCD, LCD, RGB888, RGB666,
ATA, VLYNQ, EMIFA, and GPIO peripherals" to "... ownership over shared pins among EMAC,
CCD, LCD, RGB888, RGB666, ATA, VLYNQ, EMIFA, HPI, and GPIO peripherals"
Figure 3-7, PINMUX0 Register:
•Bits 4–0: Updated/changed "R/W-LLLL" to "R/W-LLLLL"
•Updated/changed footnote from "For proper DM6441 device operation, always write a value of '0' to
RSV bits 30 and 29" to "For proper DM6441 device operation, always write a value of '0' to
RSV bit 30"
Table 3-14, PINMUX0 Register Field Descriptions:
•Updated/changed the description of Bit 29 (HPIEN) [Cleared Documentation Feedback Issue]
Section 6.6.3Table 6-19, Switching Characteristics Over Recommended Operating Conditions for CLK_OUT1:
Clock PLL Electrical
Data/Timing (Input and
Output Clocks)
•Removed "For proper DM6441 device operation, always write a value of '0' to RSV bit 9" footnote
•Bits 15–13: Updated/changed "R-0000 00" to "R-000"
•"Measured under the following conditions:" footnote:
–Added "For more details on core and I/O activity, as well as information relevant to board power
supply design, see the TMS320DM6441 Power Consumption Summary application report
(literature number SPRAAU3)."
•PLL Controller 2: Updated/changed "PLLDIV1 (/1)" to "PLLDIV1 (/10)"
•Updated/changed "EDMA" to "EDMA3"
•Updated/changed address range "0x01C4 1004 through 0x01C4 1014 to "Reserved"
•Updated/changed address range "0x01C4 1100 through 0x01C4 111F to "Reserved"
•Updated/changed address range "0x01C4 1308 through 0x01C4 17FF to "Reserved"
•Updated/changed the pins specified in the Z Group [Cleared Documentation Feedback Issue]
•Parameter 1 (tC): Added "ns" in UNIT column
Section 6.10.1.2Table 6-35, Switching Characteristics Over Recommended Operating Conditions for Asynchronous
EMIFA ElectricalMemory Cycles for EMIFA Module:
Data/Timing
Table 2-1 provides an overview of the TMS320DM6441 SoC. The table shows significant features of the
device, including the capacity of on-chip RAM, peripherals, internal peripheral bus frequency relative to the
C64x+ DSP, and the package type with pin count.
Table 2-1. Characteristics of the Processor
HARDWARE FEATURESDM6441
DDR2 Memory ControllerDDR2 (16/32-bit bus width)
Asynchronous EMIF (EMIFA)
Flash Cards
EDMA3
Timersseparate 32-bit timers)
Peripherals
Not all peripherals pins
are available at the
same time. (For more
details, see Section 3,
Device Configurations.)
On-Chip Memory
CPU ID + CPU Rev ID Control Status Register (CSR.[31:16])0x1000
C64x+ MegamoduleRevision ID Register (MM_REVID[15:0])0x0000 (Silicon Revision 1.3 and earlier)
Table 2-1. Characteristics of the Processor (continued)
HARDWARE FEATURESDM6441
Cycle Timens
Voltage
PLL Optionsx1 (bypass), x15 (1.05 V), x19 (1.2 V)
BGA Package361-pin BGA (ZWT)
Process Technologyµm0.09 µm
Product Status
(1) PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of the Texas
Instruments standard warranty. Production processing does not necessarily include testing of all parameters.
(1)
Core (V)1.05 V, 1.2 V
I/O (V)1.8 V, 3.3 V
CLKIN frequency multiplier
(27 MHz reference)
16 x 16 mm
ball finish SnAgCu
Product Preview (PP),
Advance Information (AI),PD
Production Data (PD)
DSP 2.47 ns, ARM 4.94 ns at 1.05 V
DSP 1.9 ns, ARM 3.9 ns at 1.2V
2.2Device Compatibility
The ARM926EJ-S RISC CPU is compatible with other ARM9 CPUs from ARM Holdings plc.
The C64x+ DSP core is code-compatible with the C6000™ DSP platform and supports features of the
C64x DSP family.
2.3ARM Subsystem
The ARM subsystem is designed to give the ARM926EJ-S (ARM9) master control of the device. In
general, the ARM is responsible for configuration and control of the device; including the DSP subsystem,
the VPSS subsystem, and a majority of the peripherals and external memories.
The ARM subsystem includes the following features:
•ARM926EJ-S RISC processor
•ARMv5TEJ (32/16-bit) instruction set
•Little endian
•Coprocessor 15 (CP15)
•MMU
•16KB instruction cache
•8KB data cache
•Write buffer
•16KB internal RAM (32-bit-wide access)
•8KB internal ROM (ARM bootloader for non-EMIFA boot options)
•Embedded trace module and embedded trace buffer (ETM/ETB)
The ARM subsystem integrates the ARM926EJ-S processor. The ARM926EJ-S processor is a member of
ARM9 family of general-purpose microprocessors. This processor is targeted at multi-tasking applications
where full memory management, high performance, low die size, and low power are all important. The
ARM926EJ-S processor supports the 32-bit ARM and 16-bit THUMB instruction sets, enabling the user to
trade off between high performance and high code density. Specifically, the ARM926EJ-S processor
supports the ARMv5TEJ instruction set, which includes features for efficient execution of Java byte codes,
providing Java performance similar to Just in Time (JIT) Java interpreter, but without associated code
overhead.
The ARM926EJ-S processor supports the ARM debug architecture and includes logic to assist in both
hardware and software debug. The ARM926EJ-S processor has a Harvard architecture and provides a
complete high performance subsystem, including:
•ARM926EJ -S integer core
•CP15 system control coprocessor
•Memory management unit (MMU)
•Separate instruction and data caches
•Write buffer
•Separate instruction and data tightly-coupled memories (TCMs) [internal RAM] interfaces
•Separate instruction and data AHB bus interfaces
•Embedded trace module and embedded trace buffer (ETM/ETB)
www.ti.com
For more complete details on the ARM9, refer to the ARM926EJ-S Technical Reference Manual, available
at http://www.arm.com.
2.3.2CP15
The ARM926EJ-S system control coprocessor (CP15) is used to configure and control instruction and
data caches, tightly-coupled memories (TCMs), memory management unit (MMU), and other ARM
subsystem functions. The CP15 registers are programmed using the MRC and MCR ARM instructions,
when the ARM in a privileged mode such as supervisor or system mode.
2.3.3MMU
The ARM926EJ-S MMU provides virtual memory features required by operating systems such as Linux™,
WindowCE®, Ultron®, ThreadX®, etc. A single set of two level page tables stored in main memory is used
to control the address translation, permission checks and memory region attributes for both data and
instruction accesses. The MMU uses a single unified translation lookaside buffer (TLB) to cache the
information held in the page tables. The MMU features are:
•Standard ARM architecture v4 and v5 MMU mapping sizes, domains and access protection scheme.
The size of the instruction cache is 16KB, data cache is 8KB. Additionally, the caches have the following
features:
•Virtual index, virtual tag, and addressed using the modified virtual address (MVA)
•Four-way set associative, with a cache line length of eight words per line (32-bytes per line) and with
two dirty bits in the Dcache
•Dcache supports write-through and write-back (or copy back) cache operation, selected by memory
region using the C and B bits in the MMU translation tables.
•Critical-word first cache refilling
•Cache lockdown registers enable control over which cache ways are used for allocation on a line fill,
providing a mechanism for both lockdown, and controlling cache corruption
•Dcache stores the physical address TAG (PA TAG) corresponding to each Dcache entry in the TAG
RAM for use during the cache line write-backs, in addition to the virtual address TAG stored in the
TAG RAM. This means that the MMU is not involved in Dcache write-back operations, removing the
possibility of TLB misses related to the write-back address.
•Cache maintenance operations provide efficient invalidation of, the entire Dcache or Icache, regions of
the Dcache or Icache, and regions of virtual memory.
The write buffer is used for all writes to a noncachable bufferable region, write-through region and write
misses to a write-back region. A separate buffer is incorporated in the Dcache for holding write-back for
cache line evictions or cleaning of dirty cache lines. The main write buffer has 16-word data buffer and a
four-address buffer. The Dcache write-back has eight data word entries and a single address entry.
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
2.3.5Tightly Coupled Memory (TCM)
ARM internal RAM is provided for storing real-time and performance-critical code/data and the interrupt
vector table. ARM internal ROM enables non-EMIFA boot options, such as NAND and UART. The RAM
and ROM memories interfaced to the ARM926EJ-S via the tightly coupled memory interface that provides
for separate instruction and data bus connections. Since the ARM TCM does not allow instructions on the
D-TCM bus or data on the I-TCM bus, an arbiter is included so that both data and instructions can be
stored in the internal RAM/ROM. The arbiter also allows accesses to the RAM/ROM from extra-ARM
sources (e.g., EDMA3 or other masters). The ARM926EJ-S has built-in DMA support for direct accesses
to the ARM internal memory from a non-ARM master. Because of the time-critical nature of the TCM link
to the ARM internal memory, all accesses from non-ARM devices are treated as DMA transfers.
Instruction and data accesses are differentiated via accessing different memory map regions, with the
instruction region from 0x0000 through 0x7FFF and data from 0x8000 through 0xFFFF. The instruction
region at 0x0000 and data region at 0x8000 map to the same physical 16K-byte TCM RAM. Placing the
instruction region at 0x0000 is necessary to allow the ARM interrupt vector table to be placed at 0x0000,
as required by the ARM architecture. The internal 16K-byte RAM is split into two physical banks of 8KB
each, which allows simultaneous instruction and data accesses to be accomplished if the code and data
are in separate banks.
2.3.6Advanced High-performance Bus (AHB)
The ARM subsystem uses the AHB port of the ARM926EJ-S to connect the ARM to the config bus and
the external memories. Arbiters are employed to arbitrate access to the separate D-AHB and I-AHB by the
config bus and the external memories bus.
2.3.7Embedded Trace Macrocell (ETM) and Embedded Trace Buffer (ETB)
To support real-time trace, the ARM926EJ-S processor provides an interface to enable connection of an
embedded trace macrocell (ETM). The ARM926ES-J subsystem in the DM6441 also includes the
embedded trace buffer (ETB). The ETM consists of two parts:
•Trace port provides real-time trace capability for the ARM9.
•Triggering facilities provide trigger resources, which include address and data comparators, counter,
and sequencers.
The DM6441 trace port is not pinned out and is instead only connected to the embedded trace buffer. The
ETB has a 4K-byte buffer memory. ETB enabled debug tools are required to read/interpret the captured
trace data.
2.3.8ARM Memory Mapping
The ARM memory map is shown in Section 2.5, Memory Map Summary, of this document. The ARM has
access to memories shown in the following sections.
2.3.8.1ARM Internal Memories
The ARM has access to the following ARM internal memories:
•16KB ARM internal RAM on TCM interface, logically separated into two 8-KB pages to allow
simultaneous access on any given cycle if there are separate accesses for code (I-TCM bus) and data
(D-TCM) to the different memory regions.
•8KB ARM internal ROM
www.ti.com
2.3.8.2External Memories
The ARM has access to the following external memories:
•DDR2 synchronous DRAM
•Asynchronous EMIF / NOR flash / NAND flash
•ATA/CF
•Flash card devices:
– MMC/SD with SDIO
– Memory Stick/Memory Stick PRO
– xD
– SmartMedia
2.3.8.3DSP Memories
The ARM has access to the following DSP memories:
•L2 RAM
•L1P RAM
•L1D RAM
2.3.8.4VICP Registers and Memories
The ARM has access to the registers and memories of the video/imaging coprocessor (VICP) subsystem.
DM6441 ARM and DSP integration features are as follows:
•DSP visibility from ARM’s memory map, see Section 2.5, Memory Map Summary, for details
•Boot modes for DSP - see Device Configurations section, Section 3.3.3, DSP Boot, for details
•ARM control of DSP boot / reset - see Device Configurations section, Section 3.3.2, ARM Boot, for
details
•ARM control of DSP isolation and powerdown / powerup - see Section 3, Device Configurations, for
details
•ARM & DSP Interrupts - see Section 6.7.1, ARM CPU Interrupts, and Section 6.7.2, DSP Interrupts, for
details
2.3.9Peripherals
The ARM9 has access to all of the peripherals on the DM6441 device with the exception of the VICP.
2.3.10 PLL Controller (PLLC)
The ARM subsystem includes the PLL controller. The PLL controller contains a set of registers for
configuring DM6441’s two internal PLLs (PLL1 and PLL2). The PLL controller provides the following
configuration and control:
•PLL bypass mode
•Set PLL multiplier parameters
•Set PLL divider parameters
•PLL power down
•Oscillator power down
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
The PLLs are briefly described in this document in Section 6.6, Clock PLLs. For more detailed information
on the PLLs and PLL Controller register descriptions, see the TMS320DM644x DMSoC ARM SubsystemReference Guide (literature number SPRUE14).
2.3.11 Power and Sleep Controller (PSC)
The ARM subsystem includes the power and sleep controller (PSC). Through register settings accessible
by the ARM9, the PSC provides two levels of power savings: peripheral/module clock gating and power
domain shut-off. Brief details on the PSC are given in Section 6.3, Power Supplies. For more detailed
information and complete register descriptions for the PSC, see the TMS320DM644x DMSoC ARMSubsystem Reference Guide (literature number SPRUE14).
2.3.12 ARM Interrupt Controller (AINTC)
The ARM interrupt controller (AINTC) accepts device interrupts and maps them to either the ARM’s IRQ
(interrupt request) or FIQ (fast interrupt request). The ARM interrupt controller is briefly described in this
document in the Interrupts section. For detailed information on the ARM interrupt controller, see the
TMS320DM644x DMSoC ARM Subsystem Reference Guide (literature number SPRUE14).
2.3.13 System Module
The ARM subsystem includes the system module. The system module consists of a set of registers for
configuring and controlling a variety of system functions. For details and register descriptions for the
system module, see Section 3, Device Configurations, and see the TMS320DM644x DMSoC ARMSubsystem Reference Guide (literature number SPRUE14).
DM6441 has several means of managing power consumption. There is extensive use of clock gating,
which reduces the power used by global device clocks and individual peripheral clocks. Clock
management can be utilized to reduce clock frequencies in order to reduce switching power. For more
details on power management techniques, see Section 3, Device Configurations, Section 6, Peripheraland Electrical Specifications, and see the TMS320DM644x DMSoC ARM Subsystem Reference Guide
(literature number SPRUE14).
DM6441 gives the programmer full flexibility to use any and all of the previously mentioned capabilities to
customize an optimal power management strategy. Several typical power management scenarios are
described in the following sections.
2.4DSP Subsystem
The DSP subsystem includes the following features:
•C64x+ DSP CPU
•32KB L1 program (L1P)/cache (up to 32KB)
•80KB L1 data (L1D)/cache (up to 32KB)
•64KB unified mapped RAM/cache (L2)
•Little endian
2.4.1C64x+ DSP CPU Description
www.ti.com
The C64x+ central processing unit (CPU) consists of eight functional units, two register files, and two data
paths as shown in Figure 2-1. The two general-purpose register files (A and B) each contain 32 32-bit
registers for a total of 64 registers. The general-purpose registers can be used for data or can be data
address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit data,
40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are stored
in register pairs, with the 32 LSBs of data placed in an even register and the remaining eight or 32 MSBs
in the next upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
The C64x+ CPU extends the performance of the C64x core through enhancements and new features.
Each C64x+ .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x
32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with
add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four
16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for
Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and
modems require complex multiplication. The complex multiply (CMPY) instruction takes four 16-bit inputs
and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding
capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The
32 x 32 bit multiply instructions provide the extended precision necessary for audio and other
high-precision algorithms on a variety of signed and unsigned 32-bit data types.
The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a
pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C64x+ core enhances the .S unit in several ways. In the C64x core, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C64x+ core they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
Other new features include:
•SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where
•Compact instructions - The native instruction size for the C6000 devices is 32 bits. Many common
•Instruction set enhancement - As noted above, there are new instructions such as 32-bit
•Exceptions handling - Intended to aid the programmer in isolating bugs. The C64x+ CPU is able to
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
•Time-stamp counter - Primarily targeted for real-time operating system (RTOS) robustness, a
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C64x+
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as a watchdog time expiration).
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
free-running time-stamp counter is implemented in the CPU which is not sensitive to system stalls.
For more details on the C64x+ CPU and its enhancements over the C64x architecture, see the following
documents:
•TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (literature number SPRU732)
•TMS320C64x Technical Overview (literature number SPRU395)
A. On .M unit, dst2 is 32 MSB.
B. On .M unit, dst1 is 32 LSB.
C. On C64x CPU .M unit, src2 is 32 bits; on C64x+ CPU .M unit, src2 is 64 bits.
D. On .L and .S units, odd dst connects to odd register files and even dst connects to even register files.
TMS320DM6441
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
www.ti.com
Figure 2-1. TMS320C64x+™ CPU (DSP Core) Data Paths
The DSP memory map is shown in Table 2-3. Configuration of the control registers for DDR2, EMIFA, and
ARM internal RAM is supported by the ARM. The DSP has access to memories shown in the following
sections.
2.4.2.1ARM Internal Memories
The DSP has access to the 16KB ARM internal RAM on the ARM D-TCM interface (i.e., data only).
2.4.2.2External Memories
The DSP has access to the following external memories:
•DDR2 synchronous DRAM
•Asynchronous EMIF / NOR Flash
2.4.2.3DSP Internal Memories
The DSP has access to the following DSP memories:
•L2 RAM
•L1P RAM
•L1D RAM
2.4.2.4C64x+ CPU
SPRS359E–SEPTEMBER 2006–REVISED AUGUST 2010
The C64x+ core uses a two-level cache-based architecture. The Level 1 program cache (L1P) is 32 KB
direct mapped cache and the Level 1 data cache (L1D) is 80 KB 2-way set associated cache. The Level 2
memory/cache (L2) consists of a 64 KB memory space that is shared between program and data space.
L2 memory can be configured as mapped memory, cache, or a combination of both.
Table 2-2 shows a memory map of the C64x+ CPU cache registers for the device.
Memory attribute registers for EMIFA/VLYNQ shadow 0x4200 0000 0x4FFF FFFF
www.ti.com
2.4.3Peripherals
The DSP has controllability for the following peripherals:
•VICP
•EDMA3
•ASP
•Two Timers (Timer 0 and Timer1) that can each be configured as one 64-bit or two 32-bit timers
2.4.4DSP Interrupt Controller
The DSP interrupt controller accepts device interrupts and appropriately maps them to available DSP
interrupts. The DSP interrupt controller is briefly described in this document in the Interrupts section. For
more detailed on the DSP interrupt controller, see the TMS320C64x/C64x+ DSP CPU and Instruction SetReference Guide (literature number SPRU732).
2.5Memory Map Summary
Table 2-3 shows the memory map address ranges of the device. Table 2-4 depicts the expanded map of
the configuration space (0x0180 0000 through 0x0FFF FFFF). The device has multiple on-chip memories
associated with its two processors and various subsystems. To help simplify software development a
unified memory map is used where possible to maintain a consistent view of device resources across all
bus masters.
(1) EMIFA shadow memory started a 0x4200 0000 is physically the same memory as location 0x0200 0000. Memory range 0x200 0000
through 0x09FF FFFF should only be used by C64x+ for data accesses. Memory range 0x4200 0000 through 0x4FFF FFFF can be
used by C64x+ for both code execution and data accesses.
Extensive use of pin multiplexing is used to accommodate the largest number of peripheral functions in
the smallest possible package. Pin multiplexing is controlled using a combination of hardware
configuration at device reset and software programmable register settings. For more information on pin
muxing, see Section 3.5.2, Multiplexed Pin Configurations, of this document.
2.6.1Pin Map (Bottom View)
Figure 2-2 through Figure 2-5 show the bottom view of the package pin assignments in four quadrants (A,
The terminal functions tables (Table 2-5 through Table 2-30) identify the external signal names, the
associated pin (ball) numbers along with the mechanical package designator, the pin type, whether the pin
has any internal pullup or pulldown resistors, and a functional pin description. For more detailed
information on device configuration, peripheral selection, multiplexed/shared pin, and debugging
considerations, see Section 3, Device Configurations, of this data manual.
Table 2-5. BOOT Terminal Functions
SIGNAL
NAMENO.
TYPE
(1)
COUT0/
B3/A16I/O/Z
BTSEL0
COUT1/
B4/B16I/O/Z01ARM EMIFA boot (NOR)
BTSEL1
COUT2/
B5/A17I/O/Z
EM_WIDTH
COUT3/
B6/B17I/O/Z
DSP_BT
YOUT0/
G5/D15I/O/Z
AEAW0
YOUT1/
G6/D16I/O/Z
AEAW1
YOUT2/input states of AEAW[4:0] are sampled to set the EMIFA address bus
G7/D17I/O/Zwidth. See Section 3.4.2, Peripheral Selection at Device Reset, for details.
AEAW2After reset, these are video encoder outputs YOUT[0:4] or RGB666/888
YOUT3/
R3/D18I/O/Z
AEAW3
YOUT4/
R4/E15I/O/Z
AEAW4
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = internal pulldown, IPU = internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal
(2)
OTHER
(3)
DESCRIPTION
BOOT
These pins are multiplexed between ARM boot mode and the VPBE. At
reset, the boot mode inputs BTSEL0 and BTSEL1 are sampled to
IPDdetermine the ARM boot configuration. See below for the boot modes set
DV
DD18
by these inputs. See Section 3.3, Bootmode, for more details.
After reset, these are video encoder outputs COUT0 and COUT1, or
RGB666/888 Blue output data bits 3 and 4 B3/B4.
BTSEL1BTSEL0ARM Boot Mode
00ARM ROM boot (NAND, SPI) [default]
IPD
DV
DD18
10ARM ROM boot (HPI)
11ARM ROM boot (UART0)
This pin is multiplexed between EMIFA and the VPBE. At reset, the input
state is sampled to set the EMIFA data bus width (EM_WIDTH). For an
IPD8-bit-wide EMIFA data bus, EM_WIDTH = 0. For a 16-bit-wide EMIFA data
DV
DD18
bus, EM_WIDTH = 1.
After reset, it is video encoder output COUT2 or RGB666/888 Blue output
data bit 5 B5.
This pin is multiplexed between DSP boot and the VPBE. At reset, the
input state is sampled to set the DSP boot source DSP_BT. The DSP is
IPDbooted by the ARM when DSP_BT=0. The DSP boots from EMIFA when
DV
DD18
DSP_BT=1.
After reset, it is video encoder output COUT3 or RGB666/888 Blue data
bit 6 output B6.
IPD
DV
DD18
IPD
DV
DD18
These pins are multiplexed between EMIFA and the VPBE. At reset, the
IPD
DV
DD18
Red and Green data bit outputs G5, G6, G7, R3, and R4.
F17GNDclock-in source is supplied, M24VSSshould still be connected to ground. When the
M2S
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) Specifies the operating I/O supply voltage for each signal
(3) For more information, see Section 5.2, Recommended Operating Conditions.
(4) For more information, see Section 5.2, Recommended Operating Conditions.
TYPE
(1)
OTHER
(2)
DESCRIPTION
OSCILLATOR, PLL
Crystal input MXI for MX oscillator (system oscillator, typically 27 MHz). If a crystal
DD18
DD18
(3)
input is not used, but instead a physical clock-in source is supplied, this is the
external oscillator clock input.
Crystal output for MX oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, MXO should be left as a No Connect.
1.8-V power supply for MX oscillator. If a crystal input is not used, but instead a
physical clock-in source is supplied, MXVDDshould still be connected to the 1.8-V
power supply.
(3)
Ground for MX oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, MXVSSshould still be connected to ground.
Crystal input for M24 oscillator (24 MHz for USB). If a crystal input is not used, but
DD18
instead a physical clock-in source is supplied, this is the external oscillator clock
input. When the USB peripheral is not used, M24XI should be left as a No Connect.
Crystal output for M24 oscillator. If a crystal input is not used, but instead a physical
DD18
clock-in source is supplied, M24XO should be left as a No Connect. When the USB
peripheral is not used, M24XO should be left as a No Connect.
1.8-V power supply for M24 oscillator. If a crystal input is not used, but instead a
(3)
physical clock-in source is supplied, M24VDDshould still be connected to the 1.8-V
power supply. When the USB peripheral is not used, M24VDDshould be connected
to the 1.8-V power supply.
(4)
Ground for M24 oscillator. If a crystal input is not used, but instead a physical
USB peripheral is not used, M24VSSshould be connected to ground.
(4)
1.8-V power supply for PLLs (system).
Table 2-7. Clock Generator Terminal Functions
SIGNAL
NAMENO.
CLK_OUT0/
GPIO48
K1I/O/ZDV
CLK_OUT1/This pin is multiplexed between the USB clock generator, timer, and GPIO.
TIM_IN/E19I/O/ZDV
GPIO4912 MHz or 24 MHz clock outputs.
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) Specifies the operating I/O supply voltage for each signal
TYPE
(1)
OTHER
(2)
DESCRIPTION
CLOCK GENERATOR
This pin is multiplexed between the PLL1 clock generator and GPIO.
DD18
DD18
For the PLL1 clock generator, it is clock output CLK_OUT0. This is configurable for
13.5 MHz or 27 MHz clock outputs.
For the USB clock generator, it is clock output CLK_OUT1. This is configurable for
RESETL4IThis is the active low global reset input.
TMSE6IJTAG test-port mode select input
TDOB5O/ZJTAG test-port data output
TDIA5IJTAG test-port data input
TCKA6IJTAG test-port clock input
RTCKB6O/ZJTAG test-port return clock output
TRSTD7IJTAG compatibility statement portion of this data manual (Section 6.26, IEEE
EMU1C6I/O/ZEmulation pin 1
EMU0D6I/O/ZEmulation pin 0
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = internal pulldown, IPU = internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal
TYPE
(1)
OTHER
IPU
DV
IPU
DV
DV
IPU
DV
IPU
DV
DV
IPD
DV
IPU
DV
IPU
DV
(2) (3)
DD18
DD18
–
DD18
DD18
DD18
–
DD18
DD18
DD18
DD18
DESCRIPTION
RESET
JTAG
JTAG test-port reset. For IEEE 1149.1 JTAG compatibility, see the IEEE 1149.1
1149.1 JTAG).
www.ti.com
Table 2-9. EMIFA Terminal Functions
SIGNAL
NAMENO.
COUT2/sampled to set the EMIFA data bus width (EM_WIDTH). For an 8-bit-wide EMIFA
B5/A17I/O/Zdata bus, EM_WIDTH = 0. For a 16-bit-wide EMIFA data bus, EM_WIDTH = 1.
EM_WIDTHAfter reset, it is video encoder output COUT2 or RGB666/888 Blue output data bit 5
COUT3/sampled to set the DSP boot source DSP_BT. The DSP is booted by the ARM when
B6/B17I/O/ZDSP_BT=0. The DSP boots from EMIFA when DSP_BT=1.
DSP_BTAfter reset, it is video encoder output COUT3 or RGB666/888 Blue data bit 6 output
YOUT0/
G5/D15I/O/Z
AEAW0
YOUT1/
G6/D16I/O/Z
AEAW1
YOUT2/of AEAW[4:0] are sampled to set the EMIFA address bus width. See Section 3.4.2,
G7/D17I/O/ZPeripheral Selection at Device Reset, for details.
AEAW2After reset, these are video encoder outputs YOUT[0:4] or RGB666/888 Red and
YOUT3/
R3/D18I/O/Z
AEAW3
YOUT4/
R4/E15I/O/Z
AEAW4
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = Internal pulldown, IPU = Internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal