•Six ALUs (32-/40-Bit), Each Supports
Single 32-Bit, Dual 16-Bit, or Quad 8-Bit
Arithmetic per Clock Cycle
•Two Multipliers Support Four 16 x 16-Bit
Multiplies (32-Bit Results) per Clock
Cycle or Eight 8 x 8-Bit Multiplies (16-Bit• ARM9 Memory Architecture
Results) per Clock Cycle
– Load-Store Architecture With Non-Aligned
Support
– 64 32-Bit General-Purpose Registers
– Instruction Packing Reduces Code Size
– All Instructions Conditional
– Additional C64x+™ Enhancements
•Protected Mode Operation
•Exceptions Support for Error Detection
and Program Redirection
•Hardware Support for Modulo Loop
Operation
• C64x+ Instruction Set Features
– Byte-Addressable (8-/16-/32-/64-Bit Data)
– 8-Bit Overflow Protection
– Bit-Field Extract, Set, Clear
– Normalization, Saturation, Bit-Counting
– Compact 16-Bit Instructions
– Additional Instructions to Support Complex
Multiplies
Mapped)
– 80K-Byte L1D Data RAM/Cache (2-Way
Set-Associative)
– 64K-Byte L2 Unified Mapped RAM/Cache
(Flexible RAM/Cache Allocation)
• ARM926EJ-S Core
– Support for 32-Bit and 16-Bit (Thumb®
Mode) Instruction Sets
– DSP Instruction Extensions and Single Cycle
MAC
– ARM® Jazelle® Technology
– EmbeddedICE-RT™ Logic for Real-Time
Debug
– 16K-Byte Instruction Cache
– 8K-Byte Data Cache
– 16K-Byte RAM
– 8K-Byte ROM
• Emulation Trace Buffer™ (ETB11™) With 4-KB
Memory for ARM9 Debug
• Endianness: Little Endian for ARM and DSP
• Video Processing Subsystem
– Resize Engine Provides:
•Resize Images From 1/4x to 4x
•Separate Horizontal and Vertical Control
– Back End Provides:
•Hardware On-Screen Display (OSD)
•4 - 54 MHz DACs for a Combination of
– Composite NTSC/PAL Video
– Luma/Chroma Separate Video
(S-video)
– Component (YPbPr or RGB) Video
(Progressive)
•Digital Output
– 8-/16-Bit YUV or up to 24-Bit RGB
– HD Resolution
– Up to 2 Video Windows
1
Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2All trademarks are the property of their respective owners.
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of the Texas
Instruments standard warranty. Production processing does not
necessarily include testingof all parameters.
The TMS320DM6443 (also referenced as DM6443) leverages TI’s DaVinci™ technology to meet the
networked media encode and decode application processing needs of next-generation embedded devices.
The DM6443 enables OEMs and ODMs to quickly bring to market devices featuring robust operating
systems support, rich user interfaces, high processing performance, and long battery life through the
maximum flexibility of a fully integrated mixed processor solution.
The dual-core architecture of the DM6443 provides benefits of both DSP and Reduced Instruction Set
Computer (RISC) technologies, incorporating a high-performance TMS320C64x+™ DSP core and an
ARM926EJ-S core.
The ARM926EJ-S is a 32-bit RISC processor core that performs 32-bit or 16-bit instructions and
processes 32-bit, 16-bit, or 8-bit data. The core uses pipelining so that all parts of the processor and
memory system can operate continuously.
The ARM core incorporates:
•A coprocessor 15 (CP15) and protection module
•Data and program Memory Management Units (MMUs) with table look-aside buffers.
•Separate 16K-byte instruction and 8K-byte data caches. Both are four-way associative with virtual
index virtual tag (VIVT).
The TMS320C64x+™DSPs arethe highest-performancefixed-point DSPgeneration inthe
TMS320C6000™ DSP platform. It is based on an enhanced version of the second-generation
high-performance, advanced very-long-instruction-word(VLIW) architecture developed byTexas
Instruments (TI), making these DSP cores an excellent choice for digital media applications. The C64x is a
code-compatible member of the C6000™ DSP platform. The TMS320C64x+ DSP is an enhancement of
the C64x+™ DSP with added functionality and an expanded instruction set.
SPRS282G–DECEMBER 2005–REVISED AUGUST 2010
Any reference to the C64x™ DSP or C64x™ CPU also applies, unless otherwise noted, to the C64x+™
DSP and C64x+™ CPU, respectively.
With performance of up to 4752 million instructions per second (MIPS) at a clock rate of 594 MHz, the
C64x+ core offers solutions to high-performance DSP programming challenges. The DSP core possesses
the operational flexibility of high-speed controllers and the numerical capability of array processors. The
C64x+ DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly
independent functional units—two multipliers for a 32-bit result and six arithmetic logic units (ALUs). The
eight functional units include instructions to accelerate the performance in video and imaging applications.
The DSP core can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2376 million
MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 4752 MMACS. For more details
on the C64x+ DSP, see the TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide
(literature number SPRU732).
The DM6443 also has application-specific hardware logic, on-chip memory, and additional on-chip
peripherals similar to the other C6000 DSP platform devices. The DM6443 core uses a two-level
cache-based architecture. The Level 1 program cache (L1P) is a 256K-bit direct mapped cache and the
Level 1 data cache (L1D) is a 640K-bit 2-way set-associative cache. The Level 2 memory/cache (L2)
consists of an 512K-bit memory space that is shared between program and data space. L2 memory can
be configured as mapped memory, cache, or combinations of the two.
The peripheral set includes: 1 configurable video port; a 10/100 Mb/s Ethernet MAC (EMAC) with a
Management Data Input/Output (MDIO) module; an inter-integrated circuit (I2C) Bus interface; one audio
serial port (ASP); 2 64-bit general-purpose timers each configurable as 2 independent 32-bit timers;
1 64-bit watchdog timer; up to 71-pins of general-purpose input/output (GPIO) with programmable
interrupt/event generation modes, multiplexed with other peripherals; 3 UARTs with hardware
handshaking support on 1 UART; 3 pulse width modulator (PWM) peripherals; and 2 external memory
interfaces: an asynchronous external memory interface (EMIFA) for slower memories/peripherals, and a
higher speed synchronous memory interface for DDR2.
The DM6443 includes a Video Processing Sub-System (VPSS) that has a configurable Resizer and Video
Processing Back-End (VPBE) output used for display.
The Resizer accepts image data for separate horizontal and vertical resizing from 1/4x to 4x in increments
of 256/N, where N is between 64 and 1024.
The Video Processing Back-End (VPBE) is comprised of an On-Screen Display Engine (OSD) and a
Video Encoder (VENC). The OSD engine is capable of handling 2 separate video windows and 2 separate
OSD windows. Other configurations include 2 video windows, 1 OSD window, and 1 attribute window
allowing up to 8 levels of alpha blending. The VENC provides four analog DACs that run at 54 MHz,
providing a means for composite NTSC/PAL video, S-Video, and/or Component video output. The VENC
also provides up to 24 bits of digital output to interface to RGB888 devices. The digital output is capable of
8/16-bit BT.656 output and/or CCIR.601 with separate horizontal and vertical syncs.
The Ethernet Media Access Controller (EMAC) provides an efficient interface between the DM644x and
the network. The DM6443 EMAC support both 10Base-T and 100Base-TX, or 10 Mbits/second (Mbps)
and 100 Mbps in either half- or full-duplex mode, with hardware flow control and quality of service (QOS)
support.
The Management Data Input/Output (MDIO) module continuously polls all 32 MDIO addresses in order to
enumerate all PHY devices in the system. Once a PHY candidate has been selected by the ARM, the
MDIO module transparently monitors its link state by reading the PHY status register. Link change events
are stored in the MDIO module and can optionally interrupt the ARM, allowing the ARM to poll the link
status of the device without continuously performing costly MDIO accesses.
www.ti.com
The HPI, I2C, SPI, USB2.0, and VLYNQ ports allow DM6443 to easily control peripheral devices and/or
communicate with host processors. The DM6443 also provides multimedia card support, MMC/SD, with
SDIO support.
The rich peripheral set provides the ability to control external peripheral devices and communicate with
external processors. For details on each of the peripherals, see the related sections later in this document
and the associated peripheral reference guides listed in Section 2.8.3.1, Related Documentation FromTexas Instruments.
The DM6443 has a complete set of development tools for both the ARM and DSP. These include
C compilers, a DSP assembly optimizer to simplify programming and scheduling, and a Windows™
debugger interface for visibility into source code execution.
This data manual revision history highlights the technical changes made to the SPRS282F device-specific
data manual to make it an SPRS282G revision.
Scope: Added information/data on silicon revision 2.3.
Applicable updates to the DM64x device family, specifically relating to the TMS320DM6443 device, have
been incorporated.
SPRS282G–DECEMBER 2005–REVISED AUGUST 2010
Revision History
NOTE: Page numbers for previous revisions may differ from page numbers in the current version.
Section 6.13.2.3Table 6-52, Timing Requirements for VPBE CLK Inputs:
VPBE Electrical
Data/Timing
•Removed "VPFE Electrical Data/Timing" section
•Removed Parameter 1 [t
•Removed Parameter 2 [t
•Removed Parameter 3 [t
•Removed Parameter 4 [t
], Cycle time, PCLK
c(PCLK)
], Pulse duration, PCLK high
w(PCLKH)
], Pulse duration, PCLK low
w(PCLKL)
], Transition time, PCLK
t(PCLK)
Figure 6-46, VPBECLK Timing:
•Updated/changed figure title from "VPBE PCLK and VPBECLK Timing" to "VPBECLK Timing"
•Removed PCLK waveform
Table 6-53, Timing Requirements for VPBE Control Input With Respect to VPBECLK:
•Updated/changed table title from "Timing Requirements for VPBE Control Input With Respect to
PCLK and VPBECLK" to "Timing Requirements for VPBE Control Input With Respect to VPBECLK"
•Removed Parameter 9 [t
•Removed Parameter 10 [t
su(VCTLV-PCLK)
h(PCLK-VCTLV)
•Renumbered Parameter 27 as Parameter 9 [t
VPBECLK rising edge
•Renumbered Parameter 28 as Parameter 10 [t
VPBECLK rising edge
•Removed Parameter 33 [t
•Removed Parameter 34 [t
su(FIELD-PCLK)
h(PCLK-FIELD)
•Renumbered Parameter 35 as Parameter 33 [t
VPBECLK edge
•Renumbered Parameter 36 as Parameter 34 [t
VPBECLK edge
], Setup time, VCTL valid before PCLK edge
], Hold time, VCTL valid after PCLK edge
su(VCTLV-VPBECLK)
h(VPBECLK-VCTLV)
], Setup time, LCD_FIELD valid before PCLK edge
], Hold time, LCD_FIELD valid after PCLK edge
su(FIELD-VPBECLK)
h(VPBECLK-FIELD)
•Removed "PCLK may be configured ..." footnote
•Updated/changed "P = 1/(VCLKIN clock frequency) in ns ..." footnote
SPRS282G–DECEMBER 2005–REVISED AUGUST 2010
], Setup time, VCTL valid before
], Hold time, VCTL valid after
], Setup time, LCD_FIELD valid before
], Hold time, LCD_FIELD valid after
Figure 6-47, VPBE Input Timing With Respect to VPBECLK:
•Updated/changed figure title from "VPBE Input Timing With Respect to PCLK and VPBECLK" to
"VPBE Input Timing With Respect to VPBECLK"
•Removed VPBECLK waveform
•Renamed PCLK (Positive Edge Clocking) waveform as VPBECLK waveform
•Removed PCLK (Negative Edge Clocking) waveform
•Removed Parameters 27, 28, 35, and 36
Table 6-54, Switching Characteristics Over Recommended Operating Conditions for VPBE Control and
Data Output With Respect to VPBECLK:
•Updated/changed table title from "Switching Characteristics Over Recommended Operating
Conditions for VPBE Control and Data Output With Respect to PCLK and VPBECLK" to "Switching
Characteristics Over Recommended Operating Conditions for VPBE Control and Data Output With
Respect to VPBECLK"
Table 2-1 provides an overview of the TMS320DM6443 SoC. The table shows significant features of the
device, including the capacity of on-chip RAM, peripherals, internal peripheral bus frequency relative to the
C64x+ DSP, and the package type with pin count.
Table 2-1. Characteristics of the Processor
HARDWARE FEATURESDM6443
DDR2 Memory ControllerDDR2 (16/32-bit bus width)
Asynchronous EMIF (EMIFA)
Flash CardsMMC/SD with secure data input/output (SDIO)
EDMA3
Timersseparate 32-bit timers)
Peripherals
Not all peripherals pins are
available at the same time.
(For more details, see
Section 3, Device
Configurations.)
On-Chip Memory
CPU ID + CPU Rev IDControl Status Register (CSR.[31:16])0x1000
C64x+ MegamoduleRevision ID Register (MM_REVID[15:0])0x0000 (Silicon Revision 1.3 and earlier)
UART3 (one with RTS and CTS flow control)
SPI1 (supports 2 slave devices)
I2C1 (Master/Slave)
Audio Serial Port [ASP]1
10/100 Ethernet MAC with Management Data
Table 2-1. Characteristics of the Processor (continued)
HARDWARE FEATURESDM6443
Voltage
PLL Optionsx1 (Bypass), x22 (-594)
BGA Package357-Pin BGA (ZWT)
Process Technologyµm0.09 µm
Product Status
(1) PRODUCTION DATA information is current as of publication date. Products conform to specifications per the terms of Texas
Instruments standard warranty. Production processing does not necessarily include testing of all parameters.
(1)
Core (V)1.2 V (-594)
I/O (V)1.8 V, 3.3 V
CLKIN frequency multiplier
(27 MHz reference)
16 x 16 mm
ball finish SnAgCu
Product Preview (PP),
Advance Information (AI),PD
or Production Data (PD)
2.2Device Compatibility
The ARM926EJ-S RISC CPU is compatible with other ARM9 CPUs from ARM Holdings plc.
The C64x+ DSP core is code-compatible with the C6000™ DSP platform and supports features of the
C64x DSP family.
2.3ARM Subsystem
The ARM Subsystem is designed to give the ARM926EJ-S (ARM9) master control of the device. In
general, the ARM is responsible for configuration and control of the device; including the DSP Subsystem,
the VPSS Subsystem, and a majority of the peripherals and external memories.
www.ti.com
The ARM Subsystem includes the following features:
•ARM926EJ-S RISC processor
•ARMv5TEJ (32/16-bit) instruction set
•Little endian
•Co-Processor 15 (CP15)
•MMU
•16KB Instruction cache
•8KB Data cache
•Write Buffer
•16KB Internal RAM (32-bit-wide access)
•8KB Internal ROM (ARM bootloader for non-EMIFA boot options)
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
The ARM Subsystem integrates the ARM926EJ-S processor. The ARM926EJ-S processor is a member of
ARM9 family of general-purpose microprocessors. This processor is targeted at multi-tasking applications
where full memory management, high performance, low die size, and low power are all important. The
ARM926EJ-S processor supports the 32-bit ARM and 16 bit THUMB instruction sets, enabling the user to
trade off between high performance and high code density. Specifically, the ARM926EJ-S processor
supports the ARMv5TEJ instruction set, which includes features for efficient execution of Java byte codes,
providing Java performance similar to Just in Time (JIT) Java interpreter, but without associated code
overhead.
The ARM926EJ-S processor supports the ARM debug architecture and includes logic to assist in both
hardware and software debug. The ARM926EJ-S processor has a Harvard architecture and provides a
complete high performance subsystem, including:
•ARM926EJ -S integer core
•CP15 system control coprocessor
•Memory Management Unit (MMU)
•Separate instruction and data Caches
•Write buffer
•Separate instruction and data Tightly-Coupled Memories (TCMs) [internal RAM] interfaces
•Separate instruction and data AHB bus interfaces
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
SPRS282G–DECEMBER 2005–REVISED AUGUST 2010
For more complete details on the ARM9, refer to the ARM926EJ-S Technical Reference Manual, available
at http://www.arm.com.
2.3.2CP15
The ARM926EJ-S system control coprocessor (CP15) is used to configure and control instruction and
data caches, Tightly-Coupled Memories (TCMs), Memory Management Unit (MMU), and other ARM
subsystem functions. The CP15 registers are programmed using the MRC and MCR ARM instructions,
when the ARM in a privileged mode such as supervisor or system mode.
2.3.3MMU
The ARM926EJ-S MMU provides virtual memory features required by operating systems such as Linux®,
Windows® CE, Ultron®, ThreadX®, etc. A single set of two level page tables stored in main memory is
used to control the address translation, permission checks and memory region attributes for both data and
instruction accesses. The MMU uses a single unified Translation Lookaside Buffer (TLB) to cache the
information held in the page tables. The MMU features are:
•Standard ARM architecture v4 and v5 MMU mapping sizes, domains and access protection scheme.
The size of the Instruction Cache is 16KB, Data cache is 8KB. Additionally, the Caches have the following
features:
•Virtual index, virtual tag, and addressed using the Modified Virtual Address (MVA)
•Four-way set associative, with a cache line length of eight words per line (32-bytes per line) and with
two dirty bits in the Dcache
•Dcache supports write-through and write-back (or copy back) cache operation, selected by memory
region using the C and B bits in the MMU translation tables.
•Critical-word first cache refilling
•Cache lockdown registers enable control over which cache ways are used for allocation on a line fill,
providing a mechanism for both lockdown, and controlling cache corruption
•Dcache stores the Physical Address TAG (PA TAG) corresponding to each Dcache entry in the TAG
RAM for use during the cache line write-backs, in addition to the Virtual Address TAG stored in the
TAG RAM. This means that the MMU is not involved in Dcache write-back operations, removing the
possibility of TLB misses related to the write-back address.
•Cache maintenance operations provide efficient invalidation of, the entire Dcache or Icache, regions of
the Dcache or Icache, and regions of virtual memory.
The write buffer is used for all writes to a noncachable bufferable region, write-through region and write
misses to a write-back region. A separate buffer is incorporated in the Dcache for holding write-back for
cache line evictions or cleaning of dirty cache lines. The main write buffer has 16-word data buffer and a
four-address buffer. The Dcache write-back has eight data word entries and a single address entry.
www.ti.com
2.3.5Tightly Coupled Memory (TCM)
ARM internal RAM is provided for storing real-time and performance-critical code/data and the Interrupt
Vector table. ARM internal ROM enables non-EMIFA boot options, such as NAND and UART. The RAM
and ROM memories interfaced to the ARM926EJ-S via the tightly coupled memory interface that provides
for separate instruction and data bus connections. Since the ARM TCM does not allow instructions on the
D-TCM bus or data on the I-TCM bus, an arbiter is included so that both data and instructions can be
stored in the internal RAM/ROM. The arbiter also allows accesses to the RAM/ROM from extra-ARM
sources (e.g., EDMA3 or other masters). The ARM926EJ-S has built-in DMA support for direct accesses
to the ARM internal memory from a non-ARM master. Because of the time-critical nature of the TCM link
to the ARM internal memory, all accesses from non-ARM devices are treated as DMA transfers.
Instruction and Data accesses are differentiated via accessing different memory map regions, with the
instruction region from 0x0000 through 0x7FFF and data from 0x8000 through 0xFFFF. The instruction
region at 0x0000 and data region at 0x8000 map to the same physical 16KB TCM RAM. Placing the
instruction region at 0x0000 is necessary to allow the ARM Interrupt Vector table to be placed at 0x0000,
as required by the ARM architecture. The internal 16-KB RAM is split into two physical banks of 8KB
each, which allows simultaneous instruction and data accesses to be accomplished if the code and data
are in separate banks.
The ARM926EJ-S has built in DMA support for direct accesses to the ARM internal memory from a nonARM device. Furthermore, because of the time critical nature of the TCM link to the ARM internal memory,
all accesses from non-ARM devices are treated as DMA transfers.
2.3.6Advanced High-Performance Bus (AHB)
The ARM Subsystem uses the AHB port of the ARM926EJ-S to connect the ARM to the Config bus and
the external memories. Arbiters are employed to arbitrate access to the separate D-AHB and I-AHB by the
Config Bus and the external memories bus.
2.3.7Embedded Trace Macrocell (ETM) and Embedded Trace Buffer (ETB)
To support real-time trace, the ARM926EJ-S processor provides an interface to enable connection of an
Embedded Trace Macrocell (ETM). The ARM926ES-J Subsystem in the DM6443 also includes the
Embedded Trace Buffer (ETB). The ETM consists of two parts:
•Trace Port provides real-time trace capability for the ARM9.
•Triggering facilities provide trigger resources, which include address and data comparators, counter,
and sequencers.
The DM6443 trace port is not pinned out and is instead only connected to the Embedded Trace Buffer.
The ETB has a 4KB buffer memory. ETB enabled debug tools are required to read/interpret the captured
trace data.
2.3.8ARM Memory Mapping
The ARM memory map is shown in Section 2.5, Memory Map Summary, of this document. The ARM has
access to memories shown in the following sections.
2.3.8.1ARM Internal Memories
The ARM has access to the following ARM internal memories:
•16KB ARM Internal RAM on TCM interface, logically separated into two 8KB pages to allow
simultaneous access on any given cycle if there are separate accesses for code (I-TCM bus) and data
(D-TCM) to the different memory regions.
•8KB ARM Internal ROM
2.3.8.2External Memories
The ARM has access to the following external memories:
The ARM9 has access to all of the peripherals on the DM6443 device.
2.3.10 PLL Controller (PLLC)
The ARM Subsystem includes the PLL Controller. The PLL Controller contains a set of registers for
configuring DM6443’s two internal PLLs (PLL1 and PLL2). The PLL Controller provides the following
configuration and control:
•PLL Bypass Mode
•Set PLL multiplier parameters
•Set PLL divider parameters
•PLL power down
•Oscillator power down
The PLLs are briefly described in this document in Section 6.6, Clock PLLs. For more detailed information
on the PLLs and PLL Controller register descriptions, see the TMS320DM644x DMSoC ARM SubsystemReference Guide (literature number SPRUE14).
2.3.11 Power and Sleep Controller (PSC)
The ARM Subsystem includes the Power and Sleep Controller (PSC). Through register settings
accessible by the ARM9, the PSC provides two levels of power savings: peripheral/module clock gating
and power domain shut-off. Brief details on the PSC are given in Section 6.3, Power Supplies. For more
detailed information and complete register descriptions for the PSC, see the TMS320DM644x DMSoCARM Subsystem Reference Guide (literature number SPRUE14).
www.ti.com
2.3.12 ARM Interrupt Controller (AINTC)
The ARM Interrupt Controller (AINTC) accepts device interrupts and maps them to either the ARM’s IRQ
(interrupt request) or FIQ (fast interrupt request). The ARM Interrupt Controller is briefly described in this
document in the Interrupts section. For detailed information on the ARM Interrupt Controller, see the
TMS320DM644x DMSoC ARM Subsystem Reference Guide (literature number SPRUE14)
2.3.13 System Module
The ARM Subsystem includes the System module. The System module consists of a set of registers for
configuring and controlling a variety of system functions. For details and register descriptions for the
System module, see Section 3, Device Configurations, and see the TMS320DM644x DMSoC ARMSubsystem Reference Guide (literature number SPRUE14).
2.3.14 Power Management
DM6443 has several means of managing power consumption. There is extensive use of clock gating,
which reduces the power used by global device clocks and individual peripheral clocks. Clock
management can be utilized to reduce clock frequencies in order to reduce switching power. For more
details on power management techniques, see Section 3, Device Configurations, Section 6, Peripheraland Electrical Specifications, and see the TMS320DM644x DMSoC ARM Subsystem Reference Guide
(literature number SPRUE14).
DM6443 gives the programmer full flexibility to use any and all of the previously mentioned capabilities to
customize an optimal power management strategy. Several typical power management scenarios are
described in the following sections.
The DSP Subsystem includes the following features:
•C64x+ DSP CPU
•32KB L1 Program (L1P)/Cache (up to 32KB)
•80KB L1 Data (L1D)/Cache (up to 32KB)
•64KB Unified Mapped RAM/Cache (L2)
•Little endian
2.4.1C64x+ DSP CPU Description
The C64x+ Central Processing Unit (CPU) consists of eight functional units, two register files, and two
data paths as shown in Figure 2-1. The two general-purpose register files (A and B) each contain
32 32-bit registers for a total of 64 registers. The general-purpose registers can be used for data or can be
data address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit
data, 40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are
stored in register pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or
32 MSBs in the next upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
SPRS282G–DECEMBER 2005–REVISED AUGUST 2010
The C64x+ CPU extends the performance of the C64x core through enhancements and new features.
Each C64x+ .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x
32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with
add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four
16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for
Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and
modems require complex multiplication. The complex multiply (CMPY) instruction takes for 16-bit inputs
and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding
capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The
32 x 32 bit multiply instructions provide the extended precision necessary for audio and other
high-precision algorithms on a variety of signed and unsigned 32-bit data types.
The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a
pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C64x+ core enhances the .S unit in several ways. In the C64x core, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C64x+ core they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
•SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
•Compact Instructions - The native instruction size for the C6000 devices is 32 bits. Many common
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C64x+
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
•Instruction Set Enhancement - As noted above, there are new instructions such as 32-bit
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
•Exceptions Handling - Intended to aid the programmer in isolating bugs. The C64x+ CPU is able to
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as a watchdog time expiration).
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
•Time-Stamp Counter - Primarily targeted for Real-Time Operating System (RTOS) robustness, a
free-running time-stamp counter is implemented in the CPU which is not sensitive to system stalls.
For more details on the C64x+ CPU and its enhancements over the C64x architecture, see the following
documents:
•TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (literature number SPRU732)
•TMS320C64x Technical Overview (literature number SPRU395)
A. On .M unit, dst2 is 32 MSB.
B. On .M unit, dst1 is 32 LSB.
C. On C64x CPU .M unit, src2 is 32 bits; on C64x+ CPU .M unit, src2 is 64 bits.
D. On .L and .S units, odd dst connects to odd register files and even dst connects to even register files.
TMS320DM6443
www.ti.com
SPRS282G–DECEMBER 2005–REVISED AUGUST 2010
Figure 2-1. TMS320C64x+™ CPU (DSP Core) Data Paths
The DSP memory map is shown in Section 2.5, Memory Map Summary. Configuration of the control
registers for DDR2, EMIFA, and ARM Internal RAM is supported by the ARM. The DSP has access to
memories shown in the following sections.
2.4.2.1ARM Internal Memories
The DSP has access to the 16KB ARM Internal RAM on the ARM D-TCM interface (i.e., data only).
2.4.2.2External Memories
The DSP has access to the following External memories:
•DDR2 Synchronous DRAM
•Asynchronous EMIF / NOR Flash
2.4.2.3DSP Internal Memories
The DSP has access to the following DSP memories:
•L2 RAM
•L1P RAM
•L1D RAM
2.4.2.4C64x+ CPU
www.ti.com
The C64x+ core uses a two-level cache-based architecture. The Level 1 Program cache (L1P) is 32 KB
direct mapped cache and the Level 1 Data cache (L1D) is 80 KB 2-way set associated cache. The Level 2
memory/cache (L2) consists of a 64 KB memory space that is shared between program and data space.
L2 memory can be configured as mapped memory, cache, or a combination of both.
Table 2-2 shows a memory map of the C64x+ CPU cache registers for the device.
Memory Attribute Registers for EMIFA/VLYNQ Shadow 0x4200 0000 0x4FFF FFFF
2.4.3Peripherals
The DSP has controllability for the following peripherals:
•EDMA3
•ASP
•2 Timers (Timer0 and Timer1) that can each be configured as 1 64-bit or 2 32-bit timers
2.4.4DSP Interrupt Controller
The DSP Interrupt Controller accepts device interrupts and appropriately maps them to the DSP’s
available interrupts. The DSP Interrupt Controller is briefly described in this document in Section 6.7,
Interrupts. For more detailed on the DSP Interrupt Controller, see the TMS320C64x/C64x+ DSP CPU and
Instruction Set Reference Guide (literature number SPRU732).
2.5Memory Map Summary
Table 2-3 shows the memory map address ranges of the device. Table 2-4 depicts the expanded map of
the Configuration Space (0x0180 0000 through 0x0FFF FFFF). The device has multiple on-chip memories
associated with its two processors and various subsystems. To help simplify software development a
unified memory map is used where possible to maintain a consistent view of device resources across all
bus masters.
0x2000 00000x2000 7FFF32KDDR2 Control RegistersDDR2 Control RegistersDDR2 Control RegistersDDR2 Control Registers
0x2000 80000x41FF FFFF 544M-32k ReservedReservedReserved
(1) HPI's access to the configuration bus peripherals is limited to the power and sleep controller registers, PLL1 and PLL2 registers, and
HPI configuration registers.
(2) EMIFA shadow memory started a 0x4200 0000 is physically the same memory as location 0x0200 0000. Memory range 0x200 0000
through 0x09FF FFFF should only be used by C64x+ for data accesses. Memory range 0x4200 0000 through 0x4FFF FFFF can be
used by C64x+ for both code execution and data accesses.
Extensive use of pin multiplexing is used to accommodate the largest number of peripheral functions in
the smallest possible package. Pin multiplexing is controlled using a combination of hardware
configuration at device reset and software programmable register settings. For more information on pin
muxing, see Section 3.5.2, Multiplexed Pin Configurations, of this document.
2.6.1Pin Map (Bottom View)
Figure 2-2 through Figure 2-5 show the bottom view of the package pin assignments in four quadrants (A,
The terminal functions tables (Table 2-5 through Table 2-29) identify the external signal names, the
associated pin (ball) numbers along with the mechanical package designator, the pin type, whether the pin
has any internal pullup or pulldown resistors, and a functional pin description. For more detailed
information on device configuration, peripheral selection, and multiplexed/shared pins, see Section 3,
Device Configurations, of this data manual.
Table 2-5. BOOT Terminal Functions
SIGNAL
NAMENO.
COUT0/
B3/A16I/O/Z
BTSEL0
COUT1/
B4/B16I/O/Z
BTSEL1
COUT2/bus width (EM_WIDTH). For an 8-bit-wide EMIFA data
B5/A17I/O/Zbus, EM_WIDTH = 0. For a 16-bit-wide EMIFA data bus,
EM_WIDTHEM_WIDTH = 1.
COUT3/source DSP_BT. The DSP is booted by the ARM when
B6/B17I/O/ZDSP_BT=0. The DSP boots from EMIFA when
DSP_BTDSP_BT=1.
YOUT0/
G5/D15I/O/Z
AEAW0
YOUT1/VPBE. At reset, the input states of AEAW[4:0] are
G6/D16I/O/Zsampled to set the EMIFA address bus width. See
AEAW1Section 3.4.2, Peripheral Selection at Device Reset, for
YOUT2/After reset, these are video encoder outputs YOUT[0:4]
G7/D17I/O/Zor RGB666/888 Red and Green data bit outputs G5, G6,
AEAW2G7, R3, and R4.
YOUT3/
R3/D18I/O/Z
AEAW3
YOUT4/
R4/E15I/O/Z
AEAW4
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) IPD = Internal pulldown, IPU = Internal pullup. (To pull up a signal to the opposite supply rail, a 1-kΩ resistor should be used.)
(3) Specifies the operating I/O supply voltage for each signal
TYPE
(1)
OTHER
(2) (3)
DESCRIPTION
BOOT
These pins are multiplexed between ARM boot mode and
the VPBE. At reset, the boot mode inputs BTSEL0 and
BTSEL1 are sampled to determine the ARM boot
IPDconfiguration. See below for the boot modes set by these
DV
DD18
inputs. See Section 3.3, Bootmode, for more details.
After reset, these are video encoder outputs COUT0 and
COUT1, or RGB666/888 Blue output data bits 3 and 4
B3/B4.
BTSEL1BTSEL0ARM Boot Mode
ARM ROM Boot (NAND, SPI)
[default]
DV
IPD
DD18
00
01ARM EMIFA Boot (NOR)
10ARM ROM Boot (HPI)
11ARM ROM Boot (UART0)
This pin is multiplexed between EMIFA and the VPBE. At
reset, the input state is sampled to set the EMIFA data
IPD
DV
DD18
After reset, it is video encoder output COUT2 or
RGB666/888 Blue output data bit 5 B5.
This pin is multiplexed between DSP boot and the VPBE.
At reset, the input state is sampled to set the DSP boot
IPD
DV
DD18
After reset, it is video encoder output COUT3 or
RGB666/888 Blue data bit 6 output B6.
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) Specifies the operating I/O supply voltage for each signal
(3) For more information, see Section 5.2, Recommended Operating Conditions.
TYPE
(1)
OTHER
(2)
DESCRIPTION
OSCILLATOR, PLL
Crystal input MXI for MX oscillator (system oscillator, typically 27 MHz). If a crystal
DD18
DD18
(3)
input is not used, but instead a physical clock-in source is supplied, this is the
external oscillator clock input.
Crystal output for MX oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, MXO should be left as a No Connect.
1.8-V power supply for MX oscillator. If a crystal input is not used, but instead a
physical clock-in source is supplied, MXVDDshould still be connected to the 1.8-V
power supply.
(3)
Ground for MX oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, MXVSSshould still be connected to ground.
Crystal input for M24 oscillator (24 MHz for USB). If a crystal input is not used, but
DD18
instead a physical clock-in source is supplied, this is the external oscillator clock
input. When the USB peripheral is not used, M24XI should be left as a No Connect.
Crystal output for M24 oscillator. If a crystal input is not used, but instead a physical
DD18
clock-in source is supplied, M24XO should be left as a No Connect. When the USB
peripheral is not used, M24XO should be left as a No Connect.
1.8-V power supply for M24 oscillator. If a crystal input is not used, but instead a
(3)
physical clock-in source is supplied, M24VDDshould still be connected to the 1.8-V
power supply. When the USB peripheral is not used, M24VDDshould be connected
to the 1.8-V power supply.
(3)
Ground for M24 oscillator. If a crystal input is not used, but instead a physical
clock-in source is supplied, M24VSSshould still be connected to ground. When the
USB peripheral is not used, M24VSSshould be connected to ground.
(3)
1.8-V power supply for PLLs (system).
www.ti.com
Table 2-7. Clock Generator Terminal Functions
SIGNAL
NAMENO.
CLK_OUT0/
GPIO48
K1I/O/ZDV
CLK_OUT1/This pin is multiplexed between the USB clock generator, timer, and GPIO.
TIM_IN/E19I/O/ZDV
GPIO4912 MHz or 24 MHz clock outputs.
(1) I = Input, O = Output, Z = High impedance, S = Supply voltage, GND = Ground, A = Analog signal
(2) Specifies the operating I/O supply voltage for each signal
TYPE
(1)
OTHER
(2)
DESCRIPTION
CLOCK GENERATOR
This pin is multiplexed between the PLL1 clock generator and GPIO.
DD18
DD18
For the PLL1 clock generator, it is clock output CLK_OUT0. This is configurable for
13.5 MHz or 27 MHz clock outputs.
For the USB clock generator, it is clock output CLK_OUT1. This is configurable for