•Six ALUs (32-/40-Bit), Each Supports
Single 32-Bit, Dual 16-Bit, or Quad 8-Bit
Arithmetic per Clock Cycle
•Two Multipliers Support Four 16 x 16-Bit
Multiplies (32-Bit Results) per Clock
Cycle or Eight 8 x 8-Bit Multiplies (16-Bit• Embedded Trace Buffer™ (ETB11™) With 4KB
Results) per Clock CycleMemory for ARM9 Debug
– Load-Store Architecture With Non-Aligned• Endianness: Little Endian for ARM and DSP
Support
– 64 32-Bit General-Purpose RegistersImage Co-Processor (HDVICP) Engines
– Instruction Packing Reduces Code Size– Supports a Range of Encode, Decode, and
– All Instructions Conditional
– Additional C64x+™ Enhancements
•Protected Mode Operation
•Exceptions Support for Error Detection
and Program Redirection
•Hardware Support for Modulo Loop
Operation
• C64x+ Instruction Set Features
– Byte-Addressable (8-/16-/32-/64-Bit Data)
– 8-Bit Overflow Protection
– Bit-Field Extract, Set, Clear
– Normalization, Saturation, Bit-Counting
– Compact 16-Bit Instructions
– Additional Instructions to Support Complex
Multiplies
1
Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2All trademarks are the property of their respective owners.
PRODUCTION DATA information is current as of publication date.
Products conform to specifications per the terms of the Texas
Instruments standard warranty. Production processing does not
necessarily include testingof all parameters.
Mapped)
– 32K-Byte L1D Data RAM/Cache (2-Way
Set-Associative)
– 128K-Byte L2 Unified Mapped RAM/Cache
(Flexible RAM/Cache Allocation)
• ARM926EJ-S Core
– Support for 32-Bit and 16-Bit (Thumb®
Mode) Instruction Sets
– DSP Instruction Extensions and Single Cycle
MAC
– ARM® Jazelle® Technology
– EmbeddedICE-RT™ Logic for Real-Time
Debug
• ARM9 Memory Architecture
– 16K-Byte Instruction Cache
– 8K-Byte Data Cache
– 32K-Byte RAM
– 8K-Byte ROM
• Dual Programmable High-Definition Video
Transcode Operations
•H.264, MPEG2, VC1, MPEG4 SP/ASP
• 99-/108-MHz Video Port Interface (VPIF)
– Two 8-Bit SD (BT.656), Single 16-Bit HD
(BT.1120), or Single Raw (8-/10-/12-Bit) Video
Capture Channels
– Two 8-Bit SD (BT.656) or Single 16-Bit HD
(BT.1120) Video Display Channels
• Video Data Conversion Engine (VDCE)
– Horizontal and Vertical Downscaling
– Chroma Conversion (4:2:2↔4:2:0)
• Two Transport Stream Interface (TSIF) Modules
(One Parallel/Serial and One Serial Only)
– TSIF for MPEG Transport Stream
– Simultaneous Synchronous or
• Applications:
– Video Encode/Decode/Transcode/Transrate
– Digital Media
– Networked Media Encode/Decode
– Video Imaging
– Video Infrastructure
– Video Conferencing
The TMS320DM6467 (also referenced as DM6467) leverages TI’s DaVinci™ technology to meet the
networked media encode and decode digital media processing needs of next-generation embedded
devices.
The DM6467 enables OEMs and ODMs to quickly bring to market devices featuring robust operating
systems support, rich user interfaces, high processing performance, and long battery life through the
maximum flexibility of a fully integrated mixed processor solution.
The dual-core architecture of the DM6467 provides benefits of both DSP and Reduced Instruction Set
Computer (RISC) technologies, incorporating a high-performance TMS320C64x+ DSP core and an
ARM926EJ-S core.
The ARM926EJ-S is a 32-bit RISC processor core that performs 32-bit or 16-bit instructions and
processes 32-bit, 16-bit, or 8-bit data. The core uses pipelining so that all parts of the processor and
memory system can operate continuously.
The ARM core incorporates:
•A coprocessor 15 (CP15) and protection module
•Data and program Memory Management Units (MMUs) with table look-aside buffers.
•Separate 16K-byte instruction and 8K-byte data caches. Both are four-way associative with virtual
index virtual tag (VIVT).
The TMS320C64x+™DSPs arethe highest-performancefixed-point DSPgeneration inthe
TMS320C6000™ DSP platform. It is based on an enhanced version of the second-generation
high-performance, advanced very-long-instruction-word(VLIW) architecture developed byTexas
Instruments (TI), making these DSP cores an excellent choice for digital media applications. The C64x is a
code-compatible member of the C6000™ DSP platform. The TMS320C64x+ DSP is an enhancement of
the C64x+ DSP with added functionality and an expanded instruction set.
SPRS403G–DECEMBER 2007–REVISED OCTOBER 2010
Any reference to the C64x DSP or C64x CPU also applies, unless otherwise noted, to the C64x+ DSP and
C64x+ CPU, respectively.
With performance of up to 5832 million instructions per second (MIPS) at a clock rate of 729 MHz, the
C64x+ core offers solutions to high-performance DSP programming challenges. The DSP core possesses
the operational flexibility of high-speed controllers and the numerical capability of array processors. The
C64x+ DSP core processor has 64 general-purpose registers of 32-bit word length and eight highly
independent functional units—two multipliers for a 32-bit result and six arithmetic logic units (ALUs). The
eight functional units include instructions to accelerate the performance in video and imaging applications.
The DSP core can produce four 16-bit multiply-accumulates (MACs) per cycle for a total of 2376 million
MACs per second (MMACS), or eight 8-bit MACs per cycle for a total of 4752 MMACS. For more details
on the C64x+ DSP, see the TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide
(literature number SPRU732).
The DM6467 also has application-specific hardware logic, on-chip memory, and additional on-chip
peripherals similar to the other C6000 DSP platform devices. The DM6467 core uses a two-level
cache-based architecture. The Level 1 program cache (L1P) is a 256K-bit direct mapped cache and the
Level 1 data cache (L1D) is a 640K-bit 2-way set-associative cache. The Level 2 memory/cache (L2)
consists of an 512K-bit memory space that is shared between program and data space. L2 memory can
be configured as mapped memory, cache, or combinations of the two.
The peripheral set includes: a configurable video port; a 10/100/1000 Mb/s Ethernet MAC (EMAC) with a
Management Data Input/Output (MDIO) module; a 4-bit transfer/4-bit receive VLYNQ interface; an
inter-integrated circuit (I2C) Bus interface; a multichannel audio serial port (McASP0) with 4 serializers; a
secondary multichannel audio serial port (McASP1) with a single transmit serializer; 2 64-bit
general-purpose timers each configurable as 2 independent 32-bit timers; 1 64-bit watchdog timer; a
configurable 32-bit host port interface (HPI); up to 33-pins of general-purpose input/output (GPIO) with
programmable interrupt/event generation modes, multiplexed with other peripherals; 3 UART/IrDA/CIR
interfaces with modem interface signals on UART0; 2 pulse width modulator (PWM) peripherals; an
ATA/ATAPI-6 interface; a 33-MHz peripheral component interface (PCI); and 2 external memory
interfaces: an asynchronous external memory interface (EMIFA) for slower memories/peripherals, and a
higher speed synchronous memory interface for DDR2.
The Ethernet Media Access Controller (EMAC) provides an efficient interface between the DM6467 and
the network. The DM6467 EMAC support both 10Base-T and 100Base-TX, or 10 Mbits/second (Mbps)
and 100 Mbps in either half- or full-duplex mode; and 1000Base-TX (1 Gbps) in full-duplex mode with
hardware flow control and quality of service (QOS) support.
The Management Data Input/Output (MDIO) module continuously polls all 32 MDIO addresses in order to
enumerate all PHY devices in the system. Once a PHY candidate has been selected by the ARM, the
MDIO module transparently monitors its link state by reading the PHY status register. Link change events
are stored in the MDIO module and can optionally interrupt the ARM, allowing the ARM to poll the link
status of the device without continuously performing costly MDIO accesses.
The PCI, HPI, I2C, SPI, USB2.0, and VLYNQ ports allow the DM6467 to easily control peripheral devices
and/or communicate with host processors.
The DM6467 also includes a High-Definition Video/Imaging Co-processor (HDVICP) and Video Data
Conversion Engine (VDCE) to offload many video and imaging processing tasks from the DSP core,
making more DSP MIPS available for common video and imaging algorithms. For more information on the
HDVICP enhanced codecs, such as H.264 and MPEG4, please contact your nearest TI sales
representative.
www.ti.com
The rich peripheral set provides the ability to control external peripheral devices and communicate with
external processors. For details on each of the peripherals, see the related sections later in this document
and the associated peripheral reference guides.
The DM6467 has a complete set of development tools for both the ARM and DSP. These include C
compilers, a DSP assembly optimizer to simplify programming and scheduling, and a Windows™
debugger interface for visibility into source code execution.
This data manual revision history highlights the technical changes made to the SPRS403F device-specific
data manual to make it an SPRS403G revision.
Scope: Applicable updates to the DM646x DMSoC device family, specifically relating to the
TMS320DM6467 device (all Silicon Revisions 3.0, 1.1, and 1.0) which is now in the production data (PD)
stage of development have been incorporated.
•Added, for clarification, the device-specific DDR2 Memory Controller speeds: 297-MHz (-594) and
310.5-MHz (-729).
SEEADDITIONS/MODIFICATIONS/DELETIONS
Global
Section 1.2
Description
Section 3.5Table 3-3, Memory Map Summary:
Memory Map Summary
•Added, for clarification, the device-specific DDR2 Memory Controller speeds: 297-MHz (-594) and
310.5-MHz (-729)
•First paragraph:
–Updated/Changed "... to meet the networked media encode and decode application processing
needs ..." to "... to meet the networked media encode and decode digital media processing needs
..."
•Deleted C64x+ PCI Data access to address range 0x3000 0000 to 0x3FFF FFFF
Table 3-4, Configuration Memory Map Summary:
•Deleted C64x+ Timer2 access at address range 0x01C2 1C00 to 0x01C2 1FFF
•Deleted C64x+ PLL Controller1 access at address range 0x01C4 0800 to 0x01C4 0BFF
•Deleted C64x+ PLL Controller2 access at address range 0x01C4 0C00 to 0x01C4 0FFF
•Deleted HPI, PCI, and VLYNQ Master Peripheral Accessibility for address range 0x01D0 2000 to 0x01DF
FFFF
•Deleted HPI, PCI, and VLYNQ Master Peripheral Accessibility for address range 0x01E0 0000 to 0x01FF
FFFF
Section 7.17.2Table 7-80, Timing Requirements for MDIO Input:
Management Data
Input/Output (MDIO)
Electrical Data/Timing
Section 7.29
IEEE 1149.1 JTAG
•Added associated "The DEV_CVDDcore voltage value is device dependent (e.g., ..." footnote for
clarification
Figure 7-9, 24-MHz Auxiliary Oscillator:
•Added associated "The AUX_CVDDcore voltage value is device dependent (e.g., ..." footnote for
clarification
•Added associated "The DEV_CVDDcore voltage value is device dependent (e.g., ..." footnote for
clarification
Figure 7-11, 1.8-V LVCMOS-Compatible Clock Input:
•Added associated "The AUX_CVDDcore voltage value is device dependent (e.g., ..." footnote for
clarification
•Added frequency parameter with MIN/MAX values for clarification
•Updated/Changed the MIN value for t
from "10" to "0" ns.
•Deleted "For maximum reliability," from the "... DM6467 includes an internal pulldown (IPD) on the TRST
pin ..." paragraph [Cleared Documentation Feedback Issue]
h(MDCLKH-MDIO)
, Hold time, MDIO data input valid after MDCLK high
www.ti.com
3Device Overview
3.1Device Characteristics
Table 3-1 provides an overview of the TMS320DM6467 SoC. The table shows significant features of the
device, including the capacity of on-chip RAM, peripherals, internal peripheral bus frequency relative to the
C64x+ DSP, and the package type with pin count.
Multichannel Audio Serial Port (McASP)one DIT transmit only with 1 serializer for S/PDIF
Peripherals
Not all peripherals pins are
available at the same time
(for more detail, see the
Device Configurations
section).
On-Chip Memory
CPU ID + CPU Rev IDControl Status Register (CSR.[31:16])0x1000
10/100/1000 Ethernet MAC with Management Data
Input/Output (MDIO)
VLYNQ1
General-Purpose Input/Output Port (GPIO)Up to 33 pins
PWM2 outputs
ATA1 (ATA/ATAPI-6)
PCI1 (32-bit, 33 MHz)
HPI1 (16-/32-bit multiplexed address/data)
VDCE
Clock Recovery Generator (CRGEN)1
Power Sleep Controller (PSC)1 (peripheral/module clock gating)
Configurable Video Port Interface (VPIF)1 16-bit Y/C capture channel or
99-MHz (-594)1 8-/10-/12-bit raw video capture channel and
108-MHz (-729)2 8-bit BT.656 display channels or
Transport Stream Interface (TSIF)1 with serial-only input and output
The ARM926EJ-S RISC CPU is compatible with other ARM9 CPUs from ARM Holdings plc.
The C64x+ DSP core is code-compatible with the C6000™ DSP platform and supports features of the
C64x DSP family.
3.3ARM Subsystem
The ARM Subsystem is designed to give the ARM926EJ-S (ARM9) master control of the device. In
general, the ARM is responsible for configuration and control of the device; including the DSP Subsystem,
the VPSS Subsystem, and a majority of the peripherals and external memories.
The ARM Subsystem includes the following features:
•8KB Internal ROM (ARM bootloader for non-EMIFA boot options)
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
•ARM Interrupt Controller
•PLL Controller
•Power and Sleep Controller (PSC)
•System Module
SPRS403G–DECEMBER 2007–REVISED OCTOBER 2010
3.3.1ARM926EJ-S RISC CPU
The ARM Subsystem integrates the ARM926EJ-S processor. The ARM926EJ-S processor is a member of
ARM9 family of general-purpose microprocessors. This processor is targeted at multi-tasking applications
where full memory management, high performance, low die size, and low power are all important. The
ARM926EJ-S processor supports the 32-bit ARM and 16 bit THUMB instruction sets, enabling the user to
trade off between high performance and high code density. Specifically, the ARM926EJ-S processor
supports the ARMv5TEJ instruction set, which includes features for efficient execution of Java byte codes,
providing Java performance similar to Just in Time (JIT) Java interpreter, but without associated code
overhead.
The ARM926EJ-S processor supports the ARM debug architecture and includes logic to assist in both
hardware and software debug. The ARM926EJ-S processor has a Harvard architecture and provides a
complete high performance subsystem, including:
•ARM926EJ -S integer core
•CP15 system control coprocessor
•Memory Management Unit (MMU)
•Separate instruction and data Caches
•Write buffer
•Separate instruction and data Tightly-Coupled Memories (TCMs) [internal RAM] interfaces
•Separate instruction and data AHB bus interfaces
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
For more complete details on the ARM9, refer to the ARM926EJ-S Technical Reference Manual, available
at http://www.arm.com
3.3.2CP15
The ARM926EJ-S system control coprocessor (CP15) is used to configure and control instruction and
data caches, Tightly-Coupled Memories (TCMs), Memory Management Unit (MMU), and other ARM
subsystem functions. The CP15 registers are programmed using the MRC and MCR ARM instructions,
when the ARM in a privileged mode such as supervisor or system mode.
3.3.3MMU
The ARM926EJ-S MMU provides virtual memory features required by operating systems such as Linux®,
Windows® CE, Ultron®, ThreadX®, etc. A single set of two level page tables stored in main memory is
used to control the address translation, permission checks and memory region attributes for both data and
instruction accesses. The MMU uses a single unified Translation Lookaside Buffer (TLB) to cache the
information held in the page tables. The MMU features are:
•Standard ARM architecture v4 and v5 MMU mapping sizes, domains and access protection scheme.
•Access permissions for large pages and small pages can be specified separately for each quarter of
the page (subpage permissions)
•Hardware page table walks
•Invalidate entire TLB, using CP15 register 8
•Invalidate TLB entry, selected by MVA, using CP15 register 8
•Lockdown of TLB entries, using CP15 register 10
www.ti.com
3.3.4Caches and Write Buffer
The size of the Instruction Cache is 16KB, Data cache is 8KB. Additionally, the Caches have the following
features:
•Virtual index, virtual tag, and addressed using the Modified Virtual Address (MVA)
•Four-way set associative, with a cache line length of eight words per line (32-bytes per line) and with
two dirty bits in the Dcache
•Dcache supports write-through and write-back (or copy back) cache operation, selected by memory
region using the C and B bits in the MMU translation tables.
•Critical-word first cache refilling
•Cache lockdown registers enable control over which cache ways are used for allocation on a line fill,
providing a mechanism for both lockdown, and controlling cache corruption
•Dcache stores the Physical Address TAG (PA TAG) corresponding to each Dcache entry in the TAG
RAM for use during the cache line write-backs, in addition to the Virtual Address TAG stored in the
TAG RAM. This means that the MMU is not involved in Dcache write-back operations, removing the
possibility of TLB misses related to the write-back address.
•Cache maintenance operations provide efficient invalidation of, the entire Dcache or Icache, regions of
the Dcache or Icache, and regions of virtual memory.
The write buffer is used for all writes to a noncachable bufferable region, write-through region and write
misses to a write-back region. A separate buffer is incorporated in the Dcache for holding write-back for
cache line evictions or cleaning of dirty cache lines. The main write buffer has 16-word data buffer and a
four-address buffer. The Dcache write-back has eight data word entries and a single address entry.
ARM internal RAM is provided for storing real-time and performance-critical code/data and the Interrupt
Vector table. ARM internal ROM enables non-EMIFA boot options, such as NAND and UART. The RAM
and ROM memories interfaced to the ARM926EJ-S via the tightly coupled memory interface that provides
for separate instruction and data bus connections. Since the ARM TCM does not allow instructions on the
D-TCM bus or data on the I-TCM bus, an arbiter is included so that both data and instructions can be
stored in the internal RAM/ROM. The arbiter also allows accesses to the RAM/ROM from extra-ARM
sources (e.g., EDMA or other masters). The ARM926EJ-S has built-in DMA support for direct accesses to
the ARM internal memory from a non-ARM master. Because of the time-critical nature of the TCM link to
the ARM internal memory, all accesses from non-ARM devices are treated as DMA transfers.
Instruction and Data accesses are differentiated via accessing different memory map regions, with the
instruction region from 0x0000 through 0x7FFF and data from 0x10000 through 0x17FFF. The instruction
region at 0x0000 and data region at 0x10000 map to the same physical 32-KB TCM RAM. Placing the
instruction region at 0x0000 is necessary to allow the ARM Interrupt Vector table to be placed at 0x0000,
as required by the ARM architecture. The internal 32-KB RAM is split into two physical banks of 16KB
each, which allows simultaneous instruction and data accesses to be accomplished if the code and data
are in separate banks.
3.3.6Advanced High-Performance Bus (AHB)
The ARM Subsystem uses the AHB port of the ARM926EJ-S to connect the ARM to the Config bus and
the external memories. Arbiters are employed to arbitrate access to the separate D-AHB and I-AHB by the
Config Bus and the external memories bus.
SPRS403G–DECEMBER 2007–REVISED OCTOBER 2010
3.3.7Embedded Trace Macrocell (ETM) and Embedded Trace Buffer (ETB)
To support real-time trace, the ARM926EJ-S processor provides an interface to enable connection of an
Embedded Trace Macrocell (ETM). The ARM926ES-J Subsystem in the DM6467 also includes the
Embedded Trace Buffer (ETB). The ETM consists of two parts:
•Trace Port provides real-time trace capability for the ARM9.
•Triggering facilities provide trigger resources, which include address and data comparators, counter,
and sequencers.
The DM6467 trace port is not pinned out and is instead only connected to the Embedded Trace Buffer.
The ETB has a 4KB buffer memory. ETB enabled debug tools are required to read/interpret the captured
trace data.
3.3.8ARM Memory Mapping
The ARM memory map is shown in Section 3.5, Memory Map Summary of this document. The ARM has
access to memories shown in the following sections.
3.3.8.1ARM Internal Memories
The ARM has access to the following ARM internal memories:
•32KB ARM Internal RAM on TCM interface, logically separated into two 16KB pages to allow
simultaneous access on any given cycle if there are separate accesses for code (I-TCM bus) and data
(D-TCM) to the different memory regions.
•8KB ARM Internal ROM
3.3.8.2External Memories
The ARM has access to the following external memories:
DM6467 ARM and DSP integration features are as follows:
•DSP visibility from ARM’s memory map, see Section 3.5, Memory Map Summary, for details
•Boot Modes for DSP - see Device Configurations section, Section 4.4.1, DSP Boot, for details
•ARM control of DSP boot / reset - see Device Configurations section, Section 4.4.2.4, ARM Boot, for
details
•ARM control of DSP isolation and powerdown / powerup - see Section 4, Device Configurations, for
details
•ARM & DSP Interrupts - see Section 7.8.1, ARM CPU Interrupts, and Section 7.8.2, DSP Interrupts, for
details
3.3.9Peripherals
The ARM9 has access to all of the peripherals on the DM6467 device.
www.ti.com
3.3.10 PLL Controller (PLLC)
The ARM Subsystem includes the PLL Controller. The PLL Controller contains a set of registers for
configuring DM6467’s two internal PLLs (PLL1 and PLL2). The PLL Controller provides the following
configuration and control:
•PLL Bypass Mode
•Set PLL multiplier parameters
•Set PLL divider parameters
•PLL power down
•Oscillator power down
The PLLs are briefly described in this document in the Clocking section. For more detailed information on
the PLLs and PLL Controller register descriptions, see the TMS320DM646x DMSoC ARM Subsystem
Reference Guide (literature number SPRUEP9).
3.3.11 Power and Sleep Controller (PSC)
The ARM Subsystem includes the Power and Sleep Controller (PSC). Through register settings
accessible by the ARM9, the PSC provides two levels of power savings: peripheral/module clock gating
and power domain shut-off. Brief details on the PSC are given in Section 7.3, Power Supplies. For more
detailed information and complete register descriptions for the PSC, see the TMS320DM646x DMSoCARM Subsystem Reference Guide (literature number SPRUEP9).
3.3.12 ARM Interrupt Controller (AINTC)
The ARM Interrupt Controller (AINTC) accepts device interrupts and maps them to either the ARM’s IRQ
(interrupt request) or FIQ (fast interrupt request). The ARM Interrupt Controller is briefly described in this
document in the Interrupts section. For detailed information on the ARM Interrupt Controller, see the
TMS320DM646x DMSoC ARM Subsystem Reference Guide (literature number SPRUEP9).
The ARM Subsystem includes the System module. The System module consists of a set of registers for
configuring and controlling a variety of system functions. For details and register descriptions for the
System module, see Section 4, Device Configurations and see the TMS320DM646x DMSoC ARMSubsystem Reference Guide (literature number SPRUEP9).
3.3.14 Power Management
DM6467 has several means of managing power consumption. There is extensive use of clock gating,
which reduces the power used by global device clocks and individual peripheral clocks. Clock
management can be utilized to reduce clock frequencies in order to reduce switching power. For more
details on power management techniques, see Section 4, Device Configurations, Section 7, Peripheraland Electrical Specifications, and see the TMS320DM646x DMSoC ARM Subsystem Reference Guide
(literature number SPRUEP9).
DM6467 gives the programmer full flexibility to use any and all of the previously mentioned capabilities to
customize an optimal power management strategy. Several typical power management scenarios are
described in the following sections.
3.4DSP Subsystem
The DSP Subsystem includes the following features:
•C64x+ DSP CPU
•32KB L1 Program (L1P)/Cache (up to 32KB)
•32KB L1 Data (L1D)/Cache (up to 32KB)
•128KB Unified Mapped RAM/Cache (L2)
•Little endian
SPRS403G–DECEMBER 2007–REVISED OCTOBER 2010
3.4.1C64x+ DSP CPU Description
The C64x+ Central Processing Unit (CPU) consists of eight functional units, two register files, and two
data paths as shown in Figure 3-1. The two general-purpose register files (A and B) each contain
32 32-bit registers for a total of 64 registers. The general-purpose registers can be used for data or can be
data address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit
data, 40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are
stored in register pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or
32 MSBs in the next upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
The C64x+ CPU extends the performance of the C64x core through enhancements and new features.
Each C64x+ .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x
32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with
add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four
16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for
Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and
modems require complex multiplication. The complex multiply (CMPY) instruction takes for 16-bit inputs
and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding
capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The
32 x 32 bit multiply instructions provide the extended precision necessary for audio and other
high-precision algorithms on a variety of signed and unsigned 32-bit data types.
The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a
pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C64x+ core enhances the .S unit in several ways. In the C64x core, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C64x+ core they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
Other new features include:
•SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
•Compact Instructions - The native instruction size for the C6000 devices is 32 bits. Many common
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C64x+
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
•Instruction Set Enhancement - As noted above, there are new instructions such as 32-bit
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
•Exceptions Handling - Intended to aid the programmer in isolating bugs. The C64x+ CPU is able to
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as a watchdog time expiration).
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
•Time-Stamp Counter - Primarily targeted for Real-Time Operating System (RTOS) robustness, a
free-running time-stamp counter is implemented in the CPU which is not sensitive to system stalls.
www.ti.com
For more details on the C64x+ CPU and its enhancements over the C64x architecture, see the following
documents:
•TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide (literature number SPRU732)
•TMS320C64x Technical Overview (literature number SPRU395)
A. On .M unit, dst2 is 32 MSB.
B. On .M unit, dst1 is 32 LSB.
C. On C64x CPU .M unit, src2 is 32 bits; on C64x+ CPU .M unit, src2 is 64 bits.
D. On .L and .S units, odd dst connects to odd register files and even dst connects to even register files.
TMS320DM6467
www.ti.com
SPRS403G–DECEMBER 2007–REVISED OCTOBER 2010
Figure 3-1. TMS320C64x+™ CPU (DSP Core) Data Paths
The DSP memory map is shown in Section 3.5, Memory Map Summary. Configuration of the control
registers for DDR2, EMIFA, and ARM Internal RAM is supported by the ARM. The DSP has access to
memories shown in the following sections.
3.4.2.1ARM Internal Memories
The DSP has access to the 32KB ARM Internal RAM on the ARM D-TCM interface (i.e., data only).
3.4.2.2External Memories
The DSP has access to the following External memories:
•DDR2 Synchronous DRAM
•Asynchronous EMIF / NOR Flash
•ATA
3.4.2.3DSP Internal Memories
The DSP has access to the following DSP memories:
•L2 RAM
•L1P RAM
•L1D RAM
www.ti.com
3.4.2.4C64x+ CPU
The C64x+ core uses a two-level cache-based architecture. The Level 1 Program memory/cache (L1P)
consists of 32 KB memory space that can be configured as mapped memory or direct mapped cache. The
Level 1 Data memory/cache (L1D) consists of 32 KB that can be configured as mapped memory or 2-way
set associated cache. The Level 2 memory/cache (L2) consists of a 128 KB RAM memory space that is
shared between program and data space. L2 memory can be configured as mapped memory, cache, or a
combination of both.
Table 3-2 shows a memory map of the C64x+ CPU cache registers for the device.
Memory Attribute Registers for ARM TCM (corresponds to byte address
0x1000 0000 - 0x10FF FFFF)
Memory Attribute Registers for EMIFA (corresponds to byte address 0x4200
0000 - 0x49FF FFFF)
Memory Attribute Registers for VLYNQ (corresponds to byte address
0x4C00 0000 - 0x4FFF FFFF)
Memory Attribute Registers for DDR2 (corresponds to byte address 0x8000
0000 - 0xBFFF FFFF)
3.4.3Peripherals
The DSP has access/controllability of the following peripherals:
•HDVICP0/1
•EDMA
•McASP0/1
•2 Timers (Timer0 and Timer1) that can each be configured as 1 64-bit or 2 32-bit timers
3.4.4DSP Interrupt Controller
The DSP Interrupt Controller accepts device interrupts and appropriately maps them to the DSP’s
available interrupts. The DSP Interrupt Controller is briefly described in this document in the Interrupts
section. For more detailed on the DSP Interrupt Controller, see the TMS320C64x+ DSP Megamodule
Reference Guide (literature number SPRU871).
Table 3-3 shows the memory map address ranges of the device. Table 3-4 depicts the expanded map of
the Configuration Space (0x0180 0000 through 0x0FFF FFFF). The device has multiple on-chip memories
associated with its two processors and various subsystems. To help simplify software development a
unified memory map is used where possible to maintain a consistent view of device resources across all
bus masters.
(1) These peripherals have their own DMA engine or master port interface to the DMSoC system bus and do not use the EDMA for data
transfers. The ✓ symbol indicates that the peripheral has a valid connection through the device switch fabric to the memory region
identified in the EDMA access column.
(2) MPPA should be used to disable the hole. For more information on MPPA, see the TMS320C64x+ DSP Megamodule Reference Guide
(SPRU871).
(3) The HPI's, PCI's, and VLYNQ's access to the configuration bus peripherals is limited, see Table 3-4, Configuration Memory Map
0x01C6 60000x01C6 67FF2KATA✓✓✓
0x01C6 68000x01C6 6FFF2KSPI✓✓✓
0x01C6 70000x01C6 77FF2KGPIO✓✓✓
0x01C6 78000x01C6 7FFF2KHPIHPI✓✓✓
0x01C6 80000x01C7 FFFF96KReservedReserved✓✓✓
0x01C8 00000x01C8 0FFF4KEMAC Control Registers✓✓✓
0x01C8 10000x01C8 1FFF4KEMAC Control Module Registers✓✓✓
0x01C8 20000x01C8 3FFF8KEMAC Control Module RAM✓✓✓
0x01C8 40000x01C8 47FF2KMDIO Control Registers✓✓✓
0x01C8 48000x01D0 0FFF498KReservedReserved✓✓✓
0x01D0 10000x01D0 13FF1KMcASP0 RegistersMcASP0 Registers✓✓✓
0x01D0 14000x01D0 17FF1KMcASP0 Data PortMcASP0 Data Port✓✓✓
0x01D0 18000x01D0 1BFF1KMcASP1 RegistersMcASP1 Registers✓✓✓
0x01D0 1C000x01D0 1FFF1KMcASP1 Data PortMcASP1 Data Port✓✓✓
0x01D0 20000x01DF FFFF1016K ReservedReserved
0x01E0 00000x01FF FFFF2MReservedReserved
0x0200 00000x021F FFFF2MReservedReserved
0x0220 00000x023F FFFF2MReservedReserved
0x0240 00000x0FFF FFFF220MReservedReserved
ARM/EDMAC64x+
Reserved
MASTER PERIPHERAL
ACCESSIBILITY
HPIPCIVLYNQ
www.ti.com
3.6Pin Assignments
Extensive use of pin multiplexing is used to accommodate the largest number of peripheral functions in
the smallest possible package. Pin multiplexing is controlled using a combination of hardware
configuration at device reset and software programmable register settings. For more information on pin
muxing, see Section 4.7, Multiplexed Pin Configurations, of this document.
3.6.1Pin Map (Bottom View)
Figure 3-2 through Figure 3-7 show the bottom view of the package pin assignments in six quadrants (A,
The terminal functions tables (Table 3-5 through Table 3-32) identify the external signal names, the
associated pin (ball) numbers along with the mechanical package designator, the pin type, whether the pin
has any internal pullup or pulldown resistors, and a functional pin description. For more detailed
information on device configuration, peripheral selection, multiplexed/shared pin, and see the DeviceConfigurations section of this data manual.