Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2DSP/BIOS, XDS are trademarks of Texas Instruments.
3All other trademarks are the property of their respective owners.
PRODUCT PREVIEW information concerns products in the formative
or design phase of development. Characteristic data and other
specifications are design goals. Texas Instruments reserves the rightEnglish Data Sheet: SPRS647
to change or discontinue these products without notice.
Table 2-2. Characteristics of the Processor (continued)
HARDWARE FEATURESDM841x
McASP
McBSP1 (2 Data Pins, Transmit/Receive)
Controller Area Network (DCAN)2
Serial ATA (SATA) 3.0 Gbps1 (Supports 1 Hard Disk Drive)
RTC1
GPIOUp to 128 pins
Parallel Camera Interface (CAM)1
Spin Lock Module1 (up to 128 H/W Semaphores)
Mailbox Module1 (with 12 Mailboxes)
Size (Bytes)1120KB RAM, 48KB ROM
On-Chip MemoryOrganization48KB Boot ROM
CPU ID + CPU Rev IDControl Status Register (CSR.[31:16])0x1401
C674x Megamodule
2.4ARM® Cortex™-A8 Microprocessor Unit (MPU) Subsystem Overview
The ARM® Cortex™-A8 Subsystem is designed to give the ARM Cortex-A8 Master control of the device.
In general, the ARM Cortex-A8 is responsible for configuration and control of the various subsystems,
peripherals, and external memories.
The ARM Cortex-A8 Subsystem includes the following features:
•ARM Cortex-A8 RISC processor:
– ARMv7 ISA plus Thumb2™, JazelleX™, and Media Extensions
– Neon™ Floating-Point Unit
– Enhanced Memory Management Unit (MMU)
– Little Endian
– 32KB L1 Instruction Cache
– 32KB L1 Data Cache
– 512KB L2 Cache
•CoreSight Embedded Trace Module (ETM)
•ARM Cortex-A8 Interrupt Controller (AINTC)
•Embedded PLL Controller (PLL_ARM)
•64KB Internal RAM
•48KB Internal Public ROM
www.ti.com.cn
Figure 2-1 shows the ARM Cortex-A8 Subsystem for the device.
For more detailed information on the ARM Cortex-A8 Subsystem, see the ARM Cortex-A8 Subsystem
User's Guide (Literature Number: TBD).
2.4.1ARM Cortex-A8 RISC Processor
The ARM Cortex-A8 Subsystem integrates the ARM Cortex-A8 processor. The ARM Cortex-A8 processor
is a member of ARM Cortex family of general-purpose microprocessors. This processor is targeted at
multi-tasking applications where full memory management, high performance, low die size, and low power
are all important. The ARM Cortex-A8 processor supports the ARM debug architecture and includes logic
to assist in both hardware and software debug. The ARM Cortex-A8 processor has a Harvard architecture
and provides a complete high-performance subsystem, including:
•NEON™ 64-/128-bit Hybrid SIMD Engine for Multimedia
•Enhanced VFPv3 Floating-Point Coprocessor
•Enhanced Memory Management Unit (MMU)
•Separate Level-1 Instruction and Data Caches
•Integrated Level-2 Cache
•128-bit Interconnector-to-System Memories and Peripherals
•Embedded Trace Module (ETM).
2.4.2Embedded Trace Module (ETM)
To support real-time trace, the ARM Cortex-A8 processor provides an interface to enable connection of an
embedded trace module (ETM). The ETM consists of two parts:
•The Trace port which provides real-time trace capability for the ARM Cortex-A8.
•Triggering facilities that provide trigger resources, which include address and data comparators,
counter, and sequencers.
The ARM Cortex-A8 trace port is not pinned out and is, instead, only connected to the system-level
Embedded Trace Buffer (ETB). The ETB has a 32KB buffer memory. ETB enabled debug tools are
required to read/interpret the captured trace data.
ZHCS057–MARCH 2011
For more details on the ETM, see Section 8.5.2, Trace.
2.4.3ARM Cortex-A8 Interrupt Controller (AINTC)
The ARM Cortex-A8 subsystem contains an interrupt controller (AINTC) that prioritizes all service requests
from the system peripherals and generates either IRQ or FIQ to the ARM Cortex-A8 processor. For more
details on the AINTC, see Section 7.5.1, ARM Cortex-A8 Interrupts.
2.4.4ARM Cortex-A8 PLL (PLL_ARM)
The ARM Cortex-A8 subsystem contains an embedded PLL Controller (PLL_ARM) for generating the
subsystem’s clocks from the DEV Clock input. For more details on the PLL_ARM, see Section 7.4,
Clocking.
2.4.5ARM MPU Interconnect
The ARM Cortex-A8 processor is connected through the arbiter to both an L3 interconnect port and a
DMM port. The DMM port is 128-bits wide and provides the ARM Cortex-A8 direct access to the DDR
memories, while the L3 interconnect port is 64-bits wide and provides access to the remaining device
modules.
The C674x central processing unit (CPU) consists of eight functional units, two register files, and two data
paths as shown in Figure 2-2. The two general-purpose register files (A and B) each contain 32 32-bit
registers for a total of 64 registers. The general-purpose registers can be used for data or can be data
address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit data,
40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are stored
in register pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or 32 MSBs in
the next upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
The C674x CPU combines the performance of the C64x+ core with the floating-point capabilities of the
C67x+ core.
Each C674x .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x
32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with
add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four 16
x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for
Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and
modems require complex multiplication. The complex multiply (CMPY) instruction takes for 16-bit inputs
and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding
capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The
32 x 32 bit multiply instructions provide the extended precision necessary for high-precision algorithms on
a variety of signed and unsigned 32-bit data types.
ZHCS057–MARCH 2011
The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a
pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C674x core enhances the .S unit in several ways. On the previous cores, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C674x core they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
Other new features include:
•SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
•Compact Instructions - The native instruction size for the C6000 devices is 32 bits. Many common
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C674x
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
•Instruction Set Enhancement - As noted above, there are new instructions such as 32-bit
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
•Exceptions Handling - Intended to aid the programmer in isolating bugs. The C674x CPU is able to
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as a watchdog time expiration).
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
Time-Stamp Counter - Primarily targeted for Real-Time Operating System (RTOS) robustness, a
free-running time-stamp counter is implemented in the CPU which is not sensitive to system stalls.
For more details on the C674x CPU and its enhancements over the C64x architecture, see the following
documents:
•TMS320C674x DSP CPU and Instruction Set User's Guide (literature number SPRUFE8)
•TMS320C674x DSP Megamodule Reference Guide (literature number SPRUFK5)
A. .M unit,is 32 MSB.
B On .M unit,is 32 LSB.
C. On C64x CPU .M unit,is 32 bits; on C64x+ CPU .M unit,is 64 bits.
D. On .L and .S units,connects to odd register files and evenconnects to even register files
All C674x accesses through its MDMA port will be directed through the DSP/EDMA MMU module where
they are remapped to physical system addresses. This protects the ARM Cortex-A8 memory regions from
accidental corruption by C674x code and allows for direct allocation of buffers in user space without the
need for translation between ARM and DSP applications.
In addition, accesses by the EDMA TC0 and TC1 may optionally be routed through the DSP/EDMA MMU.
This allows EDMA Channels 0 and 1 to be used by the DSP to perform transfers using only the known
virtual addresses of the associated buffers. The MMU_CFG register in the Control Module is used to
enable/disable use of the DSP/EDMA MMU by the EDMA TCs.
For more details on the DSP/EDMA MMU features, see the TBD Memory Management Unit (MMU) User's
Guide (Literature Number: TBD).
The Media Controller has the responsibility of managing the HDVPSS, HDVICP2, and ISS modules.
For more details on the Media Controller, see the TBD Subsystem User's Guide (Literature Number:
TBD).
2.8HDVICP2 Overview
The HDVICP2 is a Video Encoder/Decoder hardware accelerator supporting a range of encode, decode,
and transcode operations for most major video codec standards. The main video Codec standards
supported in hardware are MPEG1/2/4 ASP/SP, H.264 BL/MP/HP, VC-1 SP/MP/AP, RV9/10, AVS-1.0,
and ON2 VP6.2/VP7.
The HDVICP2 hardware accelerator is composed of the following elements:
•Motion estimation acceleration engine
•Loop filter acceleration engine
•Sequencer, including its memories and an interrupt controller
•Intra-prediction estimation engine
•Calculation engine
•Motion compensation engine
•Entropy coder/decoder
•Video Direct Memory Access (DMA)
•Synchronization boxes
•Shared L2 controller
•Local interconnect
ZHCS057–MARCH 2011
For more details on the HDVICP2 see the HDVICP2 User's Guide (Literature Number: TBD).
The SGX530 is a vector/3D graphics accelerator for vector and 3-dimensional (3D) graphics applications.
The SGX530 graphics accelerator efficiently processes a number of various multimedia data types
concurrently:
•Pixel data
•Vertex data
•Video data
This is achieved using a multi-threaded architecture using two levels of scheduling and data partitioning
enabling zero overhead task switching.
•Advanced shader feature set - in excess of Microsoft VS3.0, PS3.0, and OpenGL2.0
•Industry standard API support - OpenGL ES 1.1 and 2.0, OpenVG v1.1
•Fine-grained task switching, load balancing, and power management
•Advanced geometry DMA driven operation for minimum CPU interaction
•Programmable high-quality image anti-aliasing
•POWERVR SGX core MMU for address translation from the core virtual address to the external
physical address (up to 4GB address range)
•Fully-virtualized memory addressing for OS operation in a unified memory architecture
•Advanced and standard 2D operations [e.g., vector graphics, block level transfers (BLTs), raster
operations (ROPs)]
www.ti.com.cn
For more details on the SSM, see the ARM Cortex-A8 Subsystem User's Guide (Literature Number: TBD).
2.10 Spinlock Module Overview
The Spinlock module provides hardware assistance for synchronizing the processes running on multiple
processors in the device:
•ARM Cortex-A8 processor
•C674x DSP
•Media Controller .
The Spinlock module implements 128 spinlocks (or hardware semaphores) that provide an efficient way to
perform a lock operation of a device resource using a single read-access, avoiding the need for a
read-modify-write bus transfer of which the programmable cores are not capable.
For more detailed information on the Spinlock Module, see the TBD User's Guide (Literature Number:
TBD).
The device Mailbox module facilitates communication between the ARM Cortex-A8, C674x DSP, and the
Media Controller. It consists of twelve mailboxes, each supporting a 1-way communication between two of
the above processors. The sender sends information to the receiver by writing a message to the mailbox
registers. Interrupt signaling is used to notify the receiver that a message has been queued or to notify the
sender about an overflow situation.
The Mailbox module supports the following features (see Figure 2-4):
•12 mailboxes
•Flexible mailbox-to-processor assignment scheme
•Four-message FIFO depth for each message queue
•32-bit message width
•Message reception and queue-not-full notification using interrupts
•Four interrupts (one to ARM Cortex-A8, one to C674x, and two to Media Controller)
ZHCS057–MARCH 2011
For more detailed information on the Mailbox Module, see the TBD User's Guide (Literature Number:
TBD).
The device has multiple on-chip memories associated with its two processors and various subsystems. To
help simplify software development a unified memory map is used where possible to maintain a consistent
view of device resources across all bus masters.
2.12.1 L3 Memory Map
Table 2-3 shows the L3 memory map for all system masters (including Cortex-A8), except for the C674x
DSP. Table 2-4 shows the memory map for the C674x DSP.
For more details on the interconnect topology and connectivity across the L3 and L4 interconnects, see
(1) Addresses 0x1000_0000 to 0x10FF_FFFF are mapped to C674x internal addresses 0x0000_0000 to 0x00FF_FFFF.
(2) For more details on the DSP/EDMA MMU, see the TMS320DM814x DMSoC Memory Management Units (MMU) User's Guide
The L4 Fast Peripheral Domain, L4 Slow Peripheral Domain regions of the memory maps above are
broken out into Table 2-5 and Table 2-6.
For more details on the interconnect topology and connectivity across the L3 and L4 interconnects, see
Section 5, System Interconnect.
2.12.3.1 L4 Fast Peripheral Memory Map
Cortex-A8 and L3 MastersC674x DSP
STARTSTART
ADDRESSADDRESS
(HEX)(HEX)
0x4A00_00000x4A00_07FF0x0A00_00000x0A00_07FF2KBL4 Fast Configuration - Address/Protection
0x4A00_08000x4A00_0FFF0x0A00_08000x0A00_0FFF2KBL4 Fast Configuration - Link Agent (LA)
0x4A00_10000x4A00_13FF0x0A00_10000x0A00_13FF1KBL4 Fast Configuration - Initiator Port (IP0)
0x4A00_14000x4A00_17FF0x0A00_14000x0A00_17FF1KBL4 Fast Configuration - Initiator Port (IP1)
0x4A00_18000x4A00_1FFF0x0A00_18000x0A00_1FFF2KBReserved
0x4A00_20000x4A07_FFFF0x0A00_20000x0A07_FFFF504KBReserved
0x4A08_00000x4A0F_FFFF0x0A08_00000x0A0F_FFFF512KBReserved
0x4A10_00000x4A10_7FFF0x0A10_00000x0A10_7FFF32KBEMAC SW Peripheral Registers
0x4A10_80000x4A10_8FFF0x0A10_80000x0A10_8FFF4KBEMAC SW Support Registers
0x4A14_00000x4A14_FFFF0x0A14_00000x0A14_FFFF64KBSATA Peripheral Registers
0x4A15_00000x4A15_0FFF0x0A15_00000x0A15_0FFF4KBSATA Support Registers
0x4A15_10000x4A17_FFFF0x0A15_10000x0A17_FFFF188KBReserved
0x4A18_00000x4A1A_1FFF0x0A18_00000x0A1A_1FFF136KBReserved
0x4A1A_20000x4A1A_3FFF0x0A1A_20000x0A1A_3FFF8KBMcASP3 Configuration Peripheral Registers
0x4A1A_40000x4A1A_4FFF0x0A1A_40000x0A1A_4FFF4KBMcASP3 Configuration Support Registers
0x4A1A_50000x4A1A_5FFF0x0A1A_50000x0A1A_5FFF4KBMcASP3 Data Peripheral Registers
0x4A1A_60000x4A1A_6FFF0x0A1A_60000x0A1A_6FFF4KBMcASP3 Data Support Registers
0x4A1A_80000x4A1A_9FFF0x0A1A_80000x0A1A_9FFF8KBMcASP4 Configuration Peripheral Registers
0x4A1A_A0000x4A1A_AFFF0x0A1A_A0000x0A1A_AFFF4KBMcASP4 Configuration Support Registers
0x4A1A_B0000x4A1A_BFFF0x0A1A_B0000x0A1A_BFFF4KBMcASP4 Data Peripheral Registers
0x4A1A_C0000x4A1A_CFFF0x0A1A_C0000x0A1A_CFFF4KBMcASP4 Data Support Registers
0x4A1A_D0000x4A1A_DFFF0x0A1A_D0000x0A1A_DFFF4KBReserved
0x4A1A_E0000x4A1A_FFFF0x0A1A_E0000x0A1A_FFFF8KBMcASP5 Configuration Peripheral Registers
0x4A1B_00000x4A1B_0FFF0x0A1B_00000x0A1B_0FFF4KBMcASP5 Configuration Support Registers
0x4A1B_10000x4A1B_1FFF0x0A1B_10000x0A1B_1FFF4KBMcASP5 Data Peripheral Registers
0x4A1B_20000x4A1B_2FFF0x0A1B_20000x0A1B_2FFF4KBMcASP5 Data Support Registers
0x4A1B_30000x4A1B_5FFF0x0A1B_30000x0A1B_5FFF12KBReserved
0x4A1B_60000x4A1B_6FFF0x0A1B_60000x0A1B_6FFF4KBReserved
0x4A1B_40000x4AFF_FFFF0x0A1B_40000x0AFF_FFFF14632KBReserved
(1) These regions decoded internally by the Cortex™-A8 Subsystem and are not physically part of the L4 Slow. They are included here only
for reference when considering the Cortex™-A8 Memory Map. For Masters other than the Cortex-A8 these regions are reserved.
(2) These regions decoded internally by the Cortex™-A8 Subsystem and are not physically part of the L4 Slow. They are included here only
for reference when considering the Cortex™-A8 Memory Map. For Masters other than the Cortex-A8 these regions are reserved.
END ADDRESSSTARTEND ADDRESS
(HEX)ADDRESS (HEX)(HEX)
SIZEDEVICE NAME
(1)
(1)
(1)
(1)
(1)
(2)
2.12.4 DDR DMM TILER Extended Addressing Map
The Tiler includes an additional 4-GBytes of addressing range, enabled by a 33rd address bit, to access
the frame buffer in rotated and mirrored views. shows the details of the Tiler Extended Address Mapping.
This entirety of this additional range is only accessible to the HDVPSS and ISS subsystems. However,
other masters can access any one single view through the 512-MB Tiler region in the base 4GByte
address memory map.