(EDMA3)
– Serial ATA (SATA) Controller
– DDR2/Mobile DDR Memory Controller(EDMA3):
– Two Multimedia Card (MMC)/Secure Digital– 2 Channel Controllers
(SD) Card Interface
– LCD Controller
– Video Port Interface (VPIF)
– 10/100 Mb/s Ethernet MAC (EMAC):
– Programmable Real-Time Unit Subsystem
– Three Configurable UART Modules
– USB 1.1 OHCI (Host) With Integrated PHYSupport
– USB 2.0 OTG Port With Integrated PHY– 64 General-Purpose Registers (32 Bit)
– One Multichannel Audio Serial Port– Six ALU (32-/40-Bit) Functional Units
– Two Multichannel Buffered Serial Ports•Supports 32-Bit Integer, SP (IEEE Single
• ARM926EJ-S Core
– 32-Bit and 16-Bit (Thumb®) Instructions
– DSP Instruction Extensions
– Single Cycle MAC
– ARM® Jazelle® Technology
– EmbeddedICE-RT™ for Real-Time Debug
• ARM9 Memory Architecture
– 16K-Byte Instruction Cache
– 16K-Byte Data Cache
– 8K-Byte RAM (Vector Table)
– 64K-Byte ROM
• C674x Instruction Set Features
– Superset of the C67x+™ and C64x+™ ISAs
– Up to 3648/2746 C674x MIPS/MFLOPS
– Byte-Addressable (8-/16-/32-/64-Bit Data)
– 8-Bit Overflow Protection
– Bit-Field Extract, Set, Clear
1
Please be aware that an important notice concerning availability, standard warranty, and use in critical applications of Texas
Instruments semiconductor products and disclaimers thereto appears at the end of this data sheet.
2TMS320C6000, C6000 are trademarks of Texas Instruments.
3ARM926EJ-S is a trademark of ARM Limited.
ADVANCE INFORMATION concerns new products in the sampling
or preproduction phaseof development. Characteristic dataand other
specifications are subjectto change without notice.
Precision/32-Bit) and DP (IEEE Double
Precision/64-Bit) Floating Point
•Supports up to Four SP Additions Per
Clock, Four DP Additions Every 2 Clocks
•Supports up to Two Floating Point (SP or
DP) Reciprocal Approximation (RCPxP)
and Square-Root Reciprocal
Approximation (RSQRxP) Operations Per
Cycle
– Two Multiply Functional Units
•Mixed-Precision IEEE Floating Point
Multiply Supported up to:
– 2 SP x SP -> SP Per Clock
– 2 SP x SP -> DP Every Two Clocks
– 2 SP x DP -> DP Every Three Clocks
– 2 DP x DP -> DP Every Four Clocks
•Fixed Point Multiply Supports Two 32 x
32-Bit Multiplies, Four 16 x 16-Bit
Multiplies, or Eight 8 x 8-Bit Multiplies per
Clock Cycle, and Complex Multiples
– Instruction Packing Reduces Code Size
– All Instructions Conditional
– Hardware Support for Modulo Loop• USB 1.1 OHCI (Host) With Integrated PHY
Operation(USB1)
– Protected Mode Operation• USB 2.0 OTG Port With Integrated PHY (USB0)
– Exceptions Support for Error Detection and– USB 2.0 High-/Full-Speed Client
Program Redirection
• Software Support
– TI DSP/BIOS™
– USB 2.0 High-/Full-/Low-Speed Host
– End Point 0 (Control)
– End Points 1,2,3,4 (Control, Bulk, Interrupt or
– Chip Support Library and DSP LibraryISOC) Rx and Tx
• 128K-Byte RAM Shared Memory• One Multichannel Audio Serial Port:
• 1.8V or 3.3V LVCMOS IOs (except for USB and– Two Clock Zones and 16 Serial Data Pins
DDR2 interfaces)
• Two External Memory Interfaces:
– EMIFA
•NOR (8-/16-Bit-Wide Data)
•NAND (8-/16-Bit-Wide Data)
•16-Bit SDRAM With 128 MB Address
Space
– DDR2/Mobile DDR Memory Controller
•16-Bit DDR2 SDRAM With 512 MB
Address Space or
•16-Bit mDDR SDRAM With 256 MB
Address Space
• Three Configurable 16550 type UART Modules:
– With Modem Control Signals
– 16-byte FIFO
– 16x or 13x Oversampling Option
• LCD Controller
• Two Serial Peripheral Interfaces (SPI) Each
With Multiple Chip-Selects
• Two Multimedia Card (MMC)/Secure Digital (SD)
Card Interface with Secure Data I/O (SDIO)
Interfaces
• Two Master/Slave Inter-Integrated Circuit (I2C
Bus™)
• One Host-Port Interface (HPI) With 16-Bit-Wide
Muxed Address/Data Bus For High Bandwidth
• Programmable Real-Time Unit Subsystem
(PRUSS)
– Two Independent Programmable Realtime
Unit (PRU) Cores
•32-Bit Load/Store RISC architecture
•4K Byte instruction RAM per core
•512 Bytes data RAM per core
•PRU Subsystem (PRUSS) can be disabled
via software to save power
•Register 30 of each PRU is exported from
the subsystem in addition to the normal
R31 output of the PRU cores.
– Standard power management mechanism
•Clock gating
•Entire subsystem under a single PSC
clock gating domain
– Dedicated interrupt controller
– Supports TDM, I2S, and Similar Formats
– DIT-Capable
– FIFO buffers for Transmit and Receive
• Two Multichannel Buffered Serial Ports:
– Supports TDM, I2S, and Similar Formats
– AC97 Audio Codec Interface
– Telecom Interfaces (ST-Bus, H100)
– 128-channel TDM
– FIFO buffers for Transmit and Receive
• 10/100 Mb/s Ethernet MAC (EMAC):
– IEEE 802.3 Compliant
– MII Media Independent Interface
– RMII Reduced Media Independent Interface
– Management Data I/O (MDIO) Module
• Video Port Interface (VPIF):
– Two 8-bit SD (BT.656), Single 16-bit or Single
Raw (8-/10-/12-bit) Video Capture Channels
– Two 8-bit SD (BT.656), Single 16-bit Video
Display Channels
• Universal Parallel Port (uPP):
– High-Speed Parallel Interface to FPGAs and
Data Converters
– Data Width on Each of Two Channels is 8- to
16-bit Inclusive
– Single Data Rate or Dual Data Rate Transfers
– Supports Multiple Interfaces with START,
ENABLE and WAIT Controls
• Serial ATA (SATA) Controller:
– Supports SATA I (1.5 Gbps) and SATA II (3.0
Gbps)
– Supports all SATA Power Management
Features
– Hardware-Assisted Native Command
Queueing (NCQ) for up to 32 Entries
– Supports Port Multiplier and
Command-Based Switching
• Real-Time Clock With 32 KHz Oscillator and
Separate Power Rail
• Three 64-Bit General-Purpose Timers (Each
configurable as Two 32-Bit Timers)
• One 64-bit General-Purpose/Watchdog Timer
(Configurable as Two 32-bit General-Purpose
Timers)
The device is a Low-power applications processor based on an ARM926EJ-S™ and a C674x DSP core. It
provides significantly lower power than other members of the TMS320C6000™ platform of DSPs.
The device enables OEMs and ODMs to quickly bring to market devices featuring robust operating
systems support, rich user interfaces, and high processing performance life through the maximum
flexibility of a fully integrated mixed processor solution.
The dual-core architecture of the device provides benefits of both DSP and Reduced Instruction Set
Computer (RISC) technologies, incorporating a high-performance TMS320C674x DSP core and an
ARM926EJ-S core.
The ARM926EJ-S is a 32-bit RISC processor core that performs 32-bit or 16-bit instructions and
processes 32-bit, 16-bit, or 8-bit data. The core uses pipelining so that all parts of the processor and
memory system can operate continuously.
The ARM core has a coprocessor 15 (CP15), protection module, and Data and program Memory
Management Units (MMUs) with table look-aside buffers. It has separate 16K-byte instruction and
16K-byte data caches. Both are four-way associative with virtual index virtual tag (VIVT). The ARM core
also has a 8KB RAM (Vector Table) and 64KB ROM.
The device DSP core uses a two-level cache-based architecture. The Level 1 program cache (L1P) is a
32KB direct mapped cache and the Level 1 data cache (L1D) is a 32KB 2-way set-associative cache. The
Level 2 program cache (L2P) consists of a 256KB memory space that is shared between program and
data space. L2 memory can be configured as mapped memory, cache, or combinations of the two.
Although the DSP L2 is accessible by ARM and other hosts in the system, an additional 128KB RAM
shared memory is available for use by other hosts without affecting DSP performance.
SPRS586B–JUNE 2009–REVISED AUGUST 2010
The peripheral set includes: a 10/100 Mb/s Ethernet MAC (EMAC) with a Management Data Input/Output
(MDIO) module; one USB2.0 OTG interface; one USB1.1 OHCI interface; two inter-integrated circuit (I2C)
Bus interfaces; one multichannel audio serial port (McASP) with 16 serializers and FIFO buffers; two
multichannel buffered serial ports (McBSP) with FIFO buffers; two SPI interfaces with multiple chip
selects; four 64-bit general-purpose timers each configurable (one configurable as watchdog); a
configurable 16-bit host port interface (HPI) ; up to 9 banks of 16 pins of general-purpose input/output
(GPIO) with programmable interrupt/event generation modes, multiplexed with other peripherals; three
UART interfaces (each with RTS and CTS); two enhanced high-resolution pulse width modulator
(eHRPWM) peripherals; 3 32-bit enhanced capture (eCAP) module peripherals which can be configured
as 3 capture inputs or 3 auxiliary pulse width modulator (APWM) outputs; and 2 external memory
interfaces: an asynchronous and SDRAM external memory interface (EMIFA) for slower memories or
peripherals, and a higher speed DDR2/Mobile DDR controller.
The Ethernet Media Access Controller (EMAC) provides an efficient interface between the device and a
network. The EMAC supports both 10Base-T and 100Base-TX, or 10 Mbits/second (Mbps) and 100 Mbps
in either half- or full-duplex mode. Additionally an Management Data Input/Output (MDIO) interface is
available for PHY configuration. The EMAC supports both MII and RMII interfaces.
The SATA controller provides a high-speed interface to mass data storage devices. The SATA controller
supports both SATA I (1.5 Gbps) and SATA II (3.0 Gbps).
The Universal Parallel Port (uPP) provides a high-speed interface to many types of data converters,
FPGAs or other parallel devices. The UPP supports programmable data widths between 8- to 16-bits on
each of two channels. Single-data rate and double-data rate transfers are supported as well as START,
ENABLE and WAIT signals to provide control for a variety of data converters.
A Video Port Interface (VPIF) is included providing a flexible video input/output port.
The rich peripheral set provides the ability to control external peripheral devices and communicate with
external processors. For details on each of the peripherals, see the related sections later in this document
and the associated peripheral reference guides.
The device has a complete set of development tools for the ARM and DSP. These include C compilers, a
DSP assembly optimizer to simplify programming and scheduling, and a Windows™ debugger interface
for visibility into source code execution.
NOTE: This is a placeholder for the Revision History Table for future revisions of the document.
This data manual revision history highlights the changes made to the SPRS586A device-specific data
manual to make it an SPRS586B revision.
Table 2-1. Revision History
ADDITIONS/MODIFICATIONS/DELETIONS
Global - Added MPU Content
Global - Replaced all "CLKIN" references with "OSCIN"
Global - Updated td(SCSL_SPC)S min from P to 2P
Global - Made changes in the document to reflect the following detail.
"The DSP L2 ROM is used for boot purposes and cannot be programmed with application code".
Global - Updated the pin map graphic to fix typos.
Global -
•All instances of EMU[0] updated to EMU0
•All instances of EMU[1] updated toEMU1
•All instances of UART1_RTS updated to have an overbar
•All instances of UART2_RTS updated to have an overbar
•All instances of SPI1_SCS[0] updated to have an overbar
•All instances of EMA_CS[4] updated to have an overbar
•All instances of SPI1_ENA updated to have an overbar
•All instances of SATA_TXN updated to have an overbar
•All instances of LCD_AC_ENB_CS updated to have an overbar
•All instances of DDR_CS updated to have an overbar
•All instances of UHPI_HRDY updated to have an overbar
•All instances of UHPI_HDS1 updated to have an overbar
•All instances of UHPI_HCS updated to have an overbar
Added Table 3-3 C674x L1/L2 Memory Protection Registers
Added Section 3.10 Unused Pin Configurations
Added Section 6.6.3- Dynamic Voltage and Frequency Scaling (DVFS)
AddedSection 4.3 Pullup/Pulldown Resistors
Added Section 6.14.3 - SATA Unused Signal Configuration
Added sections -Section 6.14.2 - SATA Interface, Section 6.14.2.1 - SATA Interface Schematic, Section 6.14.2.2 - Compatible SATA
The following documents are available on the Internet at www.ti.com. Tip: Enter the literature number in
the search box provided at www.ti.com.
DSP Reference Guides
SPRUG82TMS320C674x DSP Cache User's Guide. Explains the fundamentals of memory caches
and describes how the two-level cache-based internal memory architecture in the
TMS320C674x digital signal processor (DSP) can be efficiently used in DSP applications.
Shows how to maintain coherence with external memory, how to use DMA to reduce
memory latencies, and how to optimize your code to improve cache efficiency. The internal
memory architecture in the C674x DSP is organized in a two-level hierarchy consisting of a
dedicated program cache (L1P) and a dedicated data cache (L1D) on the first level.
Accesses by the CPU to the these first level caches can complete without CPU pipeline
stalls. If the data requested by the CPU is not contained in cache, it is fetched from the next
lower memory level, L2 or external memory.
SPRUFE8TMS320C674x DSP CPU and Instruction Set Reference Guide. Describes the CPU
architecture, pipeline, instruction set, and interrupts for the TMS320C674x digital signal
processors (DSPs). The C674x DSP is an enhancement of the C64x+ and C67x+ DSPs with
added functionality and an expanded instruction set.
www.ti.com
SPRUFK5TMS320C674x DSP Megamodule Reference Guide. Describes the TMS320C674x digital
signal processor (DSP) megamodule. Included is a discussion on the internal direct memory
access (IDMA) controller, the interrupt controller, the power-down controller, memory
protection, bandwidth management, and the memory and cache.
EMIFA
Flash Card InterfaceMMC and SD cards supported.
EDMA3
Timers
UART3 (each with RTS and CTS flow control)
SPI2 (Each with one hardware chip select)
Peripherals
Not all peripherals pins
are available at the
same time (for more
detail, see the Device
Configurations section).
On-Chip Memory
C674x CPU ID + CPU
Rev ID
C674x Megamodule
Revision
JTAG BSDL_IDDEVIDR0 Register0x0B7D_102F
CPU FrequencyMHz
Voltage
Packages
I2C2 (both Master/Slave)
Multichannel Audio Serial Port [McASP]1 (each with transmit/receive, FIFO buffer, 16 serializers)
Multichannel Buffered Serial Port [McBSP]2 (each with transmit/receive, FIFO buffer, 16)
10/100 Ethernet MAC with Management Data I/O1 (MII or RMII Interface)
USB 2.0 (USB0)High-Speed OTG Controller with on-chip OTG PHY
USB 1.1 (USB1)Full-Speed OHCI (as host) with on-chip PHY
General-Purpose Input/Output Port9 banks of 16-bit
LCD Controller1
SATA Controller1 (Support both SATA I and SATAII)
Universal Parallel Port (uPP)1
Video Port Interface (VPIF)1 (video in and video out)
PRU Subsystem (PRUSS)2 Programmable PRU Cores
Size (Bytes)488KB RAM
OrganizationARM
Control Status Register (CSR.[31:16])0x1400
Revision ID Register (MM_REVID[15:0])0x0000
Core (V)
I/O (V)1.8V or 3.3 V
4 64-Bit General Purpose (each configurable as 2 separate
DSP Memories can be made accessible to ARM, EDMA3,
DDR2, 16-bit bus width, up to 150 MHz
Mobile DDR, 16-bit bus width, up to 133 MHz
Asynchronous (8/16-bit bus width) RAM, Flash,
16-bit SDRAM, NOR, NAND
64 independent channels, 16 QDMA channels,
2 channel controllers, 3 transfer controllers
32-bit timers, one configurable as Watch Dog)
4 Single Edge, 4 Dual Edge Symmetric, or
2 Dual Edge Asymmetric Outputs
DSP
32KB L1 Program (L1P)/Cache (up to 32KB)
32KB L1 Data (L1D)/Cache (up to 32KB)
256KB Unified Mapped RAM/Cache (L2)
and other peripherals.
16KB I-Cache
16KB D-Cache
8KB RAM (Vector Table)
64KB ROM
ADDITIONAL SHARED MEMORY
128KB RAM
674x DSP 375 MHz (1.2V) or 456 MHz (1.3V)
ARM926 375 MHz (1.2V) or 456 MHz (1.3V)
1.2 V nominal for 375 MHz version
1.3 V nominal for 456 MHz version
13 mm x 13 mm, 361-Ball 0.65 mm pitch, PBGA (ZCE)
16 mm x 16 mm, 361-Ball 0.80 mm pitch, PBGA (ZWT)
Table 3-1. Characteristics of OMAP-L138 (continued)
HARDWARE FEATURESOMAP-L138
Product Status
(1) ADVANCE INFORMATION concerns new products in the sampling or preproduction phase of development. Characteristic data and
other specifications are subject to change without notice. PRODUCTION DATA information is current as of publication date. Products
conform to specifications per the terms of the Texas Instruments standard warranty. Production processing does not necessarily include
testing of all parameters.
(1)
Product Preview (PP),
Advance Information (AI),
or Production Data (PD)
375 MHz versions - PD
456 MHz versions - AI
3.3Device Compatibility
The ARM926EJ-S RISC CPU is compatible with other ARM9 CPUs from ARM Holdings plc.
The C674x DSP core is code-compatible with the C6000™ DSP platform and supports features of both
the C64x+ and C67x+ DSP families.
3.4ARM Subsystem
The ARM Subsystem includes the following features:
•ARM926EJ-S RISC processor
•ARMv5TEJ (32/16-bit) instruction set
•Little endian
•System Control Co-Processor 15 (CP15)
•MMU
•16KB Instruction cache
•16KB Data cache
•Write Buffer
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
•ARM Interrupt controller
3.4.1ARM926EJ-S RISC CPU
The ARM Subsystem integrates the ARM926EJ-S processor. The ARM926EJ-S processor is a member of
ARM9 family of general-purpose microprocessors. This processor is targeted at multi-tasking applications
where full memory management, high performance, low die size, and low power are all important. The
ARM926EJ-S processor supports the 32-bit ARM and 16 bit THUMB instruction sets, enabling the user to
trade off between high performance and high code density. Specifically, the ARM926EJ-S processor
supports the ARMv5TEJ instruction set, which includes features for efficient execution of Java byte codes,
providing Java performance similar to Just in Time (JIT) Java interpreter, but without associated code
overhead.
The ARM926EJ-S processor supports the ARM debug architecture and includes logic to assist in both
hardware and software debug. The ARM926EJ-S processor has a Harvard architecture and provides a
complete high performance subsystem, including:
•ARM926EJ -S integer core
•CP15 system control coprocessor
•Memory Management Unit (MMU)
•Separate instruction and data caches
•Write buffer
•Separate instruction and data (internal RAM) interfaces
•Separate instruction and data AHB bus interfaces
•Embedded Trace Module and Embedded Trace Buffer (ETM/ETB)
For more complete details on the ARM9, refer to the ARM926EJ-S Technical Reference Manual, available
at http://www.arm.com
3.4.2CP15
The ARM926EJ-S system control coprocessor (CP15) is used to configure and control instruction and
data caches, Memory Management Unit (MMU), and other ARM subsystem functions. The CP15 registers
are programmed using the MRC and MCR ARM instructions, when the ARM in a privileged mode such as
supervisor or system mode.
3.4.3MMU
A single set of two level page tables stored in main memory is used to control the address translation,
permission checks and memory region attributes for both data and instruction accesses. The MMU uses a
single unified Translation Lookaside Buffer (TLB) to cache the information held in the page tables. The
MMU features are:
•Standard ARM architecture v4 and v5 MMU mapping sizes, domains and access protection scheme.
•Access permissions for large pages and small pages can be specified separately for each quarter of
the page (subpage permissions)
•Hardware page table walks
•Invalidate entire TLB, using CP15 register 8
•Invalidate TLB entry, selected by MVA, using CP15 register 8
•Lockdown of TLB entries, using CP15 register 10
SPRS586B–JUNE 2009–REVISED AUGUST 2010
3.4.4Caches and Write Buffer
The size of the Instruction cache is 16KB, Data cache is 16KB. Additionally, the caches have the following
features:
•Virtual index, virtual tag, and addressed using the Modified Virtual Address (MVA)
•Four-way set associative, with a cache line length of eight words per line (32-bytes per line) and with
two dirty bits in the Dcache
•Dcache supports write-through and write-back (or copy back) cache operation, selected by memory
region using the C and B bits in the MMU translation tables
•Critical-word first cache refilling
•Cache lockdown registers enable control over which cache ways are used for allocation on a line fill,
providing a mechanism for both lockdown, and controlling cache corruption
•Dcache stores the Physical Address TAG (PA TAG) corresponding to each Dcache entry in the TAG
RAM for use during the cache line write-backs, in addition to the Virtual Address TAG stored in the
TAG RAM. This means that the MMU is not involved in Dcache write-back operations, removing the
possibility of TLB misses related to the write-back address.
•Cache maintenance operations provide efficient invalidation of, the entire Dcache or Icache, regions of
the Dcache or Icache, and regions of virtual memory.
The write buffer is used for all writes to a noncachable bufferable region, write-through region and write
misses to a write-back region. A separate buffer is incorporated in the Dcache for holding write-back for
cache line evictions or cleaning of dirty cache lines. The main write buffer has 16-word data buffer and a
four-address buffer. The Dcache write-back has eight data word entries and a single address entry.
The ARM Subsystem uses the AHB port of the ARM926EJ-S to connect the ARM to the Config bus and
the external memories. Arbiters are employed to arbitrate access to the separate D-AHB and I-AHB by the
Config Bus and the external memories bus.
3.4.6Embedded Trace Macrocell (ETM) and Embedded Trace Buffer (ETB)
To support real-time trace, the ARM926EJ-S processor provides an interface to enable connection of an
Embedded Trace Macrocell (ETM). The ARM926ES-J Subsystem in the device also includes the
Embedded Trace Buffer (ETB). The ETM consists of two parts:
•Trace Port provides real-time trace capability for the ARM9.
•Triggering facilities provide trigger resources, which include address and data comparators, counter,
and sequencers.
The device trace port is not pinned out and is instead only connected to the Embedded Trace Buffer. The
ETB has a 4KB buffer memory. ETB enabled debug tools are required to read/interpret the captured trace
data.
3.4.7ARM Memory Mapping
By default the ARM has access to most on and off chip memory areas, including the DSP Internal
memories, EMIFA, DDR2, and the additional 128K byte on chip shared SRAM. Likewise almost all of the
on chip peripherals are accessible to the ARM by default.
www.ti.com
See Table 3-4 for a detailed top level device memory map that includes the ARM memory space.
The C674x Central Processing Unit (CPU) consists of eight functional units, two register files, and two
data paths as shown in Figure 3-2. The two general-purpose register files (A and B) each contain
32 32-bit registers for a total of 64 registers. The general-purpose registers can be used for data or can be
data address pointers. The data types supported include packed 8-bit data, packed 16-bit data, 32-bit
data, 40-bit data, and 64-bit data. Values larger than 32 bits, such as 40-bit-long or 64-bit-long values are
stored in register pairs, with the 32 LSBs of data placed in an even register and the remaining 8 or
32 MSBs in the next upper register (which is always an odd-numbered register).
The eight functional units (.M1, .L1, .D1, .S1, .M2, .L2, .D2, and .S2) are each capable of executing one
instruction every clock cycle. The .M functional units perform all multiply operations. The .S and .L units
perform a general set of arithmetic, logical, and branch functions. The .D units primarily load data from
memory to the register file and store results from the register file into memory.
The C674x CPU combines the performance of the C64x+ core with the floating-point capabilities of the
C67x+ core.
Each C674x .M unit can perform one of the following each clock cycle: one 32 x 32 bit multiply, one 16 x
32 bit multiply, two 16 x 16 bit multiplies, two 16 x 32 bit multiplies, two 16 x 16 bit multiplies with
add/subtract capabilities, four 8 x 8 bit multiplies, four 8 x 8 bit multiplies with add operations, and four
16 x 16 multiplies with add/subtract capabilities (including a complex multiply). There is also support for
Galois field multiplication for 8-bit and 32-bit data. Many communications algorithms such as FFTs and
modems require complex multiplication. The complex multiply (CMPY) instruction takes for 16-bit inputs
and produces a 32-bit real and a 32-bit imaginary output. There are also complex multiplies with rounding
capability that produces one 32-bit packed output that contain 16-bit real and 16-bit imaginary values. The
32 x 32 bit multiply instructions provide the extended precision necessary for high-precision algorithms on
a variety of signed and unsigned 32-bit data types.
www.ti.com
The .L or (Arithmetic Logic Unit) now incorporates the ability to do parallel add/subtract operations on a
pair of common inputs. Versions of this instruction exist to work on 32-bit data or on pairs of 16-bit data
performing dual 16-bit add and subtracts in parallel. There are also saturated forms of these instructions.
The C674x core enhances the .S unit in several ways. On the previous cores, dual 16-bit MIN2 and MAX2
comparisons were only available on the .L units. On the C674x core they are also available on the .S unit
which increases the performance of algorithms that do searching and sorting. Finally, to increase data
packing and unpacking throughput, the .S unit allows sustained high performance for the quad 8-bit/16-bit
and dual 16-bit instructions. Unpack instructions prepare 8-bit data for parallel 16-bit operations. Pack
instructions return parallel results to output precision including saturation support.
Other new features include:
•SPLOOP - A small instruction buffer in the CPU that aids in creation of software pipelining loops where
multiple iterations of a loop are executed in parallel. The SPLOOP buffer reduces the code size
associated with software pipelining. Furthermore, loops in the SPLOOP buffer are fully interruptible.
•Compact Instructions - The native instruction size for the C6000 devices is 32 bits. Many common
instructions such as MPY, AND, OR, ADD, and SUB can be expressed as 16 bits if the C674x
compiler can restrict the code to use certain registers in the register file. This compression is
performed by the code generation tools.
•Instruction Set Enhancement - As noted above, there are new instructions such as 32-bit
multiplications, complex multiplications, packing, sorting, bit manipulation, and 32-bit Galois field
multiplication.
•Exceptions Handling - Intended to aid the programmer in isolating bugs. The C674x CPU is able to
detect and respond to exceptions, both from internally detected sources (such as illegal op-codes) and
from system events (such as a watchdog time expiration).
•Privilege - Defines user and supervisor modes of operation, allowing the operating system to give a
basic level of protection to sensitive resources. Local memory is divided into multiple pages, each with
read, write, and execute permissions.
A. On .M unit, dst2 is 32 MSB.
B. On .M unit, dst1 is 32 LSB.
C. On C64x CPU .M unit, src2 is 32 bits; on C64x+ CPU .M unit, src2 is 64 bits.
D. On .L and .S units, odd dst connects to odd register files and even dst connects to even register files.
The DSP memory map is shown in Section 3.6.
By default the DSP also has access to most on and off chip memory areas, with the exception of the ARM
RAM, ROM, and AINTC interrupt controller.
Additionally, the DSP megamodule includes the capability to limit access to its internal memories through
its SDMA port; without needing an external MPU unit.
3.5.2.1ARM Internal Memories
The DSP does not have access to the ARM internal memory.
3.5.2.2External Memories
The DSP has access to the following External memories:
•Asynchronous EMIF / SDRAM / NAND / NOR Flash (EMIFA)
•SDRAM (DDR2)
3.5.2.3DSP Internal Memories
The DSP has access to the following DSP memories:
•L2 RAM
•L1P RAM
•L1D RAM
SPRS586B–JUNE 2009–REVISED AUGUST 2010
3.5.2.4C674x CPU
The C674x core uses a two-level cache-based architecture. The Level 1 Program cache (L1P) is 32 KB
direct mapped cache and the Level 1 Data cache (L1D) is 32 KB 2-way set associated cache. The Level 2
memory/cache (L2) consists of a 256 KB memory space that is shared between program and data space.
L2 memory can be configured as mapped memory, cache, or a combination of both.
Table 3-2 shows a memory map of the C674x CPU cache registers for the device.
Extensive use of pin multiplexing is used to accommodate the largest number of peripheral functions in
the smallest possible package. Pin multiplexing is controlled using a combination of hardware
configuration at device reset and software programmable register settings.
3.7.1Pin Map (Bottom View)
The following graphics show the bottom view of the ZCE and ZWT packages pin assignments in four
quadrants (A, B, C, and D). The pin assignments for both packages are identical.