T exas Instruments (TI) reserves the right to make changes to its products or to discontinue any
semiconductor product or service without notice, and advises its customers to obtain the latest
version of relevant information to verify , before placing orders, that the information being relied
on is current.
TI warrants performance of its semiconductor products and related software to the specifications
applicable at the time of sale in accordance with TI’s standard warranty . T esting and other quality
control techniques are utilized to the extent TI deems necessary to support this warranty.
Specific testing of all parameters of each device is not necessarily performed, except those
mandated by government requirements.
Certain applications using semiconductor products may involve potential risks of death,
personal injury , or severe property or environmental damage (“Critical Applications”).
TI SEMICONDUCTOR PRODUCTS ARE NOT DESIGNED, INTENDED, AUTHORIZED, OR
WARRANTED TO BE SUITABLE FOR USE IN LIFE-SUPPORT APPLICATIONS, DEVICES
OR SYSTEMS OR OTHER CRITICAL APPLICATIONS.
Inclusion of TI products in such applications is understood to be fully at the risk of the customer.
Use of TI products in such applications requires the written approval of an appropriate TI officer .
Questions concerning potential risk applications should be directed to TI through a local SC
sales office.
In order to minimize risks associated with the customer’s applications, adequate design and
operating safeguards should be provided by the customer to minimize inherent or procedural
hazards.
TI assumes no liability for applications assistance, customer product design, software
performance, or infringement of patents or services described herein. Nor does TI warrant or
represent that any license, either express or implied, is granted under any patent right, copyright,
mask work right, or other intellectual property right of TI covering or relating to any combination,
machine, or process in which such semiconductor products or services might be or are used.
Copyright 1999, Texas Instruments Incorporated
About This Manual
Preface
Read This First
This reference guide describes the on-chip peripherals of the TMS320C6000
digital signal processors (DSPs). Main topics are the program memory, the data
memory , the direct memory access (DMA) controller, the enhanced DMA controller (EDMA), the host-port interface (HPI), the exansion bus, the external memory
interface (EMIF), the boot configuration, the multichannel buffered serial ports
(McBSPs), the timers, the interrupt selector and external interrupts, and the power-down modes.
The TMS320C62x (’C62x) and the TMS320C67x (’C67x) generations of digital signal processors make up the TMS320C6000 platform of the TMS320
family of digital signal processors. The ’C62x devices are fixed-point DSPs,
and the ’C67x devices are floating-point DSPs. The TMS320C6000 (’C6000)
is the first DSP to use the VelociTI architecture, a high-performance,
advanced VLIW (very long instruction word) architecture. The VelocTI architechure makes the ’C6x an excellent choice for multichannel, multifunction, and
high data rate applications.
Notational Conventions
This document uses the following conventions:
- Program listings, program examples, names are shown in a special
-
. Here is a sample program listing:
font
LDW .D1*A0,A1
ADD .L1A1,A2,A3
NOP3
MPY .M1A1,A4,A5
Throughout this book MSB means
least significant bit
.
most significant bit,
Contents
and LSB means
iii
Notational Conventions / Related Documentation From Texas Instruments
Registers are described throughout this book in register diagrams. Each
diagram shows a rectangle divided into fields that represent the fields of the
register. Each field is labeled with its name inside, its beginning and ending bit
numbers above, and its properties below. A legend explains the notation used
for the properties. For example:
31252423222120181716
FIELDAFIELDBFIELDCR, +1RW, +0
RW, +0RC, +xR, +0R, +1HRW, +0
Note:R = Readable by the CPU, W = Writeable by the CPU, +x = V alue undefined after reset, +0 = Value is 0 after reset,
+1 = Value is 1 after reset, C = Clearable by the CPU, H = reads/writes performed by the host
Related Documentation From Texas Instruments
The following documents describe the TMS320C6x family and related support
tools. To obtain a copy of any of these TI documents, call the Texas Instruments Literature Response Center at (800) 477–8924. When ordering, please
identify the book by its title and literature number.
TMS320C6000 Technical Brief
introduction to the ’C6000 platform of digital signal processors, development tools, and third-party support.
TMS320C6000 CPU and Instruction Set Reference Guide
number SPRU189) describes the ’C6000 CPU architecture, instruction
set, pipeline, and interrupts for these digital signal processors.
TMS320C6000 Programmer’s Guide
describes ways to optimize C and assembly code for the TMS320C6000
DSPs and includes application program examples.
TMS320C6000 Assembly Language Tools User’s Guide
SPRU186) describes the assembly language tools (assembler, linker,
and other tools used to develop assembly language code), assembler
directives, macros, common object file format, and symbolic debugging
directives for the ’C6000 generation of devices.
TMS320C6000 Optimizing C Compiler User’s Guide
SPRU187) describes the ’C6000 C compiler and the assembly optimizer .
This C compiler accepts ANSI standard C source code and produces assembly language source code for the ’C6000 generation of devices. The
assembly optimizer helps you optimize your assembly code.
TMS320C6x C Source Debugger User’s Guide
SPRU188) tells you how to invoke the ’C6x simulator and emulator
versions of the C source debugger interface. This book discusses
various aspects of the debugger, including command entry, code
execution, data management, breakpoints, profiling, and analysis.
(literature number SPRU197) gives an
(literature
(literature number SPRU198)
(literature number
(literature number
(literature number
iv
Related Documents / Trademarks
TMS320C6201, TMS320C6201B Digital Signal Processors Data Sheet
(literature number SPRS051) describes the features of the
TMS320C6201 and TMS320C6201B fixed-point DSPs and provides
pinouts, electrical specifications, and timings for the devices.
Trademarks
TMS320C6202 Digital Signal Processor Data Sheet
(literature number
SPRS072) describes the features of the TMS320C6202 fixed-point DSP
and provides pinouts, electrical specifications, and timings for the device.
TMS320C6701 Digital Signal Processor Data Sheet
(literature number
SPRS067) describes the features of the TMS320C6701 floating-point
DSP and provides pinouts, electrical specifications, and timings for the
device.
TMS320C6211 Digital Signal Processor Data Sheet
(literature number
SPRS073) describes the features of the TMS320C621 1 fixed-point DSP
and provides pinouts, electrical specifications, and timings for the device.
TMS320C6711 Digital Signal Processor Data Sheet
(literature number
SPRS088) describes the features of the TMS320C671 1 fixed-point DSP
and provides pinouts, electrical specifications, and timings for the device.
320 Hotline On-line, VelociTI, and XDS510 are trademarks of Texas
Instruments Incorporated.
PC is a trademark of International Business Machines Corporation.
Solaris and SunOS are trademarks of Sun Microsystems, Inc.
SPI is a trademark of Motorola, Inc.
ST-BUS is a trademark of Mitel.
Windows and Windows NT are registered trademarks of Microsoft Corporation.
Read This First
v
If You Need Assistance
If You Need Assistance . . .
- World-Wide Web Sites
TI Onlinehttp://www.ti.com
Semiconductor Product Information Center (PIC)http://www.ti.com/sc/docs/pic/home.htm
DSP Solutionshttp://www.ti.com/dsps
320 Hotline On-linethttp://www.ti.com/sc/docs/dsps/support.htm
- North America, South America, Central America
Product Information Center (PIC)(972) 644-5580
TI Literature Response Center U.S.A.(800) 477-8924
Software Registration/Upgrades(214) 638-0333Fax: (214) 638-7742
U.S.A. Factory Repair/Hardware Upgrades(281) 274-2285
U.S. T echnical T raining Organization(972) 644-5580
DSP HotlineEmail: dsph@ti.com
DSP Internet BBS via anonymous ftp to ftp://ftp.ti.com/pub/tms320bbs
- Europe, Middle East, Africa
European Product Information Center (EPIC) Hotlines:
Literature Response Center+852 2 956 7288Fax: +852 2 956 2200
Hong Kong DSP Hotline+852 2 956 7268Fax: +852 2 956 1002
Korea DSP Hotline+82 2 551 2804Fax: +82 2 551 2828
Korea DSP Modem BBS+82 2 551 2914
Singapore DSP HotlineFax: +65 390 7179
Taiwan DSP Hotline+886 2 377 1450Fax: +886 2 377 2718
Taiwan DSP Modem BBS+886 2 376 2592
Taiwan DSP Internet BBS via anonymous ftp to ftp://dsp.ee.tit.edu.tw/pub/TI/
- Japan
Product Information Center+0120-81-0026 (in Japan)Fax: +0120-81-0036 (in Japan)
DSP Hotline+03-3769-8735 or (INTL) 813-3769-8735Fax: +03-3457-7071 or (INTL) 813-3457-7071
DSP BBS via Nifty-ServeType “Go TIASP”
- Documentation
When making suggestions or reporting errors in documentation, please include the following information that is on the title
page: the full title of the book, the publication date, and the literature number.
Describes the program and data memory system for the TMS320C6201/C6701. This includes
program memory organization,cache modes, DMA and peripheral bus operation.
Describes the JTAG emulator cable. Tells you how to construct a 14-pin connector on your
target system and how to connect the target sysem to the emulator.
8–18Expansion Bus Host Port Interface Control (XBHC) Register8
8–19Read Transfer Initiated by the TMS320C6202 and Throttled by
8–20Write Transfer Initiated by the TMS320C6202 and Throttled by
8–21External Device Requests the Bus From the TMS320C6202 Using XBOFF8Ć33. . . . . . . . . .
8–22The Expansion Bus Master Writes a Burst of Data to the TMS320C62028Ć37. . . . . . . . . . . .
8–23The Bus Master Reads a Burst of Data From the TMS320C62028Ć39. . . . . . . . . . . . . . . . . . .
8–24Timing Diagrams for Asynchronous Host Port Mode of the Expansion Bus8
8–25Timing Diagrams for Bus Arbitration–XHOLD/XHOLDA
8–26Timing Diagrams for Bus Arbitration XHOLD/XHOLDA
8–27XHOLD Timing When the External Host Starts a Transfer to DSP Instead of
8–28Expansion Bus Boot Configuration via Pull Up/Pull Down Resistors on XD[31:0]8
9–1External Memory Interface in the TMS320C6201/C6202/C6701BlockDiagram9Ć3. . . . . . . .
9–2External Memory Interface in the TMS320C6211/C6711BlockDiagram9Ć3. . . . . . . . . . . . . . .
2–5Register Contents After Little-Endian or Big-Endian Data Loads
2–6Register Contents After Little-Endian or Big-Endian Data Loads
2–7Memory Contents After Little-Endian or Big-Endian Data Stores
2–8Memory Contents After Little-Endian or Big-Endian Data Stores2Ć21. . . . . . . . . . . . . . . . . . . .
The TMS320C6000 (‘C6000) platform of devices consists of the first off-theshelf digital signal processors (DSPs) to use advanced very long instruction
word (VLIW) to achieve high performance through increased instruction-level
parallelism. The VelociTI advanced very long instruction word (VLIW) archi-
tecture uses multiple execution units operating in parallel to execute multiple
instructions during a single clock cycle. Parallelism is the key to extremely high
performance, taking these DSPs well beyond the performance capabilities of
traditional designs.
This chapter introduces the TMS320 family of DSPs and the ’C6000 platform
of this family, and it describes the features, memory, and peripherals of the
’C6000 devices.
The TMS320 family consists of fixed-point, floating-point, and multiprocessor
digital signal processors (DSPs). TMS320 DSPs are specifically designed for
real-time signal processing.
1.1.1History of TMS320 DSPs
In 1982, Texas Instruments introduced the TMS32010—the first fixed-point
DSP in the TMS320 family. Before the end of the year,
magazine awarded the TMS32010 the title “Product of the Year”. Today, the
TMS320 family consists of these generations: ’C1x, ’C2x, ’C27x, ’C5x, and
’C54x, ’C55x fixed-point DSPs; ’C3x and ’C4x floating-point DSPs; and ’C8x
multiprocessor DSPs. Now there is a new generation of DSPs, the
TMS320C6000 platform, with performance and features that are reflective of
Texas Instruments’ commitment to lead the world in DSP solutions.
1.1.2Typical Applications for the TMS320 Family
Table 1-1 lists some typical applications for the TMS320 family of DSPs. The
TMS320 DSPs offer adaptable approaches to traditional signal-processing
problems. They also support complex applications that often require multiple
operations to be performed simultaneously.
Electronic Products
1-2
TMS320 Family Overview
Table 1–1. Typical Applications for the TMS320 DSPs
AutomotiveConsumerControl
Adaptive ride control
Antiskid brakes
Cellular telephones
Digital radios
Engine control
Global positioning
Navigation
Vibration analysis
Voice commands
General PurposeGraphics/ImagingIndustrial
Adaptive filtering
Convolution
Correlation
Digital filtering
Fast Fourier transforms
Hilbert transforms
Waveform generation
Windowing
InstrumentationMedicalMilitary
Digital filtering
Function generation
Pattern matching
Phase-locked loops
Seismic processing
Spectrum analysis
Transient analysis
TelecommunicationsVoice/Speech
Digital radios/TVs
Educational toys
Music synthesizers
Pagers
Power tools
Radar detectors
Solid-state answering machines
Disk drive control
Engine control
Laser printer control
Motor control
Robotics control
Servo control
Numeric control
Power-line monitoring
Robotics
Security access
Image processing
Missile guidance
Navigation
Radar processing
Radio frequency modems
Secure communications
Sonar processing
1200- to 56 600-bps modems
Adaptive equalizers
ADPCM transcoders
Base stations
Cellular telephones
Channel multiplexing
Data encryption
Digital PBXs
Digital speech interpolation (DSI)
DTMF encoding/decoding
Echo cancellation
Faxing
Future terminals
Line repeaters
Personal communications
systems (PCS)
Personal digital assistants (PDA)
Speaker phones
Spread spectrum communications
Digital subscriber loop (xDSL)
Video conferencing
X.25 packet switching
With a performance of up to 2000 million instructions per second (MIPS) and
an efficient C compiler , the TMS320C6000 DSPs give system architects unlimited possibilities to differentiate their products from others. High performance,
ease of use, and affordable pricing make the TMS320C6000 platform the ideal
solution for multichannel, multifunction applications, such as:
- Pooled modems
- Wireless local loop base stations
- Remote access servers (RAS)
- Digital subscriber loop (DSL) systems
- Cable modems
- Multichannel telephony systems
The TMS320C6000 platform is also an ideal solution for exciting new applications, for example:
- Personalized home security with face and hand/fingerprint recognition
- Advanced cruise control with GPS navigation and accident avoidance
- Remote medical diagnostics
- Beam-forming base stations
- Virtual reality 3-D graphics
- Speech recognition
- Audio
- Radar
- Atmospheric modeling
- Finite element analysis
- Imaging (for example, fingerprint recognition, ultrasound, and MRI)
1-4
Features and Options of the TMS320C6000 Devices
1.3Features and Options of the TMS320C6000 Devices
The ’C6000 devices execute up to eight 32-bit instructions per cycle. The device’s core CPU consists of 32 general-purpose registers of 32-bit-word length
and eight functional units:
- Two multipliers
- Six arithmetic logic units ( ALUs)
The ’C6000 generation has a complete set of optimized development tools,
including an efficient C compiler , an assembly optimizer for simplified assembly-language programming and scheduling, and a Windows based debugger interface for visibility of source code execution characteristics.
Features of the ’C6000 devices include:
- Advanced VLIW CPU with eight functional units, including two multipliers
and six arithmetic units
J Executes up to eight instructions per cycle for up to ten times the per-
formance of other DSPs
J Allows designers to develop highly effective RISC-like code for rapid
development
- Instruction packing
J Gives code size equivalence for eight instructions executed serially or
in parallel
J Reduces code size, program fetches, and power consumption
- Conditional execution of all instructions
J Reduces costly branching
J Increases parallelism for higher sustained performance
- Efficient code execution on independent functional units
J Industry’s most efficient C compiler on DSP benchmark suite
J Industry’s first assembly optimizer for fast development and improved
parallelism
Introduction
1-5
Features and Options of the TMS320C6000 Devices
Features and Options of the TMS320C6000 Devices / Overview of TMS320C6000 Memory
- 8/16/32-bit data support, providing efficient memory support for a variety
of applications
- 40-bit arithmetic options, which add extra precision for vocoders and other
computationally intensive applications
- Saturation and normalization, which provide support for key arithmetic op-
erations
- Field manipulation and instruction extract, set, clear, and bit counting,
which support common operations found in control and data manipulation
applications.
- Hardware support for IEEE single-precision and double-precision instruc-
tions. (’C6701 only)
- Pin-compatible fixed-point and floating-point DSPs.
For more information on features and options of the TMS320C6000, see the
TMS320C6000 CPU and Instruction Set Reference Guide
1.4Overview of TMS320C6000 Memory
The internal memory configuration varies between the different ’C6000 devices. All devices include:
.
- Internal data/program memory
- Internal peripherals
- External memory accessed through the external memory interface (EMIF)
TMS320C6201/C6202/C6701: The ‘C6201, ‘C6202, and ‘C6701 each have
separate data and program memories. The internal program memory can be
mapped into the CPU address space or operated as a program cache. A
256-bit-wide path is provided from to the CPU to allow a continuous stream of
eight 32-bit instructions for maximum performance.
Data memory is accessed through the data memory controller, which controls
the following functions:
- The CPU and the direct memory access (DMA) controller accesses to the
internal data memory, and performs the necessary arbitration.
- The CPU data access to the EMIF
- The CPU access to on-chip peripherals
The internal data memory is divided into 16-bit-wide banks. The data memory
controller performs arbitration between the CPU and the DMA controller independently for each bank, allowing both sides of the CPU and the DMA to access different memory locations simultaneously without contention. The data
memory controller supports configurable endianness. The LENDIAN pin on
the device selects the endianness of the device.
1-6
Features and Options of the TMS320C6000 Devices
Overview of TMS320C6000 Memory
TMS320C6211/C6711: The ‘C6211/C6711 is a cache-based architecture,
with separate level-one program and data caches. These cache spaces are
not included in the memory map and are enabled at all times. The level-one
caches are only accessible by the CPU.
The level-one program cache (L1P) controller interfaces the CPU to the L1P.
A 256-bit wide path is provided from to the CPU to allow a continuous stream
of 8 32-bit instructions for maximum performance.
The level-one data cache (L1D) controller provides the interface between the
CPU and the L1D. The L1D is a dual-ported memory, which allows simultaneous access by both sides of the CPU.
On a miss to either L1D or L1P, the request is passed to the L2 controller. The
L2 controller facilitates:
- The CPU and the enhanced direct memory access (EDMA) controller ac-
cesses to the internal memory, and performs the necessary arbitration
- The CPU data access to the EMIF
- The CPU accesses to on-chip peripherals
- Sends request to EMIF for an L2 data miss
The internal SRAM of the ‘C621 1/C6711 is a unified program and data memory
space. The L2 memory space may be configured as all memory-mapped
SRAM, all cache, or a combination of the two.
Introduction
1-7
Overview of TMS320C6000 Peripherals
1.5Overview of TMS320C6000 Peripherals
Peripherals available on the TMS320C6000 devices are shown in Table 1-2.
Table 1–2. TMS320C6000 Peripherals
PeripheralC6201 C6202 C6211 C6701 C6711
Direct memory access (DMA)
The user-accessible peripherals are configured via a set of memory-mapped
control registers. The peripheral bus controller performs the arbitration for accesses of on-chip peripherals. The Boot Configuration logic is interfaced
through external signals only , and the Power-down logic is accessed directly
by the CPU.
Figure 1-1 shows the peripherals in the block diagram for the TMS320C6201,
‘C6202, and ‘C6701 devices. Figure 1-2 shows a block diagram for the
TMS320C6211 and ’C6711 devices.
1-8
Figure 1–1. TMS320C6201/C6202/C6701 Block Diagram
Program
memory/
External
memory
interface
(EMIF)
Timer 0
Timer 1
Multichannel
buffered
serial port 0
(McBSP 0)
Multichannel
buffered
serial port 1
(McBSP 1)
Host port/
expansion
bus
Program bus
Data bus
DMA buses
Direct memory access
controller (DMA)
cache
controller
Instruction fetch
Instruction dispatch
Instruction decodeIn-circuit emulation
Data path A
A register file
.L1.S1 .M1 .D1.D2 .M2 .S2.L2
12
Data memory
Power down
logic
PLL
configuration
Overview of TMS320C6000 Peripherals
Internal program memory
CPU
Expansion bus control
registers
Control registers
Data path B
B register file
Interrupt control
controller
Internal data
memory
Boot
Introduction
1-9
Overview of TMS320C6000 Peripherals
Figure 1–2. TMS320C6211/C6711 Block Diagram
External
memory
interface
(EMIF)
Multichannel
buffered
serial port 1
(McBSP 1)
Multichannel
buffered
serial port 0
(McBSP 0)
Host port
interface
(HPI)
Power down logic
L1P cache
direct mapped
4K bytes
CPU
Control
registers
In–circuit
emulation
Data path 2
B register file
L2S2M2D2
Interrupt control
Enhanced
DMA
controller
L2 memory
4 banks
64K bytes
Instruction fetch
Instruction dispatch
Instruction decode
Data path 1
A register file
L1 S1 M1 D1
Timer 0Timer 1
L1D cache
2–way set
associative
4K bytes
DMA Controller: The DMA controller transfers data between address ranges
in the memory map without intervention by the CPU. The DMA controller has
four programmable channels and a fifth auxiliary channel.
EDMA Controller: The EDMA controller performs the same functions as the
DMA controller. The EDMA has sixteen programmable channels, as well as
a RAM space to hold multiple configurations for future transfers.
1-10
HPI: The HPI is a parallel port through which a host processor can directly access the CPU’s memory space. The host device has ease of access because
it is the master of the interface. The host and the CPU can exchange information via internal or external memory . In addition, the host has direct access to
memory-mapped peripherals.
Expansion Bus: The expansion bus is a replacement for the HPI, as well as
an expansion of the EMIF. The expansion provides two distinct areas of
functionality , (host port and I/O port) which can co-exist in a system. The host
port of the expansion bus can operate in either asynchronous slave mode,
similar to the HPI, or in synchronous master/slave mode. This allows the
device to interface to a variety of host bus protocols. Synchronous FIFOs and
asynchronous peripheral I/O devices may interface to the expansion bus.
Overview of TMS320C6000 Peripherals
EMIF: The EMIF supports a glueless interface to several external devices, including:
- Synchronous burst SRAM (SBSRAM)
- Synchronous DRAM (SDRAM)
- Asynchronous devices, including SRAM, ROM, and FIFOs
- An external shared-memory device
Boot Configuration: The TMS320C62x and TMS320C67x provide a variety
of boot configurations that determine what actions the DSP performs after device reset to prepare for initialization. These include loading in code from an
external ROM space on the EMIF and loading code through the HPI/expansion bus from an external host.
McBSP: The multichannel buf fered serial port (McBSP) is based on the standard serial port interface found on the TMS320C2000 and ’C5000 platform devices. In addition, the port can buffer serial samples in memory automatically
with the aid of the DMA/EDMA controller. It also has multichannel capability
compatible with the T1, E1, SCSA, and MVIP networking standards. Like its
predecessors, it provides:
- Full-duplex communication
- Double-buffered data registers that allow a continuous data stream
- Independent framing and clocking for receive and transmit
- Direct interface to industry-standard codecs, analog interface chips
(AICs), and other serially connected A/D and D/A devices
In addition, the McBSP has the following capabilities:
- Multichannel transmission and reception of up to 128 channels
- A wider selection of data sizes including 8-, 12-, 16-, 20-, 24-, and 32-bits
- µ-law and A-law companding
- 8-bit data transfers with LSB or MSB first
- Programmable polarity for both frame synchronization and data clocks
- Highly programmable internal clock and frame generation
Introduction
1-1 1
Overview of TMS320C6000 Peripherals
Timer: The ’C6000 devices have two 32-bit general-purpose timers that are
used to:
- Time events
- Count events
- Generate pulses
- Interrupt the CPU
- Send synchronization events to the DMA/EDMA controller
Interrupt Selector: The ’C6000 peripheral set produces 14–16 interrupt
sources. The CPU has 12 interrupts available. The interrupt selector allows
you to choose which 12 interrupts your system needs. The interrupt selector
also allows you to change the polarity of external interrupt inputs.
Power-down: The power-down logic allows reduced clocking to reduce power consumption. Most of the operating power of CMOS logic dissipates during
circuit switching from one logic state to another. By preventing some or all of
the chip’s logic from switching, you can realize significant power savings without losing any data or operational context.
1-12
Chapter 2
TMS320C6201/C6701
Program and Data Memory
This chapter describes the program memory organization, the program
memory and cache modes, and access of program memory through the DMA
controller for the TMS320C6201/C6701.
The program memory controller, shown in Figure 2–1 performs the following
tasks:
- Performs CPU and DMA requests to internal program memory and the
necessary arbitration
- Performs CPU requests to external memory through the external memory
interface (EMIF)
- Manages the internal program memory when it is configured as cache.
Figure 2–1. TMS320C6201/C6701 Program Memory Controller in the Block Diagram
’C6201/C6701
Timers
Interrupt selector
McBSPs
HPI control
DMA control
EMIF control
Host port
PLL
Power
down
Boot
configuration
EMIF
Peripheral
bus
controller
DMA
controller
Data memory
Data memory
controller
CPU core
Program fetch
Instruction dispatch
Instruction decode
Data path
1
Program memory controller
Program memory/cache
Data path
2
2-2
2.2Internal Program Memory
The internal program memory contains 64K bytes of RAM or, equivalently, 2K
256-bit fetch packets or 16K 32-bit instructions. The CPU, through the program memory controller, has a single-cycle throughput, 256-bit-wide connection to internal program memory.
2.2.1Internal Program Memory Modes
The internal program memory can be used in any of four modes which are selected by the program cache control (PCC) field (bits 7–5) in the CPU control
and status register (CSR) as shown in Table 2–1. The modes are:
- Mapped: Depending on the memory map selected, the program memory
is located at one of these addresses:
J 0000 0000h–0000 FFFFh for map 1
J 0140 0000h–0140 FFFFh for map 0
In mapped mode, program fetches from the internal program memory address return the fetch packet at that address. In the other modes, CPU
accesses to this address range return undefined data. Mapped mode is
the default state of the internal program memory at reset. The CPU cannot
access internal program memory through the data memory controller.
(See Chapter 7,
tion about how to select the memory map.)
Boot Configuration, Reset, and Memory Map
Internal Program Memory
, for informa-
- Cache enabled: In cache enabled mode, any initial program fetch at an ad-
dress causes a cache miss. In a cache miss, the fetch packet is loaded from
the external memory interface (EMIF) and stored in the internal cache
memory, one 32-bit instruction at a time. While the fetch packet is being
loaded, the CPU is halted. The number of wait states incurred depends on
the type of external memory used, the state of that memory, and any contention for the EMIF with other requests, such as the DMA controller or a CPU
data access. Any subsequent read from a cached address causes a cache
hit, and that fetch packet is sent to the CPU from the internal program
memory without any wait states. Changing from program memory mode to
cache enabled mode flushes the program cache. This mode transition is the
only means to flush the cache.
- Cache freeze: During a cache freeze, the cache retains its current state.
A program read of a frozen cache is identical to a read of an enabled cache
except that on a cache miss the data read from the external memory interface is not stored in the cache. A subsequent read of the same address
causes a cache miss, and the data is again fetched from external memory .
TMS320C6201/C6701 Program and Data Memory
2-3
Internal Program Memory
Cache freeze ensures that critical program data is not overwritten in the
cache.
- Cache bypass: When the cache is bypassed, any program read fetches
data from external memory . The data is not stored in the cache memory.
As in cache freeze, the cache retains its state in cache bypass. This mode
ensures that external program data is being fetched.
Table 2–1. Internal Program Memory Mode Summary
Internal Program
Memory Mode
Mapped000Cache disabled (default state at reset)
Cache enabled010Cache accessed and updated on reads
Cache freeze011Cache accessed but not updated on reads
Cache bypass100Cache not accessed or updated on reads
PCC
Value
OtherReserved
Description
Note:
If you change the operation mode of the PMEMC, you should use the following assembly routine to ensure correct operation of the PMEMC. This routine
enables the cache. To change the PMEMC operation mode to a state other
than cache enable, you should modify line four of the routine to correspond
the the value of PCC that you want moved into B5. For example, to put the
cache into mapped mode 0000h should be moved into B5. The CPU registers used in this example have no significance. Any of the registers A0–A15
or B0–B15 can be used in the program.
.align32
MVC .S2CSR,B5;copy control status register
|| MVK .S10xff1f,A5
AND .L1xA5,B5,A5;clear PCC field of CSR value
|| MVK S20x0040,B5 ;set cache enable mask
OR .L2xA5,B5,B5;set cache enable bit
MVC .S2B5,CSR;update CSR to enable cache
NOP 4
NOP
2.2.2Cache Architecture
The architecture of the cache is directly mapped. The 64K byte cache contains
2K fetch packets, thus, 2K frames. The width of the cache (the frame size) is
256 bits. Each frame in the cache is one fetch packet.
2-4
2.2.2.1Cache Usage of CPU Address
Figure 2–2 shows how the cache uses the fetch packet address from the CPU:
- 5-bit fetch packet alignment: The five LSBs of the address are assumed to
be 0 because all program fetch requests are aligned on fetch packet
boundaries (eight words or 32 bytes).
- 1 1-bit tag block offset: Because the cache is directly mapped, any external
address maps to only one of the 2K frames. Any two fetch packets that are
separated by an integer multiple of 64K bytes map to the same frame.
Thus, bits 15–5 of the CPU address create the 1 1-bit block offset that determines which of the 2K frames any particular fetch packet maps to.
- 10-bit tag: The cache assumes a maximum external address space of
64M bytes (from 0000 0000h–03FFFFFFh). Thus, bits 25–16 of the address correspond to the tag that determines the original location of the
fetch packet in external memory space. The cache also has a separate
2K × 1 1 tag RAM that holds all the tags. Each address location in this RAM
contains a 10-bit tag plus a valid bit that is used to record frame validity
information.
Internal Program Memory
Figure 2–2. Logical Mapping of Cache Address
31 26 25 16 15 54 0
Outside external range.
assumed to be 0
2.2.2.2Cache Flush
A dedicated valid bit in each address location of the tag RAM indicates whether
the contents of the corresponding cache frame is valid data. During a cache
flush, all of the valid bits are cleared to indicate that no cache frames have valid
data. Cache flushes occur only at the transition of the internal program
memory from mapped mode to cache enabled mode. Y ou initiate this transition
by setting the cache enable pattern in the PCC field of the CPU control and
status register.
2.2.2.3Frame Replacement
A cache miss is detected when the tag corresponding to the block offset of
the fetch packet address requested by the CPU does not correspond to bits
25–16 of the fetch packet address or if the valid bit at the block offset location
is clear . If enabled, the cache loads the fetch packet into the corresponding
frame, sets the valid bit, sets the tag to bits 25–16 of the requested address,
and delivers this fetch packet to the CPU after all eight instructions are
available.
TagBlock offset
Fetch packet alignment.
assumed 0
TMS320C6201/C6701 Program and Data Memory
2-5
DMA Controller Access to Program Memory
2.3DMA Controller Access to Program Memory
The DMA controller can read and write to internal program memory when the
memory is configured in mapped mode. The CPU always has priority over the
DMA controller for access to internal program memory regardless of the value
of the PRI bit for that DMA channel. DMA controller accesses are postponed
until the CPU stops making requests. To avoid losing future requests that occur
after arbitration and while a DMA controller access is in progress, the CPU incurs one wait state per DMA controller access. The maximum throughput to the
DMA is one access every other cycle. In a cache mode, a DMA controller write
is ignored by the program memory controller , and a read returns an undefined
value. For both DMA reads and writes in cache modes, the DMA controller is
signaled that its request has finished. At reset, the program memory system is
in mapped mode, allowing the DMA controller to boot load code into the internal
program memory.
See Chapter 7,
ing code.
TMS320C6000 Boot Modes,
for more information on bootload-
2-6
2.4Data Memory Controller
As shown inFigure 2–3, the data memory controller connects:
- The CPU and direct memory access (DMA) controller to internal data
memory and performs the necessary arbitration.
- CPU to the external memory interface (EMIF).
- The CPU to the on chip peripherals through the peripheral bus controller.
The peripheral bus controller performs arbitration between the CPU and DMA
for the on-chip peripherals.
Figure 2–3. TMS320C6x Block Diagram
Data Memory Controller
Timers
Interrupt selector
MCSPs
HPI control
DMA control
EMIF control
Host port
PLL
Power
down
Boot
Configuration
EMIF
Peripheral
bus
controller
DMA
controller
Data memory
Data memory
controller
CPU core
Program fetch
Instruction dispatch
Instruction decode
Data path
1
Program memory controller
Program memory/cache
Data path
2
TMS320C6201/C6701 Program and Data Memory
2-7
Data Memory Access
2.5Data Memory Access
The data memory controller services all CPU and DMA controller data requests to internal data memory . Figure 2–4, Figure 2–5, and Figure 2–6 show
the directions of data flow and the master (requester) and slave (resource)
relationships between the modules:
- The CPU requests data reads and writes to:
J Internal data memory
J On-chip peripherals through the peripheral bus controller
J EMIF
- The DMA controller requests reads and writes to internal data memory.
- The CPU cannot access internal program memory through the data
memory controller.
The CPU sends requests to the data memory controller through the two address
buses (DA1 and DA2). Store data is transmitted through the CPU data store
buses (ST1 and ST2). Load data is received through the CPU data load buses
(LD1 and LD2). The CPU data requests are mapped, based on address, to
either the internal data memory, internal peripheral space (through the peripheral bus controller), or the external memory interface. The data memory controller
also connects the DMA controller to the internal data memory and performs arbitration between the CPU and DMA controller.
2-8
Internal Data Memory Organization
2.6Internal Data Memory Organization
The following sections describe the memory organization of each device in the
’C6x generation of DSPs ’C6201 and ’C6701 devices.
2.6.1TMS320C6201 Revision 2
The 64K bytes of internal data RAM are organized as one block of 64K bytes located from address 8000 0000h to 8000 FFFFh. This block is organized as four
8K banks of 16-bit halfwords. Both the CPU and the DMA controller can simultaneously access data that resides in different banks. This organization allows the
two CPU data ports, A and B, to simultaneously access neighboring 16-bit data
elements inside the block without a resource conflict.
Table 2–2. Data Memory Organization (TMS320C6201 Revision 2)
Bank 0Bank 1Bank 2Bank 3
First address80000000
80000008
8000FFF0
80000001
80000009
S
S
S
S
S
S
8000FFF1
80000002
8000000A
S
S
S
8000FFF2
80000003
8000000B
S
S
S
8000FFF3
80000004
8000000C
S
S
S
8000FFF4
80000005
8000000D
S
S
S
8000FFF5
80000006
8000000E
S
S
S
8000FFF6
80000007
8000000F
S
S
S
8000FFF7
Last address8000FFF8 8000FFF98000FFFA 8000FFFB8000FFFC 8000FFFD8000FFFE 8000FFFF
TMS320C6201/C6701 Program and Data Memory
2-9
Internal Data Memory Organization
Figure 2–4. Data Memory Controller Interconnect to Other Banks
(TMS320C6201 Revision 2)
’C6201 CPU
Side ASide B
32323232
Control
DA2 address
Peripheral
bus
controller
ST2 store data
LD2 load data
Data memory controller
(DMEMC)
Control
External
memory
interface
DA1 address
ST1 store data
LD1 load data
16
16
16
16
323232
DMA
controller
64 K bytes
0 1 2 3 4 5 6 7
8 9 A B C D E F
8000 0000
8000 FFFF
Bank 3
Bank 2
Bank 1
Bank 0
2-10
Internal Data Memory Organization
2.6.2TMS320C6201 Revision 3
The 64K bytes of internal data RAM are organized as two blocks of 32K bytes
located from address 8000 0000h to 8000 7FFFh and 8000 8000h to
8000 FFFFh. The DMA controller or side A and side B of the CPU can simultaneously access any portion of the internal memory without conflict, when using
different blocks. Both blocks are organized as four 4K banks of 16-bit halfwords.
Therefore you do not have to consider the address within a block if simultaneous
accesses occur to different blocks. Accesses to different blocks never cause performance penalties. Both CPU and DMA can still simultaneously access data that
resides in different banks within the same block without a performance penalty.
To avoid performance penalties, you have to pay attention to address LSBs
when the two accesses involve data in the same block. Thi s or ganization also
allows the two CPU data ports, A and B, to simultaneously access neighboring
16-bit data elements inside the block without a resource conflict.
Table 2–3. Data Memory Organization (TMS320C6201 Revision 3)
Figure 2–5. Data Memory Controller Interconnect to Other Banks
(TMS320C6201 Revision 3)
’C6201 CPU
Side ASide B
32323232
Bank 3
Bank 2
Bank 1
Bank 0
Block 1
8000 8000
E
DF
2134657
0
8A9BC
8000 FFFF
16
16
16
16
Peripheral
controller
Control
DA2 address
bus
ST2 store data
LD2 load data
Data memory controller
Control
(DMEMC)
External
memory
interface
DA1 address
ST1 store data
LD1 load data
16
16
16
16
323232
DMA
controller
Block 0
(32K bytes)(32K bytes)
2134657
0
8000 0000
A9BCEDF
8
8000 7FFF
Bank 3
Bank 2
Bank 1
Bank 0
2-12
2.6.3TMS320C6701
The 64K bytes of internal data RAM are organized as two blocks of 32K bytes
located from address 8000 0000h to 8000 7FFFh and 8000 8000h to
8000 FFFFh. Side A and side B of the CPU or the DMA Controller can simultaneously access any portion of the internal data memory without conflict, when
using different blocks. Therefore, you do not have to consider the address
within a block if simultaneous accesses occur to different blocks. Accesses to
different blocks never cause performance penalties. You only have to pay
attention to the address when the two accesses occur in different blocks. Both
blocks are organized as eight 2K banks of 16-bit halfwords. Both the CPU and
DMA controller can still simultaneously access data that resides in different
banks within the same block without performance penalty. To avoid performance penalties, you have to pay attention to address LSBs when two accesses involve data in the same block. This organization also allows the two
CPU data ports, A and B, to simultaneously access neighboring 16-bit data
elements inside the same block without a resource conflict.
Figure 2–6. Data Memory Controller Interconnect to Other Blocks (TMS320C6701)
’C6701CPU
Side ASide B
Bank 7
Bank 6
Bank 5
Bank 4
Bank 3
Bank 2
Bank 1
Bank 0
Block 1
326464
Control
DA2 address
ST2 store data
8000 FFFF
16
DCBA0123456798FE
16
16
16
16
16
16
16
LD2 load data
Data memory controller
(DMEMC)
Control
DA1 address
32
ST1 store data
LD1 load data
323232
16
16
16
16
16
16
16
16
Block 0
(32K bytes)(32K bytes)
8000 7FFF
Bank 7
DCBA0123456798FE
Bank 6
Bank 5
Bank 4
Bank 3
Bank 2
Bank 1
Bank 0
2-14
8000 8000
Peripheral
bus
controller
External
memory
interface
DMA
controller
8000 0000
2.6.4Data Alignment
The following data alignment restrictions apply:
Doublewords: (’C6701 only) Doublewords are aligned on even 8-byte (dou-
bleword) boundaries, and always start at a byte address where the three LSBs
are 0. Doublewords are used only on loads triggered by the LDDW instruction.
Store operations do not use doublewords.
Words: Words are aligned on even 4-byte (word) boundaries, and always start
at a byte address where the two LSBs are 0. A word access requires two adjacent 16-bit-wide banks.
Halfwords: Halfwords are aligned on even 2-byte (halfword) boundaries, and
always start at byte addresses where the LSB is 0. Halfword accesses require
the entire 16-bit-wide bank.
Bytes: There are no alignment restrictions on byte accesses.
2.6.5Dual CPU Accesses to Internal Memory
Internal Data Memory Organization
Both the CPU and DMA can read and write 8-bit bytes, 16-bit halfwords, and
32-bit words. The data memory controller performs arbitration individually for
each 16-bit bank. Although arbitration is performed on 16-bit-wide banks, the
banks have byte enables to support byte-wide accesses. However, a byte access prevents the entire 16 bits containing the byte from simultaneously being
used by another access.
As long as multiple requesters access data in separate banks, all accesses are
performed simultaneously with no penalty . Also, when two memory accesses
involve separate 32K byte memory blocks, there are no memory conflicts, regardless of the address. For multiple data accesses within the same block, the
memory organization also allows simultaneous multiple memory accesses as
long as they involve different banks. In one CPU cycle, two simultaneous accesses to two different internal memory banks occur without wait states. T wo
simultaneous accesses to the same internal memory bank stall the entire CPU
pipeline for one CPU clock, providing two accesses in two CPU clocks. These
rules apply regardless of whether the accesses are loads or stores.
TMS320C6201/C6701 Program and Data Memory
2-15
Internal Data Memory Organization
Loads and stores from the same execute packet are seen by the data memory
controller during the same CPU cycle. Loads and stores from future or previous CPU cycles do not cause wait states for the internal data memory accesses in the current cycle. Thus, internal data memory access causes a wait
state only when a conflict occurs between instructions in the same fetch packet
accessing the same 16-bit wide bank. This conflict is an internal memory conflict. The data memory controller stalls the CPU for one CPU clock, serializes
the accesses, and performs each access separately . In prioritizing the two accesses, any load occurs before any store access. A load in parallel with a store
always has priority over the store. If both the load and the store access the
same resource (for example, the EMIF, or peripheral bus, internal memory
block), the load always occurs before the store. If both accesses are stores,
the access from DA1 takes precedence over the access from DA2. If both accesses are loads, the access from DA2 takes precedence over the access
from DA1. Figure 3–3 shows what access conditions cause internal memory
conflicts when the CPU makes two data accesses (on DA1 and DA2).
Figure 2–7. Conflicting Internal Memory Accesses to the Same Block
Figure 2–8. Conflicting Internal Memory Accesses to the Same Block (TMS320C6701)
Double
DA1ByteHalfwordWord
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
0
0
0
0
1
1
1
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
0
0
1
1
D
A
2
B
y
t
e
H
a
l
f
w
o
r
d
W
o
r
d
D
W
3–0
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
0000
0010
0100
0110
1000
1010
1100
1110
0000
0100
1000
1100
0000
1000
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
1
0
0
0
0
1
0
0
0
0
1
0
1
1
1
0
1
0
0
0
0
0
0
1
0
0
0
0
-word
1
1
0
0
1
0
0
0
0
0
0
0
1
0
0
0
Note:Conflicts shown in shaded areas.
TMS320C6201/C6701 Program and Data Memory
2-17
Internal Data Memory Organization
2.6.6DMA Accesses to Internal Memory
The DMA controller can accesss any portion of one block of internal data
memory while the CPU is simultaneously accessing any portion of another
block. If both the CPU and the DMA controller are accessing the same block,
and portions of both accesses are to the same 16-bit bank, the DMA operation
can take place first or last, depending on the CPU/DMA priority settings. You
can use Figure 3–3 to determine DMA versus CPU conflicts. Assume that one
axis represents the DMA access and the other represents the CPU access
from one CPU data port. Then, perform this analysis again for the other data
port. If both comparisons yield no conflict, then there is no CPU/DMA internal
memory conflict. If either comparison yields a conflict, then there is a CPU/
DMA internal memory conflict. In this case, the priority is resolved by the PRI
bit of the DMA channel as described in Chapter 4,
Internal Memory
CPU (PRI = 1), any CPU accesses are postponed until the DMA accesses finish and the CPU incurs a 1-CPU-clock wait state. If both CPU ports and the
DMA access the same memory block, the number of wait states increases to
two. If the DMA has multiple consecutive requests to the block required by the
CPU, the CPU is held off until all DMA accesses to the necessary blocks finish.
In contrast, if the CPU has higher priority (PRI = 0), then the DMA access is
postponed until the both CPU data ports stop accessing that bank. In this configuration, a DMA access request never causes a wait state.
. If the DMA channel is configured as higher priority than the
TMS320C621 1 Two-Level
2.6.7Data Endianness
Two standards for data ordering in byte-addressable microprocessors exist:
- Little-endian ordering, in which bytes are ordered from right to left, the
- Big-endian ordering, in which bytes are ordered from left to right, the most
Both the CPU and the DMA controller support a programmable endianness.
This endianness is selected by the LENDIAN pin on the device.
LENDIAN = 1 selects little endian, and LENDIAN big. Byte ordering within word
and half word data resident in memory is identical for little-endian and big-endian data. Table 2–5 shows which bits of a data word in memory are loaded
into which bits of a destination register for all possible CPU data loads from bigor little-endian data. The data in memory is assumed to be the same data that
is in the register results from the LDW instruction in the first row. Table 2–7 and
T able 2–8 show which bits of a register are stored in which bits of a destination
memory word for all possible CPU data stores from big- and little-endian data.
The data in the source register is assumed to be the same data that is in the
memory results from the STW instruction in the first row.
2-18
most significant byte having the highest address
significant byte having the lowest address
Internal Data Memory Organization
Table 2–5.Register Contents After Little-Endian or Big-Endian Data Loads
Note:The contents of the word in data memory at location xxxx xx00 before the ST instruction
executes is 01121970h. The contents of the source register is BA987654h.
110112 1954h5412 1970h
2.7Peripheral Bus
The peripherals are controlled by the CPU and the DMA controller through accesses of control registers. The CPU and the DMA controller access these registers through the peripheral data bus. The DMA controller directly accesses
the peripheral bus controller, whereas the CPU accesses it through the data
memory controller.
2.7.1Byte and Halfword Access
The peripheral bus controller converts all peripheral bus accesses to word
accesses. However, on read accesses both the CPU and the DMA controller
can extract the correct portions of the word to perform byte and halfword accesses properly . Any side-effects caused by a peripheral control register read
occur regardless of which bytes are read. In contrast, for byte or halfword
writes, the values the CPU and the DMA controller only provide correct values
in the enabled bytes. The values that are always correct are shown in
Table 2–8. Undefined results are written to the nonenabled bytes. If you are
not concerned about the values in the disabled bytes, this is acceptable. Otherwise, access the peripheral registers only via word accesses.
Peripheral Bus
Table 2–8. Memory Contents After Little-Endian or Big-Endian Data Stores
Note:X indicates nybbles correctly written, ? indicates nybbles with undefined value after
write
Address Bits
(1:0)
11??????XXXX??????
Big-Endian
Register
Little-Endian
Memory Result
TMS320C6201/C6701 Program and Data Memory
2-21
Peripheral Bus
2.7.2CPU Wait States
Isolated peripheral bus controller accesses from the CPU causes six CPU wait
states. These wait states are inserted to allow pipeline registers to break up
the paths between traversing the on-chip distances between the CPU and
peripherals as well as for arbitration time.
2.7.3Arbitration Between the CPU and the DMA Controller
As shown in Figure 2–5 and Figure 2–6, the peripheral bus controller performs
arbitration between the CPU and the DMA controller for the peripheral bus.
Like internal data access, the PRI bits in the DMA controller determine the
priority between the CPU and the DMA controller. If a conflict occurs between
the CPU (via the data memory controller) the lower priority requester is held
off until the higher priority requester completes all accesses to the peripheral
bus controller. The peripheral bus is arbitrated as a single resource, so the lower priority resource is blocked from accessing all peripherals, not just the one
accessed by the higher priority requester.
2-22
Chapter 3
TMS320C6202 Program and Data Memory
This chapter describes the TMS320C6202 program memory and data
memory controller. Program memory modes including cache operation and
bootload operation are discussed.
The TMS320C6202 program memory controller (PMEMC) provides all of the
functionality available in the TMS320C6201 revision 3. The PMEMC operates
as either a 128K byte memory or direct-mapped cache. In addition to the
memory/cache, the C6202 provides 128K bytes of memory that operates as
a memory-mapped block. To achieve this functionality, the block of program
memory has been expanded to 128K bytes. A second 128K byte block of program memory has been added. These two blocks can be accessed independently , allowing for program fetch from one block by the CPU to occur in parallel and without interfering with a DMA transfer with the other block of program
memory. Table 3–1 and Table 3–2 compare the internal memory and cache
configurations available on the current TMS320C6000 devices. Figure 3–1
shows a block diagram of the connections between the C6202 CPU, PMEMC,
and memory blocks. The addresses shown in Figure 3–1 are for operation in
memory map mode 1.
Figure 3–1. TMS320C6202 Program Memory Controller Block Diagram
C62x CPU
Program fetch
Program Address
256
Control
Program Data
Block 0
(128K bytes)
0000 0000h
mapped
0001 FFFFh
Program memory
controller
(PMEMC)
External
memory
interface
DMA
bus
controller
Block 1
(128K bytes)
0002 0000h
256256
cached or
mapped
0003 FFFFh
TMS320C6202 Program and Data Memory
3-3
Memory Mapped Operation
3.2Memory Mapped Operation
When the PCC field of the CPU control status register is programmed for
Mapped mode, both blocks of internal program RAM are mapped into internal
program space. Table 3–3 shows the address space for both blocks of RAM
for the map mode selected at device reset.
Table 3–3. Internal Program RAM Address Mapping in Memory Mapped Mode
In mapped mode, both the CPU and the DMA can access all locations in both
blocks of RAM. Any access outside of the address space that the internal RAM
is mapped to is forwarded to the EMIF. The DMA can only access one of the
two blocks of RAM at a time. The CPU and DMA can access the internal RAM
without interference as long as each accesses a different block. If the CPU and
DMA attempt to access the same block of RAM at the same time, then the DMA
is stalled until the CPU completes its accesses to that block. After the CPU access is complete, the DMA is allowed to access the RAM. The DMA cannot
cross between Block 0 and Block 1 in a single transfer. You must use separate
DMA transfers to cross block boundaries.
3-4
3.3Cache Operation
When the PCC field of the CPU Control Status Register is programmed for one
of the Cache modes, block 1 operates as a cache while block 0 remains
mapped into internal program space. Table 3–4 shows the addresses occupied by the RAM that is not used for cache, for each Map Mode.
Table 3–4. Internal Program RAM Address Mapping in Cache Mode
The cache on the C6202 operates identically to the C6201 cache. Any CPU
or DMA access to the memory range that was occupied by the cache RAM
returns undefined results. As in mapped mode, simultaneous accesses to
block 0 by the CPU and DMA stalls the DMA until the CPU has completed its
access. A DMA access to block 0 while the cache is flushed continues without
stalling. The CPU is halted during a cache flush. Y ou must ensure that all DMA
accesses to block 1 have completed before the cache is enabled.
Note:
Cache Operation
If you change the operation mode of the PMEMC, you should use the following assembly routine to ensure correct operation of the PMEMC. This routine
enables the cache. To change the PMEMC operation mode to a state other
than cache enable, you should modify line four of the routine to correspond
the the value of PCC that you want moved into B5. For example, to put the
cache into mapped mode 0000h should be moved into B5. The CPU registers used in this example have no significance. Any of the registers A0–A15
or B0–B15 can be used in the program.
.align32
MVC .S2CSR,B5;copy control status register
|| MVK .S10xff1f,A5
AND .L1xA5,B5,A5;clear PCC field of CSR value
|| MVK S20x0040,B5 ;set cache enable mask
OR .L2xA5,B5,B5;set cache enable bit
MVC .S2B5,CSR;update CSR to enable cache
NOP 4
NOP
TMS320C6202 Program and Data Memory
3-5
Bootload Operation
3.4Bootload Operation
The ’C6202 bootload operates identically to the C6201 revision 3. During ROM
bootload, a 64K byte block of data is transferred from the beginning of CE1 to
memory at address 0. During HPI bootload, the host can read or write any internal or external memory location, including the entire internal program
space.
3-6
TMS320C6202 Data Memory Controller
3.5TMS320C6202 Data Memory Controller
The TMS320C6202 data memory controller (DMEMC) provides all of the functionality available in the TMS320C6201 revision 3. The C6202 DMEMC contains 128K bytes of RAM organized in two blocks of four banks each. Each
bank is 16 bits wide. The DMEMC for the C6202 operates identically to the
C6201 DMEMC, the DMA controller or side A or side B of the CPU can simultaneously access two different banks without conflict. Figure 3–2 shows a block
diagram of the connections between the C6202 CPU, DMEMC, and memory
blocks. T able 3–5 shows the memory range occupied by each block of internal
data RAM.
Figure 3–2. TMS320C6202 Data Memory Controller Block Diagram
The TMS320C6211/C6711 provides a two level memory architecture for the
internal program and data busses. The first level memory for both the internal
program and data bus is a 4K byte cache, designated L1P for the program
cache and L1D for the data cache. The second level memory is a 64K byte
memory block that is shared by both the program and data memory buses,
designated L2.
Figure 4–1 illustrates how the L1P, L1D, and L2 are arranged in the
TMS320C6211/C6711. Figure 4–2 illustrates the bus connections between
the CPU, internal memories, and the enhanced DMA for the ’C6211, and.
The L1P, L1D, and L2 are controlled by a set of memory configuration registers. The CPU can read and write to the internal memory control registers. The
EDMA (and thus the HPI) can only read these registers. Table 4–3 lists these
control registers and their associated addresses. You should initialize a
memory attribute register by setting the appropriate MAR bits, and then read
that memory attribute register before continuing execution to insure proper
operation.
Table 4–3. Internal Memory Control Register Addresses
0184 4004hL2FWCL2 flush word count register
0184 4010hL2CBARL2 clean base address register
0184 4014hL2CWCL2 clean word count register
0184 4020hL1PFBARL1P flush base address register
0184 4024hL1PFWCL1P flush word count register
0184 4030hL1DFBARL1D flush base address register
0184 4034hL1DFWCL1D flush word count register
0184 5000hL2FLUSHL2 flush register
0184 5004hL2CLEANL2 clean register
0184 8200hMAR0Memory attribute register – Region 0
0184 8204hMAR1Memory attribute register – Region 1
0184 8208hMAR2Memory attribute register – Region 2
0184 820ChMAR3Memory attribute register – Region 3
0184 8240hMAR4Memory attribute register – Region 4
0184 8244hMAR5Memory attribute register – Region 5
0184 8248hMAR6Memory attribute register – Region 6
0184 824ChMAR7Memory attribute register – Region 7
0184 8280hMAR8Memory attribute register – Region 8
0184 8284hMAR9Memory attribute register – Region 9
0184 8288hMAR10Memory attribute register – Region 10
0184 828ChMAR11Memory attribute register – Region 11
0184 82C0hMAR12Memory attribute register – Region 12
0184 82C4hMAR13Memory attribute register – Region 13
0184 82C8hMAR14Memory attribute register – Region 14
0184 82CChMAR15Memory attribute register – Region 15
Register
Mnemonic
Register Name
L2FBARL2 flush base address register
TMS320C6211/C6711 Two-Level Internal Memory
4-5
L1P Description
4.3L1P Description
The L1P is organized as a 64 line direct mapped cache with a 64 byte (2 fetch
packet) line size. The L1P data request size is one line, thus the six least
significant bits of a requested address are ignored. The next six bits of the address are used to reference the set within the cache that the addressed data
maps to. The remaining bits of the address are used as a unique tag for the
requested data. Figure 4–4 illustrates how a 32 bit address is allocated to provide the set index and tag data for the L1P.
Figure 4–4. L1P Address Allocation
311211650
TagSetOffset
A cache hit returns data to the CPU in a single cycle. Unlike the
TMS320C6201, the L1P only operates as a cache and cannot be memory
mapped. The L1P does not support freeze or bypass modes. The only values
allowed for the program cache control (PCC) field in the CPU control and status register (CSR) are 000b and 010b. All other values for PCC are reserved,
as shown in Table 4–4.
Any initial program fetch of an address causes a cache miss to occur. The data
is requested from the L2 and stored in the internal cache memory . Any subsequent read from a cached address causes a cache hit and that data is loaded
from the L1P memory. Figure 4–5 illustrates the organization of a direct
mapped cache.
4-6
Figure 4–5. L1P Direct Mapped Cache Diagram
TagSetOffset
L1P Description
Address
Tag RAM
Data out
=
Address
Cache data
Data out
L2
data
1 0
Program
data
There are two methods for user-controlled invalidation of data in the L1P . Writing a 1 to the IP bit of the cache configuration register (CCFG) invalidates all
of the cache tags in the L1P tag RAM. This is a write-only bit, a read of this
bit will always return a 0. Any CPU access to the L1P while invalidation is being
processed stalls the CPU until the invalidation has completed and the CPU request has been fetched. Figure 4–12 shows the format for the CCFG register.
Table 4–6 describes the operation of this register.
TMS320C6211/C6711 Two-Level Internal Memory
4-7
L1P Description
The second method for invalidating the L1P requires the L1PFBAR and
L1PFWC registers. This is useful for invalidating a block of data in the L1P.
Y ou must first write a word–aligned address into the L1PFBAR. This value is
the starting address for the invalidation. The number of words to be invalidated will be equal to the value written into the L1PFWC register. The L1P
searches for and invalidates all lines whose external memory address falls
within the range from L1PFBAR to L1PFBAR+L1PFWC–4. If L1PFBAR or
L1PFWC are not aligned to the L1P line size (16 words), all lines which contain
any address in the specified range are invalidated. Using this block invalidation will not stall any pending CPU accesses. The block invalidation begins
when the L1PFWC is written, therefore you should take care to ensure that the
L1PFBAR register is set up correctly prior to writing the L1PFWC. Figure 4–6
and Figure 4–7 show the format for the L1PFBAR and L1PFWC.
Figure 4–6. L1P Flush Base Address Register Fields (L1PFBAR
310
L1P flush base address
RW,+x
Figure 4–7. L1P Flush Word Count Register Fields (L1PFWC
3116150
rsvd
R,+xRW,+x
L1P flush word count
)
)
4-8
4.4L1D Description
The L1D is organized as a 64 set 2–way set associative cache with a 32 byte
line size. The two least significant bits of a requested address are ignored by
the L1D since the smallest access size is for a word. The next bit of the address
is used to address the correct word. Bits four and three select one of the four
8 byte sublines in the addressed set. The next six bits select the set within the
cache that the addressed data maps to. The remaining bits of the address are
used as a unique tag for the requested data. Figure 4–8 illustrates how a 32
bit address is allocated to provide the word index, subline index, set index and
tag data for the L1D.
Figure 4–8. L1D Address Allocation
311110543210
TagSetSublineWordOffset
A cache hit returns data to the CPU in a single cycle. Operation on a cache
miss depends on the direction of the access. On a read miss, the L1D sends
a read request to the L2 to fetch the data. When the data is returned from the
L2, the L1D analyzes the set that the addressed data maps to in each way.
The L1D controller stores the new data into the set that was least recently used
(LRU). If the data in that set has been modified but the corresponding address
has not be updated (the cache line is dirty), that data is written out to the L2.
In this way, cached data that has been modified will not be discarded before
it is updated in its original address. If two read misses occur in the same cycle,
they are serialized by the L1D so that only one request is presented to the L2
at a time. On a write miss, the L1D sends the write request to the L2. The data
is not stored in the L1D. Write requests from the L1D to the L2 are buffered.
If a write request is still pending from the L1D when a read miss occurs, this
buffer is allowed to empty before the read request is sent to the L2.
L1D Description
The L1D only operates as a cache and cannot be memory mapped. The L1D
does not support freeze or bypass modes. The only values allowed for the data
cache control (DCC) field in the CPU control and status register (CSR) are
000b and 010b. All other values for DCC are reserved, as shown in T able 4–5.
Any initial load of an address causes a cache miss to occur. The data is loaded
and stored in the internal cache memory . Any subsequent read from a cached
address will cause a cache hit and that data will be loaded from the internal
cache memory . Figure 4–9 illustrates the organization for a 2-way set associative cache.
4-10
Figure 4–9. L1D 2–Way Set Associative Cache Diagram.
Way 1
L1D Description
Way 0
Address
Tag RAM
Data out
Address
Tag RAM
Address
Cache
data
Data out
=
Address
Cache
data
L2
Data
Data out
TagSetOffsetWordSubline
Data out
=
TMS320C6211/C6711 Two-Level Internal Memory
0 1
1 0
256
64
32
Data
4-11
L1D Description
There are two methods for user-controlled invalidation of data in the L1D. Writing a 1 to the ID bit of the cache configuration register (CCFG) invalidates all
the cache tags in the L1D tag RAM. This is a write-only bit, a read of this bit
returns a 0. Any CPU access to the L1D while invalidation is being processed
stalls until the invalidation has completed and the CPU request has been
fetched.
The second method for invalidating the L1D requires the L1DFBAR and
L1DFWC registers. This is useful for invalidating a block of data in the L1D.
You must first write a word-aligned address into the L1DFBAR. This value is
used as the starting address for the invalidation. The number of words invalidated equals the value written into the L1DFWC register. The L1D searches
for and invalidate all lines whose external memory address falls within the
range from L1DFBAR to L1DFBAR+L1DFWC–4. The data in these lines is
sent to the L2 to be stored in the original memory location. In this way , the L2
and external memory will remain coherent with the data that is invalidated.
If L1DFBAR or L1DFWC are not aligned to the L1D line size (8 words) all lines
which contain data in the address range specified are invalidated. However
only those words that are contained in the range from L1DFBAR to
L1DFBAR+L1DFWC–4 will be saved to the L2. This block invalidation will occur in the background and not stall any pending CPU accesses. The block invalidation begins when the L1DFWC is written, therefore you should take care
to ensure that the L1DFBAR register is set up correctly prior to writing the
L1DFWC. This is the preferred method for writing data that has been cached
in the L1D to the external memory space. Figure 4–10 and Figure 4–11 show
the format for the L1DFBAR and L1DFWC.
Figure 4–10. L1D Flush Base Address Register Fields (L1DFBAR)
310
L1D flush base address
RW,+x
Figure 4–11. L1D Flush Word Count Register Fields (L1DFWC)
3116150
L1D flush word count
4-12
rsvd
R,+xRW,+x
L2 Description
4.5L2 Description
The L2 is accessible from both the L1P and the L1D. On a cache miss from
the L1P or L1D, the request is first sent to the L2 to be serviced. How the L2
services the request depends on the selected operation mode of the L2.
T able 4–6 shows the supported operation modes for the L2. Figure 4–13 illustrates the division of the L2 memory space according to the L2 Mode.
Writing to the L2MODE field of the cache configuration register (CCFG) sets
the L2 mode. Figure 4–12 shows the format for the CCFG register . T able 4–6
describes the operation of this register.
ID = 0: Normal L1D operation
ID = 1: All L1D lines invalidated
IPInvalidate LIP
IP = 0: Normal L1P operation
IP = 1: All L1P lines invalidated
PL2 Requestor Priority
P = 0: CPU accesses prioritized over enhanced DMA accesses
P = 1: Enhanced DMA accesses prioritized over CPU accesses
TMS320C6211/C6711 Two-Level Internal Memory
4-13
L2 Description
The reset value of the L2MODE field is 000b, thus the L2 RAM is configured
as 64K bytes of mapped memory at reset to support bootloading. Any L2 RAM
that is configured as cache is no longer in the memory map. For example, in
L2 Mode 010b, the address space from 0000 8000h to 0000 FFFFh is no
longer mapped. The associativity of the L2 cache RAM is a function of the L2
Mode. Each 16K byte block of RAM included in the cache adds one way to
the associativity. The line size for the L2 cache is 128 bytes. Figure 4–13
shows the cache associativity for each L2 Mode.
Figure 4–13. L2 Memory Configuration
000011010001
All SRAM
3/4 SRAM
1/2 SRAM2-way cache
3-way cache
111
1/4 SRAM
4-way cache
L2 memory
16K bytes
16K bytes
16K bytes
16K bytes
Block base addressL2 mode
0000 0000h
0000 4000h
0000 8000h
0000 C000h
4-14
1-way cache
4.5.1L2 Interfaces
4.5.2L2 Operation
L2 Description
The L2 Controller services requests from three different requestors – the L1P,
the L1D, and the Enhanced DMA. Since the L1P only sends read requests,
a single 256 bit wide data bus transfers data from the L2 to the L1P. The L1D
to L2 interface consists of a 128 bit read bus from the L2 to the L1D and a 128
bit write bus from the L1D to the L2. The L2 transfers data to and from the
EDMA through a 64 bit read and a 64 bit write bus.
Each L1D access to the L2 memories takes two cycles. Since the line size of
L1D cache is twice the width of the bus between the cache and the L2, a miss
to the L2 requires two accesses. Therefore, a miss from L1D to the L2 takes
four cycles to complete if the data is available in the L2. A miss from the L1P
to the L2 completes in five cycles.
The L2 memories are organized as four 64 bit wide banks. Two accesses can
be serviced at the same time if the two accesses do not use the same bank.
Since the L1P data bus is 256 bits wide, any L1P request that occurs at the
same time as an L1D or EDMA request will cause a bank collision and therefore a stall. Concurrent accesses between the L1D and EDMA busses to different banks can be serviced without stalling.
The priority bit (P) in the Cache Configuration Register (CCFG) determines the
priority when a bank collision occurs between requestors. If the P bit is set to
0, CPU accesses (L1P and L1D) are given priority over an EDMA request.
Thus, any pending CPU request will complete before the EDMA request is serviced. If this bit is set to 1, EDMA requests are prioritized over CPU accesses.
When an L1P and L1D access collide, the L1P request is always given priority .
When an L2 location is operating as mapped RAM, an access to that location
operates like a standard RAM. A read request reads the value stored in that
location and a write request updates that location with the new data. When
an L2 location is enabled as a cache, the operation is similar to the L1D cache.
If a read request is made to the L2, the tag RAM for each of the cached blocks
is searched for that address. If a tag hit occurs, that data is sent to the requestor. If the data is not in the L2 the requestor is stalled and the data is requested
from the Enhanced DMA. T o fulfill an L1P request, the L2 controller must make
eight 64 bit requests to the EDMA. Similarly, four requests to the EDMA are
required to service an L1D request.
TMS320C6211/C6711 Two-Level Internal Memory
4-15
L2 Description
The L2 uses a least recently used (LRU) replacement strategy to replace old
cached data with new data. To determine which cache lines to replace, the
address for the new data is used to calculate the set which that address maps
to. Each external address maps to one set in each cache way . Then, that set
in each way is interrogated to determine which way contains the least recently
used (LRU) data. The new data is stored at that location. If the cache location
to be replaced contains valid data, the previous data is evicted. An eviction
occurs as follows. The L2 first polls the L1D to determine if the evicted address
is also cached in the L1D. This is referred to as snooping the L1D. If data is
returned from the L1D, it is written out to the EDMA. Then, both the L1D and
L2 lines are invalidated. If the L1D does not cache the evicted address, the
data in the L2 to written out to the EDMA. In this case, only the L2 line is
invalidated. Finally, the requested data is stored in the L2 and sent to the requestor. This mechanism ensures that the CPU does not access stale data
and that no data is lost. Figure 4–14 diagrams the decision process used by
the L2 controller to service a data request from the CPU when the L2 is operating as a cache. The L2 performs evictions for read and write requests.
4-16
Figure 4–14. L2 Cache Data Request Flow Chart
L2 Description
CPU requests
data
Write replaced
data from L1D
to EDMA
Invalidate
L1D line
Yes
Fetch data
from EDMA
Determine LRU
location
Is valid
data in LRU
location?
Yes
Is replaced
data in L1D?
No
Write replaced
data from L2
to EDMA
Is data in L2?
No
YesNo
Fetch data
from L2
Invalidate
L2 line
Store data to L2Send data to CPU
Done
TMS320C6211/C6711 Two-Level Internal Memory
4-17
L2 Description
The memory attribute registers (MARs) can be programmed to turn on caching
of each of the external chip enable (CE) spaces. In this way , you can perform
single word reads to external mapped devices. Without this feature any
external read would always read an entire L2 line of data. Each of the four CE
spaces is divided into four ranges, each of which maps the least significant bit
of an MAR register. If an MAR register is set, the corresponding address range
is cached by the L2. At reset, the MAR registers are set to 0. T o begin caching
data in the L2, you must initialize the appropriate MAR register to 1. The MAR
registers define cacheability for the EMIF only. Addresses accessed by the
EMIF which are not defined by the MAR registers are always cacheable.
Figure 4–15 shows the format for the MARs. Table 4–3 illustrates which address range each MAR bit enables for caching.
Figure 4–15. L2 CE Space Allocation Register Fields
MAR0
3110
rsvd
R,+xRW,+0
MAR1
3110
rsvd
R,+xRW,+0
MAR2
3110
rsvd
R,+xRW,+0
MAR3
3110
rsvd
R,+xRW,+0
CE 0.0
CE 0.1
CE 0.2
CE 0.3
MAR4
3110
rsvd
R,+xRW,+0
MAR5
3110
rsvd
R,+xRW,+0
4-18
CE 1.0
CE 1.1
L2 Description
Figure 4–15.L2 CE Space Allocation Register Fields (Continued)
MAR6
3110
rsvd
R,+xRW,+0
MAR7
3110
rsvd
R,+xRW,+0
MAR8
3110
rsvd
R,+xRW,+0
MAR9
3110
rsvd
R,+xRW,+0
CE 1.2
CE 1.3
CE 2.0
CE 2.1
MAR10
3110
rsvd
R,+xRW,+0
MAR1 1
3110
rsvd
R,+xRW,+0
MAR12
3110
rsvd
R,+xRW,+0
MAR12
3110
rsvd
R,+xRW,+0
CE 2.2
CE 2.3
CE 3.0
CE 3.1
TMS320C6211/C6711 Two-Level Internal Memory
4-19
L2 Description
Figure 4–15.L2 CE Space Allocation Register Fields (Continued)
EDMA accesses are only allowed to L2 space that is configured as mapped
RAM. When the EDMA makes a read request to the L2, the L2 snoops the data
from the L1D and stalls the EDMA until a response is returned. If data that
must be updated is returned, that data is placed in the L2 and the EDMA request proceeds. In this case, the L1D line is invalidated to maintain coherency .
If the L1D does not return data to the L2 then the data is read from the L2. The
L2 does not snoop the L1P for data when a EDMA read request is received
because the CPU cannot modify data in L1P so it’s data will not be incoherent.
When the EDMA makes a write request to the L2, both the L1P and the L1D
are snooped for the data. Both the L1P and the L1D must be notified of the
write because the L2 has no knowledge of the type of data being written by the
EDMA, whether program or data. If the L1P responds that it is caching the addressed data, then that line is invalidated and the data is written into L2. Similarly , if the L1D is caching that address, then that line in the L1D is invalidated
and the data is written to L2. By invalidating the lines in the L1P or the L1D,
the correct data will be fetched from the L2 on the next CPU request of that
data.
L2 Description
4.5.4L2 Invalidation
The method for user controlled invalidation of data in the L2 is similar to those
for the L1P and the L1D. For the L2, however, there are two types of invalidation. The first type of invalidation is an L2 flush. During a flush, the contents
of the L2 are copied out through the enhanced DMA. Like an EDMA read or
L2 data eviction, the L1D is snooped for any modified (dirty) data that is being
copied out by the flush. The second type of L2 invalidation is a clean. The
clean operation copies data from the L2 through the EDMA to the external
memory space and snoops data from the L1D. In addition, the clean operation
invalidates any line in the L1P , L1D, or L2 that caches data that is copied to the
external memory space.
To initiate an L2 flush of the entire L2 cache space, write a 1 to the F bit of the
L2FLUSH register. This bit remains set to 1 until the flush is complete at which
time the register is cleared to 0 by the L2 controller. Figure 4–16 shows the
fields of the L2FLUSH register. Table 4–8 describes the operation of the
L2FLUSH register. Similarly, to initiate an L2 clean of the entire L2 cache
space set the C bit of the L2CLEAN register to 1. This bit remains set to 1 until
the clean is complete at which time the register is cleared to 0. Figure 4–17
shows the fields of the L2CLEAN register. Table 4–9 describes the operation
of the L2CLEAN register.
TMS320C6211/C6711 Two-Level Internal Memory
4-21
L2 Description
Figure 4–16. L2 Flush Register Fields (L2FLUSH)
3110
rsvd
R,+xRW,+0
F
Table 4–8. L2 Flush Register Fields Description
FieldDescription
F
Flush L2
F = 0: Normal L2 operation
F = 1: All L2 lines flushed
Figure 4–17. L2 Clean Register Fields (L2CLEAN)
3110
rsvd
R,+xRW,+0
C
Table 4–9. L2 Clean Register Fields Description
FieldDescription
C
Clean L2
C = 0: Normal L2 operation
C = 1: All L2 lines cleaned
It is also possible to flush and clean a range of addresses from the L2. T o flush
a range of address from the L2, write the word–aligned address for the start
of the flush into the L2FBAR. The number of words to be flushed is equal to
the value written into the L2FWC register. The L2 controller then searches all
L2 cache blocks for all lines whose external memory address falls within the
range from L2FBAR to L2FBAR+L2FWC–4 and copies that data through the
EDMA to the external memory space. The L1D is snooped to ensure that the
correct data is stored in the original memory location. The L2 flush occurs in
the background and does not stall any pending CPU accesses. The flush begins when the L2FWC is written, therefore you should take care to ensure that
the L2FBAR register is set up correctly prior to writing the L2FWC. Figure 4–18
shows the fields in the L2FBAR register. Figure 4–19 shows the fields in the
L2FWC register.
Figure 4–18. L2 Flush Base Address Register Fields (L2FBAR)
310
L2 Flush Base Address
RW,+x
4-22
Figure 4–19. L2 Flush Word Count Register Fields (L2FWC)
3116150
rsvd
R,+xR,+x
To clean a range of address from the L2, write the word-aligned address for
the start of the clean into the L2CBAR. The number of words to clean is equal
to the value written into the L2CWC register. The L2 controller then searches
all L2 cache blocks for all lines whose external memory address falls within the
range from L2CBAR to L2CBAR+L2CWC–4 and copies that data through the
EDMA to external memory space. The L1D is snooped to ensure that the
correct data is stored in the original memory location. In addition to snooping
data from the L1D, any L1P or L1D lines that cache a cleaned address are
invalidated. The L2 clean occurs in the background and does not stall any
pending CPU accesses. The clean begins when the L2CWC is written,
therefore you should take care to ensure that the L2CBAR register is set up
correctly prior to writing the L2CWC. If L2CBAR or L2CWC are not aligned to
the L2 line size (32 words), all lines which contain the words specified are invalidated. However only those words that are contained in the range from
L2CBAR to L2CBAR + L2CWC–4 are saved to the external memory space.
Figure 4–20 shows the fields in the L2CBAR register. Figure 4–21 shows the
fields in the L2CWC register.
L2 Flush Word Count
L2 Description
Figure 4–20. L2 Clean Base Address Register Fields (L2CBAR)
310
L2 Clean Base Address
RW,+x
Figure 4–21. L2 Clean Word Count Register Fields (L2CWC)
3116150
rsvd
R,+xR,+x
If more than one block invalidation, block flush, or block clean is requested at
one time, the CPU is stalled until all are completed. For example, if an L1P
invalidate is being processed and you set up an L2 clean by writing to the
L2CWC register, the CPU is stalled until both the L1P invalidate and L2 clean
are complete.
TMS320C6211/C6711 Two-Level Internal Memory
L2 Clean Word Count
4-23
Chapter 5
Direct Memory Access (DMA) Controller
This chapter describes the direct memory access channels and registers
available for the TMS320C6201/C6202/C6701 devices.
The direct memory access (DMA) controller transfers data between regions
in the memory map without intervention by the CPU. The DMA controller allows movement of data to and from internal memory, internal peripherals, or
external devices to occur in the background of CPU operation. The DMA controller has four independent programmable channels, allowing four different
contexts for DMA operation. In addition, a fifth (auxiliary) channel allows the
DMA controller to service requests from the host port interface (HPI). In discussing DMA operations, several terms are important:
- Read transfer: The DMA controller reads a data element from a source
location in memory.
- Write transfer: The DMA controller writes the data element that was read
during a read transfer to its destination in memory.
- Element transfer: This form refers to the combined read and write transfer
for a single data element.
- Frame transfer: Each DMA channel has an independently programmable
number of elements per frame. In completing a frame transfer, the DMA
controller moves all elements in a single frame.
- Block transfer: Each DMA channel also has an independently program-
mable number of frames per block. In completing a block transfer, the
DMA controller moves all frames that it has been programmed to move.
- Transmit element transfer: In split mode, data elements are read from the
source address, and writing it to the split destination address. See section
5.8 for details.
- Receive element transfer: In split mode, data elements are read from the
split source address, and writing it to the destination address. See section
5.8 for details.
The DMA controller has the following features:
- Background operation: The DMA controller operates independently of the
CPU.
- High throughput: Elements can be transferred at the CPU clock rate. See
section 5.11,
- Four channels: The DMA controller can keep track of the contexts of four
Structure
independent block transfers. See section 5.2,
, on page 5-35 for more information.
DMA Registers
, on page
5-5 for more information about saving the contents of multiple block
transfers.
5-2
Overview
- Auxiliary channel: This channel allows the host port to make requests into
the CPU’s memory space. The auxiliary channel requests may be priori tized relative to other channels and the CPU.
- Split-channel operation: A single channel can be used to perform both the
receive and transmit element transfers from or to a peripheral simultaneously , effectively acting like two DMA channels. See section 5.8 on page
5-28 for more information.
- Multiframe transfer: Each block transfer can consist of multiple frames of
a programmable size. See Section 5.5,
- Programmable priority: Each channel has independently programmable
Transfer Counting
.
priorities versus the CPU.
- Programmable address generation: Each channel’s source and destination
address registers can have configurable indexes for each rea d a n d w r i t e
transfer. The address can remain constant, increment, decrement, or be
adjusted by a programmable value. The programmable value allows an index for the last transfer in a frame distinct from that used for the preceding
transfers. See section 5.7.1 on page 5-22 for more information.
- Full 32-bit address range: The DMA controller can access any region in
the memory map:
J On-chip data memory
J On-chip program memory when it is mapped into memory space
rather than being used as cache
J On-chip peripherals
J External memory via the EMIF
J Expansion memory via the expansion bus
- Programmable width transfers: Each channel can be independently con-
figured to transfer either bytes, 16-bit halfwords, or 32-bit words. See section 5.7.3 on page 5-23 for more information.
- Autoinitialization: Once a block transfer is complete, a DMA channel can
automatically reinitialize itself for the next block transfer. See section 5.4.1
on page 5-13 for more information.
- Event synchronization: Each read, write, or frame transfer may be initiated
by selected events. See Section 5.6 on page 5-17 for more information.
- Interrupt generation: On completion of each frame transfer or block transfer,
as well as on various error conditions, each DMA channel can send an interrupt to the CPU. See section 5.10 on page 5-33 for more information.
Direct Memory Access (DMA) Controller
5-3
Overview
Figure 5–1 shows the ’C6000 block diagram with the DMA-related components shaded.
Figure 5–1. DMA Controller Interconnect to TMS320C6201/C6202/C6701
Memory-Mapped Modules
Timers
Interrupt selector
McBSPs
HPI control
DMA control
EMIF control
Host port/
Expansion bus
PLL
Power
down
Boot
configuration
EMIF
Peripheral
bus
controller
DMA
controller
Data memory
Data memory
controller
CPU core
Program fetch
Instruction dispatch
Instruction decode
Data path
1
Program memory controller
Program memory/cache
Data path
2
5-4
5.2DMA Registers
DMA Registers
The DMA registers configure the operation of the DMA controller. Table 5–1
and Table 5–2 show how the DMA control registers are mapped in memory.
These registers include the DMA global data, count reload, index, and address
registers, as well as independent control registers for each channel.
Direct Memory Access (DMA) Controller
5-5
DMA Registers
Table 5–1. DMA Control Registers by Address
Hex Byte
Address
0184 0000DMA channel 0 primary control5.2.1
0184 0004DMA channel 2 primary control5.2.1
0184 0008DMA channel 0 secondary control5.10
0184 000CDMA channel 2 secondary control5.10
0184 0010DMA channel 0 source address5.7
0184 0014DMA channel 2 source address5.7
0184 0018DMA channel 0 destination address5.7
0184 001CDMA channel 2 destination address5.7
0184 0020DMA channel 0 transfer counter5.5
0184 0024DMA channel 2 transfer counter5.5
0184 0028DMA global count reload register A5.5
0184 002CDMA global count reload register B5.5
0184 0030DMA global index register A5.7.2
0184 0034DMA global index register B5.7.2
0184 0038DMA global address register A5.8
0184 003CDMA global address register B5.8
DMA channel 3 destination address0184 005C5.7
DMA channel 3 primary control0184 00445.2.1
DMA channel 3 secondary control0184 004C5.10
DMA channel 3 source address0184 00545.7
DMA channel 3 transfer counter0184 00645.5
DMA global address register A0184 00385.8
DMA global address register B0184 003C5.8
DMA global address register C0184 00685.8
DMA global address register D0184 006C5.8
DMA global count reload register A0184 00285.5
DMA global count reload register B0184 002C5.5
DMA global index register A0184 00305.7.2
DMA global index register B
0184 00345.7.2
Direct Memory Access (DMA) Controller
5-7
DMA Registers
5.2.1DMA Channel Control Registers
The DMA channel primary and secondary control registers (Figure 5–2 and
Figure 5–3) contain-fields that control each DMA channel independently . These
fields are summarized in Table 5–3 and T able 5–4.
Table 5–3. DMA Channel Primary Control Register Field Descriptions
FieldDescriptionSection
DST RELOAD,
SRC RELOAD
Source/destination address reload for autoinitialization
SRC/DST RELOAD = 00b: do not reload during autoinitialization
SRC/DST RELOAD = 01b: use DMA global address register B as reload
SRC/DST RELOAD = 10b: use DMA global address register C as reload
SRC/DST RELOAD = 11b: use DMA global address register D as reload
5.4.1.1
EMOD
FS
TCINT
PRI
WSYNC,
RSYNC
5-8
Emulation mode
EMOD = 0: DMA channel keeps running during an emulation halt
EMOD = 1: DMA channel pauses during an emulation halt
Frame synchronization
FS = 0: disable
FS = 1: RSYNC event used to synchronize entire frame
Transfer controller interrupt
TCINT = 0: interrupt disabled
TCINT = 1: interrupt enabled
Priority mode: DMA versus CPU
PRI = 0: CPU priority
PRI = 1: DMA priority
Read transfer/write transfer synchronization
(R/W)SYNC = 00000b: no synchronization
(R/W)SYNC = other: sets synchronization event
5.13
5.6
5.10
5.9
5.6
DMA Registers
Table 5–3. DMA Channel Primary Control Register Field Descriptions (Continued)
FieldSectionDescription
INDEX
Selects the DMA global data register to use as a programmable index
INDEX = 0: use DMA global index register A
INDEX = 1: use DMA global index register B
5.7.2
CNT RELOAD
SPLIT
ESIZE
DST DIR,
SRC DIR
STATUS
Transfer counter reload for autoinitialization and multiframe transfers
CNT RELOAD = 0: reload with DMA global count reload register A
CNT RELOAD = 1: reload with DMA global count reload register B
Split channel mode
SPLIT = 00b: split-channel mode disabled
SPLIT = 01b: split-channel mode enabled; use DMA global address register A as
split address
SPLIT = 10b: split-channel mode enabled; use DMA global address register B as
split address
SPLIT = 1 1b: split-channel mode enabled; use DMA global address register C as
Source/destination address modification after element transfers
SRC/DST DIR = 00b: no modification
SRC/DST DIR = 01b: increment by element size in bytes
SRC/DST DIR = 10b: decrement by element size in bytes
SRC/DST DIR = 11b: adjust using DMA global index register selected by INDEX
ST ATUS = 00b: stopped
ST ATUS = 01b: running without autoinitialization
ST ATUS = 10b: paused
ST ATUS = 11b: running with autoinitialization
5.4.1.1
5.8
5.7.3
5.7.1,
5.7.2
5.4
STAR T
ST ART = 00b: stop
ST ART = 01b: start without autoinitialization
ST ART = 10b: pause
ST AR T = 11b: start with autoinitialization
Direct Memory Access (DMA) Controller
5.4
5-9
DMA Registers
Figure 5–3. DMA Channel Secondary Control Register
Read or write synchronization status
ST AT = 0: synchronization is not received
STAT = 1: synchronization is received
DMAC ENDMAC pin control
DMAC EN = 000b: DMAC pin is held low.
DMAC EN = 001b: DMAC pin is held high.
DMAC EN = 010b: DMAC reflects RSYNC STAT.
DMAC EN = 011b: DMAC reflects WSYNC STAT.
DMAC EN = 100b: DMAC reflects FRAME COND.
DMAC EN = 101b: DMAC reflects BLOCK COND.
DMAC EN = other: reserved
5.10
5.10.1
5.6.1
5.12
RSYNC CLR
WSYNC CLR
5-10
Read or write synchronization status clear
5.6.1
Read as 0 write 1 to clear associated status
DMA Registers
The DMA channel secondary control register of the ‘C6202 has been expanded to include three new fields: WSPOL, RSPOL, and FSIG. This field is used
to add control to a frame-synchronized data transfer. The ‘C6202 secondary
control register is shown in Figure 5–4; the new field is shown in gray.
Table 5–5 describes the possible configurations of the new field.
Figure 5–4. TMS320C6202 Secondary Control Register
Selects the polarity of an external sync event:
1 = active low, 0 = active high
This field is valid only if EXT_INTx is selected.
5.6.3
FSIGFrame sync ignore.
Setting FSIG = 1 causes the DMA channel to
ignore any event transitions during a current
burst. Synchronization is level triggered instead
of edge triggered.
Direct Memory Access (DMA) Controller
5.6.3
5-11
Memory Map
5.3Memory Map
The DMA controller assumes the device memory map shown in Chapter 10,
Configuration, Reset, and Memory Maps
. Requests are sent to one of five re -
Boot
sources:
- Expansion bus
- External memory interface
- Internal program memory
- Internal peripheral bus
- Internal data memory
The source address is assumed to point to one of these four spaces throughout
a block transfer. This constraint also applies to the destination address.
5-12
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.