Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products
or to discontinue any product or service without notice, and advise customers to obtain the latest
version of relevant information to verify , before placing orders, that information being relied on
is current and complete. All products are sold subject to the terms and conditions of sale supplied
at the time of order acknowledgement, including those pertaining to warranty, patent
infringement, and limitation of liability.
TI warrants performance of its semiconductor products to the specifications applicable at the
time of sale in accordance with TI’s standard warranty. Testing and other quality control
techniques are utilized to the extent TI deems necessary to support this warranty . Specific testing
of all parameters of each device is not necessarily performed, except those mandated by
government requirements.
CERTAIN APPLICATIONS USING SEMICONDUCTOR PRODUCTS MAY INVOLVE
POTENTIAL RISKS OF DEATH, PERSONAL INJURY, OR SEVERE PROPERTY OR
ENVIRONMENTAL DAMAGE (“CRITICAL APPLICATIONS”). TI SEMICONDUCTOR
PRODUCTS ARE NOT DESIGNED, AUTHORIZED, OR WARRANTED TO BE SUIT ABLE FOR
USE IN LIFE-SUPPORT DEVICES OR SYSTEMS OR OTHER CRITICAL APPLICATIONS.
INCLUSION OF TI PRODUCTS IN SUCH APPLICATIONS IS UNDERSTOOD TO BE FULLY
AT THE CUSTOMER’S RISK.
In order to minimize risks associated with the customer’s applications, adequate design and
operating safeguards must be provided by the customer to minimize inherent or procedural
hazards.
TI assumes no liability for applications assistance or customer product design. TI does not
warrant or represent that any license, either express or implied, is granted under any patent right,
copyright, mask work right, or other intellectual property right of TI covering or relating to any
combination, machine, or process in which such semiconductor products or services might be
or are used. TI’s publication of information regarding any third party’s products or services does
not constitute TI’s approval, warranty or endorsement thereof.
Copyright 1999, Texas Instruments Incorporated
About This Manual
This reference guide describes the CPU architecture, pipeline, instruction set,
and interrupts for the TMS320C6000 digital signal processors (DSPs). Unless
otherwise specified, all references to the ’C6000 refer to the TMS320C6000
platform of DSPs, ’C62x refers to the TMS320C62x fixed-point DSPs in the
’C6000 platform, and ’C67x refers to the TMS320C67x floating-point DSPs in
the ’C6000 platform.
How to Use This Manual
Use this manual as a reference for the architecture of the TMS320C6000 CPU.
First-time readers should read Chapter 1 for general information about TI
DSPs, the features of the ’C6000, and the applications for which the ’C6000
is best suited.
Preface
Read This First
Read chapters 2, 5, 6, and 7 to grasp the concepts of the architecture. Chapter 3 and Chapter 4 contain detailed information about each instruction and is
best used as reference material; however, you may want to read sections 3.1
through 3.9 and sections 4.1 through 4.6 for general information about the
instruction set and to understand the instruction descriptions, then browse
through Chapter 3 and Chapter 4 to familiarize yourself with the instructions.
Contents
iii
Read This First
The following table gives chapter references for specific information:
If you are looking for information about:
T urn to these chapters:
Addressing modesChapter 3,
Instruction Set
Chapter 4,
Instruction Set
Conditional operationsChapter 3,
Instruction Set
Chapter 4,
Instruction Set
Control registersChapter 2,
CPU architecture and data
To help you easily recognize instructions and parameters throughout the
book, instructions are in bold face and parameters are in
italics
(except
in program listings).
tered as shown; portions of a syntax that are in
italics
describe the
type
of
information that should be entered. Here is an example of an instruction:
MPY
src1,src2,dst
MPY is the instruction mnemonic. When you use MPY, you must supply
two source operands (
appropriate types as defined in Chapter 3,
Point Instruction Set
.
src1
and
src2
) and a destination operand (
TMS320C62x/C67x Fixed-
dst
) of
Although the instruction mnemonic (MPY in this example) is in capital letters, the ’C6x assembler
is not case sensitive
— it can assemble mnemon-
ics entered in either upper or lower case.
- Square brackets, [ and ], and parentheses, ( and ), are used to identify op-
tional items. If you use an optional item, you must specify the information
within brackets or parentheses; however, you do not enter the brackets or
parentheses themselves. Here is an example of an instruction that has optional items.
[
label
] EXTU (
.unit) src2, csta, cstb, dst
The EXTU instruction is shown with a label and several parameters. The
[
label
] and the parameter (
cstb,
and
dst
are not optional.
- Throughout this book MSB means
least significant bit
- A special icon is used to indicate material that applies only to the floating-
.
.unit
) are optional. The parameters
most significant bit
src2, csta,
and LSB means
point (’C67x) DSP:
Read This First
v
Related Documentation From Texas Instruments
Related Documentation From Texas Instruments
The following books describe the TMS320C6x generation and related support
tools. To obtain a copy of any of these TI documents, call the Texas Instruments Literature Response Center at (800) 477–8924. When ordering, please
identify the book by its title and literature number.
TMS320C62x/C67x Technical Brief
introduction to the ’C62x/C67x digital signal processors, development
tools, and third-party support.
TMS320C6201 Digital Signal Processor Data Sheet
SPRS051) describes the features of the TMS320C6201 and provides
pinouts, electrical specifications, and timings for the device.
TMs320C6202 Digital Signal Processor Data Sheet
SPRS072) describes the features of the TMS320C6202 fixed-point DSP
and provides pinouts, electrical specifications, and timings for the device.
TMS320C6211 Digital Signal Processor Data Sheet
SPRS073) describes the features of the TMS320C621 1 fixed-point DSP
and provides pinouts, electrical specifications, and timings for the device.
TMS320C6701 Digital Signal Processor Data Sheet
SPRS067) describes the features of the TMS320C6701 floating-point
DSP and provides pinouts, electrical specifications, and timings for the
device.
TMS320C6000 Peripherals Reference Guide
describes common peripherals available on the TMS320C6000 digital
signal processors. This book includes information on the internal data
and program memories, the external memory interface (EMIF), the host
port, serial ports, direct memory access (DMA), clocking and phaselocked loop (PLL), and the power-down modes.
(literature number SPRU197) gives an
(literature number
(literature number
(literature number
(literature number
(literature number SPRU190)
TMS320C62x/C67x Programmer’s Guide
describes ways to optimize C and assembly code for the
TMS320C62x/C67x DSPs and includes application program examples.
TMS320C6000 Assembly Language Tools User’s Guide
SPRU186) describes the assembly language tools (assembler, linker,
and other tools used to develop assembly language code), assembler
directives, macros, common object file format, and symbolic debugging
directives for the ’C6000 generation of devices.
vi
(literature number SPRU198)
(literature number
Related Documentation From Texas Instruments / Trademarks
Trademarks
TMS320C6000 Optimizing C Compiler User’s Guide
(literature number
SPRU187) describes the ’C6000 C compiler and the assembly optimizer .
This C compiler accepts ANSI standard C source code and produces assembly language source code for the ’C6000 generation of devices. The
assembly optimizer helps you optimize your assembly code.
TMS320 Third-Party Support Reference Guide
(literature number
SPRU052) alphabetically lists over 100 third parties that provide various
products that serve the family of TMS320 digital signal processors. A
myriad of products and applications are offered—software and hardware
development tools, speech recognition, image processing, noise cancellation, modems, etc.
TI, XDS510, V elociTI, and 320 Hotline On-line are trademarks of T exas Instruments Incorporated.
Windows and Windows NT are registered trademarks of Microsoft Corporation.
Read This First
vii
If You Need Assistance
If You Need Assistance . . .
- World-Wide Web Sites
TI Onlinehttp://www.ti.com
Semiconductor Product Information Center (PIC)http://www.ti.com/sc/docs/pic/home.htm
DSP Solutionshttp://www.ti.com/dsps
320 Hotline On-linethttp://www.ti.com/sc/docs/dsps/support.htm
- North America, South America, Central America
Product Information Center (PIC)(972) 644-5580
TI Literature Response Center U.S.A.(800) 477-8924
Software Registration/Upgrades(214) 638-0333Fax: (214) 638-7742
U.S.A. Factory Repair/Hardware Upgrades(281) 274-2285
U.S. Technical Training Organization(972) 644-5580
DSP Hotline(281) 274-2320Fax: (281) 274-2324Email: dsph@ti.com
DSP Modem BBS(281) 274-2323
DSP Internet BBS via anonymous ftp to ftp://ftp.ti.com/pub/tms320bbs
- Europe, Middle East, Africa
European Product Information Center (EPIC) Hotlines:
Literature Response Center+852 2 956 7288Fax: +852 2 956 2200
Hong Kong DSP Hotline+852 2 956 7268Fax: +852 2 956 1002
Korea DSP Hotline+82 2 551 2804Fax: +82 2 551 2828
Korea DSP Modem BBS+82 2 551 2914
Singapore DSP HotlineFax: +65 390 7179
Taiwan DSP Hotline+886 2 377 1450Fax: +886 2 377 2718
Taiwan DSP Modem BBS+886 2 376 2592
Taiwan DSP Internet BBS via anonymous ftp to ftp://dsp.ee.tit.edu.tw/pub/TI/
- Japan
Product Information Center+0120-81-0026 (in Japan)Fax: +0120-81-0036 (in Japan)
DSP Hotline+03-3769-8735 or (INTL) 813-3769-8735Fax: +03-3457-7071 or (INTL) 813-3457-7071
DSP BBS via Nifty-ServeType “Go TIASP”
- Documentation
When making suggestions or reporting errors in documentation, please include the following information that is on the title
page: the full title of the book, the publication date, and the literature number.
Note:When calling a Literature Response Center to order documentation, please specify the literature number of the
viii
book.
+03-3457-0972 or (INTL) 813-3457-0972Fax: +03-3457-1259 or (INTL) 813-3457-1259
Contents
Contents
Summarizes the features of the TMS320 family of products and presents typical applications.
Describes the TMS320C62x/C67x DSPs and lists their key features.
Describes the assembly language instructions that are common to both the TMS320C62x and
TMS320C67x, including examples of each instruction. Provides information about addressing
modes, resource constraints, parallel operations, and conditional operations.
Describes the TMS320C67x floating-point instruction set, including examples of each
instruction. Provides information about addressing modes and resource constraints.
The TMS320C6x generation of digital signal processors is part of the TMS320
family of digital signal processors (DSPs). The TMS320C62x devices are
fixed-point DSPs in the TMS320C6x generation, and the TMS320C67x
devices are floating-point DSPs in the TMS320C6x generation. The
TMS320C62x and TMS320C67x are code compatible and both use the
VelociTI architecture, a high-performance, advanced VLIW (very long
instruction word) architecture, making these DSPs excellent choices for multichannel and multifunction applications.
The VelociTI architecture of the ’C62x and ’C67x make them the first of f-theshelf DSPs to use advanced VLIW to achieve high performance through
increased instruction-level parallelism. A traditional VLIW architecture
consists of multiple execution units running in parallel, performing multiple
instructions during a single clock cycle. Parallelism is the key to extremely high
performance, taking these DSPs well beyond the performance capabilities of
traditional superscalar designs. VelociTI is a highly deterministic architecture,
having few restrictions on how or when instructions are fetched, executed, or
stored. It is this architectural flexibility that is key to the breakthrough efficiency
levels of the ’C6x compiler. VelociTI’s advanced features include:
- Instruction packing: reduced code size
- All instructions can operate conditionally: flexibility of code
- Variable-width instructions: flexibility of data types
The TMS320 family consists of fixed-point, floating-point, and multiprocessor
digital signal processors (DSPs). TMS320 DSPs have an architecture designed specifically for real-time signal processing.
1.1.1History of TMS320 DSPs
In 1982, Texas Instruments introduced the TMS32010—the first fixed-point
DSP in the TMS320 family. Before the end of the year,
magazine awarded the TMS32010 the title “Product of the Year”. Today, the
TMS320 family consists of many generations: ’C1x, ’C2x, ’C2xx, ’C5x, and
’C54x fixed-point DSPs; ’C3x and ’C4x floating-point DSPs, and ’C8x multiprocessor DSPs. Now there is a new generation of DSPs, the TMS320C6x generation, with performance and features that are reflective of T exas Instruments
commitment to lead the world in DSP solutions.
1.1.2Typical Applications for the TMS320 Family
T able 1–1 lists some typical applications for the TMS320 family of DSPs. The
TMS320 DSPs offer adaptable approaches to traditional signal-processing
problems. They also support complex applications that often require multiple
operations to be performed simultaneously.
Electronic Products
1-2
Table 1–1. Typical Applications for the TMS320 DSPs
AutomotiveConsumerControl
TMS320 Family Overview
Adaptive ride control
Antiskid brakes
Cellular telephones
Digital radios
Engine control
Global positioning
Navigation
Vibration analysis
Voice commands
General PurposeGraphics/ImagingIndustrial
Adaptive filtering
Convolution
Correlation
Digital filtering
Fast Fourier transforms
Hilbert transforms
Waveform generation
Windowing
InstrumentationMedicalMilitary
Digital filtering
Function generation
Pattern matching
Phase-locked loops
Seismic processing
Spectrum analysis
Transient analysis
Digital radios/TVs
Educational toys
Music synthesizers
Pagers
Power tools
Radar detectors
Solid-state answering machines
Disk drive control
Engine control
Laser printer control
Motor control
Robotics control
Servo control
Numeric control
Power-line monitoring
Robotics
Security access
Image processing
Missile guidance
Navigation
Radar processing
Radio frequency modems
Secure communications
Sonar processing
TelecommunicationsVoice/Speech
1200- to 56Ă600-bps modems
Adaptive equalizers
ADPCM transcoders
Base stations
Cellular telephones
Channel multiplexing
Data encryption
Digital PBXs
Digital speech interpolation (DSI)
DTMF encoding/decoding
Echo cancellation
Faxing
Future terminals
Line repeaters
Personal communications
systems (PCS)
Personal digital assistants (PDA)
Speaker phones
Spread spectrum communications
Digital subscriber loop (xDSL)
Video conferencing
X.25 packet switching
Overview of the TMS320C6x Generation of Digital Signal Processors
1.2Overview of the TMS320C6x Generation of Digital Signal Processors
With a performance of up to 1600 million instructions per second (MIPS) and
an efficient C compiler , the TMS320C6x DSPs give system architects unlimited possibilities to differentiate their products. High performance, ease of use,
and affordable pricing make the TMS320C6x generation the ideal solution for
multichannel, multifunction applications, such as:
- Pooled modems
- Wireless local loop base stations
- Beam-forming base stations
- Remote access servers (RAS)
- Digital subscriber loop (DSL) systems
- Cable modems
- Multichannel telephony systems
- Virtual reality 3-D graphics
- Speech recognition
- Audio
- Radar
- Atmospheric modeling
- Finite element analysis
- Imaging (examples: fingerprint recognition, ultrasound, and MRI)
The TMS320C6x generation is also an ideal solution for exciting new applications; for example:
- Personalized home security with face and hand/fingerprint recognition
- Advanced cruise control with global positioning systems (GPS) navigation
and accident avoidance
- Remote medical diagnostics
1-4
Features and Options of the TMS320C62x/C67x
1.3Features and Options of the TMS320C62x/C67x
The ’C62x devices operate at 200 MHz (5-ns cycle time). The ’C67x devices
operate at 167 MHz (6-ns cycle time). Both DSPs execute up to eight 32-bit
instructions every cycle. The device’s core CPU consists of 32 generalpurpose registers of 32-bit word length and eight functional units:
- Two multipliers
- Six ALUs
The ’C62x/C67x have a complete set of optimized development tools, including an efficient C compiler, an assembly optimizer for simplified assemblylanguage programming and scheduling, and a Windows based debugger
interface for visibility into source code execution characteristics. A hardware
emulation board, compatible with the TI XDS510 emulator interface, is also
available. This tool complies with IEEE Standard 1149.1–1990, IEEE Standard Test Access Port and Boundary-Scan Architecture.
Features of the ’C62x/C67x include:
- Advanced VLIW CPU with eight functional units, including two multipliers
and six arithmetic units
J Executes up to eight instructions per cycle for up to ten times the
performance of typical DSPs
J Allows designers to develop highly effective RISC-like code for fast
development time
- Instruction packing
J Gives code size equivalence for eight instructions executed serially or
in parallel
J Reduces code size, program fetches, and power consumption.
- All instructions execute conditionally .
J Reduces costly branching
J Increases parallelism for higher sustained performance
- Code executes as programmed on independent functional units.
J Industry’s most efficient C compiler on DSP benchmark suite
J Industry’s first assembly optimizer for fast development and improved
parallelization
- 8/16/32-bit data support, providing efficient memory support for a variety
of applications
- 40-bit arithmetic options add extra precision for vocoders and other com-
putationally intensive applications
Introduction
1-5
Features and Options of the TMS320C62x/C67x
- Saturation and normalization provide support for key arithmetic opera-
tions.
- Field manipulation and instruction extract, set, clear, and bit counting
support common operation found in control and data manipulation
applications.
The ’C67x has these additional features:
- Peak 1336 MIPS at 167 MHz
- Peak 1G FLOPS at 167 MHz for single-precision operations
- Peak 250M FLOPS at 167 MHz for double-precision operations
- Peak 688M FLOPS at 167 MHz for multiply and accumulate operations
- Hardware support for single-precision (32-bit) and double-precision
(64-bit) IEEE floating-point operations
- 32 32-bit integer multiply with 32- or 64-bit result
A variety of memory and peripheral options are available for the ’C62x/C67x:
and other asynchronous memories for a broad range of external memory
requirements and maximum system performance
- 16-bit host port for access to ’C62x/C67x memory and peripherals
- Multichannel DMA controller
- Multichannel serial port(s)
- 32-bit timer(s)
1-6
1.4TMS320C62x/C67x Architecture
Á
Á
Á
Figure 1–1 is the block diagram for the TMS320C62x/C67x DSPs. The
’C62x/C67x devices come with program memory, which, on some devices,
can be used as a program cache. The devices also have varying sizes of data
memory. Peripherals such as a direct memory access (DMA) controller,
power-down logic, and external memory interface (EMIF) usually come with
the CPU, while peripherals such as serial ports and host ports are on only
certain devices. Check the data sheet for your device to determine the specific
peripheral configurations you have.
Figure 1–1. TMS320C62x/C67x Block Diagram
’C62x/’C67x device
Program cache/program memory
32-bit address
256-bit data
TMS320C62x/C67x Architecture
DMA, EMIF
Power
down
Data path AData path B
Data cache/data memory
32-bit address
8-, 16-, 32-bit data
Program fetch
Instruction dispatch
Instruction decode
.D1.M1.S1.L1
.D2 .M2 .S2 .L2
’C62x/C67x CPU
Control
registers
Control
Register file BRegister file A
logic
Test
Emulation
Interrupts
Additional
peripherals:
Timers,
serial ports,
etc.
Introduction
1-7
TMS320C62x/C67x Architecture
1.4.1Central Processing Unit (CPU)
The ’C62x/C67x CPU, shaded in Figure 1–1, is common to all the ’C62x/C67x
devices. The CPU contains:
- Program fetch unit
- Instruction dispatch unit
- Instruction decode unit
- Two data paths, each with four functional units
- 32 32-bit registers
- Control registers
- Control logic
- Test, emulation, and interrupt logic
The program fetch, instruction dispatch, and instruction decode units can
deliver up to eight 32-bit instructions to the functional units every CPU clock
cycle. The processing of instructions occurs in each of the two data paths (A
and B), each of which contains four functional units (.L, .S, .M, and .D) and 16
32-bit general-purpose registers. The data paths are described in more detail
in Chapter 2,
means to configure and control various processor operations. To understand
how instructions are fetched, dispatched, decoded, and executed in the data
path, see Chapter 5,
Pipeline
CPU Data Paths and Control
.
. A control register file provides the
TMS320C62x Pipeline
, and Chapter 6,
TMS320C67x
1.4.2Internal Memory
The ’C62x/C67x have a 32-bit, byte-addressable address space. Internal (onchip) memory is organized in separate data and program spaces. When offchip memory is used, these spaces are unified on most devices to a single
memory space via the external memory interface (EMIF).
The ’C62x/C67x have two 32-bit internal ports to access internal data memory .
The ’C62x/C67x have a single internal port to access internal program
memory, with an instruction-fetch width of 256 bits.
1-8
1.4.3Peripherals
TMS320C62x/C67x Architecture
The following peripheral modules can complement the CPU on the
’C62x/C67x DSPs. Some devices have a subset of these peripherals but may
not have all of them.
- Serial ports
- Timers
- External memory interface (EMIF) that supports synchronous and
asynchronous SRAM and synchronous DRAM
- DMA controller
- Host-port interface
- Power-down logic that can halt CPU activity, peripheral activity, and
phased-locked loop (PLL) activity to reduce power consumption
Introduction
1-9
Chapter 2
CPU Data Paths and Control
This chapter focuses on the CPU, providing information about the data paths
and control registers. The two register files and the data crosspaths are
described.
Figure 2–1 and Figure 2–2 show the components of the data paths the ’C62x
and C67x, repectively. These components consist of:
- Two general-purpose register files (A and B)
- Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2)
2.7TMS320C67x Extensions to the Control Register File2-13. . . . . . . . . . .
2-1 August 1996
CPU Data Paths and Control
Figure 2–1. TMS320C62x CPU Data Paths
ST1
Data path A
LD1
DA1
DA2
LD2
Data path B
ST2
.L1
long dst
long src
long src
long dst
.S1
.M1
.D1
.D2
.M2
.S2
long dst
long src
long src
long dst
.L2
src1
src2
dst
dst
src1
src2
dst
src1
src2
dst
src1
src2
src2
src1
dst
src2
src1
dst
src2
src1
dst
dst
src2
8
8
32
8
Register
file A
(A0–A15)
2X
1X
Register
file B
(B0–B15)
8
32
8
8
2-2
src1
Control
register
file
Figure 2–2. TMS320C67x CPU Data Paths
Á
Á
Á
LD1 32 MSB
ST1
Data path A
LD1 32 LSB
DA1
.L1
long dst
long src
long src
long dst
.S1
.M1
.D1
src1
src2
dst
dst
src1
src2
dst
src1
src2
dst
src1
src2
CPU Data Paths and Control
8
8
8
32
32
8
Register
file A
(A0–A15)
2X
Data path B
DA2
LD2 32 LSB
LD2 32 MSB
ST2
.D2
.M2
.S2
long dst
long src
long src
long dst
.L2
src2
src1
dst
src2
src1
dst
src2
src1
dst
dst
src2
src1
1X
Register
file B
(B0–B15)
8
8
32
8
32
8
Control
register
file
CPU Data Paths and Control
2-3
General-Purpose Register Files
2.1General-Purpose Register Files
There are two general-purpose register files (A and B) in the ’C62x/C67x data
paths. Each of these files contains 16 32-bit registers (A0–A15 for file A and
B0–B15 for file B). The general-purpose registers can be used for data, data
address pointers, or condition registers.
The general-purpose register files support 32- and 40-bit fixed-point data. The
32-bit data can be contained in any general-purpose register. The ’C67x also
supports 32-bit single-precision and 64-bit double-precision data. The 40-bit
data is contained across two registers; the 32 LSBs of the data are placed in
an even register and the remaining eight MSBs are placed in the eight LSBs
of the next upper register (which is always an odd register). There are 16 valid
register pairs for 40-bit data, as shown in Table 2–1. In assembly language
syntax, the register pairs are denoted by a colon between the register names
and the odd register is specified first. The ’C67x also uses these register pairs
to hold 64-bit double-precision floating-point values. See Chapter 4 for more
information on double-precision floating-point values.
Table 2–1. 40-Bit/64-Bit Register Pairs
Register Files
AB
A1:A0B1:B0
A3:A2B3:B2
A5:A4B5:B4
A7:A6B7:B6
A9:A8B9:B8
A11:A10B11:B10
A13:A12B13:B12
A15:A14
B15:B14
2-4
Figure 2–3 illustrates the register storage scheme for 40-bit long data. Operations requiring a long input ignore the 24 MSBs of the odd register. Operations
producing a long result zero-fill the 24 MSBs of the odd register. The even
register is encoded in the opcode.
Figure 2–3. Storage Scheme for 40-Bit Data in a Register Pair
310310
Odd registerEven register
Ignored
Odd registerEven register
Zero-filled
78
Read from registers
3932310
Write to registers
3932310
General-Purpose Register Files
40-bit data
40-bit data
CPU Data Paths and Control
2-5
Functional Units
2.2Functional Units
The eight functional units in the ’C62x/C67x data paths can be divided into two
groups of four; each functional unit in one data path is almost identical to the
corresponding unit in the other data path. The functional units are described
in Table 2–2.
Table 2–2. Functional Units and Operations Performed
.L unit (.L1,.L2)32/40-bit arithmetic and compare operations
Leftmost 1 or 0 bit counting for 32 bits
Normalization count for 32 and 40 bits
32-bit logical operations
.S unit (.S1, .S2)32-bit arithmetic operations
32/40-bit shifts and 32-bit bit-field operations
32-bit logical operations
Branches
Constant generation
Register transfers to/from the control register file
(.S2 only)
.M unit (.M1, .M2)16 16 bit multiply operations32 32 bit fixed-point multiply
.D unit (.D1, .D2)
Note:Fixed-point operations are available on both the ’C62x and the ’C67x. Floating-point operations and 32-bit fixed-point
multiply are available only on the ’C67x.
32-bit add, subtract, linear and circular address
calculation
Loads and stores with a 5-bit constant offset
Loads and stores with 15-bit constant offset
(.D2 only)
Arithmetic operations
DP → SP, INT → DP, INT → SP
conversion operations
Compare
Reciprocal and reciprocal squareroot operations
Absolute value operations
SP → DP conversion operations
operations
Floating-point multiply operations
Load doubleword with 5-bit constant
offset
Most data lines in the CPU support 32-bit operands, and some support long
(40-bit) operands. Each functional unit has its own 32-bit write port into a
general-purpose register file. All units ending in 1 (for example, .L1) write to
register file A and all units ending in 2 write to register file B. Each functional
src1
and
src2
unit has two 32-bit read ports for source operands
. Four units
(.L1, .L2, .S1, and .S2) have an extra 8-bit-wide port for 40-bit long writes, as
well as an 8-bit input for 40-bit long reads. Because each unit has its own 32-bit
write port, all eight units can be used in parallel every cycle.
2-6
Register File Cross Paths / Memory, Load, and Store Paths / Data Address Paths
2.3Register File Cross Paths
Each functional unit reads directly from and writes directly to the register file
within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register
file A and the .L2, .S2, .D2, and .M2 units write to register file B. The register
files are connected to the opposite-side register file’s functional units via the
1X and 2X cross paths. These cross paths allow functional units from one data
path to access a 32-bit operand from the opposite side’s register file. The 1X
cross path allows data path A ’s functional units to read their source from register file B and the 2X cross path allows data path B’s functional units to read their
source from register file A.
Six of the functional units have access to the opposite side’s register file via
a cross path. The .M1, .M2, .S1, and .S2 units’
able between the cross path and the same side register file. The .L1 and .L2
units’
src1
and
path and the same-side register file.
Only two cross paths, 1X and 2X, exist in the ’C62x/C67x CPUs. This limits one
source read from each data path’s opposite register file per cycle, or two crosspath source reads per cycle.
Functional Units
src2
inputs are multiplex-select-
src2
inputs are also multiplex-selectable between the cross
2.4Memory, Load, and Store Paths
There are two 32-bit paths for loading data from memory to the register file:
LD1 for register file A, and LD2 for register file B. The ’C67x also has a second
32-bit load path for both register files A and B, which allows the LDDW instruction to simultaneously load two 32-bit registers into side A and two 32-bit registers into side B. There are also two 32-bit paths, ST1 and ST2, for storing register values to memory from each register file. The store paths are shared with
the .L and .S long read paths.
2.5Data Address Paths
The data address paths (DA1 and DA2 in Figure 2–1 and Figure 2–2) coming
out of the .D units allow data addresses generated from one register file to support loads and stores to memory from the other register file.
CPU Data Paths and Control
2-7
TMS320C62x/C67x Control Register File
2.6TMS320C62x/C67x Control Register File
One unit (.S2) can read from and write to the control register file, as shown in
Figure 2–1 and Figure 2–2. Table 2–3 lists the control registers contained in
the control register file and describes each. If more information is available on
a control register, the table lists where to look for that information. Each control
register is accessed by the MVC instruction. See the MVC instruction description in Chapter 3,
TMS320C62x/C67x Fixed-Point Instruction Set
tion on how to use this instruction.
Table 2–3. Control Registers
Register
Abbreviation NameDescriptionPage
, for informa-
AMRAddressing mode registerSpecifies whether to use linear or circular addres-
sing for each of eight registers; also contains sizes
for circular addressing
CSRControl status registerContains the global interrupt enable bit, cache
control bits, and other miscellaneous control and
status bits
IFRInterrupt flag registerDisplays status of interrupts7-14
ISRInterrupt set registerAllows you to set pending interrupts manually7-14
ICRInterrupt clear registerAllows you to clear pending interrupts manually7-14
IERInterrupt enable registerAllows enabling/disabling of individual interrupts7-13
ISTPInterrupt service table pointerPoints to the beginning of the interrupt service
table
IRPInterrupt return pointerContains the address to be used to return from a
maskable interrupt
NRPNonmaskable interrupt return
pointer
PCE1
Program counter, E1 phaseContains the address of the fetch packet that con-
Contains the address to be used to return from a
nonmaskable interrupt
tains the execute packet in the E1 pipeline stage
2-11
7-16
7-16
2-12
2-9
7-8
2-8
2.6.1Addressing Mode Register (AMR)
For each of the eight registers (A4–A7, B4–B7) that can perform linear or circular addressing, the AMR specifies the addressing mode. A 2-bit field for each
register selects the address modification mode: linear (the default) or circular
mode. With circular addressing, the field also specifies which BK (block size)
field to use for a circular buffer . In addition, the buffer must be aligned on a byte
boundary equal to the block size. The mode select fields and block size fields
are shown in Figure 2–4, and the mode select field encoding is shown in
Table 2–4.
The CSR, shown in Figure 2–5, contains control and status bits. The functions
of the fields in the CSR are shown in T able 2–6. For the EN, PWRD, PCC, and
DCC fields, see your data sheet to see if your device supports the options that
these fields control and see the
Guide
for more information on these options.
TMS320C6201/C6701 Peripherals Reference
Figure 2–5. Control Status Register (CSR)
3124
CPU ID
15
PWRDSATENPCCDCC
R, W, +0
Legend: RReadable by the MVC instruction
WWriteable by the MVC instruction
+xValue undefined after reset
+0Value is zero after reset
CClearable using the MVC instruction
10987542
R, C, +0
Table 2–6. Control Status Register Field Descriptions
TMS320C62x/C67x Control Register File
Revision ID
R
R, +x
R, W, +0
1
PGIE GIE
1623
0
Bit PositionWidthField NameFunction
31-248CPU IDCPU ID; defines which CPU.
CPU ID = 00b: indicates ’C62x, CPU ID= 10b: indicates ’C67x
23-168Revision IDRevision ID; defines silicon revision of the CPU
15-106PWRDControl power-down modes; the values are always read as zero.
91SATThe saturate bit, set when any unit performs a saturate, can be
cleared only by the MVC instruction and can be set only by a func-
tional unit. The set by a functional unit has priority over a clear (by
the MVC instruction) if they occur on the same cycle. The saturate
bit is set one full cycle (one delay slot) after a saturate occurs. This
bit will not be modified by a conditional instruction whose condition
is false.
81ENEndian bit: 1 = little endian, 0 = big endian
7-53PCCProgram cache control mode
4-23DCCData cache control mode
†
†
†
11PGIEPrevious GIE (global interrupt enable); saves GIE when an inter-
rupt is taken
01GIEGlobal interrupt enable; enables (1) or disables (0) all interrupts
except the reset interrupt and NMI (nonmaskable interrupt)
†
See the
TMS320C6201/C6701 Peripherals Reference Guide
for more information.
†
CPU Data Paths and Control
2-11
TMS320C62x/C67x Control Register File
2.6.3E1 Phase Program Counter (PCE1)
The PCE1, shown in Figure 2–6, contains the 32-bit address of the execute
packet in the E1 pipeline phase.
Figure 2–6. E1 Phase Program Counter (PCE1)
31
16
PCE1
15
R,W, +x
PCE1
R,W, +x
Legend: RReadable by the MVC instruction
WWriteable by the MVC instruction
+xValue undefined after reset
0
2-12
TMS320C67x Extensions to the Control Register File
2.7TMS320C67x Extensions to the Control Register File
The ’C67x has three additional configuration registers to support floating point
operations. The registers specify the desired floating-point rounding mode for
src1
and
src2
the .L and .M units. They also contain fields to warn if
or denormalized numbers, and if the result overflows, underflows, is inexact,
infinite, or invalid. There are also fields to warn if a divide by 0 was performed,
or if a compare was attempted with a NaN source. Table 2–7 shows the additional registers used by the ’C67x. The OVER, UNDER, INEX, INV AL, DENn,
NANn, INFO, UNORD and DIV0 bits within these registers will not be modified
by a conditional instruction whose condition is false.
Table 2–7. Control Register File Extensions
Register
AbbreviationNameDescriptionPage
are NaN
FADCRFloating-point adder configura-
tion register
FAUCRFloating-point auxiliary configu-
ration register
FMCRFloating-point multiplier config-
uration register
Specifies underflow mode, rounding mode, NaNs,
and other exceptions for the .L unit.
Specifies underflow mode, rounding mode, NaNs,
and other exceptions for the .S unit.
Specifies underflow mode, rounding mode, NaNs,
and other exceptions for the .M unit.
2-14
2-16
2-18
CPU Data Paths and Control
2-13
TMS320C67x Extensions to the Control Register File
The floating-point configuration register (FADCR) contains fields that specify
underflow or overflow, the rounding mode, NaNs, denormalized numbers, and
inexact results for instructions that use the .L functional units. FADCR has a
set of fields specific to each of the .L units, .L1 and .L2. Figure 2–7 shows the
layout of FADCR. The functions of the fields in the FADCR are shown in
Table 2–8.
WWriteable by the MVC instruction
+0Value is zero after reset
242322
UNDER
UNDER
INEX OVERINVAL
87 6
INEX OVERINVAL
2120
INFO
R, W, +0
54
INFO
R, W, +0
19
DEN2
3
DEN2
18
16
17
NAN1
NAN2DEN1
2
NAN2DEN1
0
1
NAN1
2-14
TMS320C67x Extensions to the Control Register File
Table 2–8. Floating-Point Adder Configuration Register Field Descriptions
Bit PositionWidthField NameFunction
31–275Reserved
26–252Rmode .L2Value 00: Round toward nearest representable floating-point number
V alue 01: Round toward 0 (truncate)
V alue 10: Round toward infinity (round up)
V alue 11: Round toward negative infinity (round down)
241UNDER .L2Set to 1 when result underflows
231INEX .L2Set to 1 when result differs from what would have been computed had
the exponent range and precision been unbounded; never set with
INVAL
221OVER .L2Set to 1 when result overflows
211INFO .L2Set to 1 when result is signed infinity
201INVAL .L2Set to 1 when a signed NaN (SNaN) is a source, NaN is a source in
a floating-point to integer conversion, or when infinity is subtracted
from infinity
191DEN2 .L2
181DEN1 .L2
171NAN2 .L2
161NAN1 .L2
15–115Reserved
10–92Rmode .L1Value 00: Round toward nearest even representable floating-point
81UNDER .L1Set to 1 when result underflows
71INEX .L1Set to 1 when result differs from what would have been computed had
61OVER .L1Set to 1 when result overflows
51INFO .L1Set to 1 when result is signed infinity
41INVAL .L1Set to 1 when a signed NaN is a source, NaN is a source in a floating-
31DEN2 .L1
21DEN1 .L1
11NAN2 .L1
01NAN1 .L1
src2
is a denormalized number
src1
is a denormalized number
src2
is NaN
src1
is NaN
number
V alue 01: Round toward 0 (truncate)
V alue 10: Round toward infinity (round up)
V alue 11: Round toward negative infinity (round down)
the exponent range and precision been unbounded; never set with
INVAL
point to integer conversion, or when infinity is subtracted from infinity
src2
is a denormalized number
src1
is a denormalized number
src2
is NaN
src1
is NaN
CPU Data Paths and Control
2-15
TMS320C67x Extensions to the Control Register File
The floating-point auxiliary register (FAUCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and
inexact results for instructions that use the .S functional units. FAUCR has a
set of fields specific to each of the .S units, .S1 and .S2. Figure 2–8 shows the
layout of FAUCR. The functions of the fields in the FAUCR are shown in
Table 2–9.
WWriteable by the MVC instruction
+0Value is zero after reset
242322
UND
INEX OVERINVAL
87 6
UND
INEX OVERINVAL
19
DEN2
3
DEN2
18
16
17
NAN2DEN1UNORD
NAN1
2
NAN2DEN1
0
1
NAN1
2-16
TMS320C67x Extensions to the Control Register File
Table 2–9. Floating-Point Auxiliary Configuration Register Field Descriptions
Bit Position Width Field Name Function
31–275Reserved
261DIV0 .S2Set to 1 when 0 is source to reciprocal operation
251UNORD .S2 Set to 1 when NaN is a source to a compare operation
241UNDER .S2 Set to 1 when result underflows
231INEX .S2Set to 1 when result differs from what would have been computed had the
exponent range and precision been unbounded; never set with INVAL
221OVER .S2Set to 1 when result overflows
211INFO .S2Set to 1 when result is signed infinity
201INVAL .S2Set to 1 when a signed NaN (SNaN) is a source, NaN is a source in a float-
ing-point to integer conversion, or when infinity is subtracted from infinity
191DEN2 .S2
181DEN1 .S2
171NAN2 .S2
161NAN1 .S2
15–115Reserved
101DIV0 .S1Set to 1 when 0 is source to reciprocal operation
91UNORD .S1 Set to 1 when NaN is a source to a compare operation
81UNDER .S1 Set to 1 when result underflows
71INEX .S1Set to 1 when result differs from what would have been computed had the
61OVER .S1Set to 1 when result overflows
51INFO .S1Set to 1 when result is signed infinity
41INVAL .S1Set to 1 when SNaN is a source, NaN is a source in a floating-point to
31DEN2 .S1
21DEN1 .S1
src2
is a denormalized number
src1
is a denormalized number
src2
is NaN
src1
is NaN
exponent range and precision been unbounded; never set with INVAL
integer conversion, or when infinity is subtracted from infinity
src2
is a denormalized number
src1
is a denormalized number
11NAN2 .S1
01NAN1 .S1
src2
src1
is a NaN
is a NaN
CPU Data Paths and Control
2-17
TMS320C67x Extensions to the Control Register File
The floating-point multiplier configuration register (FMCR) contains fields that
specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .M functional units. FMCR
has a set of fields specific to each of the .M units, .M1 and .M2. Figure 2–9
shows the layout of FMCR. The functions of the fields in the FMCR are shown
in Table 2–10.
number
V alue 01: Round toward 0 (truncate)
V alue 10: Round toward infinity (round up)
V alue 11: Round toward negative infinity (round down)
241UNDER .M2Set to 1 when result underflows
231INEX .M2Set to 1 when result differs from what would have been com-
puted had the exponent range and precision been unbounded;
never set with INVAL
221OVER .M2Set to 1 when result overflows
211INFO .M2Set to 1 when result is signed infinity
201INVAL .M2Set to 1 when SNaN is a source, NaN is a source in a floating-
point to integer conversion, or when infinity is subtracted from
81UNDER .M1Set to 1 when result underflows
71INEX .M1Set to 1 when result differs from what would have been com-
61OVER .M1Set to 1 when result overflows
51INFO .M1Set to 1 when result is signed infinity
41INVAL .M1Set to 1 when SNaN is a source, NaN is a source in a floating-
31DEN2 .M1
21DEN1 .M1
11NAN2 .M1
01NAN1 .M1
src2
is a denormalized number
src1
is a denormalized number
src2
is NaN
src1
is NaN
number
V alue 01: Round toward 0 (truncate)
V alue 10: Round toward infinity (round up)
V alue 11: Round toward negative infinity (round down)
puted had the exponent range and precision been unbounded;
never set with INVAL
point to integer conversion, or when infinity is subtracted from
infinity
src2
is a denormalized number
src1
is a denormalized number
src2
is NaN
src1
is NaN
CPU Data Paths and Control
2-19
Chapter 3
TMS320C62x/C67x Fixed-Point Instruction Set
The ’C62x and the ’C67x share an instruction set. All of the instructions valid
for the ’C62x are also valid for the ’C67x. However, because the ’C67x is a
floating-point device, there are some instructions that are unique to it and do
not execute on the fixed-point device. This chapter describes the assembly
language instructions that are common to both the ’C62x and ’C67x digital signal processors. Also described are parallel operations, conditional operations,
resource constraints, and addressing modes.
Instructions unique to the ’C67x (floating-point addition, subtraction, multiplication, and others) are described in Chapter 4.
T able 3–1 explains the symbols used in the fixed-point instruction descriptions.
Table 3–1. Fixed-Point Instruction Operation and Execution Notations
SymbolMeaning
abs(x)Absolute value of x
andBitwise AND
–aPerform 2s-complement subtraction using the addressing mode de-
fined by the AMR
+aPerform 2s-complement addition using the addressing mode defined
by the AMR
b
y..z
condCheck for either
creg
cstn
int32-bit integer value
lmb0(x)Leftmost 0 bit search of x
lmb1(x)Leftmost 1 bit search of
long40-bit integer value
lsbn or LSBnn least significant bits (for example, lsb16)
msbn or MSBn n most significant bits (for example, msb16)
nopNo operation
norm(x)Leftmost nonredundant sign bit of x
notBitwise logical complement
or
op
RAny general-purpose register
scstnn-bit signed constant field
Selection of bits y through z of bit string b
creg
equal to 0 or
3-bit field specifying a conditional register
n-bit constant field (for example, cst5)
x
Bitwise OR
Opfields
creg
not equal to 0
3-2
sintSigned 32-bit integer value
slongSigned 40-bit integer value
slsb16Signed 16 LSB of register
smsb16
Signed 16 MSB of register
Instruction Operation and Execution Notations
Table 3–1. Fixed-Point Instruction Operation and Execution Notations (Continued)
SymbolMeaning
–sPerform 2s-complement subtraction and saturate the result to the re-
sult size if an overflow occurs
+sPerform 2s-complement addition and saturate the result to the result
size if an overflow occurs
ucstnn-bit unsigned constant field (for example, ucst5)
uintUnsigned 32-bit integer value
ulongUnsigned 40-bit integer value
ulsb16Unsigned 16 LSB of register
umsb16Unsigned 16 MSB of register
x
clear
x
ext
x
extu
b,e
l,r
l,r
Clear a field in x, specified by b (beginning bit) and e (ending bit)
Extract and sign-extend a field in x, specified by l (shift left value) and
r (shift right value)
Extract an unsigned field in x, specified by l (shift left value) and r (shift
right value)
x
set
b,e
xorBitwise exclusive OR
xsintSigned 32-bit integer value that can optionally use cross path
xslsb16Signed 16 LSB of register that can optionally use cross path
xsmsb16Signed 16 MSB of register that can optionally use cross path
xuintUnsigned 32-bit integer value that can optionally use cross path
xulsb16Unsigned 16 LSB of register that can optionally use cross path
xumsb16Unsigned 16 MSB of register that can optionally use cross path
→Assignment
+Addition
×Multiplication
–Subtraction
<<Shift left
>>sShift right with sign extension
>>z
Set field in x to all 1s, specified by b (beginning bit) and e (ending bit)
Shift right with a zero fill
TMS320C62x/C67x Fixed-Point Instruction Set
3-3
Mapping Between Instructions and Functional Units
3.2Mapping Between Instructions and Functional Units
Table 3–2 shows the mapping between instructions and functional units and
Table 3–3 shows the mapping between functional units and instructions.
T able 3–4 and the instruction descriptions in this chapter explain the field syntaxes and values. The ’C62x and ’C67x opcodes are mapped in Figure 3–1.
Table 3–4. TMS320C62x/C67x Opcode Map Symbol Definitions
SymbolMeaning
baseR
base address register
TMS320C62x/C67x Opcode Map
creg
cst
csta
cstb
dst
h
ld/st
mode
offsetR
op
p
r
rsv
s
src2
src1
3-bit field specifying a conditional register
constant
constant a
constant b
destination
MVK or MVKH bit
load/store opfield
addressing mode
register offset
opfield, field within opcode that specifies a unique instruction
parallel execution
LDDW bit
reserved
select side A or B for destination
source 2
source 1
The execution of fixed-point instructions can be defined in terms of delay slots.
The number of delay slots is equivalent to the number of cycles required after
the source operands are read for the result to be available for reading. For a
single-cycle type instruction (such as ADD), source operands read in cycle
produce a result that can be read in cycle i + 1. For a multiply instruction (MPY),
source operands read in cycle i produce a result that can be read in cycle
Table 3–5 shows the number of delay slots associated with each type of instruction.
Delay slots are equivalent to an execution or result latency . All of the instructions that are common to the ’C62x and ’C67x have a functional unit latency
of 1. This means that a new instruction can be started on the functional unit
each cycle. Single-cycle throughput is another term for single-cycle functional
unit latency.
Table 3–5. Delay Slot and Functional Unit Latency Summary
i +
i
2.
Delay
Instruction Type
БББББББББ
NOP (no operation)
Store
Single cycle
БББББББББ
Multiply (16 16)
Load
Branch
†
Cycle i is in the E1 pipeline phase.
‡
The branch to label, branch to IRP, and branch to NRP instructions instruction does not read any registers.
§
The write on cycle i + 4 uses a separate write port from other .D unit instructions.
Slots
0
ÁÁÁ
0
0
1
ÁÁÁ
4
5
Functional
Unit Latency
1
ÁÁ
1
1
1
ÁÁ
1
1
Read
Cycles
ÁÁÁÁÁÁÁÁÁÁÁ
i
i
i
ÁÁÁ
i
‡
i
†
Write
Cycles
i
i
i + 1
ÁÁ
i, i + 4
†
§
Branch
†
Taken
ÁÁÁÁ
i + 5
3-12
3.5Parallel Operations
Parallel Operations
Instructions are always fetched eight at a time. This constitutes a
The basic format of a fetch packet is shown in Figure 3–2. Fetch packets are
aligned on 256-bit (8-word) boundaries.
Figure 3–2. Basic Format of a Fetch Packet
310 310 310 310 310 310 310 310
pppppppp
LSBs of
the byte
address
Instruction
A
00000
2
Instruction
B
00100
2
Instruction
C
01000
2
Instruction
D
01100
The execution of the individual instructions is partially controlled by a bit in
each instruction, the p-bit. The p-bit (bit 0) determines whether the instruction
executes in parallel with another instruction. The
to right (lower to higher address). If the p-bit of instruction i is 1, then instruction
i
+ 1 is to be executed in parallel with (in the the same cycle as) instruction i.
If the p-bit of instruction i is 0, then instruction i + 1 is executed in the cycle after
instruction i. All instructions executing in parallel constitute an
An execute packet can contain up to eight instructions. Each instruction in an
execute packet must use a different functional unit.
2
Instruction
E
10000
2
Instruction
F
10100
2
fetch packet
Instruction
G
11000
p
-bits are scanned from left
2
Instruction
H
11100
2
execute packet
.
.
An execute packet cannot cross an 8-word boundary . Therefore, the last p-bit
in a fetch packet is always set to 0, and each fetch packet starts a new execute
p
packet. There are three types of
p
-bit patterns result in the following execution sequences for the eight instruc-
-bit patterns for fetch packets. These three
tions:
- Fully serial
- Fully parallel
- Partially serial
Example 3–1 through Example 3–3 illustrate the conversion of a p-bit sequence into a cycle-by-cycle execution stream of instructions.
TMS320C62x/C67x Fixed-Point Instruction Set
3-13
Parallel Operations
Example 3–1. Fully Serial p-Bit Pattern in a Fetch Packet
Note:Instructions C, D, and E do not use any of the same functional units, cross paths, or
other data path resources. This is also true for instructions F, G, and H.
CDE
FGH
Instructions
3.5.1Example Parallel Code
The || characters signify that an instruction is to execute in parallel with the previous instruction. The code for the fetch packet in Example 3–3 would be represented as this:
instruction A
H
instruction B
instruction C
|| instruction D
|| instruction E
instruction F
|| instruction G
|| instruction H
3.5.2Branching Into the Middle of an Execute Packet
If a branch into the middle of an execute packet occurs, all instructions at lower
addresses are ignored. In Example 3–3, if a branch to the address containing
instruction D occurs, then only D and E execute. Even though instruction C is
in the same execute packet, it is ignored. Instructions A and B are also ignored
because they are in earlier execute packets. If your result depends on executing A,B, or C, the branch to the middle of the execute packet will produce an
erroneous result.
TMS320C62x/C67x Fixed-Point Instruction Set
3-15
Conditional Operations
3.6Conditional Operations
All instructions can be conditional. The condition is controlled by a 3-bit opcode
field (
creg
) that specifies the condition register tested, and a 1-bit field (z) that
specifies a test for zero or nonzero. The four MSBs of every opcode are
and z. The specified condition register is tested at the beginning of the E1 pipeline stage for all instructions. For more information on the pipeline, see Chap-
TMS320C62x Pipeline
ter 5,
the test is for equality with zero. If z = 0, the test is for nonzero. The case of
creg
= 0 and z = 0 is treated as always true to allow instructions to be executed
unconditionally . The
creg
in Table 3–6.
Table 3–6. Registers That Can Be Tested by Conditional Operations
, and Chapter 6,
TMS320C67x Pipeline
field is encoded in the instruction opcode as shown
Conditional instructions are represented in code by using square brackets, [ ],
surrounding the condition register name. The following execute packet contains two ADD instructions in parallel. The first ADD is conditional on B0 being
nonzero. The second ADD is conditional on B0 being zero. The character ! in-
dicates the inverse of the condition.
[B0]ADD.L1A1,A2,A3
|| [!B0] ADD.L2B1,B2,B3
3-16
The above instructions are mutually exclusive. This means that only one will
execute. If they are scheduled in parallel, mutually exclusive instructions are
constrained as described in section 3.7. If mutually exclusive instructions
share any resources as described in section 3.7, they cannot be scheduled in
parallel (put in the same execute packet), even though only one will execute.
3.7Resource Constraints
No two instructions within the same execute packet can use the same
resources. Also, no two instructions can write to the same register during the
same cycle. The following sections describe how an instruction can use each
of the resources.
3.7.1Constraints on Instructions Using the Same Functional Unit
Two instructions using the same functional unit cannot be issued in the same
execute packet.
The following execute packet is invalid:
ADD .S1A0, A1, A2 ; \ .S1 is used for
||SHR .S1A3, 15, A4 ; / both instructions
The following execute packet is valid:
ADD .L1A0, A1, A2 ; \ Two different functional
||SHR .S1A3, 15, A4 ; / units are used
Resource Constraints
3.7.2Constraints on Cross Paths (1X and 2X)
One unit (either a .S, .L, or .M unit) per data path, per execute packet, can read
a source operand from its opposite register file via the cross paths (1X and 2X).
For example, .S1 can read both of an instruction’s operands from the A register
file, or it can read one operand from the B register file using the 1X cross path
and the other from the A register file. This is denoted by an X following the unit
name in the instruction syntax.
Two instructions using the same cross path between register files cannot be
issued in the same execute packet, because there is only one path from A to
B and one path from B to A.
The following execute packet is invalid:
ADD.L1X A0,B1,A1 ; \ 1X cross path is used
|| MPY.M1X A4,B4,A5 ; / for both instructions
The following execute packet is valid:
ADD.L1X A0,B1,A1 ; \ Instructions use the 1X and
|| MPY.M2X B4,A4,B2 ; / 2X cross paths
The operand will come from a register file opposite of the destination if the x
bit in the instruction field is set (shown in the opcode map located in Figure 3–1
on page 3-10).
TMS320C62x/C67x Fixed-Point Instruction Set
3-17
Resource Constraints
3.7.3Constraints on Loads and Stores
Load/store instructions can use an address pointer from one register file while
loading to or storing from the other register file. Two load/store instructions using a destination/source from the same register file cannot be issued in the
same execute packet. The address register must be on the same side as the
.D unit used.
The following execute packet is invalid:
LDW.D1 *A0,A1 ; \ .D2 unit must use the address
|| LDW.D2 *A2,B2 ; / register from the B register file
The following execute packet is valid:
LDW.D1*A0,A1 ; \ Address registers from correct
|| LDW.D2*B0,B2 ; / register files
Two loads and/or stores loading to and/or storing from the same register file
cannot be issued in the same execute packet.
The following execute packet is invalid:
LDW.D1 *A4,A5 ; \ Loading to and storing from the
|| STW.D2 A6,*B4 ; / same register file
The following execute packets are valid:
LDW.D1 *A4,B5 ; \ Loading to, and storing from
|| STW.D2 A6,*B4 ; / different register files
LDW.D1 *A0,B2 ; \ Loading to
|| LDW.D2 *B0,A1 ; / different register files
3.7.4Constraints on Long (40-Bit) Data
Because the .S and .L units share a read register port for long source operands
and a write register port for long results, only one long result may be issued
per register file in an execute packet. All instructions with a long result on the
.S and .L units have zero delay slots. See section 2.1 on page 2-4 for the order
for long pairs.
The following execute packet is invalid:
ADD.L1 A5:A4,A1,A3:A2 ; \ Two long writes
|| SHL.S1 A8,A9,A7:A6 ; / on A register file
3-18
The following execute packet is valid:
ADD.L1 A5:A4,A1,A3:A2 ; \ One long write for
|| SHL.S2 B8,B9,B7:B6 ; / each register file
Because the .L and .S units share their long read port with the store port, operations that read a long value cannot be issued on the .L and/or .S units in
the same execute packet as a store.
The following execute packet is invalid:
ADD .L1 A5:A4,A1,A3:A2; \ Long read operation and a
|| STW .D1 A8,*A9 ; / store
The following execute packet is valid:
ADD.L1A4, A1, A3:A2; \ No long read with
|| STW.D1 A8,*A9; / with the store
3.7.5Constraints on Register Reads
More than four reads of the same register cannot occur on the same cycle.
Conditional registers are not included in this count.
Resource Constraints
The following code sequences are invalid:
MPY.M1 A1,A1,A4 ; five reads of register A1
|| ADD.L1 A1,A1,A5
|| SUB.D1 A1,A2,A3
MPY.M1 A1,A1,A4 ; five reads of register A1
|| ADD.L1 A1,A1,A5
|| SUB.D2x A1,B2,B3
This code sequence is valid:
MPY .M1 A1,A1,A4 ; only four reads of A1
|| [A1] ADD .L1 A0,A1,A5
|| SUB .D1 A1,A2,A3
3.7.6Constraints on Register Writes
Two instructions cannot write to the same register on the same cycle. Two instructions with the same destination can be scheduled in parallel as long as
they do not write to the destination register on the same cycle. For example,
a MPY issued on cycle i followed by an ADD on cycle i + 1 cannot write to the
same register because both instructions write a result on cycle
the following code sequence is invalid unless a branch occurs after the MPY,
causing the ADD not to be issued.
MPY.M1 A0,A1,A2
ADD .L1 A4,A5,A2
i
+ 1. Therefore,
TMS320C62x/C67x Fixed-Point Instruction Set
3-19
Resource Constraints
However, this code sequence is valid:
MPY.M1 A0,A1,A2
|| ADD .L1 A4,A5,A2
Figure 3–3 shows different multiple-write conflicts. For example, ADD and
SUB in execute packet L1 write to the same register. This conflict is easily de-
tectable.
MPY in packet L2 and ADD in packet L3 might both write to B2 simultaneously;
however, if a branch instruction causes the execute packet after L2 to be
something other than L3, a conflict would not occur. Thus, the potential conflict
in L2 and L3 might not be detected by the assembler. The instructions in L4
do not constitute a write conflict because they are mutually exclusive. In contrast, because the instructions in L5 may or may not be mutually exclusive, the
assembler cannot determine a conflict. If the pipeline does receive commands
to perform multiple writes to the same register, the result is undefined.
Figure 3–3. Examples of the Detectability of Write Conflicts by the Assembler
The addressing modes on the ’C62x and ’C67x are linear, circular using BK0,
and circular using BK1. The mode is specified by the addressing mode register, or AMR (defined in Chapter 2).
All registers can perform linear addressing. Only eight registers can perform
circular addressing: A4–A7 are used by the .D1 unit and B4–B7 are used by
the .D2 unit. No other units can perform circular addressing.
LDB(U)/LDH(U)/LDW, STB/STH/STW, ADDAB/ADDAH/ADDAW/ADDAD,
and SUBAB/SUBAH/SUBAW instructions all use the AMR to determine what
type of address calculations are performed for these registers.
3.8.1Linear Addressing Mode
3.8.1.1LD/ST Instructions
Addressing Modes
For load and store instructions, linear mode simply shifts the
and to the left by 2, 1, or 0 for word, halfword, or byte access, respectively , and
then performs an add or a subtract to
cified).
3.8.1.2ADDA/SUBA Instructions
For integer addition and subtraction instructions, linear mode simply shifts the
src1/cst
respectively, and then performs the add or subtract specified.
operand to the left by 2, 1, or 0 for word, halfword, or byte data sizes,
3.8.2Circular Addressing Mode
The BK0 and BK1 fields in the AMR specify block sizes for circular addressing.
See section 2.6.1, on page 2-9, for more information on the AMR.
3.8.2.1LD/ST Instructions
After shifting
respectively, an add or subtract is performed with the carry/borrow inhibited
between bits N and N + 1. Bits N + 1 to 31 of
other carries/borrows propagate as usual. If you specify an
than the circular buffer size, 2
cular buffer size (see Example 3–4). The circular buffer size in the AMR is not
scaled; for example, a block size of 4 is 4 bytes, not 4 data size (byte, half-
word, word). So, to perform circular addressing on an array of 8 words, a size
of 32 should be specified, or N = 4. Example 3–4 shows a LDW performed with
register A4 in circular mode and BK0 = 4, so the buffer size is 32 bytes, 16 halfwords, or 8 words. The value put in the AMR for this example is 0004 0001h.
offsetR/cst
offsetR/cst
baseR
to the left by 2, 1, or 0 for LDW, LDH(U) , or LDB(U),
(depending on the operation spe-
baseR
remain unchanged. All
offsetR/cst
(N + 1)
, the effective
offsetR/cst
is modulo the cir-
oper-
greater
TMS320C62x/C67x Fixed-Point Instruction Set
3-21
Addressing Modes
Example 3–4. LDW in Circular Mode
LDW.D1*++A4[9],A1
Before LDW1 cycle after LDW5 cycles after LDW
A4
0000 0100h
A1 XXXX XXXXhA1 XXXX XXXXhA1 1234 5678h
mem 104h1234 5678hmem 104h1234 5678hmem104h1234 5678h
A4 0000 0104hA4 0000 0104h
Note:9h words is 24h bytes. 24h bytes is 4 bytes beyond the 32-byte (20h) boundary 100h–11Fh; thus, it is wrapped around to
(124h – 20h = 104h).
3.8.2.2ADDA/SUBA Instructions
After shifting
src1/cst
to the left by 2, 1, or 0 for ADDAW , ADDAH , or ADDAB,
respectively , an add or a subtract is performed with the carry/borrow inhibited
between bits N and N + 1. Bits N + 1 to 31 (inclusive) of
All other carries/borrows propagate as usual. If you specify
(N + 1)
the circular buffer size, 2
, the effective
offsetR/cst
src2
remain unchanged.
src1
greater than
is modulo the circular
buffer size (see Example 3–5). The circular buffer size in the AMR is not
scaled; for example, a block size of 4 is 4 bytes, not 4 data size (byte, half-
word, word). So, to perform circular addressing on an array of 8 words, a size
of 32 should be specified, or N = 4. Example 3–5 shows an ADDAH performed
with register A4 in circular mode and BK0 = 4, so the buffer size is 32 bytes,
16 halfwords, or 8 words. The value put in the AMR for this example is
0004 0001h.
Example 3–5. ADDAH in Circular Mode
ADDAH.D1A4,A1,A4
Before ADDAH1 cycle after ADDAH
A4
0000 0100h
A4 0000 0106h
3-22
A1 0000 0013hA1 0000 0013h
Note:13h halfwords is 26h bytes. 26h bytes is 6 bytes beyond the 32-byte (20h) boundary
100h–1 1Fh; thus, it is wrapped around to (126h – 20h = 106h).
3.8.3Syntax for Load/Store Address Generation
The ’C62x and ’C67x CPUs have a load/store architecture, which means that
the only way to access data in memory is with a load or store instruction.
Table 3–7 shows the syntax of an indirect address to a memory location.
Sometimes a large offset is required for a load/store. In this case you can use
the B14 or B15 register as the base register, and use a 15-bit constant (
as the offset.
Table 3–7. Indirect Address Generation for Load/Store
Addressing Modes
ucst15
)
Preincrement or
No Modification of
Addressing Type
Register indirect*R*++R
Register relative*+R[
Register relative with
15-bit constant offset
Base + index
Address Register
ucst5
]
*–R[
ucst5
]
*+B14/B15[
*+R[
*–R[
ucst15
offsetR
offsetR
]
]
]not supportednot supported
Predecrement of
Address Register
*– –R
ucst5
*++R[
*– –R[
ucst5
*++R[
offsetR
*– –R[
offsetR
Postincrement or
Postdecrement of
Address Register
*R++
*R– –
]
]
]
]
*R++[
*R– –[
*R++[
*R– –[
ucst5
ucst5
offsetR
offsetR
]
]
]
]
TMS320C62x/C67x Fixed-Point Instruction Set
3-23
Individual Instruction Descriptions
3.9Individual Instruction Descriptions
This section gives detailed information on the fixed-point instruction set for the
’C62x and ’C67x. Each instruction presents the following information:
- Assembler syntax
- Functional units
- Operands
- Opcode
- Description
- Execution
- Instruction type
- Delay slots
- Functional Unit Latency
- Examples
The ADD instruction is used as an example to familiarize you with the way
each instruction is described. The example describes the kind of information
you will find in each part of the individual instruction description and where to
obtain more information.
3-24
Example Instruction
EXAMPLE
SyntaxEXAMPLE (.unit)
.unit = .L1, .L2, .S1, .S2, .D1, .D2
src
and
dst
indicate source and destination, respectively . The (.unit) dictates
which functional unit the instruction is mapped to (.L1, .L2, .S1, .S2, .M1, .M2,
.D1, or .D2).
A table is provided for each instruction that gives the opcode map fields, units
the instruction is mapped to, types of operands, and the opcode.
The opcode map, repeated from the summary figure on page 3-10 shows the
various fields that make up each instruction. These fields are described in
Table 3–4 on page 3-9.
There are instructions that can be executed on more than one functional unit.
Table 3–8 shows how this situation is documented for the ADD instruction.
This instruction has three opcode map fields:
seventh row, the operands have the types
and
dst
, respectively . The ordering of these fields implies
where + represents the operation being performed by the ADD. This operation
can be done on .L1 or .L2 (both are specified in the unit column). The s in front
of each operand signifies that
signed values.
src, dst
src1 (scst5
cst5, long,
),
src2 (slong
src1, src2
and
cst5
), and
, and
long
for
+
long ³ long
dst (slong
dst
. In the
src1, src2
) are all
,
,
In the third row,
front of each operand signifies that all operands are unsigned. Any operand
that begins with x can be read from a register file that is different from the
destination register file. The operand comes from the register file opposite the
destination if the x bit in the instruction is set (shown in the opcode map).
src1, src2
, and
dst
are int, int, and long, respectively . The u in
TMS320C62x/C67x Fixed-Point Instruction Set
3-25
EXAMPLE
Example Instruction
Table 3–8. Relationships Between Operands, Operand Size, Signed/Unsigned, Functional
Units, and Opfields for Example Instruction (ADD)
Opcode map field used...For operand type...UnitOpfieldMnemonic
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
sint
xsint
sint
sint
xsint
slong
uint
xuint
ulong
xsint
slong
slong
xuint
ulong
ulong
scst5
xsint
sint
scst5
slong
slong
.L1,
0000011ADD
.L2
.L1,
0100011ADD
.L2
.L1,
0101011ADDU
.L2
.L1,
0100001ADD
.L2
.L1,
0101001ADDU
.L2
.L1,
0000010ADD
.L2
.L1,
0100000ADD
.L2
3-26
src1
src2
dst
src1
src2
dst
src2
src1
dst
src2
src1
dst
sint
xsint
sint
scst5
xsint
sint
sint
sint
sint
sint
ucst5
sint
.S1,
.S2
.S1,
.S2
.D1,
.D2
.D1,
.D2
000111ADD
000110ADD
010000ADD
010010ADD
Example Instruction
EXAMPLE
DescriptionInstruction execution and its effect on the rest of the processor or memory con-
tents are described. Any constraints on the operands imposed by the processor or the assembler are discussed. The description parallels and supplements the information given by the execution block.
Execution for .L1, .L2 and .S1, .S2 Opcodes
if (cond)
src1 + src2 → dst
elsenop
Execution for .D1, .D2 Opcodes
if (cond)
src2 + src1 → dst
elsenop
The execution describes the processing that takes place when the instruction
is executed. The symbols are defined in Table 3–1 on page 3-2.
PipelineThis section contains a table that shows the sources read from, the destina-
tions written to, and the functional unit used during each execution cycle of the
instruction.
Instruction TypeThis section gives the type of instruction. See section 5.2 on page 5-11 for in-
formation about the pipeline execution of this type of instruction.
Delay SlotsThis section gives the number of delay slots the instruction takes to execute
See section 3.4 on page 3-12 for an explanation of delay slots.
Functional Unit Latency
This section gives the number of cycles that the functional unit is in use during
the execution of the instruction.
ExampleExamples of instruction execution. If applicable, register and memory values
are given before and after instruction execution.
TMS320C62x/C67x Fixed-Point Instruction Set
3-27
ABS
Integer Absolute Value With Saturation
SyntaxABS (.unit)
src2, dst
.unit = .L1, .L2
Opcode map field used...For operand type...UnitOpfield
src2
dst
src2
dst
xsint
sint
slong
slong
.L1, .L20011010
.L1, L20111000
Opcode
3129 28 2723 2218 17
src2
) →
0 0 0 0 0
dst
cregzdst
35557
src2
DescriptionThe absolute value of
Executionif (cond)abs(
src2
131211543210
x
is placed in
dst
op
.
elsenop
src2
when
src2
The absolute value of
is an sint is determined as follows:
110
sp
1) If
src2
2) If
src2
3) If
src2
The absolute value of
1) If
src2
2) If
src2
3) If
src2
Pipeline
Pipeline
Stage
Read
Written
Unit in use.L
Instruction TypeSingle-cycle
Delay Slots0
3-28
w 0, then
t 0 and
src2 → dst
src2
–231, then –
= –231, then 231 – 1 →
src2
when
w 0, then
t 0 and
= –239, then 2
src2 → dst
src2
–239, then –
E1
src2
dst
39
– 1 →
src2 → dst
dst
src2
is an slong is determined as follows:
src2 → dst
dst
Example 1ABS .L1A1,A5
Before instruction1 cycle after instruction
A1
8000 4E3Dh
A5 XXXX XXXXhA5 7FFF B1C3h2147463619
Example 2ABS .L1A1,A5
Before instruction1 cycle after instruction
A1
3FF6 0010h
A5 XXXX XXXXhA5 3FF6 0010h1073086480
Integer Absolute V alue W ith Saturation
–2147463619A1 8000 4E3Dh–2147463619
1073086480A1 3FF6 0010h1073086480
ABS
TMS320C62x/C67x Fixed-Point Instruction Set
3-29
ADD(U)
Signed or Unsigned Integer Addition Without Saturation
SyntaxADD (.unit)
or
ADDU (.L1 or .L2)
or
ADD (.D1 or .D2)
.unit = .L1, .L2, .S1, .S2
Opcode map field used...For operand type...UnitOpfield
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1, src2, dst
src1, src2, dst
src2, src1, dst
sint
xsint
sint
sint
xsint
slong
uint
xuint
ulong
xsint
slong
slong
xuint
ulong
ulong
.L1, .L20000011
.L1, .L20100011
.L1, .L20101011
.L1, .L20100001
.L1, .L20101001
3-30
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
src2
src1
dst
src2
src1
dst
scst5
xsint
sint
scst5
slong
slong
sint
xsint
sint
scst5
xsint
sint
sint
sint
sint
sint
ucst5
sint
.L1, .L20000010
.L1, .L20100000
.S1, .S20001 11
.S1, .S2000110
.D1, .D2010000
.D1, .D2010010
Opcode.L unit
Signed or Unsigned Integer Addition Without Saturation
ADD(U)
3129 28 2723 2218 17
cregzdst
35557
src2
src1/cst
131211543210
x
op
Opcode.S unit
3129 28 2723 2218 17
cregzdst
35556
src2
src1/cst
11
1312543210
x
op
Description for .L1, .L2 and .S1, .S2 Opcodes
src2
is added to
src1
. The result is placed in
dst
.
Execution for .L1, .L2 and .S1, .S2 Opcodes
if (cond)
src1 + src2
→
dst
elsenop
Opcode.D unit
3129 28 2723 2218 17
cregzdst
src2
src1/cst
1312543210
op
76
6
1
10
110
000
000
sp
sp
sp
3555 6
Description for .D1, .D2 Opcodes
src1
is added to
src2
. The result is placed in
dst
.
Execution for .D1, .D2 Opcodes
if (cond)
src2 + src1
→
dst
elsenop
Pipeline
Pipeline
Stage
Read
Written
Unit in use.L, .S, or .D
E1
src1, src2
dst
Instruction TypeSingle-cycle
Delay Slots0
TMS320C62x/C67x Fixed-Point Instruction Set
3-31
ADD(U)
Signed or Unsigned Integer Addition Without Saturation
Signed or Unsigned Integer Addition Without Saturation
Example 6ADD .D126,A1,A6
Before instruction1 cycle after instruction
A1
0000 325Ah
A6 XXXX XXXXhA6 0000 3274h12916
ADD(U)
12890A1 0000 325Ah
TMS320C62x/C67x Fixed-Point Instruction Set
3-33
ADDAB/ADDAH/ADDAW
Integer Addition Using Addressing Mode
SyntaxADDAB (.unit)
src2, src1, dst
or
ADDAH (.unit)
src2, src1, dst
or
ADDAW (.unit)
src2, src1, dst
.unit = .D1 or .D2
Opcode map field used... For operand type...UnitOpfield
src2
src1
dst
src2
src1
dst
sint
sint
sint
sint
ucst
sint
.D1, .D2byte: 110000
.D1, .D2byte: 110010
5
Opcode
3129 28 2723 2218 17
2src1/cst
cregzdst
3555 6
src
1312543210
op
76
10
halfword: 1 10100
word: 1 11000
halfword: 1 10110
word: 1 11010
000
sp
Description
src1
is added to
tion defaults to linear mode. However, if
mode can be changed to circular mode by writing the appropriate value to the
AMR (see section 2.6.1).
sizes respectively. Byte, halfword, and word mnemonics are ADDAB,
ADDAH, and ADDAW, respectively. The result is placed in
Executionif (cond)
elsenop
Pipeline
Pipeline
stage
Read
Written
Unit in use.D
Instruction TypeSingle-cycle
Delay Slots0
3-34
src2
src2
src1, src2
using the addressing mode specified for
src2
is one of A4–A7 or B4–B7, the
src1
is left shifted by 1 or 2 for halfword and word data
+a
src1
→
dst
E1
dst
dst
src2
.
. The addi-
Integer Addition Using Addressing Mode
Example 1ADDAB .D1A4,A2,A4
Before instruction1 cycle after instruction
A2
0000 000Bh
A4 0000 0100hA4 0000 0103h
AMR 0002 0001hAMR 0002 0001h
BK0 = 2 → size = 8
A4 in circular addressing mode using BK0
Example 2ADDAH .D1A4,A2,A4
Before instruction1 cycle after instruction
A2
0000 000Bh
A4 0000 0100hA4 0000 0106h
AMR 0002 0001hAMR 0002 0001h
BK0 = 2 → size = 8
A4 in circular addressing mode using BK0
ADDAB/ADDAH/ADDA W
A2 0000 000Bh
A2 0000 000Bh
Example 3ADDAW .D1A4,2,A4
Before instruction1 cycle after instruction
A4
0002 0000h
AMR 0002 0001hAMR 0002 0001h
BK0 = 2 → size = 8
A4 in circular addressing mode using BK0
A4 0002 0000h
TMS320C62x/C67x Fixed-Point Instruction Set
3-35
ADDK
Integer Addition Using Signed 16-Bit Constant
SyntaxADDK (.unit)
cst, dst
.unit = .S1 or .S2
Opcode map field used...For operand type...Unit
cst
dst
scst16
uint
.S1, .S2
Opcode
31
29 28 2723 227
→
cst
165
dst
register specified. The result is
dst
creg
z
1
311
dst
DescriptionA 16-bit signed constant is added to the
Two 16-Bit Integer Adds on Upper and Lower Register Halves
ADD2
SyntaxADD2 (.unit)
src1, src2, dst
.unit = .S1 or .S2
Opcode map field used...For operand type...Unit
src1
src2
dst
sint
xsint
sint
Opcode
3129 28 2723 2218 17
cregzdst
35556
src2
src1
DescriptionThe upper and lower halves of the
src2
lower halves of the
operand. Any carry from the lower half add does not
11
1312543210
x
src1
0 0 0 0 0 10 0 0
operand are added to the upper and
affect the upper half add.
Executionif (cond){
((lsb16(
((msb16(
src1
src1
) + lsb16(
) + msb16(
src2
)) and FFFFh) or
src2
)) << 16) →
}
elsenop
6
dst
.S1, .S2
1
sp
Pipeline
Pipeline
Stage
Read
Written
Unit in use.S
Instruction TypeSingle-cycle
Delay Slots0
Example
ADD2 .S1XA1,B1,A2
Before instruction1 cycle after instruction
A1
0021 37E1h
A2 XXXX XXXXhA2 03BB 1C99h955 7321
B1 039A E4B8h922 58552B1 039A E4B8h
E1
src1, src2
dst
33 14305A1 0021 37E1h
TMS320C62x/C67x Fixed-Point Instruction Set
3-37
AND
Bitwise AND
SyntaxAND (.unit)
src1, src2, dst
.unit = .L1 or .L2, .S1 or .S2
Opcode map field used...For operand type...UnitOpfield
src1
src2
dst
src1
src2
dst
src1
src2
dst
src1
src2
dst
Opcode
.L unit form:
3129 28 2723 2218 17
cregzdst
src2
uint
xuint
uint
scst5
xuint
uint
uint
xuint
uint
scst5
xuint
uint
src1/cst
.L1, .L21111011
.L1, .L21111010
.S1, .S2011111
.S1, .S2011110
131211543210
x
op
110
sp
35557
.S unit form:
3129 28 2723 2218 17
cregzdst
35556
src2
src1/cst
DescriptionA bitwise AND is performed between
scst
The
Executionif (cond)
5 operands are sign extended to 32 bits.
src1
and
src2
→
dst
11
1312543210
x
src1
and
op
src2
. The result is placed in
elsenop
3-38
6
000
1
sp
dst
.
Delay Slots0
Bitwise AND
AND
Pipeline
Pipeline
Stage
Read
Written
Unit in use.L or .S
E1
src1, src2
dst
Instruction TypeSingle-cycle
Example 1AND .L1XA1,B1,A2
Before instruction1 cycle after instruction
A1
F7A1 302Ah
A2 XXXX XXXXhA2 02A0 2020h
B1 02B6 E724hB1 02B6 E724h
Example 2AND .L115,A1,A3
Before instruction1 cycle after instruction
A1
32E4 6936h
A1 F7A1 302Ah
A1 32E4 6936h
A3 XXXX XXXXhA3 0000 0006h
TMS320C62x/C67x Fixed-Point Instruction Set
3-39
B
Branch Using a Displacement
SyntaxB (.unit) label
.unit = .S1 or .S2
Opcode map field used...For operand type...Unit
cstscst21
.S1, .S2
Opcode
31
29 28 277
creg
z
1
311
DescriptionA 21-bit signed constant specified by
cst
21
cst
is shifted left by 2 bits and is added
60
00100
to the address of the first instruction of the fetch packet that contains the
branch instruction. The result is placed in the program fetch counter (PFC).
The assembler/linker automatically computes the correct value for
following formula:
cst
= (label – PCE1) >> 2
If two branches are in the same execute packet and both are taken, behavior
is undefined.
Two conditional branches can be in the same execute packet if one branch
uses a displacement and the other uses a register, IRP, or NRP. As long as only
one branch has a true condition, the code executes in a well-defined way.
cst
sp
by the
Executionif (cond)
elsenop
Notes:
1) PCE1 (program counter) represents the address of the first instruction
in the fetch packet in the E1 stage of the pipeline. PFC is the program
fetch counter.
2) The execute packets in the delay slots of a branch cannot be interrupted.
This is true regardless of whether the branch is taken.
3) See section 3.5.2 on page 3-15 for information on branching into the
middle of an execute packet.
3-40
cst
<< 2 + PCE1 → PFC
Branch Using a Displacement
B
Pipeline
Pipeline
Stage
Read
Written
Branch
T aken
Unit in use
Instruction TypeBranch
Delay Slots5
T able 3–9 gives the program counter values and actions for the following code
example.
If two branches are in the same execute packet and are both taken, behavior
is undefined.
Two conditional branches can be in the same execute packet if one branch
uses a displacement and the other uses a register, IRP, or NRP. As long as
onlly one branch has a true condition, the code executes in a well-defined way .
src2
Executionif (cond)
→ PFC
elsenop
6
1
sp
Notes:
1) This instruction executes on .S2 only. PFC is program fetch counter .
2) The execute packets in the delay slots of a branch cannot be interrupted.
This is true regardless of whether the branch is taken.
Pipeline
Pipeline
Stage
Read
Written
Branch
T aken
Unit in use
Instruction TypeBranch
Delay Slots5
3-42
E1PS
src2
.S2
T arget Instruction
PWPRDPDCE1
n
Branch Using a Register
B
Table 3–10 gives the program counter values and actions for the following
code example. In this example, the B10 register holds the value 1000 000Ch.
DescriptionIRP is placed in the PFC. This instruction also moves PGIE to GIE. PGIE is
unchanged.
If two branches are in the same execute packet and are both taken, behavior
is undefined.
Two conditional branches can be in the same execute packet if one branch
uses a displacement and the other uses a register, IRP, or NRP. As long as only
one branch has a ture condition, the code executes in a well-defined way.
Executionif (cond)IRP
→ PFC
elsenop
Notes:
1) This instruction executes on .S2 only. PFC is the program fetch counter .
2) Refer to the chapter on interrupts for more information on IRP , PGIE, and
GIE.
3) The execute packets in the delay slots of a branch cannot be interrupted.
This is true regardless of whether the branch is taken.
Pipeline
Pipeline
Stage
ReadIRP
Written
Branch
T aken
Unit in use
Instruction TypeBranch
3-44
E1PS
.S2
T arget Instruction
PWPRDPDCE1
n
Branch Using an Interrupt Return Pointer
Delay Slots5
Table 3–11 gives the program counter values and actions for the following
code example.
DescriptionNRP is placed in the PFC. This instruction also sets NMIE. PGIE is unchanged.
If two branches are in the same execute packet and are both taken, behavior
is undefined.
Two conditional branches can be in the same execute packet if one branch
uses a displacement and the other uses a register, IRP, or NRP. As long as only
one branch has a true condition, the code executes in a well-defined way.
Executionif (cond)NRP
→ PFC
elsenop
Notes:
1) This instruction executes on .S2 only. PFC is program fetch counter .
2) Refer to the chapter on interrupts for more information on NRP and
NMIE.
3) The execute packets in the delay slots of a branch cannot be interrupted.
This is true regardless of whether the branch is taken.
Pipeline
Pipeline
Stage
ReadNRP
Written
Branch
T aken
Unit in use
Instruction TypeBranch
3-46
E1PS
.S2
T arget Instruction
PWPRDPDCE1
n
Delay Slots5
Table 3–12 gives the program counter values and actions for the following
code example.