Analog Devices, Inc. reserves the right to change this product without
prior notice. Information furnished by Analog Devices is believed to be
accurate and reliable. However, no responsibility is assumed by Analog
Devices for its use; nor for any infringement of patents or other rights of
third parties which may result from its use. No license is granted by implication or otherwise under the patent rights of Analog Devices, Inc.
Trademark and Service Mark Notice
The Analog Devices logo, Blackfin, EZ-KIT Lite, SHARC, the SHARC
logo, TigerSHARC, and VisualDSP++ are registered trademarks of Analog
Devices, Inc.
All other brand and product names are trademarks or service marks of
their respective owners.
Contents
PREFACE
Purpose of This Manual .............................................................. xxiii
Data Access Options ............................................................. 5-38
Short Word Addressing of Single-Data in SISD Mode ....... 5-39
Short Word Addressing of Single-Data in SIMD Mode ...... 5-42
Short Word Addressing of Dual-Data in SISD Mode ......... 5-44
Short Word Addressing of Dual-Data in SIMD Mode ....... 5-46
32-Bit Normal Word Addressing of Single-Data in SISD Mode 5-48
32-Bit Normal Word Addressing of Single-Data in SIMD Mode
5-50
32-Bit Normal Word Addressing of Dual-Data in SISD Mode 5-52
32-Bit Normal Word Addressing of Dual-Data in SIMD Mode 5-54
Extended-Precision Normal Word Addressing of Single-Data 5-56
Extended-Precision Normal Word Addressing of Dual-Data in SISD
processors can use this manual, but should supplement it with other texts
(such as the appropriate hardware reference manuals and data sheets) that
describe your target architecture.
Manual Contents
This manual provides detailed information about the ADSP-2136x processor family in the following chapters:
•Chapter 1, “Introduction”
Provides an architectural overview of the ADSP-2136x processors.
•Chapter 2, “Processing Elements”
Describes the arithmetic/logic units (ALUs), multiplier/accumulator units, and shifter. The chapter also discusses data formats, data
types, and register files.
•Chapter 3, “Program Sequencer”
Describes the operation of the program sequencer, which controls
program flow by providing the address of the next instruction to be
executed. The chapter also discusses loops, subroutines, jumps,
interrupts, exceptions, and the IDLE instruction.
•Chapter 4, “Data Address Generators”
Describes the Data Address Generators (DAGs), addressing modes,
how to modify DAG and pointer registers, memory address alignment, and DAG instructions.
•Chapter 5, “Memory”
Describes aspects of processor memory including internal memory,
address and data bus structure, and memory accesses.
•Chapter 6, “JTAG Test Emulation Port”
Discusses the JTAG standard and how to use the ADSP-2136x
processors in a test environment. Includes boundary-scan architecture, instruction and boundary registers, and breakpoint control
registers.
•Chapter 7 “Timer”
Describes the three general purpose timers that can be configured
in any of three modes: pulse width modulation, pulse width count
and capture, and external event watchdog modes.
•Chapter 8, “Instruction Set”
Provides reference information for the machine language opcode
for the processor.
•Chapter 9, “Computations Reference”
Describes each compute operation in detail, including its assembly
language syntax and opcode field. Compute operations execute in
the multiplier, the ALU, and the shifter.
•Appendix A, “Instruction Set Quick Reference”
The instruction set summary provides a syntax summary for each
instruction and includes a cross reference to each instruction’s reference page.
•Appendix B, “Registers”
Provides register and bit descriptions for all of the registers that are
used to control the operation of the ADSP-2136x processor core.
This programming reference is a companion document to the
ADSP-2136x SHARC Processor Hardware Reference for the
ADSP-21362/3/4/5/6 Processors and the ADSP-2136x SHARC Processor Hardware Reference for the ADSP-21367/8/9 Processors.
What’s New in This Manual
What’s New in This Manual
This is revision 1.1 of the ADSP-2136x SHARC Processor Programming
Reference. The only changes for this revisions are corrections to cross references
(and links in the online version of the book).
Technical or Customer Support
You can reach Analog Devices, Inc. Customer Support in the following
ways:
•Visit the Embedded Processing and DSP products Web site at
The following is the list of Analog Devices, Inc. processors supported in
VisualDSP++®.
TigerSHARC® (ADSP-TSxxx) Processors
The name TigerSHARC refers to a family of floating-point and fixed-point
[8-bit, 16-bit, and 32-bit] processors. VisualDSP++ currently supports the
following TigerSHARC families: ADSP-TS101 and ADSP-TS20x.
SHARC (ADSP-21xxx) Processors
The name SHARC refers to a family of high-performance, 32-bit,
floating-point processors that can be used in speech, sound, graphics, and
imaging applications. VisualDSP++ currently supports the following
SHARC families: ADSP-2106x, ADSP-2116x, ADSP-2126x, and
ADSP-2136x.
Blackfin® (ADSP-BFxxx) Processors
The name Blackfin refers to a family of 16-bit, embedded processors.
VisualDSP++ currently supports the following Blackfin families:
ADSP-BF53x and ADSP-BF56x.
Product Information
You can obtain product information from the Analog Devices Web site,
from the product CD-ROM, or from the printed publications (manuals).
Analog Devices is online at
mation about a broad range of products—analog integrated circuits,
amplifiers, converters, and digital signal processors.
MyAnalog.com is a free feature of the Analog Devices Web site that allows
customization of a Web page to display only the latest information on
products you are interested in. You can also choose to receive weekly
e-mail notifications containing updates to the Web pages that meet your
interests. MyAnalog.com provides access to books, application notes, data
sheets, code examples, and more.
Registration
Visit www.myanalog.com to sign up. Click Register to use MyAnalog.com.
Registration takes about five minutes and serves as a means to select the
information you want to receive.
If you are already a registered user, just log on. Your user name is your
e-mail address.
Processor Product Information
For information on embedded processors and DSPs, visit our Web site at
www.analog.com/processors, which provides access to technical publica-
tions, data sheets, application notes, product overviews, and product
announcements.
You may also obtain additional information about Analog Devices and its
products in any of the following ways.
Online documentation comprises the VisualDSP++ Help system, software
tools manuals, hardware tools manuals, processor manuals, the Dinkum
Abridged C++ library, and Flexible License Manager (FlexLM) network
license manager software documentation. You can easily search across the
entire VisualDSP++ documentation set for any topic of interest. For easy
printing, supplementary .PDF files of most manuals are also provided.
Each documentation file type is described as follows.
File Description
.CHMHelp system files and manuals in Help format
.HTM or
.HTML
.PDFVisualDSP++ and processor manuals in Portable Documentation Format (PDF).
Dinkum Abridged C++ library and FlexLM network license manager software documentation. Viewing and printing the .HTML files requires a browser, such as
Internet Explorer 4.0 (or higher).
Viewing and printing the
Reader (4.0 or higher).
.PDF files requires a PDF reader, such as Adobe Acrobat
If documentation is not installed on your system as part of the software
installation, you can add it from the VisualDSP++ CD-ROM at any time
by running the Tools installation. Access the online documentation from
the VisualDSP++ environment, Windows® Explorer, or the Analog
Devices Web site.
Accessing Documentation From VisualDSP++
From the VisualDSP++ environment:
•Access VisualDSP++ online Help from the Help menu’s Contents, Search, and Index commands.
•Open online Help from context-sensitive user interface items (toolbar buttons, menu commands, and windows).
Accessing Documentation From Windows
In addition to any shortcuts you may have constructed, there are many
ways to open VisualDSP++ online Help or the supplementary documentation from Windows.
Help system files (.
CHM) are located in the Help folder, and .PDF files are
located in the Docs folder of your VisualDSP++ installation CD-ROM.
The Docs folder also contains the Dinkum Abridged C++ library and the
FlexLM network license manager software documentation.
Using Windows Explorer
•Double-click the
vdsp-help.chm file, which is the master Help sys-
tem, to access all the other .CHM files.
•Double-click any file that is part of the VisualDSP++ documentation set.
Select a processor family and book title. Download archive (.ZIP) files, one
for each manual. Use any archive management software, such as WinZip,
to decompress downloaded files.
Printed Manuals
For general questions regarding literature ordering, call the Literature
Center at 1-800-ANALOGD (1-800-262-5643) and follow the prompts.
VisualDSP++ Documentation Set
To purchase VisualDSP++ manuals, call 1-603-883-2430. The manuals
may be purchased only as a kit.
If you do not have an account with Analog Devices, you are referred to
Analog Devices distributors. For information on our distributors, log onto
To purchase EZ-KIT Lite® and In-Circuit Emulator (ICE) manuals, call
1-603-883-2430. The manuals may be ordered by title or by product
number located on the back cover of each manual.
Processor Manuals
Hardware reference and instruction set reference manuals may be ordered
through the Literature Center at 1-800-ANALOGD (1-800-262-5643),
or downloaded from the Analog Devices Web site. Manuals may be
ordered by title or by product number located on the back cover of each
manual.
Data Sheets
All data sheets (preliminary and production) may be downloaded from the
Analog Devices Web site. Only production (final) data sheets (Rev. 0, A,
B, C, and so on) can be obtained from the Literature Center at
1-800-ANALOGD (1-800-262-5643); they also can be downloaded from
the Web site.
To have a data sheet faxed to you, call the Analog Devices Faxback System
at 1-800-446-6212. Follow the prompts and a list of data sheet code
numbers will be faxed to you. If the data sheet you want is not listed,
check for it on the Web site.
Text conventions used in this manual are identified and described as
follows.
ExampleDescription
Close command
(File menu)
{this | that}Alternative items in syntax descriptions appear within curly brackets and
[this | that]Optional items in syntax descriptions appear within brackets and separated
[this,…]Optional item lists in syntax descriptions appear within brackets delimited
.SECTIONCommands, directives, keywords, and feature names are in text with let-
filenameNon-keyword placeholders appear in text with italic style format.
L
a
Titles in reference sections indicate the location of an item within the VisualDSP++ environment’s menu system (for example, the Close command
appears on the File menu).
separated by vertical bars; read the example as this or that. One or the
other is required.
by vertical bars; read the example as an optional
by commas and terminated with an ellipse; read the example as an optional
comma-separated list of this.
ter gothic font.
Note: For correct operation, ...
A Note: provides supplementary information on a related topic. In the
online version of this book, the word Note appears instead of this symbol.
Caution: Incorrect device operation may result if ...
Caution: Device damage may result if ...
A Caution: identifies conditions or inappropriate usage of the product that
could lead to undesirable results or product damage. In the online version of
this book, the word Caution appears instead of this symbol.
this or that.
Warn in g: Injury to device users may result if ...
[
A Warning: identifies conditions or inappropriate usage of the product that
could lead to conditions that are potentially hazardous for devices users. In
the online version of this book, the word Warn in g appears instead of this
symbol.
The ADSP-2136x processors are high performance 32-bit processors used
for medical imaging, communications, military, audio, test equipment,
3D graphics, speech recognition, motor control, imaging, and other applications. By adding on-chip SRAM, integrated I/O peripherals, and an
additional processing element for single-instruction, multiple-data
(SIMD) support, this processor builds on the ADSP-21000 family processor core to form a complete system-on-a-chip.
The ADSP-2136x processors are comprised of two distinct groups, the
ADSP-21362/3/4/5/6 processors (see Figure 1-1 on page 1-3 and
Table 1-1 on page 1-11), and the ADSP-21367/8/9 processors (see
Figure 1-2 on page 1-4 and Table 1-2 on page 1-12). The groups are dif-
ferentiated by, on-chip memories, peripheral choices, packaging, and
operating speeds. However, the core processor operates in the same way in
both groups so this manual applies to both groups. Where differences exist
(such as external memory interfacing) they will be noted.
For specific information on the peripherals associated with each group,
two manuals are available: the ADSP-2136x SHARC Processor Hardware
Reference for the ADSP-21362/3/4/5/6 Processors and the ADSP-2136x
SHARC Processor Hardware Reference for the ADSP-21367/8/9 Processors.
ADSP-2136x Design Advantages
A digital signal processor’s data format determines its ability to handle signals of differing precision, dynamic range, and signal-to-noise ratios.
Because floating-point math reduces the need for scaling and probability
of overflow, using a floating-point processor can ease algorithm and software development. The extent to which this is true depends on the
floating-point processor’s architecture. Consistency with IEEE workstation simulations and the elimination of scaling are clearly two ease-of-use
advantages. High level language programmability, large address spaces,
and wide dynamic range allow system development time to be spent on
algorithms and signal processing concerns, rather than assembly language
coding, code paging, and error handling. The ADSP-2136x processors are
highly integrated, 32-bit floating-point processors that provide many of
these design advantages.
The SHARC processor architecture balances a high performance processor
core with high performance program memory (PM), data memory (DM),
and input/output (I/O) buses. In the core, every instruction can execute in
a single cycle. The buses and instruction cache provide rapid, unimpeded
data flow to the core to maintain the execution rate.
Figure 1-1 shows a detailed block diagram of the processor, illustrating the
following architectural features:
•Two processing elements (PEx and PEy), each containing 32-bit
IEEE floating-point computation units—multiplier, arithmetic
logic unit (ALU), shifter, and data register file
•Program sequencer with related instruction cache, interval timer,
and data address generators (DAG1 and DAG2)
•Up to 3M bit on-chip SRAM
•IOP with integrated direct memory access (DMA) controller, serial
peripheral interface (SPI) compatible port, and serial ports
(SPORTs) for point-to-point multiprocessor communications.
•External port for interfacing to off-chip SDRAM
(ADSP-21367/8/9 processors) and configuring a shared memory
system with up to four other ADSP-21368 SHARC processors
•Parallel port for interfacing to off-chip memory and peripherals
(ADSP-21362/3/4/5/6 processors)
Figure 1-1 also shows the three on-chip buses of the ADSP-2136x proces-
sors: the PM bus, DM bus, and I/O bus. The PM bus provides access to
either instructions or data. During a single cycle, these buses let the processor access two data operands from memory, access an instruction (from
the cache), and perform a DMA transfer.
The ADSP-2136x processors address the five central requirements for signal processing:
1. Fast, flexible arithmetic. The ADSP-21000 family processors exe-
cute all instructions in a single cycle. They provide fast cycle times
and a complete set of arithmetic operations. The processor is IEEE
floating-point compatible and allows either interrupt on arithmetic
exception or latched status exception handling.
2. Unconstrained data flow. The ADSP-2136x processors have a
Super Harvard Architecture combined with a ten-port data register
file. For more information, see “Data Register File” on page 2-37.
In every cycle, the processor can write or read two operands to or
from the register file, supply two operands to the ALU, supply two
operands to the multiplier, and receive three results from the ALU
and multiplier. The processor’s 48-bit orthogonal instruction word
supports parallel data transfers and arithmetic operations in the
same instruction.
3. 40-Bit extended precision. The processor handles 32-bit IEEE
floating-point format, 32-bit integer and fractional formats
(twos-complement and unsigned), and extended-precision 40-bit
floating-point format. The processors carry extended precision
throughout their computation units, limiting intermediate data
truncation errors (up to 80 bits of precision are maintained during
multiply-accumulate operations).
4. Dual address generators. The processor has two data address gen-
erators (DAGs) that provide immediate or indirect (pre- and
post-modify) addressing. Modulus, bit-reverse, and broadcast operations are supported with no constraints on data buffer placement.
5. Efficient program sequencing. In addition to zero-overhead loops,
the processor supports single-cycle setup and exit for loops. Loops
are both nestable (six levels in hardware) and interruptable. The
processors support both delayed and non-delayed branches.
ADSP-2136x Architectural Overview
The ADSP-2136x processors form a complete system-on-a-chip, integrating a large, high speed SRAM and I/O peripherals supported by a
dedicated I/O bus. The following sections summarize the features of each
functional block in the ADSP-2136x architecture, which appears in
The processor core consists of two processing elements (each with three
computation units and data register file), a program sequencer, two
DAGs, a timer, and an instruction cache. All processing occurs in the processor core.
Processing Elements
The processor core contains two processing elements: PEx and PEy. Each
element contains a data register file and three independent computation
units: an arithmetic logic unit (ALU), a multiplier with an 80-bit
fixed-point accumulator, and a shifter. For meeting a wide variety of processing needs, the computation units process data in three formats: 32-bit
fixed-point, 32-bit floating-point, and 40-bit floating-point. The floating-point operations are single-precision IEEE-compatible. The 32-bit
floating-point format is the standard IEEE format, whereas the 40-bit
extended-precision format has eight additional least significant bits (LSBs)
of mantissa for greater accuracy.
The ALU performs a set of arithmetic and logic operations on both
fixed-point and floating-point formats. The multiplier performs floating-point or fixed-point multiplication and fixed-point
multiply/accumulate or multiply/cumulative-subtract operations. The
shifter performs logical and arithmetic shifts, bit manipulation, bit-wise
field deposit and extraction, and exponent derivation operations on 32-bit
operands. These computation units complete all operations in a single
cycle; there is no computation pipeline. The output of any unit may serve
as the input of any unit on the next cycle. All units are connected in parallel, rather than serially. In a multifunction computation, the ALU and
multiplier perform independent, simultaneous operations.
Each processing element has a general-purpose data register file that transfers data between the computation units and the data buses and stores
intermediate results. A register file has two sets (primary and secondary) of
16 general-purpose registers each for fast context switching. All of the registers are 40 bits wide. The register file, combined with the core
processor’s Super Harvard Architecture, allows unconstrained data flow
between computation units and internal memory.
Primary processing element (PEx). PEx processes all computational
instructions whether the processor is in single-instruction, single-data
(SISD) or single-instruction, multiple-data (SIMD) mode. This element
corresponds to the computational units and register file in previous
ADSP-21000 family processors.
Secondary processing element (PEy). PEy processes each computational
instruction in lock-step with PEx, but only processes these instructions
when the processor is in SIMD mode. Because many operations are influenced by this mode, more information on SIMD is available in multiple
locations:
•For information on PEy operations, see “Processing Elements” on
page 2-1.
•For information on data addressing in SIMD mode, see “Address-
ing in SISD and SIMD Modes” on page 4-20.
•For information on data accesses in SIMD mode, see “SISD,
SIMD, and Broadcast Load Modes” on page 5-37.
•For information on SIMD programming, see “Instruction Set” in
Chapter 8, Instruction Set, and “Computations Reference” in
Chapter 9, Computations Reference.
Program Sequence Control
Internal controls for program execution come from four functional blocks:
program sequencer, data address generators, core timer, and instruction
cache. Two dedicated address generators and a program sequencer supply
addresses for memory accesses. Together the sequencer and data address
generators allow computational operations to execute with maximum
efficiency since the computation units can be devoted exclusively to processing data. With its instruction cache, the ADSP-2136x processors can
simultaneously fetch an instruction from the cache and access two data
operands from memory. The DAGs also provide built-in support for
zero-overhead circular buffering.
Program sequencer. The program sequencer supplies instruction
addresses to program memory. It controls loop iterations and evaluates
conditional instructions. With an internal loop counter and loop stack,
the processors execute looped code with zero overhead. No explicit jump
instructions are required to loop or to decrement and test the counter. To
achieve a high execution rate while maintaining a simple programming
model, the processor employs a five stage pipeline to process instructions
— fetch1, fetch2, decode, address and execute. For more information, see
“Instruction Pipeline” on page 3-2.
Data address generators. The DAGs provide memory addresses when data
is transferred between memory and registers. Dual data address generators
enable the processor to output simultaneous addresses for two operand
reads or writes. DAG1 supplies 32-bit addresses for accesses using the DM
bus. DAG2 supplies 32-bit addresses for memory accesses over the PM
bus.
Each DAG keeps track of up to eight address pointers, eight address modifiers, and for circular buffering eight base-address registers and eight
buffer-length registers. A pointer used for indirect addressing can be modified by a value in a specified register, either before (pre-modify) or after
(post-modify) the access. A length value may be associated with each
pointer to perform automatic modulo addressing for circular data buffers.
The circular buffers can be located at arbitrary boundaries in memory.
Each DAG register has a secondary register that can be activated for fast
context switching.
Circular buffers allow efficient implementation of delay lines and other
data structures required in digital signal processing They are also commonly used in digital filters and Fourier transforms. The DAGs
automatically handle address pointer wraparound, reducing overhead,
increasing performance, and simplifying implementation.
Interrupts. The ADSP-2136x processors have three external hardware
interrupts. The processor also provides three general-purpose interrupts,
and a special interrupt for reset. The processor has internally-generated
interrupts for the timer, DMA controller operations, circular buffer overflow, stack overflows, arithmetic exceptions, and user-defined software
interrupts.
For the general-purpose interrupts and the internal timer interrupt, the
processor automatically stacks the arithmetic status (
ASTATx) register and
mode (MODE1) registers in parallel with the interrupt servicing, allowing 15
nesting levels of very fast service for these interrupts.
Context switch. Many of the processor’s registers have secondary registers
that can be activated during interrupt servicing for a fast context switch.
The data registers in the register file, the DAG registers, and the multiplier
result register all have secondary registers. The primary registers are active
at reset, while the secondary registers are activated by control bits in a
mode control register.
Timer. The core’s programmable interval timer provides periodic interrupt generation. When enabled, the timer decrements a 32-bit count
register every cycle. When this count register reaches zero, the
ADSP-2136x processors generate an interrupt and asserts their timer
expired output. The count register is automatically reloaded from a 32-bit
period register and the countdown resumes immediately.
Instruction cache. The program sequencer includes a 32-word instruction
cache that effectively provides three-bus operation for fetching an instruction and two data values. The cache is selective; only instructions whose
fetches conflict with data accesses using the PM bus are cached. This
caching allows full speed execution of core, looped operations such as digital filter multiply-accumulates, and FFT butterfly processing. For more
information on the cache, refer to “Using the Cache” on page 3-8.
Processor Internal Buses
The processor core has six buses: PM address, PM data, DM address, DM
data, I/O address, and I/O data. The PM bus is used to fetch instructions
from memory, but may also be used to fetch data. The DM bus can only
be used to fetch data from memory. The I/O bus is used solely by the IOP
to facilitate DMA transfers. In conjunction with the cache, this Super
Harvard Architecture allows the core to fetch an instruction and two
pieces of data in the same cycle that a data word is moved between memory and a peripheral. This architecture allows dual data fetches, when the
instruction is supplied by the cache.
Bus capacities. The PM and DM address buses are both 32 bits wide,
while the PM and DM data buses are both 64 bits wide.
These two buses provide a path for the contents of any register in the processor to be transferred to any other register or to any data memory
location in a single cycle. When fetching data over the PM or DM bus, the
address comes from one of two sources: an absolute value specified in the
instruction (direct addressing) or the output of a data address generator
(indirect addressing). These two buses share the same port of the memory.
Each memory block also has a dedicated I/O address bus and I/O data bus
to let the I/O processor access internal memory for DMA without delaying the processor core (in the absence of memory block conflict). The I/O
address bus is 18 bits wide, and the I/O data bus is 32 bits wide.
Data transfers. Nearly every register in the processor core is classified as a
universal register (Ureg). Instructions allow the transfer of data between
any two universal registers or between a universal register and memory.
This support includes transfers between control registers, status registers,
and data registers in the register file. The PM bus connect (
permit data to be passed between the 64-bit PM data bus and the 64-bit
DM data bus, or between the 40-bit register file and the PM data bus.
These registers contain hardware to handle the data width difference. For
more information, see “Processing Element Registers” on page B-22.
Processor Peripherals
The term processor peripherals refers to the multiple on-chip functional
blocks used to communicate with off-chip devices. The
ADSP-21362/3/4/5/6 peripherals include the JTAG, parallel, serial, SPI
ports, DAI components (PCG, timers, and IDP), and any external devices
that connect to the processor. The ADSP-21367/8/9 processors peripherals include the JTAG, external, serial, DAI components (PCG, Timers,
and IDP), DPI components (two UARTs, two SPIs, three timers, and a
two wire interface port) and any external devices that connect to the processor. For complete information on using peripherals, see the
ADSP-2136x SHARC Processor Hardware Reference for the
ADSP-21362/3/4/5/6 Processors or the ADSP-2136x SHARC Processor
Hardware Reference for the ADSP-21367/8/9 Processors.
Table 1-1 and Table 1-2 provide details on the various options available
from each processor group.
Table 1-1. ADSP-21362/3/4/5/6 SHARC Processor Features
1 The ADSP-21365 provides the Digital Transmission Content Protection protocol, a proprietary
security protocol. Contact your Analog Devices sales office for more information.
2 Audio decoding algorithms include PCM, Dolby Digital EX, Dolby Prologic IIx, DTS 96/24,
Neo:6, DTS ES, MPEG2 AAC, MP3, and functions like bass management, delay, speaker equalization, graphic equalization, and more. Decoder/post-processor algorithm combination support
vary, depending upon the chip version and the system configurations. Please visit www.analog.com/SHARC for complete information.
3 Analog Devices offers these packages in lead (Pb) free versions.
128dbNo SRC140dB128dB128dB
BGA
144 Lead
LQFP
136 Ball
BGA
144 Lead
LQFP
136 Ball
BGA
144 Lead
LQFP
136 Ball BGA
144 Lead
LQFP
136 Ball
BGA
144 Lead
LQFP
Table 1-2. ADSP-21367/8/9 SHARC Processor Features
FeatureADSP-21367ADSP-21368ADSP-21369
RAM2M bit2M bit2M bit
ROM6M bit6M bit
Audio Decoders in ROM
2
YesNoNo
1
6M bit
1
Pulse Width ModulationYesYesYes
S/ PD IFYesYe sYe s
Shared MemoryNoYesNo
SRC Performance128dB140dB128dB
Package Option
Processor Speed400 MHz400 MHz400 MHz
1 The ADSP-21368/21369 processors includes a customer-definable ROM block. Please contact
your Analog Devices sales representative for additional details.
2 Audio decoding algorithms include PCM, Dolby Digital EX, PCM, Dolby Digital EX, Dolby
Prologic IIx, DTS 96/24, Neo:6, DTS ES, MPEG2 AAC, MPEG2 2channel, MP3, and functions like bass management, delay, speaker equalization, graphic equalization, and more. Decoder/post-processor algorithm combination support vary depending upon the chip version and the
system configurations. Please visit www.analog.com/SHARC for complete information.
3 Analog Devices offers these packages in lead (Pb) free versions.
Internal Memory (SRAM)
The individual ADSP-2136x products contain varying amounts of memory. For example, the ADSP-21362/3/4/5/6 processors provide 3M bits of
internal SRAM and 4M bits of internal ROM, which is organized into
four separate blocks. The memory and separate on-chip buses allow two
data transfers from the core and one from I/O, all in a single cycle.
All of the memory can be accessed as 16-, 32-, 48-, or 64-bit words. On
the ADSP-2136x processors, the memory can be configured as a maximum of 96K words of 32-bit data, 192K words of 16-bit data, 64K words
of 48-bit instructions (and 40-bit data), or combinations of different word
sizes up to 3.0M bit. For specific memory configurations, see the product
model specific data sheet.
The processor also supports a 16-bit floating-point storage format, which
effectively doubles the amount of data that may be stored on chip. Conversion between the 32-bit floating-point and 16-bit floating-point
formats completes in a single instruction.
While each memory block can store combinations of code and data,
accesses are most efficient when one block stores data (using the DM bus
for transfers) and the other block stores instructions and data (using the
PM bus for transfers). Using the DM and PM buses in this way (with one
dedicated to each memory block) assures single-cycle execution with two
data transfers. In this case, the instruction must be available in the cache.
The processor also maintains single-cycle execution when one of the data
operands is transferred to or from off chip, using the processor’s parallel
port.
In addition to the core’s programmable interval timer, the ADSP-2136x
processors have three programmable interval timers that generate periodic
interrupts. Each timer can be independently set to operate in one of three
modes:
•Pulse waveform generation mode
•Pulse width count/capture mode
•External event watchdog mode
Each timer has one bidirectional pin and four registers that implement its
mode of operation. These registers are a 7-bit configuration register, a
32-bit count register, a 32-bit period register, and a 32-bit pulse width
register. A single status register supports all three timers. A bit in each
timer’s configuration register enables or disables the corresponding timer
independently of the others.
JTAG Port
The JTAG port supports the IEEE standard 1149.1 Joint Test Action
Group (JTAG) standard for system test. This standard defines a method
for serially scanning the I/O status of each component in a system. Emulators use the JTAG port to monitor and control the processor during
emulation. Emulators using this port provide full speed emulation with
access to inspect and modify memory, registers, and processor stacks.
JTAG-based emulation is non-intrusive and does not effect target system
loading or timing.
Rom Based Security
For those devices with application code in the on-chip ROM, an optional
ROM security feature is included. This feature provides hardware support
for securing user software code by preventing unauthorized reading from
the enabled code. The processor does not boot-load any external code,
executing exclusively from internal ROM. The processor also is not freely
accessible via the JTAG port. Instead a 64-bit key is assigned to the user.
This key must be scanned in through the JTAG or Test Access Port. The
device ignores a wrong key. Emulation features and external boot modes
are only available after the correct key is scanned.
Development Tools
The ADSP-2136x SHARC processors are supported by VisualDSP++, an
easy to use Integrated Development and Debugging Environment
(IDDE). VisualDSP++ allows you to manage projects from start to finish
from within a single, integrated interface. Because the project development and debug environments are integrated, you can move easily
between editing, building, and debugging activities.
Differences From Previous SHARC
Processors
This section identifies differences between the ADSP-2136x processors
and previous SHARC processors: ADSP-21161, ADSP-21160,
ADSP-21060, ADSP-21061, ADSP-21062, and ADSP-21065L. Like the
ADSP-2116x family, the ADSP-2136x family is based on the original
ADSP-2106x SHARC family. The ADSP-2136x preserves much of the
ADSP-2106x architecture and is code compatible to the ADSP-21160,
while extending performance and functionality. For background information on SHARC and the ADSP-2106x Family processors, see the
ADSP-2106x SHARC User’s Manual.
Computational bandwidth on the ADSP-2136x processors is significantly
greater than that on the ADSP-2106x processors. The increase comes
from raising the operational frequency and adding another processing element: ALU, shifter, multiplier, and register file. The new processing
element lets the processor process multiple data streams in parallel (SIMD
mode). The ADSP-2136x processors operate at up to 400 MHz using a
five stage pipeline.
The program sequencer has several enhancements: new interrupt vector
table definitions, SIMD mode stack and conditional execution model, and
instruction decodes associated with new instructions. Interrupt vectors
have been added that detect illegal memory accesses. Also, mode stack and
mode mask support have been added to improve context switch time.
The data address generators are improved from previous architectures in
that DAG2 (for the PM bus) has the same addressing capability as DAG1
(for the DM bus). The DAG registers move 64 bits per cycle. Additionally, the DAGs support the new memory map and long word transfer
capability. Circular buffering on the ADSP-2136x processors can be
quickly disabled on interrupts and restored on the return. Data “broadcast”, from one memory location to both data register files, is determined
by appropriate index register usage.
Processor Internal Bus Enhancements
The PM, DM, and I/O data buses have increased from 32 bits on the
ADSP-2106x processors to 64 bits. Additional multiplexing and control
logic enable 16-, 32-, or 64-bit wide moves between both register files and
memory. The ADSP-2136x processors are capable of broadcasting a single
memory location to each of the register files in parallel. Also, the
ADSP-2136x processors permit register contents to be exchanged between
the two processing elements’ register files in a single cycle.
The ADSP-2136x processors memory maps differ from the memory map
of the ADSP-2106x processor. The system memory map on each processor
group supports double-word transfers each cycle, reflects extended internal memory capacity for derivative designs, and works with an updated
control register for SIMD support. The ADSP-2136x processor family
provides enough on-chip memory for several audio decoders.
JTAG Port Enhancements
The JTAG port differs from the JTAG port of the ADSP-2106x processors. The ADSP-2136x processors offer ROM-based security. These
security features prevent piracy of codes and algorithms and prohibit
inspection of on-chip memory via the emulator or buses. The JTAG port
uses program controls to limit access to sensitive code in memory. An
assigned 64-bit key must be used to access protected memory regions.
The background telemetry channel (BTC) allows the emulator to feed
new data to the processor. It also gets updates from the processor in real
time. By using this function (that operates in the background), programmers can read and write data to a set of memory-mapped buffers that are
accessible by the emulator while the core is running.
Instruction Set Enhancements
The ADSP-2136x processors provide source code compatibility with the
previous SHARC processor family members, to the application assembly
source code level. All instructions, control registers, and system resources
available in the ADSP-2106x core programming model are also available
in the ADSP-2136x processors. Instructions, control registers, or other
facilities, required to support the new feature set of the ADSP-2136x core
include:
•Code compatibility with the ADSP-21160 SIMD core
•Supersets of the ADSP-2106x programming model
•Reserved facilities in the ADSP-2106x programming model
•Symbol name changes from the ADSP-2106x and ADSP-2136x
processor programming models
These name changes can be managed through reassembly by using the
ADSP-2136x development tools to apply the ADSP-2136x symbol definitions header file and linker description file. While these changes have no
direct impact on existing core applications, system and I/O processor initialization code and control code do require modifications.
Although the porting of source code written for the ADSP-2106x family
to the ADSP-2136x has been simplified, code changes are required to take
full advantage of the new ADSP-2136x processor features. For more information, see “Instruction Set” in Chapter 8, Instruction Set, and
“Computations Reference” in Chapter 9, Computations Reference.
The processor’s processing elements (PEx and PEy) perform numeric processing for processor algorithms. Each processing element contains a data
register file and three computation units—an arithmetic/logic unit (ALU),
a multiplier, and a shifter. Computational instructions for these elements
include both fixed-point and floating-point operations, and each computational instruction executes in a single cycle.
The computational units in a processing element handle different types of
operations. The ALU performs arithmetic and logic operations on
fixed-point and floating-point data. The multiplier performs floating-point and fixed-point multiplication and executes fixed-point
multiply/add and multiply/subtract operations. The shifter computes logical shifts, arithmetic shifts, bit manipulation, field deposit, and field
extraction operations on 32-bit operands. The shifter can also derive
exponents.
Data flow paths through the computational units are arranged in parallel,
as shown in Figure 2-1. The output of any computational unit may serve
as the input of any computational unit on the next instruction cycle. Data
moving in and out of the computational units goes through a 10-port register file, consisting of 16 primary registers and 16 alternate registers. Two
ports on the register file connect to the PM and DM data buses, allowing
data transfer between the computational units and memory (and anything
else) connected to these buses.
The processor’s assembly language provides access to the data register files
in both processing elements. The syntax allows programs to move data to
and from these registers, specify a computation’s data format and provide
naming conventions for the registers, all at the same time. For information
on the data register names, see “Data Register File” on page 2-37.
Figure 2-1 provides a graphical guide to the other topics in this chapter.
First, a description of the
format, and other modes for the processing elements. The dashed box
indicates which components can be controlled by the MODE1 register. Next,
an examination of each computational unit provides details on operation
and a summary of computational instructions. Outside the computational
units, details on register files and data buses identify how to flow data for
computations. Finally, details on the processor’s advanced parallelism
reveal how to take advantage of multifunction instructions and single-instruction, multiple-data (SIMD) mode.
MODE1 register shows how to set rounding, data
Numeric Formats
The processor supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the processor
supports an extended-precision version of the same format with eight
additional bits in the mantissa (40 bits total). The processor also supports
32-bit fixed-point formats—fractional and integer—which can be signed
(two’s-complement) or unsigned.
IEEE Single-Precision Floating-Point Data Format
The IEEE Standard 754/854 specifies a 32-bit single-precision floating-point format, shown in Figure 2-2. A number in this format consists
of a sign bit(s), a 24-bit significand, and an 8-bit unsigned-magnitude
exponent (e).
For normalized numbers, the significand consists of a 23-bit fraction, f
and a “hidden” bit of 1 that is implicitly presumed to precede f
in the
22
significand. The binary point is presumed to lie between this hidden bit
and f22. The least significant bit (LSB) of the fraction is f0; the LSB of the
exponent is e
.
0
The hidden bit effectively increases the precision of the floating-point significand to 24 bits from the 23 bits actually stored in the data format. It
also ensures that the significand of any number in the IEEE normalized
number format is always greater than or equal to one and less than two.
The unsigned exponent, e, can range between 1 ≤ e ≤ 254 for normal
numbers in single-precision format. This exponent is biased by
+127 (254, 2). To calculate the true unbiased exponent, subtract 127
from e.
313
se
e
•••
7
HIDDEN BIT
1.f
0
22
BINARY POINT
•••
f
0
Figure 2-2. IEEE 32-Bit Single-Precision Floating-Point Format
The IEEE Standard also provides several special data types in the single-precision floating-point format:
•An exponent value of 255 (all ones) with a non-zero fraction is a
not-a-number (NAN). NANs are usually used as flags for data flow
control, for the values of uninitialized variables, and for the results
of invalid operations such as 0 * ∞.
•Infinity is represented as an exponent of 255 and a zero fraction.
Note that because the fraction is signed, both positive and negative
infinity can be represented.
•Zero is represented by a zero exponent and a zero fraction. As with
infinity, both positive zero and negative zero can be represented.
The IEEE single-precision floating-point data types supported by the processor and their interpretations are summarized in Table 2-1.
Table 2-1. IEEE Single-Precision Floating-Point Data Types
TypeExponentFractionValue
NAN255Non-zeroUndefined
Infinity2550(–1)s Infinity
Normal1 ≤ e ≤ 254Any(–1)s (1.f
22-0
) 2
e–127
Zero00 (–1)s Zero
Extended-Precision Floating-Point Format
The extended-precision floating-point format is 40 bits wide, with the
same 8-bit exponent as in the IEEE standard format but with a 32-bit significand. This format is shown in Figure 2-3. In all other respects, the
extended-precision floating-point format is the same as the IEEE standard
format.
393831300
e
e
s
•••
7
HIDDEN BITBINARY POINT
1.f
0
30
•••
f
0
Figure 2-3. 40-Bit Extended-Precision Floating-Point Format
The processor supports a 16-bit floating-point data type and provides conversion instructions for it. The short float data format has an 11-bit
mantissa with a 4-bit exponent plus sign bit, as shown in Figure 2-4. The
16-bit floating-point numbers reside in the lower 16 bits of the 32-bit
floating-point field.
151411100
e
e
s
•••
3
HIDDEN BITBINARY POINT
1.f
0
10
•••
f
0
Figure 2-4. 16-Bit Floating-Point Format
Packing for Floating-Point Data
Two shifter instructions, FPACK and FUNPACK, perform the packing and
unpacking conversions between 32-bit floating-point words and 16-bit
floating-point words. The
ing-point number to a 16-bit floating-point number. The FUNPACK
instruction converts 16-bit floating-point numbers back to 32-bit IEEE
floating-point. Each instruction executes in a single cycle. The results of
the FPACK and FUNPACK operations appear in Table 2-2 and Table 2-3.
135 < expLargest magnitude representation.
120 < exp ≤ 135Exponent is most significant bit (MSB) of source exponent concatenated
with the three least significant bits (LSBs) of source exponent. The
packed fraction is the rounded upper 11 bits of the source fraction.
109 < exp ≤ 120Exponent = 0. Packed fraction is the upper bits (source exponent – 110)
of the source fraction prefixed by zeros and the “hidden” one. The packed
fraction is rounded.
exp < 110Packed word is all zeros.
exp = source exponent
sign bit remains the same in all cases
Table 2-3. FUNPACK Operations
ConditionResult
0 < exp ≤ 15Exponent is the 3 LSBs of the source exponent prefixed by the MSB of
the source exponent and four copies of the complement of the MSB.
The unpacked fraction is the source fraction with 12 zeros appended.
exp = 0Exponent is (120 – N) where N is the number of leading zeros in the
source fraction. The unpacked fraction is the remainder of the source
fraction with zeros appended to pad it and the “hidden” one stripped
away.
exp = source exponent
sign bit remains the same in all cases
The short float type supports gradual underflow. This method sacrifices
precision for dynamic range. When packing a number which would have
underflowed, the exponent is set to zero and the mantissa (including
hidden 1) is right-shifted the appropriate amount. The packed result is a
denormal, which can be unpacked into a normal IEEE floating-point
number.
FPACK operation, an overflow sets the SV condition and
non-overflow clears it. During the FUNPACK operation, the SV condition is
cleared. The SZ and SS conditions are cleared by both instructions.
Fixed-Point Formats
The processor supports two 32-bit fixed-point formats—fractional and
integer. In both formats, numbers can be signed (two’s-complement) or
unsigned. The four possible combinations are shown in Figure 2-5. In the
fractional format, there is an implied binary point to the left of the most
significant magnitude bit. In integer format, the binary point is understood to be to the right of the LSB. Note that the sign bit is negatively
weighted in a two’s-complement format.
If one operand is signed and the other unsigned, the result is signed. If
both inputs are signed, the result is signed and automatically shifted left
one bit. The LSB becomes zero and bit 62 moves into the sign bit position. Normally bit 63 and bit 62 are identical when both operands are
signed. (The only exception is full-scale negative multiplied by itself.)
Thus, the left-shift normally removes a redundant sign bit, increasing the
precision of the most significant product. Also, if the data format is fractional, a single bit left-shift renormalizes the MSP to a fractional format.
The signed formats with and without left-shifting are shown in
Figure 2-7.
ALU outputs have the same width and data format as the inputs. The
multiplier, however, produces a 64-bit product from two 32-bit inputs. If
both operands are unsigned integers, the result is a 64-bit unsigned integer. If both operands are unsigned fractions, the result is a 64-bit unsigned
fraction. These formats are shown in Figure 2-6.
The multiplier has an 80-bit accumulator to allow the accumulation of
64-bit products. For more information on the multiplier and accumulator, see “Multiply Accumulator (Multiplier)” on page 2-22.
The MODE1 register controls the operating mode of the processing elements. Table B-2 on page B-5 lists the bits in the MODE1 register. The
following MODE1 bits control computational modes:
•Floating-point data format. Bit 16 (RND32) rounds floating-point
data to 32 bits (if 1) or rounds to 40 bits (if 0).
•Rounding mode. Bit 15 (TRUNC) rounds results with round-to-zero
(if 1) or round-to-nearest (if 0).
• ALU saturation. Bit 13 (ALUSAT) saturates results on positive or
In the default mode, (RND32 bit=1), the multiplier and ALU support a single-precision floating-point format, which is specified in the IEEE
754/854 standard. For more information on this standard, see “Numeric
Formats” on page 2-2. This format is IEEE 754/854 compatible for sin-
gle-precision floating-point operations in all respects except:
•The processor does not provide inexact flags. An inexact flag is an
exception flag whose bit position is inexact. The inexact exception
occurs if the rounded result of an operation is not identical to the
exact (infinitely precise) result. Thus, an inexact exception always
occurs when an overflow or an underflow occurs.
•NAN (Not-A-Number) inputs generate an invalid exception and
return a quiet NAN (all 1s).
•Denormal operands, using denormalized (or tiny) numbers, flush
to zero when input to a computational unit and do not generate an
underflow exception. A denormal operand is one of the floating-point operands with an absolute value too small to represent
with full precision in the significant. The denormal exception
occurs if one or more of the operands is a denormal number. This
exception is never regarded as an error.
•The processor supports round-to-nearest and round-toward-zero
modes, but does not support round to +infinity and round to
–infinity.
IEEE single-precision floating-point data uses a 23-bit mantissa with an
8-bit exponent plus sign bit. In this case, the computation unit sets the
eight LSBs of floating-point inputs to zeros before performing the operation. The mantissa of a result rounds to 23 bits (not including the hidden
bit), and the 8 LSBs of the 40-bit result clear to zeros to form a 32-bit
number, which is equivalent to the IEEE standard result.
In fixed-point to floating-point conversion, the rounding boundary is
always 40 bits, even if the
RND32 bit is set.
40-Bit Floating-Point Format
In extended-precision mode (RND32 bit=0), the processor supports a 40-bit
extended-precision floating-point mode, which has eight additional LSBs
of the mantissa and is compliant with the 754/854 standards. However,
results in this format are more precise than the IEEE single-precision standard specifies. Extended-precision floating-point data uses a 31-bit
mantissa with a 8-bit exponent plus sign a bit.
16-Bit Floating-Point Format (Short Word)
The processor supports a 16-bit floating-point storage format and provides instructions that convert the data for 40-bit computations. The
16-bit floating-point format uses an 11-bit mantissa with a 4-bit exponent
plus sign bit. The 16-bit data goes into bits 23 through 8 of a data register.
Two shifter instructions, FPACK and FUNPACK, perform the packing and
unpacking conversions between 32-bit floating-point words and 16-bit
floating-point words. The FPACK instruction converts a 32-bit IEEE floating-point number in a data register into a 16-bit floating-point number.
FUNPACK converts a 16-bit floating-point number in a data register to a
32-bit IEEE floating-point number. Each instruction executes in a single
cycle.
When 16-bit data is written to bits 23 through 8 of a data register, the
processor automatically extends the data into a 32-bit integer (bits 39
through 8). If the
SSE bit in MODE1 is set (1), the processor sign-extends the
upper 16 bits. If the SSE bit is cleared (0), the processor zeros the upper 16
bits.
The 16-bit floating-point format supports gradual underflow. This
method sacrifices precision for dynamic range. When packing a number
that would have underflowed, the exponent clears to zero and the mantissa
(including a “hidden” 1) right-shifts the appropriate amount. The packed
result is a denormal, which can be unpacked into a normal IEEE floating-point number.
32-Bit Fixed-Point Format
The processor represents fixed-point numbers in 32 bits, occupying the 32
MSBs in 40-bit data registers. Fixed-point data may be fractional or integer numbers and unsigned or two’s-complement. Each computational unit
has limitations on how these formats may be mixed for a given operation.
All computational units read the upper 32 bits of data (inputs, operands)
from the 40-bit registers (ignoring the eight LSBs) and write results to the
upper 32 bits (zeroing the eight LSBs).
Rounding Mode
The TRUNC bit in the MODE1 register determines the rounding mode for all
ALU operations, all floating-point multiplies, and fixed-point multiplies
of fractional data. The processor supports two rounding modes—
round-toward-zero and round-toward-nearest. The rounding modes comply with the IEEE 754 standard and have the following definitions:
•Round-toward-zero (TRUNC bit=1). If the result before rounding is
not exactly representable in the destination format, the rounded
result is the number that is nearer to zero. This is equivalent to
truncation.
•Round-toward-nearest (
is not exactly representable in the destination format, the rounded
result is the number that is nearer to the result before rounding. If
the result before rounding is exactly halfway between two numbers
in the destination format (differing by an LSB), the rounded result
is the number that has an LSB equal to zero.
Statistically, rounding up occurs as often as rounding down, so there is no
large sample bias. Because the maximum floating-point value is one LSB
less than the value that represents infinity, a result that is halfway between
the maximum floating-point value and infinity rounds to infinity in this
mode.
Though these rounding modes comply with standards set for floating-point data, they also apply for fixed-point multiplier operations on
fractional data. The same two rounding modes are supported, but only the
round-to-nearest operation is actually performed by the multiplier. Using
its local result register for fixed-point operations, the multiplier
rounds-to-zero by reading only the upper bits of the result and discarding
the lower bits.
Using Computational Status
The multiplier and ALU each provide exception information when executing floating-point operations. Each unit updates overflow, underflow,
and invalid operation flags in the processing element’s arithmetic status
(ASTATx and ASTATy) registers and sticky status (STKYx and STKYy) registers.
An underflow, overflow, or invalid operation from any unit also generates
a maskable interrupt. There are three ways to use floating-point exceptions from computations in program sequencing:
•Enable interrupts and use an interrupt service routine (ISR) to handle the exception condition immediately. This method is
appropriate if it is important to correct all exceptions as they occur.
•Use conditional instructions to test the exception flags in the
ASTATx or ASTATy registers after the instruction executes. This
method permits monitoring each instruction’s outcome.
STKY register after a series of operations. If any flags are set, some of
the results are incorrect. Use this method when exception handling
is not critical.
More information on ASTAT and STKY status appears in the sections that
describe the computational units. For summaries relating instructions and
status bits, see Table 2-4, Table 2-5, Table 2-7, Table 2-9, and
Table 2-10.
BTST) instruction to examine exception flags in the
Arithmetic Logic Unit (ALU)
The ALU performs arithmetic operations on fixed-point or floating-point
data and logical operations on fixed-point data. ALU fixed-point instructions operate on 32-bit fixed-point operands and output 32-bit
fixed-point results, and ALU floating-point instructions operate on 32-bit
or 40-bit floating-point operands and output 32-bit or 40-bit floating-point results. ALU instructions include:
•Floating-point addition, subtraction, add/subtract, average
•Fixed-point addition, subtraction, add/subtract, average
ALU instructions take one or two inputs: X input and Y input. These
inputs (known as operands) can be any data registers in the register file.
Most ALU operations return one result; in add/subtract operations, the
ALU operation returns two results; in compare operations, the ALU operation returns no result (only flags are updated). ALU results can be
returned to any location in the register file.
Because of the 5-stage pipeline in the ADSP-2136x processor core, the
operands are fetched before the results are written back. Therefore, the
ALU can read and write the same register file location in a single cycle. If
the ALU operation is fixed-point, the inputs are treated as 32-bit
fixed-point operands. The ALU transfers the upper 32 bits from the
source location in the register file. For fixed-point operations, the result(s)
are 32-bit fixed-point values. Some floating-point operations (LOGB, MANT
and FIX) can also yield fixed-point results.
The processor transfers fixed-point results to the upper 32 bits of the data
register and clears the lower eight bits of the register. The format of
fixed-point operands and results depends on the operation. In most arithmetic operations, there is no need to distinguish between integer and
fractional formats. Fixed-point inputs to operations such as scaling a floating-point value are treated as integers. For purposes of determining status
such as overflow, fixed-point arithmetic operands and results are treated as
two’s-complement numbers.
ALU Saturation
When the ALUSAT bit is set (=1) in the MODE1 register, the ALU is in saturation mode. In this mode, positive fixed-point overflows return the
maximum positive fixed-point number (0x7FFF FFFF), and negative
overflows return the maximum negative number (0x8000 0000).
ALUSAT bit is cleared (=0) in the MODE1 register, fixed-point
results that overflow are not saturated; the upper 32 bits of the result are
returned unaltered.
ALU Status Flags
ALU operations update seven status flags in the processing element’s arithmetic status (
the bits in these registers. The following bits in
flag the ALU status (a 1 indicates the condition) of the most recent ALU
operation:
•ALU result zero or floating-point underflow, bit 0 (AZ)
•ALU overflow, bit 1 (AV)
•ALU result negative, bit 2 (AN)
•ALU fixed-point carry, bit 3 (AC)
•ALU X input sign for ABS, MANT operations, bit 4 (AS)
•ALU floating-point invalid operation, bit 5 (AI)
ASTATx and ASTATy) registers. Table B-4 on page B-14 lists
ASTATx or ASTATy registers
•Last ALU operation was a floating-point operation, bit 10 (AF)
•Compare accumulation register results of last eight compare operations, bits 31-24 (
CACC)
ALU operations also update four sticky status flags in the processing element’s sticky status (
lists the bits in these registers. The following bits in
STKYx and STKYy) registers. Table B-5 on page B-20
STKYx or STKYy flag
the ALU status (a 1 indicates the condition). Once set, a sticky flag
remains high until explicitly cleared:
•ALU floating-point invalid operation, bit 5 (AIS)
Flag updates occur at the end of the cycle in which the status is generated
and is available on the next cycle. If a program writes the arithmetic status
register or sticky status register explicitly in the same cycle that the ALU is
performing an operation, the explicit write to the status register supersedes
any flag update from the ALU operation.
ALU Instruction Summary
Table 2-4 and Table 2-5 list the ALU instructions and show how they
relate to ASTATx,y and STKYx,y flags. For more information on assembly
language syntax, see “Instruction Set” in Chapter 8, Instruction Set, and
“Computations Reference” in Chapter 9, Computations Reference. In
these tables, note the meaning of these symbols:
•Rn, Rx, Ry indicate any register file location; treated as fixed-point
•Fn, Fx, Fy indicate any register file location; treated as
floating-point
•* indicates that the flag may be set or cleared, depending on the
results of instruction
•** indicates that the flag may be set (but not cleared), depending
on the results of the instruction
The multiplier performs fixed-point or floating-point multiplication and
fixed-point multiply/accumulate operations. Fixed-point multiply/accumulates are available with cumulative addition or cumulative subtraction.
Multiplier floating-point instructions operate on 32-bit or 40-bit floating-point operands and output 32-bit or 40-bit floating-point results.
Multiplier fixed-point instructions operate on 32-bit fixed-point data and
produce 80-bit results. Inputs are treated as fractional or integer, unsigned
or two’s-complement. Multiplier instructions include:
•Floating-point multiplication
•Fixed-point multiplication
•Fixed-point multiply/accumulate with addition, rounding optional
•Fixed-point multiply/accumulate with subtraction, rounding
optional
•Rounding multiplier result register
•Saturating multiplier result register
•Clearing multiplier result register
Multiplier Operation
The multiplier takes two inputs: X and Y. These inputs (also known as
operands) can be any data registers in the register file. The multiplier can
accumulate fixed-point results in the local multiplier result (MRF) registers
or write results back to the register file. The results in MRF can also be
rounded or saturated in separate operations. Floating-point multiplies
yield floating-point results, which the multiplier writes directly to the register file.
Because of the 5-stage pipeline in the ADSP-2136x processor core, the
operands are fetched before the results are written back. Therefore, the
multiplier can read and write the same register file location in a single
cycle.
For fixed-point multiplies, the multiplier reads the inputs from the upper
32 bits of the data registers. Fixed-point operands may be integer, fractional or both formats. The format of the result matches the format of the
inputs. Each fixed-point operand may be either an unsigned number or a
two’s-complement number. If both inputs are fractional and signed, the
multiplier automatically shifts the result left one bit to remove the redundant sign bit. The register name(s) within the multiplier instruction
specify input data type(s)—Fx for floating-point and Rx for fixed-point.
Multiplier Result Register (Fixed-Point)
Fixed-point operations place 80-bit results in the multiplier’s foreground
MRF register or background MRB register, depending on which is active. For
more information on selecting the result register, see “Alternate (Second-
ary) Data Registers” on page 2-39.
The location of a result in the MRF register’s 80-bit field depends on
whether the result is in fractional or integer format, as shown in
Figure 2-8. If the result is sent directly to a data register, the 32-bit result
with the same format as the input data is transferred, using bits 63-32 for
a fractional result or bits 31-0 for an integer result. The eight LSBs of the
40-bit register file location are zero-filled.
Fractional results can be rounded-to-nearest before being sent to the register file. If rounding is not specified, discarding bits 31-0 effectively
truncates a fractional result (rounds to zero). For more information on
rounding, see “Rounding Mode” on page 2-14.
The MRF register is comprised of the MRF2, MRF1, and MRF0 registers, which
individually can be read from or written to the register file. Each of these
registers has the same format. When data is read from
Figure 2-8. Multiplier Fixed-Point Result Placement
sign-extended to 32 bits as shown in Figure 2-9. The processor zero-fills
the eight LSBs of the 40-bit register file location when data is read from
MRF2, MRF1, or MRF0 written to the register file. When the processor writes
data into MRF2, MRF1, or MRF0 from the 32 MSBs of a register file location,
the eight LSBs are ignored. Data written to MRF1 is sign-extended to MRF2,
repeating the MSB of MRF1 in the 16 bits of MRF2. Data written to MRF0 is
not sign-extended.
In addition to multiply, fixed-point operations include accumulate,
round, and saturate fixed-point data. There are three
MRF register opera-
tions: clear (CLR), round (RND), and saturate (SAT).
The CLR operation (MRF=0) resets the specified MRF register to zero. Often,
it is best to perform this operation at the start of a multiply/accumulate
operation to remove the results of the previous operation.
The RND operation (MRF=RND MRF) applies only to fractional results and
integer results are not effected. This operation rounds the 80-bit MRF value
to nearest at bit 32, for example, the MRF1-MRF0 boundary. Rounding a
fixed-point result occurs as part of a multiply or multiply/accumulate
operation or as an explicit operation on the MRF register. The rounded
result in MRF1 can be sent to the register file or back to the same MRF register. To round a fractional result to zero (truncation) instead of to nearest,
a program transfers the unrounded result from MRF1, discarding the lower
32 bits in MRF0.
The SAT operation (MRF=SAT MRF) sets MRF to a maximum value if the MRF
value has overflowed. Overflow occurs when the MRF value is greater than
the maximum value for the data format—unsigned or two’s-complement
and integer or fractional—as specified in the saturate instruction. The six
possible maximum values appear in Table 2-6. The result from MRF saturation can be sent to the register file or back to the same MRF register.
Table 2-6. Fixed-Point Format Maximum Values (Saturation)
Table 2-6. Fixed-Point Format Maximum Values (Saturation) (Cont’d)
Maximum Number(Hexadecimal)
MRF2MRF1MRF0
Unsigned fractional number0000FFFF FFFFFFFF FFFF
Unsigned integer number00000000 0000FFFF FFFF
Multiplier Status Flags
Multiplier operations update four status flags in the processing element’s
arithmetic status registers (ASTATx and ASTATy). “Arithmetic Status Regis-
ters (ASTATx and ASTATy)” on page B-12 lists the bits in these registers.
The bits in the ASTATx or ASTATy registers that indicate the multiplier status (a 1 indicates the condition) of the most recent multiplier operation
are:
•Multiplier result negative, bit 6 (MN)
•Multiplier overflow, bit 7 (MV)
•Multiplier underflow, bit 8 (MU)
•Multiplier floating-point invalid operation, bit 9 (MI)
Multiplier operations also update four “sticky” status flags in the processing element’s sticky status (
STKYx and STKYy) registers. Table B-5 on
page B-20 lists the bits in these registers. Once set, a sticky flag remains
high until explicitly cleared. The bits in the STKYx or STKYy registers that
indicate multiplier status (a 1 indicates the condition) are:
•Multiplier fixed-point overflow, bit 6 (MOS)
•Multiplier floating-point overflow, bit 7 (MVS)
•Multiplier underflow, bit 8 (
•Multiplier floating-point invalid operation, bit 9 (
Flag updates occur at the end of the cycle in which the status is generated
and are available on the next cycle. If a program writes the arithmetic status register or sticky register explicitly in the same cycle that the multiplier
is performing an operation, the explicit write to
ASTAT or STKY supersedes
any flag update from the multiplier operation.
Multiplier Instruction Summary
Table 2-7 and Table 2-9 list the multiplier instructions and describe how
they relate to ASTATx,y and STKYx,y flags. For more information on
assembly language syntax, see “Instruction Set” in Chapter 8, Instruction
Set, and “Computations Reference” in Chapter 9, Computations Reference. In these tables, note the meaning of the following symbols:
•Rn, Rx, Ry indicate any register file location; treated as fixed-point
•Fn, Fx, Fy indicate any register file location; treated as
floating-point
•* indicates that the flag may be set or cleared, depending on results
of instruction
•** indicates that the flag may be set (but not cleared), depending
on results of instruction
•– indicates no effect
•The Input Mods column indicates the types of optional modifiers
that can be applied to the instruction inputs. For a list of modifiers,
see Table 2-8.
The shifter performs bit-wise operations on 32-bit fixed-point operands.
Shifter operations include:
•Shifts and rotates from off-scale left to off-scale right
•Bit manipulation operations, including bit set, clear, toggle, and
test
•Bit field manipulation operations, including extract and deposit
•Fixed-point/floating-point conversion operations, including exponent extract, number of leading 1s or 0s
Shifter Operation
The shifter takes one to three inputs: X, Y, and Z. The inputs (known as
operands) can be any register in the register file. Within a shifter instruction, the inputs serve as follows.
•The X input provides data that is operated on.
•The Y input specifies shift magnitudes, bit field lengths, or bit
positions.
•The Z input provides data that is operated on and updated.
In the following example,
Z input. The shifter returns one output (Rn) to the register file.
Rn = Rn OR LSHIFT Rx BY Ry;
As shown in Figure 2-10, the shifter fetches input operands from the
upper 32 bits of a register file location (bits 39-8) or from an immediate
value in the instruction. Because of the 5-stage pipeline in the
Rx is the X input, Ry is the Y input, and Rn is the
Processing Elements
ADSP-2136x processor core, the operands are fetched before the results
are written back. Therefore, the shifter can read and write the same register file location in a single cycle.
The X input and Z input are always 32-bit fixed-point values. The Y input
is a 32-bit fixed-point value or an 8-bit field (shf8), positioned in the register file. These inputs appear in Figure 2-10.
Some shifter operations produce 8 or 6-bit results. As shown in
Figure 2-11, the shifter places these results in the shf8 field or the bit6
field and sign-extends the results to 32 bits. The shifter always returns a
32-bit result.
3970
32-BIT Y INPUT OR RESULT
391570
SHF8
8-BIT Y INPUT OR RESULT
Figure 2-10. Register File Fields for Shifter Instructions
The shifter supports bit field deposit and bit field extract instructions for
manipulating groups of bits within an input. The Y input for bit field
instructions specifies two 6-bit values, bit6 and len6, which are positioned
in the Ry register as shown in Figure 2-11. The shifter interprets bit6 and
len6 as positive integers. Bit6 is the starting bit position for the deposit or
extract, and len6 is the bit field length, which specifies how many bits are
deposited or extracted.
Figure 2-11. Register File Fields for FDEP, FEXT Instructions
Field deposit (
FDEP) instructions take a group of bits from the input regis-
ter (starting at the LSB of the 32-bit integer field) and deposit the bits as
directed anywhere within the result register. The bit6 value specifies the
starting bit position for the deposit. Figure 2-13 shows how the inputs,
bit6 and len6, work in a field deposit instruction
Rn = FDEP Rx By Ry
Figure 2-12 shows bit placement for the following field deposit
RY DETERMINES LENGTH OF BIT FIELD TO TAKE FROM RX AND STARTING POSITION
397
RX
LEN6 = NUMBER OF BITS TO TAKE FROM RX, STARTING FROM LSB OF 32-BIT FIELD
397
RN
BIT6 = STARTING BIT POSITIONFOR DEPOSIT, REFERENCEDFROM LSB OF 32-BIT FIELD
DEPOSIT FIELD
BIT6REFERENCE POINT
LEN6
FOR DEPOSIT IN RN
BIT6
Figure 2-13. Bit Field Deposit Instruction
Field extract (
FEXT) instructions extract a group of bits as directed from
anywhere within the input register and place them in the result register,
aligned with the LSB of the 32-bit integer field. The bit6 value specifies
the starting bit position for the extract.
0
0
0
Figure 2-14 shows bit placement for the following field extract
Shifter operations update three status flags in the processing element’s
arithmetic status registers (ASTATx and ASTATy). Table B-4 on page B-14
lists the bits in these registers. The following bits in the ASTATx or ASTATy
registers indicate shifter status (a 1 indicates the condition) for the most
recent ALU operation:
•Shifter overflow of bits to left of MSB, bit 11 (SV)
•Shifter result zero, bit 12 (
SZ)
•Shifter input sign for exponent extract only, bit 13 (SS)
A flag update occurs at the end of the cycle in which the status is generated and is available on the next cycle. If a program writes the arithmetic
status register explicitly in the same cycle that the shifter is performing an
operation, the explicit write to ASTAT supersedes any flag update caused by
the shift operation.
Each of the processor’s processing elements has a data register file, which
is a set of data registers that transfers data between the data buses and the
computational units. These registers also provide local storage for operands and results.
The two register files consist of 16 primary registers and 16 alternate (secondary) registers. The data registers are 40 bits wide. Within these
registers, 32-bit data is left-justified. If an operation specifies a 32-bit data
transfer to these 40-bit registers, the eight LSBs are ignored on register
reads, and the LSBs are cleared to zeros on writes.
Program memory data accesses and data memory accesses to and from the
register file(s) occur on the PM data bus and DM data bus, respectively.
One PM data bus access for each processing element and/or one DM data
bus access for each processing element can occur in one cycle. Transfers
between the register files and the DM or PM data buses can move up to
64 bits of valid data on each bus.
If an operation specifies the same register file location as both an input
and output, the 5-stage pipeline fetches the operands before the results are
written back. Therefore, the processor uses the old data as the operand,
before updating the location with the new result data. If writes to the same
location take place in the same cycle, only the write with higher precedence actually occurs. The processor determines precedence for the write
operation from the source of the data; from highest to lowest, the precedence is:
The data register file in Figure 2-1 on page 2-3 lists register names of
R0
through R15 within the PEx’s register file. When a program refers to these
registers as R0 through R15, the computational units treat the contents of
these registers as fixed-point data. To perform floating-point computations, refer to these registers as F0 through F15. For example, the following
instructions refer to the same registers, but direct the computational units
to perform different operations:
F0 = F1 * F2; /* floating-point multiply */
R0 = R1 * R2; /* fixed-point multiply */
The F and R prefixes on register names do not effect the 32-bit or 40-bit
data transfer; the naming convention only determines how the ALU, multiplier, and shifter treat the data.
To maintain compatibility with code written for previous SHARC processors, the assembly syntax accommodates references to the PEx and PEy
data registers.
Code may refer only to the PEy data registers (
S0 through S15) for data
move instructions. The rules for using register names are:
•
R0 through R15 and F0 through F15 refer to PEx registers for data
move and computational instructions, whether the processor is in
SISD or SIMD mode.
R0 through R15 and F0 through F15 refer to both PEx and PEy reg-
ister for computational instructions in SIMD mode.
•S0 through S15 refer to PEy registers for data move instructions,
when the processor is in SISD or SIMD mode.
For more information on SISD and SIMD computational operations, see
“Secondary Processing Element (PEy)” on page 2-45. For more informa-
tion on ADSP-2136x assembly language, see“Instruction Set” in
Chapter 8, Instruction Set, and “Computations Reference” in Chapter 9,
Computations Reference.
Alternate (Secondary) Data Registers
Each register file has an alternate register set. To facilitate fast context
switching, the processor includes alternate register sets for data, results,
and data address generator registers. Bits in the MODE1 register control
when alternate registers become accessible. While inaccessible, the contents of alternate registers are not affected by processor operations. Note
that there is a one cycle latency from the time when writes are made to the
MODE1 register until an alternate register set can be accessed. The alternate
register sets for data and results are described in this section. For more
information on alternate data address generator registers, see “Alternate
(Secondary) DAG Registers” on page 4-6.
Bits in the
sets: the lower half (
S8-S15). To share data between contexts, a program places the data to be
MODE1 register can activate independent alternate data register
R0-R7 and S0-S7) and the upper half (R8-R15 and
shared in one half of either the current processing element’s register file or
the opposite processing element’s register file and activates the alternate
register set of the other half. For information on how to activate alternate
data registers, see the description of the
background (MRB) results register. A bit in the MODE1 register selects which
result register receives the result from the multiplier operation, swapping
which register is the current MRF or MRB. This swapping facilitates context
switching. Unlike other registers that have alternates, both MRF and MRB are
accessible at the same time. Fixed-point multiplies can accumulate results
in the MRF or MRB registers, without regard to the state of the MODE1 register. With this arrangement, code can use the result registers as primary
and alternate accumulators, or code can use these registers as two parallel
accumulators. This feature facilitates complex math.
The MODE1 register controls the access to alternate registers. Table B-2 on
page B-5 lists the bits in MODE1. The following bits in the MODE1 register
control alternate registers (a 1 enables the alternate set):
•Secondary registers for computational unit results, bit 2 (SRCU)
•Secondary registers for the hi register file, R8–R15 and S8–S15, bit 7
(SRRFH)
•Secondary registers for the lo register file, R0–R7 and S0–S7, bit 10
(SRRFL)
The following example demonstrates how code should handle the one
cycle of latency—from the instruction that sets the bit in the MODE1 register until the alternate registers may be accessed. Note that it is possible to
use any instruction that does not access the switching register file instead
of a NOP instruction.
BIT SET MODE1 SRRFL; /* activate alternate reg. file */
NOP; /* wait for access to alternates */
R0 = 7;
The processor supports multiple parallel (multifunction) computations by
using the parallel data paths within its computational units. These instructions complete in a single cycle, and they combine parallel operation of
the multiplier and the ALU or dual ALU functions. The multiple operations perform as if they were in corresponding single function
computations. Multifunction computations also handle flags in the same
way as the single function computations, except that in the dual add/subtract computation, the ALU flags from the two operations are ORed
together.
To work with the available data paths, the computational units constrain
which data registers hold the four input operands for multifunction computations. These constraints limit which registers may hold the X input
and Y input for the ALU and multiplier.
Figure 2-15 shows a computational unit and indicates which registers may
serve as X inputs and Y inputs for the ALU and multiplier. For example,
the X input to the ALU can only be R8, R9, R10 or R11. Note that the
shifter is gray in Figure 2-15 to indicate no shifter multifunction
operations.
Table 2-12, Table 2-13, Table 2-14, and Table 2-15 list the multifunc-
tion computations. For more information on assembly language syntax,
see“Instruction Set” in Chapter 8, Instruction Set, and“Computations
Reference” in Chapter 9, Computations Reference. Table 2-11 provides
Fm = F3-0 * F7-4, Fa = F11-8 + F15-12, Fs = F11-8 – F15-12
Another type of multifunction operation available on the processor combines transfers between the results and data registers and transfers between
memory and data registers. These parallel operations complete in a single
cycle. For example, the processor can perform the following multiply and
parallel read of data memory:
MRF = MRF – R5 * R0, R6 = DM(I1,M2);
Or, the processor can perform the following result register transfer and
parallel read:
The ADSP-2136x processor contains two sets of computational units and
associated register files. As shown in Figure 2-16, these two processing elements (PEx and PEy) support SIMD operation.
MODE1 register controls the operating mode of the processing ele-
The
ments. Table B-2 on page B-5 lists the bits in MODE1. The PEYEN bit (bit
21) in the
When
MODE1 register enables or disables the PEy processing element.
PEYEN is cleared (0), the ADSP-2136x processor operates in SISD
mode, using only PEx. When the PEYEN bit is set (1), the processor operates in SIMD mode, using the PEx and PEy processing elements. There is
a one cycle delay after PEYEN is set or cleared, before the change to or from
SIMD mode takes effect.
To support SIMD, the processor performs these parallel operations:
•Dispatches a single instruction to both processing element’s computational units
•Loads two sets of data from memory, one for each processing
element
•Executes the same instruction simultaneously in both processing
elements
•Stores data results from the dual executions to memory
L
The two processing elements are symmetrical; each contains these functional blocks:
Using the information here and in“Instruction Set” in Chapter 8,
Instruction Set, and “Computations Reference” in Chapter 9,
Computations Reference. It is possible through the SIMD mode’s
parallelism to double performance over similar algorithms running
in SISD (ADSP-2106x processor compatible) mode.
•ALU
•Multiplier primary and alternate result registers
•Shifter
•Data register file and alternate register file
Dual Compute Units Sets
The computational units (ALU, multiplier, and shifter) in PEx and PEy
are identical. The data bus connections for the dual computational units
permit asymmetric data moves to, from, and between the two processing
elements. Identical instructions execute on the PEx and PEy computational units; the difference is the data. The data registers for PEy
operations are identified (implicitly) from the PEx registers in the