Analog Devices, Inc. reserves the right to change this product without
prior notice. Information furnished by Analog Devices is believed to be
accurate and reliable. However, no responsibility is assumed by Analog
Devices for its use; nor for any infringement of patents or other rights of
third parties which may result from its use. No license is granted by
implication or otherwise under the patent rights of Analog Devices, Inc.
Trademark and Service Mark Notice
The Analog Devices logo, Blackfin, EZ-ICE, SHARC, TigerSHARC, the
TigerSHARC logo, and VisualDSP++, and are registered trademarks of
Analog Devices, Inc.
Static Superscalar is a trademark of Analog Devices, Inc.
All other brand and product names are trademarks or service marks of
their respective owners.
CONTENTS
CONTENTS
PREFACE
Purpose of This Manual ................................................................. xxi
Intended Audience ......................................................................... xxi
Thank you for purchasing and developing systems using TigerSHARC®
processors from Analog Devices.
Purpose of This Manual
The ADSP-TS201 TigerSHARC Processor Hardware Reference contains
information about the DSP architecture and DSP system design for
TigerSHARC processors. These are 32-bit, fixed- and floating-point digital signal processors from Analog Devices for use in computing,
communications, and consumer applications.
The manual provides information on how the processor core and I/O
peripherals operate in the TigerSHARC processor’s architecture along
with reference information about I/O peripheral features.
Intended Audience
The primary audience for this manual is a system developer who is familiar with Analog Devices processors. This manual assumes that the
audience has a working knowledge of the appropriate processor architecture and microprocessor system design. Programmers who are unfamiliar
with Analog Devices processors can use this manual, but should supplement it with other texts (such as the appropriate programming reference
manuals and data sheets) that describe your target architecture.
•Chapter 1, Processor Architecture
This chapter provides an architectural overview of the
TigerSHARC processor.
•Chapter 2, Memory and Registers
This chapter defines the memory map of the ADSP-TS201
TigerSHARC processor. The memory space defines the location of
each element on the TigerSHARC processor.
•Chapter 3, SOC Interface
This chapter discusses clocking inputs, including the three different types of operating modes in which the ADSP-TS201
TigerSHARC processor can operate and the boot modes from
which the TigerSHARC processor initiates.
•Chapter 4, Timers
This chapter discusses clocking inputs, including the three different types of operating modes in which the ADSP-TS201
TigerSHARC processor can operate and the boot modes from
which the TigerSHARC processor initiates.
•Chapter 5, Flags
This chapter discusses clocking inputs, including the three different types of operating modes in which the ADSP-TS201
TigerSHARC processor can operate and the boot modes from
which the TigerSHARC processor initiates.
•Chapter 6, Interrupts
This chapter discusses the various types of interrupts supported by
the ADSP-TS201 TigerSHARC processor. Some of the interrupts
are generated internally or externally.
•Chapter 7, Direct Memory Access
This chapter describes how the ADSP-TS201 TigerSHARC processor’s on-chip DMA controller acts as a machine for transferring
data without core interruption.
•Chapter 8, External Port and SDRAM Interface
This chapter focuses on the external bus interface of the
ADSP-TS201 TigerSHARC processor, which includes the bus
arbitration logic and the external address, data and control buses,
and interface to SDRAM devices.
•Chapter 9, Link Ports
This chapter describes how link ports provide point-to-point communications between ADSP-TS201 TigerSHARC processors in a
system. The Link ports can also be used to interface with any other
device that is designed to work in the same protocol.
•Chapter 10, JTAG Port and Test/Debug Interface
This chapter describes features of the ADSP-TS201 TigerSHARC
processor that are useful for performing software debugging and
services usually found in Operating System (OS) kernels.
•Chapter 11, System Design
This chapter describes system features of the ADSP-TS201
TigerSHARC processor. These include Power, Reset, Clock,
JTAG, and Booting, as well as pin descriptions and other system
level information.
This hardware reference is a companion document to the
ADSP-TS201 TigerSHARC Processor Programming Reference.
What’s New in This Manual
What’s New in This Manual
Revision 1.0 of the ADSP-TS201 TigerSHARC Processor Hardware
Reference differs in a number of ways from the revision 0.2 book. In
revision 1.0, the following additions and corrections have been made:
•The Processor Architecture chapter has replaced the previous Introduction chapter. This new chapter provides a more detailed road
map to the processor architecture and processor core mode
controls.
•The SOC Interface, Timers, and Flags chapters have been added. A
description of the operation of all I/O peripherals as bus masters or
slaves on the SOC bus has been added in all chapters.
•The Interrupts chapter has been re-ordered to provide more guidance on using interrupts, and a consolidated interrupt vector table
has been added.
•The System Design chapter has been expanded. Many of the
Engineer-to-Engineer (EE) Notes to which the revision 0.2 book
referred have been added to the revision 1.0 book. The topics
added include booting, system design guidelines, and thermal
design guidelines.
•The index has been enhanced.
•Errata reports against the revision 0.2 book have been corrected.
The following is the list of Analog Devices, Inc. processors supported in
VisualDSP++®.
TigerSHARC (ADSP-TSxxx) Processors
The name “TigerSHARC” refers to a family of floating-point and
fixed-point [8-bit, 16-bit, and 32-bit] processors. VisualDSP++ currently
supports the following TigerSHARC processors:
ADSP-TS101, ADSP-TS201, ADSP-TS202, and ADSP-TS203
SHARC® (ADSP-21xxx) Processors
The name “SHARC” refers to a family of high-performance, 32-bit,
floating-point processors that can be used in speech, sound, graphics, and
imaging applications. VisualDSP++ currently supports the following
SHARC processors:
You can obtain product information from the Analog Devices Web site,
from the product CD-ROM, or from the printed publications (manuals).
Analog Devices is online at www.analog.com. Our Web site provides information about a broad range of products—analog integrated circuits,
amplifiers, converters, and digital signal processors.
MyAnalog.com
MyAnalog.com is a free feature of the Analog Devices Web site that allows
customization of a Web page to display only the latest information on
products you are interested in. You can also choose to receive weekly
e-mail notifications containing updates to the Web pages that meet your
interests. MyAnalog.com provides access to books, application notes, data
sheets, code examples, and more.
Registration
Visit www.myanalog.com to sign up. Click Register to use MyAnalog.com.
Registration takes about five minutes and serves as a means to select the
information you want to receive.
If you are already a registered user, just log on. Your user name is your
e-mail address.
Processor Product Information
For information on embedded processors and DSPs, visit our Web site at
www.analog.com/processors, which provides access to technical publica-
tions, data sheets, application notes, product overviews, and product
announcements.
You may also obtain additional information about Analog Devices and its
products in any of the following ways.
•E-mail questions or requests for information to
dsp.support@analog.com
•Fax questions or requests for information to
1-781-461-3010 (North America)
089/76 903-557 (Europe)
•Access the FTP Web site at
ftp ftp.analog.com or ftp 137.71.23.21
ftp://ftp.analog.com
Related Documents
The following publications that describe the ADSP-TS201 TigerSHARC
processor (and related processors) can be ordered from any Analog Devices
sales office:
•ADSP-TS201S TigerSHARC Embedded Processor Data Sheet
•ADSP-TS202S TigerSHARC Embedded Processor Data Sheet
•ADSP-TS203S TigerSHARC Embedded Processor Data Sheet
Online documentation comprises the VisualDSP++ Help system, software
tools manuals, hardware tools manuals, processor manuals, the Dinkum
Abridged C++ library, and Flexible License Manager (FlexLM) network
license manager software documentation. You can easily search across the
entire VisualDSP++ documentation set for any topic of interest. For easy
printing, supplementary .PDF files of most manuals are also provided.
Each documentation file type is described as follows.
File Description
.CHMHelp system files and manuals in Help format
.HTM or
.HTML
.PDFVisualDSP++ and processor manuals in Portable Documentation Format (PDF).
Dinkum Abridged C++ library and FlexLM network license manager software documentation. Viewing and printing the .HTML files requires a browser, such as
Internet Explorer 4.0 (or higher).
Viewing and printing the .PDF files requires a PDF reader, such as Adobe Acrobat
Reader (4.0 or higher).
If documentation is not installed on your system as part of the software
installation, you can add it from the VisualDSP++ CD-ROM at any time
by running the Tools installation. Access the online documentation from
the VisualDSP++ environment, Windows® Explorer, or the Analog
Devices Web site.
Accessing Documentation From VisualDSP++
From the VisualDSP++ environment:
•Access VisualDSP++ online Help from the Help menu’s Contents, Search, and Index commands.
•Open online Help from context-sensitive user interface items (toolbar buttons, menu commands, and windows).
Accessing Documentation From Windows
In addition to any shortcuts you may have constructed, there are many
ways to open VisualDSP++ online Help or the supplementary documentation from Windows.
Help system files (.
CHM) are located in the Help folder, and .PDF files are
located in the Docs folder of your VisualDSP++ installation CD-ROM.
The Docs folder also contains the Dinkum Abridged C++ library and the
FlexLM network license manager software documentation.
Using Windows Explorer
•Double-click the
vdsp-help.chm file, which is the master Help sys-
tem, to access all the other .CHM files.
•Double-click any file that is part of the VisualDSP++ documentation set.
Select a processor family and book title. Download archive (.ZIP) files, one
for each manual. Use any archive management software, such as WinZip,
to decompress downloaded files.
Printed Manuals
For general questions regarding literature ordering, call the Literature
Center at 1-800-ANALOGD (1-800-262-5643) and follow the prompts.
VisualDSP++ Documentation Set
To purchase VisualDSP++ manuals, call 1-603-883-2430. The manuals
may be purchased only as a kit.
If you do not have an account with Analog Devices, you are referred to
Analog Devices distributors. For information on our distributors, log onto
To purchase EZ-KIT Lite™ and In-Circuit Emulator (ICE) manuals,
call 1-603-883-2430. The manuals may be ordered by title or by product
number located on the back cover of each manual.
Processor Manuals
Hardware reference and instruction set reference manuals may be ordered
through the Literature Center at 1-800-ANALOGD (1-800-262-5643),
or downloaded from the Analog Devices Web site. Manuals may be
ordered by title or by product number located on the back cover of each
manual.
Data Sheets
All data sheets (preliminary and production) may be downloaded from the
Analog Devices Web site. Only production (final) data sheets (Rev. 0, A,
B, C, and so on) can be obtained from the Literature Center at
1-800-ANALOGD (1-800-262-5643); they also can be downloaded from
the Web site.
To have a data sheet faxed to you, call the Analog Devices Faxback System
at 1-800-446-6212. Follow the prompts and a list of data sheet code
numbers will be faxed to you. If the data sheet you want is not listed,
check for it on the Web site.
Text conventions used in this manual are identified and described as
follows.
ExampleDescription
Close command
(File menu)
{this | that}Alternative items in syntax descriptions appear within curly brackets
[this | that]Optional items in syntax descriptions appear within brackets and sepa-
[this,…]Optional item lists in syntax descriptions appear within brackets
.SECTIONCommands, directives, keywords, and feature names are in text with
filenameNon-keyword placeholders appear in text with italic style format.
L
Titles in reference sections indicate the location of an item within the
VisualDSP++ environment’s menu system (for example, the Close
command appears on the File menu).
and separated by vertical bars; read the example as this or that. One
or the other is required.
rated by vertical bars; read the example as an optional
delimited by commas and terminated with an ellipse; read the example
as an optional comma-separated list of
letter gothic font.
Note: For correct operation, ...
A Note: provides supplementary information on a related topic. In the
online version of this book, the word Note appears instead of this
symbol.
this.
this or that.
Preface
Caution: Incorrect device operation may result if ...
Caution: Device damage may result if ...
a
[
A Caution: identifies conditions or inappropriate usage of the product
that could lead to undesirable results or product damage. In the online
version of this book, the word Caution appears instead of this symbol.
Warn in g: Injury to device users may result if ...
A Warning: identifies conditions or inappropriate usage of the product
that could lead to conditions that are potentially hazardous for devices
users. In the online version of this book, the word Wa rnin g appears
instead of this symbol.
Additional conventions, which apply only to specific chapters, may
The ADSP-TS201 TigerSHARC Processor Hardware Reference describes the
ADSP-TS201 TigerSHARC processor architecture and hardware system
support features. These descriptions provide the information required for
designing and configuring TigerSHARC processor systems.
As shown in Figure 1-1 and Figure 1-2, the processor architecture consists
of two divisions: the processor core (where instructions execute) and the
I/O peripherals (where data is stored and off-chip I/O is processed).
This chapter provides a high level description of the processor core and
peripherals architecture. More detailed descriptions appear in related
chapters.This chapter introduces the following section on processor
architecture:
The ADSP-TS201 processor is a 128-bit, high performance TigerSHARC
processor. The ADSP-TS201 processor sets a new standard of performance for digital signal processors, combining multiple computation units
for floating-point and fixed-point processing as well as very wide word
widths. The ADSP-TS201 processor maintains a ‘system-on-chip’ scalable
computing design philosophy, including 24M bit of on-chip DRAM, six
4K word caches (one per memory block), integrated I/O peripherals, a
host processor interface, DMA controllers, LVDS link ports, and shared
bus connectivity for glueless multiprocessing.
In addition to providing unprecedented performance in DSP applications
in raw MFLOPS and MIPS, the ADSP-TS201 processor boosts performance measures such as MFLOPS/Watt and MFLOPS/square inch in
multiprocessing applications.
As shown in Figure 1-1 and Figure 1-2, the processor has the following
architectural features:
•Dual computation blocks: X and Y – each consisting of a multiplier, ALU, CLU, shifter, and a 32-word register file
•Dual integer ALUs: J and K – each containing a 32-bit IALU and
32-word register file
•Program sequencer – Controls the program flow and contains an
instruction alignment buffer (IAB) and a branch target buffer
(BTB)
•Three 128-bit buses providing high bandwidth connectivity
between internal memory and the rest of the processor core (compute blocks, IALUs, program sequencer, and SOC interface)
•A 128-bit bus providing high bandwidth connectivity between
internal memory and external I/O peripherals (DMA, external
port, and link ports)
•External port interface including the host interface, SDRAM controller, static pipelined interface, four DMA channels, four LVDS
link ports (each with two DMA channels), and multiprocessing
support
•24M bits of internal memory organized as six 4M bit blocks—each
block containing 128K words x 32 bits; each block connects to the
crossbar through its own buffers and a 128K bit, 4-way set associative cache
•Debug features
•JTAG Test Access Port
Figure 1-3 illustrates a typical single processor system. A multiprocessor
system is illustrated in Figure 1-4 and is discussed later in “Multiprocess-
ing” on page 1-30.
The TigerSHARC processor includes several features that simplify system
development. The features lie in three key areas:
•Support of IEEE floating-point formats
•IEEE Standard 1149.1 Joint Test Action Group (JTAG) serial scan
path and on-chip emulation features
•Architectural features supporting high level languages and operating systems
The processor core is the part of the processor that executes instructions.
The following discussion provides a some details on the processor core
architecture. For more information on the processor core, see related
chapters in the ADSP-TS201 TigerSHARC Processor Programming Refer-ence. This section describes:
•“Compute Blocks” on page 1-10
•“Arithmetic Logic Unit (ALU)” on page 1-12
•“Communications Logic Unit (CLU)” on page 1-12
•“Multiply Accumulator (Multiplier)” on page 1-12
•“Bit Wise Barrel Shifter (Shifter)” on page 1-13
•“Integer Arithmetic Logic Unit (IALU)” on page 1-13
•“Program Sequencer” on page 1-15
•“Processor Core Controls” on page 1-17
High performance is facilitated by the ability to execute up to four 32-bit
wide instructions per cycle. The TigerSHARC processor uses a variation
of a Static Superscalar™ architecture to allow the programmer to specify
which instructions are executed in parallel in each cycle. The instructions
do not have to be aligned in memory so that program memory is not
wasted.
The 24M bit internal memory is divided into six 128K word memory
blocks. Each of the four internal address/data bus pairs connect to all of
the six memory blocks via a crossbar interface. The six memory blocks
support up to four accesses every cycle where each memory block can perform a 128-bit access in a cycle. Each block’s cache and prefetch
mechanism improve access performance of internal memory (embedded
DRAM).
The external port cluster bus is 64 bits wide. The high I/O bandwidth
complements the high processing speeds of the core. To facilitate the high
clock rate, the ADSP-TS201 processor uses a pipelined external bus with
programmable pipeline depth for interprocessor communications and for
Synchronous Flow-through SRAM (SSRAM) and SDRAM.
The four LVDS link ports support point-to-point high bandwidth data
transfers. Each link port supports full-duplex communication.
The processor operates with a two cycle arithmetic pipeline. The branch
pipeline is four to ten cycles. A branch target buffer (BTB) is implemented
to reduce branch delay.
During compute intensive operations, one or both integer ALUs compute
or generate addresses for fetching up to two quad operands from two
memory blocks, while the program sequencer simultaneously fetches the
next quad instruction from a third memory block. In parallel, the computation units can operate on previously fetched operands while the
sequencer prepares for a branch.
While the core processor is doing the above, the DMA channels can be
replenishing the internal memories in the background with quad data
from either the external port or the link ports.
The processing core of the ADSP-TS201 processor reaches exceptionally
high DSP performance through using these features:
•Computation pipeline
•Dual computation units
•Execution of up to four instructions per cycle
•Access of up to eight words per cycle from memory
The two identical computation units support floating-point as well as
fixed-point arithmetic. These units (compute blocks) perform up to 6
floating-point or 24 fixed-point operations per cycle.
Each multiplier and ALU unit can execute four 16-bit fixed-point operations per cycle, using Single-Instruction, Multiple-Data (SIMD)
operation. This operation boosts performance of critical imaging and signal processing applications that use fixed-point data.
Compute Blocks
The TigerSHARC processor core contains two computation units called
compute blocks. Each compute block contains a register file and four independent computation units—an ALU, a CLU, a multiplier, and a shifter.
For meeting a wide variety of processing needs, the computation units
process data in several fixed- and floating-point formats.
These formats are listed here and shown in Figure 1-5:
•Fixed-point format
These include 64-bit long word, 32-bit normal word, 32-bit complex (16-bit real and 16-bit imaginary), 16-bit short word, and
8-bit byte word. For short word fixed-point arithmetic, quad parallel operations on quad-aligned data allow fast processing of array
data. Byte operations are also supported for octal-aligned data.
•Floating-point format
These include 32-bit normal word and 40-bit extended word.
Floating-point operations are single or extended precision. The
normal word floating-point format is the standard IEEE format,
and the 40-bit extended-precision format occupies a double word
(64 bits) with eight additional least significant bits (LSBs) of
mantissa for greater accuracy.
Each compute block has a general-purpose, multiport, 32-word data register file for transferring data between the computation units and the data
buses and storing intermediate results. All of these registers can be
accessed as single-, double-, or quad-aligned registers. For more information on the register file, see Chapter 2, “Compute Block Registers” in the
ADSP-TS201 TigerSHARC Processor Programming Reference.
The ALU performs arithmetic operations on fixed-point and floating-point data and logical operations on fixed-point data. The source and
destination of most ALU operations is the compute block register file. For
more information on ALU features, see Chapter 3, “ALU” in the
On the ADSP-TS201 processor, there is a special purpose compute unit
called the communications logic unit (CLU). The CLU instructions are
designed to support different algorithms used for communications applications. The algorithms that are supported by the CLU instructions are:
•Viterbi Decoding
•Turbo code Decoding
•Despreading for code division multiple access (CDMA) systems
•Cross correlations used for path search
For more information on CLU features, see Chapter 4, “CLU” in the
ADSP-TS201 TigerSHARC Processor Programming Reference.
Multiply Accumulator (Multiplier)
The multiplier performs fixed-point or floating-point multiplication and
fixed-point multiply/accumulate operations. The multiplier supports several data types in fixed- and floating-point formats. The floating-point
formats are float and float-extended, as in the ALU. The source and destination of most operations is the compute block register file.
The ADSP-TS201 processor’s multiplier supports complex multiply-accumulate operations. Complex numbers are represented by a pair of 16-bit
short words within a 32-bit word. The LSBs of the input operand repre-
sent the real part, and the most significant bits (MSBs) of the input
operand represent the imaginary part. For more information on multiplier
features, see Chapter 5, “Multiplier” in the ADSP-TS201 TigerSHARC Processor Programming Reference.
Bit Wise Barrel Shifter (Shifter)
The shifter performs logical and arithmetic shifts, bit manipulation, field
deposit, and field extraction. The shifter operates on one 64-bit, one or
two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands. Shifter operations include:
•Shifts and rotates from off-scale left to off-scale right
•Bit manipulation operations, including bit set, clear, toggle and
test
•Bit field manipulation operations, including field extract and
deposit, using register
BFOTMP (which is internal to the shifter)
•Bit FIFO operations to support bit streams with fields of varying
length
•Support for ADSP-21000 DSP family compatible
fixed-point/floating-point conversion operations (such as exponent
extract, number of leading ones or zeros)
For more information on shifter features, see Chapter 6, “Shifter” in the
ADSP-TS201 TigerSHARC Processor Programming Reference.
Integer Arithmetic Logic Unit (IALU)
The IALUs can execute standard standalone ALU operations on IALU
register files. The IALUs also execute register load, store, and transfer
operations, providing memory addresses when data is transferred between
memory and registers. The processor has dual IALUs (the J-IALU and the
K-IALU) that enable simultaneous addresses for two transactions of up to
128 bits in parallel. The IALUs allow compute operations to execute with
maximum efficiency because the computation units can be devoted exclusively to processing data.
Each IALU has a multiport, 32-word register file. All IALU calculations
are executed in a single cycle. The IALUs support pre-modify with no
update and post-modify with update address generation. Circular data
buffers are implemented in hardware. The IALUs support the following
types of instructions:
•Regular IALU instructions
•Move Data instructions
•Load Data instructions
•Load/Store instructions with register update
•Load/Store instructions with immediate update
For indirect addressing (instructions with update), one of the registers in
the register file can be modified by another register in the file or by an
immediate 8- or 32-bit value, either before (pre-modify) or after
(post-modify) the access. For circular buffer addressing, a length value can
be associated with the first four registers to perform automatic modulo
addressing for circular data buffers; the circular buffers can be located at
arbitrary boundaries in memory. Circular buffers allow efficient implementation of delay lines and other data structures, which are commonly
used in digital filters and Fourier transformations. The ADSP-TS201 processor circular buffers automatically handle address pointer wraparounds,
reducing overhead and simplifying implementation.
The IALUs also support bit reverse addressing, which is useful for the FFT
algorithm. Bit reverse addressing is implemented using a reverse carry
addition that is similar to regular additions, but the carry is taken from the
upper bits and is driven into lower bits.
The IALU provides flexibility in moving data as single, dual, or quad
words. Every instruction can execute with a throughput of one per cycle.
IALU instructions execute with a single cycle of latency. Normally, there
are no dependency delays between IALU instructions, but if there are,
four cycles of latency can occur.
For more information on IALU features, see Chapter 7, “IALU” in the
ADSP-TS201 TigerSHARC Processor Programming Reference.
Program Sequencer
The program sequencer supplies instruction addresses to memory and,
together with the IALUs, allows compute operations to execute with maximum efficiency. The sequencer supports efficient branching using the
branch target buffer (BTB), which reduces branch delays for conditional
and unconditional instructions. The two responsibilities of the sequencer
are to decode fetched instructions—separating the instruction slots of the
instruction line and sending each instruction to its execution unit (compute blocks, IALUs, or sequencer)—and to control the program flow. The
sequencer’s control flow instructions divide into two types:
•Control flow instructions. These instructions are used to direct pro-
gram execution by means of jumps and to execute individual
instructions conditionally.
•Immediate extension instructions. These instructions are used to
extend the numeric fields used in immediate operands for the
sequencer and the IALU.
•Direct jumps and calls based on an immediate address operand
specified in the instruction encoding. For example, ‘
jump 100;
true.
•Indirect jumps based on an address supplied by a register. The
instructions used for specifying conditional execution of a line are a
subcategory of indirect jumps. For example, ‘if <cond> cjmp;’ is a
jump to the address pointed to by the CJMP register.
’ always jumps to address 100, if the <cond> evaluates as
if <cond>
L
Immediate extensions are associated with IALU or sequencer (control
flow) instructions. These instructions are not specified by the programmer, but are implied by the size of the immediate data used in the
instructions. The programmer must place the instruction that requires an
immediate extension in the first instruction slot and leave an empty
instruction slot in the line (use only three slots), so the assembler can place
the immediate extension in the second instruction slot of the instruction
line.
L
The ADSP-TS201 processor achieves its fast execution rate by means of a
ten-cycle pipeline.
Two stages of the sequencer’s pipeline actually execute in the computation
units. The computation units perform single cycle operations with a
two-cycle computation pipeline, meaning that results are available for use
two cycles after the operation is begun. Hardware causes a stall if a result is
not available in a given cycle (register dependency check). Up to two com-
The control flow instruction must use the first instruction slot in
the instruction line.
Note that only one immediate extension may be in a single instruction line.
putation instructions per compute block can be issued in each cycle,
instructing the ALU, multiplier, or shifter to perform independent, simultaneous operations.
The branch penalty in a deeply pipelined processor, such as the
ADSP-TS201 processor, can be compensated for by using a branch target
buffer (BTB) and branch prediction. The branch target address is stored
in the BTB. When the address of a jump instruction, which is predicted
by the user to be taken in most cases, is recognized (the tag address), the
corresponding jump address is read from the BTB and is used as the jump
address on the next cycle. Thus, the latency of a jump is reduced from five
to nine wasted cycles to zero wasted cycles. If this address is not stored in
the BTB, the instruction must be fetched from memory.
Other instructions also use the BTB to speed up these types of branches.
These instructions are interrupt return, call return, and computed jump
instructions.
For more information on the sequencer, BTB, and immediate extensions,
see Chapter 8, “Program Sequencer” in the ADSP-TS201 TigerSHARC Processor Programming Reference.
Processor Core Controls
There are few modes and controls required for ADSP-TS201
TigerSHARC processor operation because most operational features are
specified as part of instruction syntax. This section describes operational
modes for the processor that are not controlled as part of instruction
syntax. These controls include clock domain controls, operating mode
controls, and boot mode controls.
Clock Domains
The processor uses calculated ratios of the input system clock (SCLK)
clock to operate as shown in Figure 1-6. The instruction execution rate is
equal to the processor core clock (CCLK). A PLL from SCLK generates
CCLK which is phase locked. The link port clocks (
LxCLKOUT pins) are
generated from CCLK from a software programmable divisor, and the
SOC bus clock (SOCCLK) operates at 1/2 CCLK. Memory transfers to
external and link port buffers operate at the SOCCLK rate. The SCLK
also provides the clock input for the external bus interface and defines the
AC specification reference for the external bus signals. The external bus
interface runs at the SCLK frequency.
EXTERNAL INTERFACE
SCLK
SCLKRATx
LCTLx REGISTER
PLL
/2
/CR
SPD BITS,
CCLK
(INSTRUCTION RATE)
SOCCLK
(PERIPHERAL BUS RATE)
LxCLKOUT
(LINK OUTPUT RATE)
Figure 1-6. Clock Domains
The SCLKRATx pins define the clock multiplication of SCLK to CCLK. For
information on the SCLKRATx pins settings and the maximum SCLK frequency, see the ADSP-TS201 TigerSHARC Embedded Processor Data Sheet.
The clock domains particularly influence the performance of operations
that move data between domains over the processor’s SOC bus. For more
information, see “SOC Interface” on page 1-24 and “SOC Interface” on
page 3-1.
Operation Modes
The TigerSHARC processor operates in one of three modes—user,
supervisor, and emulator. In user and supervisor modes, all instructions
are executed normally. In user mode, however, the register access is limited. Regardless of the operation mode (emulation, supervisor, or user), all
instructions are executed normally.
The current operating mode of the TigerSHARC processor affects which
components of the processor are active and can be accessed. The mode
also affects which exceptions are taken and how they are handled. The
mode priorities from lowest priority to highest priority are: User Mode,
Supervisor Mode, and Emulator Mode.
User Mode
The user mode operation is used for algorithms and control code that does
not require the manipulation of system resources. Many system resources
are not accessible in user mode. If the TigerSHARC processor attempts to
access these resources, an exception occurs.
User mode is often used when running code out of an operating system.
The operating system kernel runs in supervisor mode, but the user code is
restricted to user mode.
The registers that may be accessed by core program in user mode are:
•Register groups 0x00 to 0x09 – compute block registers
•Register groups 0x0C to 0x0F – IALU registers
•Sequencer registers –
CJMP, loop counter registers LC0 and LC1, and
the static flag register (SFREG)
All other registers cannot be written by the core program in user mode. An
attempt to write to a protected register causes an exception. These registers can still be accessed by another master (DMA, external host, and
others).
Most code that is not intended to run under an operating system should
be designed to run in supervisor mode. Supervisor mode allows the program to access all processor resources. The TigerSHARC processor is in
supervisor mode when one of these two conditions is true:
•The
NMOD bit in SQCTL is set. For more information, see “Sequencer
Control (SQCTL) Register” on page 2-29.
•An interrupt routine is executed—indicated by non-zero PMASK.
For more information, see “PMASK Register” on page 6-6.)
Normally when the TigerSHARC processor is reset (via hardware or software), it goes into idle state. It exits the idle state to a running state when
an interrupt is issued. The interrupt puts the TigerSHARC processor into
supervisor mode.
If the NMOD bit in SQCTL is cleared, the processor enters user mode after
leaving an interrupt routine (unless it is nested inside another interrupt
routine).
Emulator Mode
Emulator mode is used when controlling the processor with an emulator
tool via the JTAG port. The TigerSHARC processor enters emulation
mode when an emulation exception is generated. An emulation exception
is generated after any one of these events:
EMUTRAP instruction
•
•Watchpoint when programmed to cause emulation trap
•JTAG private instruction
•TMS rising while TEME bit in EMUCTL is set
Emulation exceptions are the highest priority exceptions or interrupts.
While the TigerSHARC processor is in emulator mode, the only source of
instructions is the
EMUIR register. The EMUIR register is loaded via the
JTAG Test Access Port (TAP). When entering this mode, the external
JTAG controller (Analog EZ-ICE or other customer hardware) must be
enabled. For more information, see “Sequencer Control (SQCTL) Regis-
ter” on page 2-29.
When the emulation features are enabled and an emulation exception is
encountered, the TigerSHARC processor enters emulation mode. When
the TigerSHARC processor is operating in emulator mode, the only way it
can exit emulator mode is by executing a return from interrupt (RTI).
In emulator mode, the debug registers (register group 0x1B) can be
accessed only by move register-to-register or immediate data load instructions. These registers cannot be loaded from or stored to memory directly.
In emulation mode, the program access to the debug registers (register
groups 0x1B and 0x3D) may be executed only using the instruction:
Ureg = Ureg ;
The BTB, cycle count, performance monitors, and trace buffer are inactive
in emulation mode. Except for if true—RTI (np), all control flow
instructions (jump, call, RDS, conditional, and so on.) and the IDLE
instruction cannot be used in emulator mode. The RTI instruction can
only be used without condition.
If the external reset pin is being asserted, and an external emulation exception is generated, the TigerSHARC processor core exits reset internally,
enters emulator mode, and waits to fetch the value of EMUIR, which is
loaded from the JTAG. In this case, only the core processor operates. The
external interfaces (external port, link ports, DMA, and others) are still
held in reset, although their internal registers can be accessed by the
instructions inserted via JTAG and
EMUIR. This feature enables an EZ-ICE
to initialize the TigerSHARC processor internal registers and memory
while it is still in reset and start its run from a known state.
The internal memory of the TigerSHARC processor can be boot loaded
from an 8-bit EPROM, a host processor, or a link port using a boot mechanism at system power up. The TigerSHARC processor can also be run
(no boot) from a memory location at startup. Selection of the boot source
is controlled by the
boot modes, see “Processor Booting Methods” on page 11-2.
BMS strap (external) pins. For more information on
Memory, Registers, and Buses
The on-chip memory consists of six blocks of 4M bits each. Each block is
128K words, thus providing high bandwidth sufficient to support both
computation units, the instruction stream and external I/O, even in very
intensive operations. The ADSP-TS201 processor provides access to program, two data operands, and a system access (over the SOC bus) to
different memory blocks without memory or bus constraints. The memory blocks can store instructions and data interchangeably.
Each memory block is organized as 128K words of 32 bits each. The
accesses are pipelined to meet one clock cycle access time needed by the
core, DMA, or by the external bus. Each access can be up to four words.
The six memory blocks are a resource that must be shared between the
compute blocks, the IALUs, the sequencer, the external port, and the link
ports. In general, if during a particular cycle more than one unit in the
processor attempts to access the same memory, one of the competing units
is granted access, while the other is held off for further arbitration until
the following cycle—see “Bus Arbitration Protocol” on page 8-38. This
type of conflict only has a small impact on performance due to the very
high bandwidth afforded by the internal buses.
An important benefit of large on-chip memory is that by managing the
movement of data on and off chip with DMA, a system designer can realize high levels of determinism in execution time. Predictable and
deterministic execution time is a central requirement in DSP and
real-time systems.
Internal Buses
The processor core has three buses (I-bus, J-bus, and K-bus), each connected to all of the internal memory blocks via a crossbar interface. These
buses are 128 bits wide to allow up to four instructions, or four aligned
data words, to be transferred in each cycle on each bus. On-chip system
elements use the SOC bus and S-bus to access memory. Only one access to
each memory block is allowed in each cycle, so if the application accesses a
different memory segment for each purpose (instruction fetch, load/store
J-IALU and K-IALU instructions and external accesses), all transactions
can be executed with no stalls.
Most registers of the ADSP-TS201 processor are classified as universal
registers (Ureg ). Instructions are provided for transferring data between
any two Ureg registers, between a Ureg and memory, or for the immediate
load of a Ureg .This includes control registers and status registers, as well
as the data registers in the register files. These transfers occur with the
same timing as internal memory load/store. All registers can be accessed by
register-move instructions or by external master access (another
ADSP-TS201 on the same cluster bus or host), but only the core registers
can be accessed by load/store instructions or load-immediate instructions.
Internal Registers
Most registers of the ADSP-TS201 processor are classified as universal
registers (
any two
Ureg). Instructions are provided for transferring data between
Ureg registers, between a Ureg and memory, or for the immediate
SOC Interface
load of a
as the data registers in the register files. These transfers occur with the
same timing as internal memory load/store.
Ureg. This includes control registers and status registers, as well
SOC Interface
A separate system on chip (SOC) bus connects the external interfaces
(external port, DMA, link ports, JTAG port, and others) to the memory
system via the SOC interface and S-bus. This bus has a 128-bit wide data
bus and a 32-bit wide address bus. It works at half the processor core clock
rate. All data transferred between internal memory or core, and external
port (cluster bus, link port, and others) passes through this bus.
As shown in Figure 1-7, the SOC bus and other buses operate at the speed
of the clock in their clock domain. Data that moves from one clock
domain to another (for example, a DMA to or from internal memory)
must pass through arbitration and synchronization at the border of each
domain. If multiple masters are requesting a bus, this arbitration can
become an important factor in system performance. For more information, see “Clock Domains” on page 1-17 and “SOC Interface” on
page 3-1.
Timers
The TigerSHARC processor has two general-purpose 64-bit timers—
Timer 0 and Timer 1. The timers are free-running counters that are
loaded with an initial value and give an indication when expiring. The
indication is normally an interrupt, but could also be an external pin
(TMR0E) for Timer 0. For more information, see “Timers” on page 4-1.
There are four input/output flag pins. Each pin can be individually configured to be an input or output. When they are configured as input flag
pins, they can be used either as a condition or as a value in the
ister. (See “Flag Control (FLAGREG) Register” on page 2-28.) After
powerup reset,
them at logic high value. For more information, see “Flags” on page 5-1.
FLAG3–0 are inputs where static 5 kΩ pull-up resistors hold
Interrupts
The ADSP-TS201 processor has four general-purpose external interrupts,
IRQ3-0. The processor also has internally generated interrupts for the two
timers, DMA channels, link ports, arithmetic exceptions, multiprocessor
vector interrupts, and user-defined software interrupts. Interrupts can be
nested through instruction commands. Interrupts have a short latency and
do not abort currently executing instructions. Interrupts vector directly to
a user-supplied address in the interrupt table register file, removing the
overhead of a second branch. For more information, see “Interrupts” on
page 6-1.
Direct Memory Access
The TigerSHARC processor on-chip DMA controller allows zero-overhead data transfers without processor intervention. The TigerSHARC
processor can simultaneously fetch instructions and access two memories
for data without relying on data or instruction caches. The DMA controller operates independently of the processor core, supplying addresses for
internal and external memory access. The DMA channels, therefore, are
not part of the core processor from a programming point of view.
The processor core has four buses, each one connected to one of the internal bus masters (J-IALU, K-IALU, program sequencer, and SOC
interface). Each bus can connect to any memory block. These buses are
128 bits wide to allow up to four instructions, or four aligned data words,
to be transferred in each cycle on each bus. On-chip system elements also
use these buses to access memory. Only one access to each memory block
is allowed in each cycle, so DMA or external port transfers must compete
with core accesses on the same block. Because of the large bandwidth
available from each memory block, not all the memory bandwidth can be
used by the core units, which leaves some memory bandwidth available for
use by the processor’s DMA processes or by the bus interface to serve
other DSPs’ external cluster bus master transfers to the TigerSHARC processor’s memory.
Both code and data can be downloaded to the TigerSHARC processor
using DMA transfers, which can occur between the following.
•TigerSHARC processor internal memory and external memory,
external peripherals or a host processor
•External memory and external peripheral devices
•External memory and link ports or between two link ports
Six DMA channels (four external port DMA channels and two autoDMA
channels) are available on the TigerSHARC processor for data transfers
through the external port. Eight DMA channels are available for link data
transfers (two per link).
Asynchronous off-chip peripherals can control any one of four DMA
channels using DMA request lines (
DMAR3–0). Other DMA features
include flyby (for channel 0 only), interrupt generation upon completion
of DMA transfers, and DMA chaining for automatically linked DMA
transfers.
For more information on DMA, see “Direct Memory Access” on page 7-1.
The TigerSHARC processor external port provides an interface to external
memory, to memory-mapped I/O, to host processor, and to additional
TigerSHARC processors. The external port performs external bus arbitration and supplies control signals to shared, global memory, SDRAM, and
I/O devices.
The external port cluster bus can be configured to be either 32 or 64 bits
wide. The high I/O bandwidth complements the high processing speeds of
the core. To facilitate the high clock rate, the TigerSHARC processor uses
a pipelined external bus protocol with programmable pipeline depth for
external memory access and host communication. For more information
on the external port, see “External Port and SDRAM Interface” on
page 8-1.
External Bus and Host Interface
The TigerSHARC processor external port (EP) provides an interface
between the core processor and the 32/64-bit parallel external bus. The
external port contains FIFOs that maintain the throughput of an external
bus that is shared with multiple processors and peripherals—each of which
may operate at speeds other than that of the core.
The most effective way to access external data in the TigerSHARC processor is through the DMA. This runs in the background, allowing the core
to continue processing while new data is read in or processed data is written out. Multiple DMA data streams can occur simultaneously, and the
use of FIFOs helps to maintain throughput in the system.
Burst accesses are provided through the
device on the bus to accept the first address and then automatically increment that address as successive data words arrive.
The TigerSHARC processor external port provides the processor interface
to off-chip memory and peripherals. The off-chip memory and peripherals
are included in the TigerSHARC processor unified address space. The
separate on-chip buses are multiplexed at the external port to create an
external system bus with a single 32-bit address bus and a single 64-bit
data bus. External memory and devices can be either 32 or 64 bits wide.
The TigerSHARC processor automatically packs external data into either
32-, 64-, or 128-bit word widths, the latter being more efficient for reducing memory access conflicts.
On-chip decoding of high order address lines (to generate memory block
select signals) facilitates addressing of external memory devices. Separate
control lines are also generated for simplified addressing of page mode
DRAM.
The TigerSHARC processor uses the address on the external port bus to
pipeline the data. This allows interfacing to synchronous DRAM and
speeds up interprocessor accesses. An option allows asynchronous operation for slower devices.
External data can be accessed by DMA channels or by the core. For core
accesses, the read latency can be significant—eight or more cycles. The
core provides I/O buffering by stalling if the data is accessed before the
data is loaded in a universal register (Ureg).
Programmable memory wait states permit peripherals with different
access, hold, and disable time requirements.
External shared memory resources are assigned between processors by
using semaphore operations.
The ADSP-TS201 processor, like the ADSP-TS101 processor, is designed
for multiprocessing applications. The primary multiprocessing architecture supported is a cluster of up to eight TigerSHARC processors that
share a common bus, a global memory, and an interface to either a host
processor or to other clusters. In large multiprocessing systems, this cluster
can be considered an element and connected in configurations such as torroid, mesh, tree, crossbar, or others. The user can provide a personal
interconnect method or use the on-chip communication ports.
The ADSP-TS201 processor includes the following multiprocessing
capabilities:
•On-chip bus arbitration for glueless multiprocessing
•Globally accessible internal memory and registers
•Semaphore support
•Powerful, in-circuit multiprocessing emulation
The TigerSHARC processor offers features tailored to multiprocessing
systems:
•The unified address space allows direct interprocessor accesses of
each TigerSHARC processor internal memory and resources.
•Distributed bus arbitration logic is included on chip for glueless
connection of systems containing up to eight TigerSHARC processors and a host processor.
•Bus arbitration rotates, except for host requests that always hold
the highest priority.
•Processor bus lock allows indivisible read-modify-write sequences
for semaphores.
•A vector interrupt capability is provided for interprocessor
commands.
•Broadcast writes allow simultaneous transmissions of data to all
TigerSHARC processors.
Host Interface
Connecting a host processor to a cluster of TigerSHARC processors is
simplified by the memory-mapped nature of the interface bus and the
availability of special host bus request signals.
A host that is able to access a pipelined memory interface can be easily
connected to the parallel TigerSHARC processor bus. All the internal
memory, Uregs, and resources within the TigerSHARC processor, such as
the DMA control registers and the internal memory, are accessible to the
host.
The host interface is through the TigerSHARC processor external address
and data bus, with additional lines being provided for host control. The
protocol is similar to the standard TigerSHARC processor pipelined bus
protocol.
The host becomes bus master of the cluster by asserting the Host Bus
Request (
HBR) signal. Host Bus Grant (HBG) is returned by the
TigerSHARC processors when the current master grants bus by asserting
HBR. The host interface is synchronous, and can be delayed a number of
cycles to allow slow host access. The host can also access external memory
directly.
All DMA channels are accessible to the host interface, allowing code and
data transfers to be accomplished with low software overhead. The host
can directly read and write the internal memory of the TigerSHARC processor and can access the DMA channel setup. Vector interrupt support is
provided for efficient execution of host commands and burst-mode
transfers.
The TigerSHARC processor has four link ports that provide four-bit
receive and four-bit transmit I/O capabilities in multiprocessing systems.
The link ports have the following characteristics.
•Link clock speed is selectable as either x1/4, x1/2, x2/3, or x1 of
internal clock frequency.
•Link port data is packed into 128-bit words for DMA transfer to
on- or off-chip memory.
•Each link port has its own buffer registers.
•Link port transfers are controlled by acknowledge handshaking.
•Link ports support full-duplex transfer and transfers to/from the
external port or other links.
For more information on the link ports, see “Link Ports” on page 9-1.
JTAG Port and Debug Interface
The ADSP-TS201 processor supports the IEEE Standard 1149.1 Joint
Test Action Group (JTAG) port for system test. This standard defines a
method for serially scanning the I/O status of each component in a system. The JTAG serial port is also used by the TigerSHARC processor
emulator to gain access to the processor’s on-chip emulation features.
For more information, see “JTAG Port and Test/Debug Interface” on
Refer to the ADSP-TS201 TigerSHARC Processor Programming Reference
for more programming information. Information available in the programming reference includes:
•Detailed chapters on each computation unit (ALU, CLU, multiplier, and shifter), IALU, program sequencer, and embedded
DRAM (internal memory) operation
•Complete reference information on all processor instructions
•Complete reference information on instruction parallelism rules
•All available reference information on other programming topics
This chapter describes the ADSP-TS201 TigerSHARC processor memory
and register map. For information on using registers for computations and
memory for register loads and stores, see the ADSP-TS201 TigerSHARC Processor Programming Reference. For information on using registers for
configuring the TigerSHARC processor’s peripherals, use the applicable
chapters in this book.
The ADSP-TS201 TigerSHARC processor has six internal memory blocks
as shown in the memory map (Figure 2-1). Each memory block consists of
4M bits of memory space, and is configured as 128K words, each 32 bits
in width. There are four separate internal 128-bit data buses, each can
access any of the memory blocks. Memory blocks can store instructions
and data interchangeably, with one access per memory block per cycle. If
the programmer ensures that program and data are in different memory
blocks, then data access can occur at the same time as program fetch. The
I/O processor has its own internal bus, so the I/O processor does not compete with the core for use of an internal bus. Thus in one cycle, up to four
128-bit transfers can occur within the core (two data transfers, one program instruction transfer, and one I/O processor interface transfer).
The TigerSHARC processor’s 32-bit address bus provides an address space
of four gigawords. This address space is common to a cluster of
TigerSHARC processors that share the same cluster bus. This chapter
defines the memory map of each TigerSHARC processor in the system
and indicates where the memory space defines the location of each element. The zones in the memory space are made up of the following
regions.
•External memory bank space—the region for standard addressing
of off-chip memory, including SDRAM, bank 0 (
MS0), bank 1
(MS1), and Host (MSH)
•External multiprocessor space—the on-chip memory of all other
TigerSHARC processors connected in a multiprocessor system
•Internal address space—the region for standard internal addressing
The global memory map is shown in Figure 2-1.
In addition to the direct accesses (normal - one 32-bit word, long - two
32-bit words, and quad - four 32-bit words), several other efficient methods are available. These include Broadcast Write, Merged Distribution,
and Broadcast Distribution. Broadcast Write is an external write to other
DSPs in a multiprocessor cluster. Merged and Broadcast Distribution are
internal access methods. For additional information on memory access
methods, see “IALU” in the ADSP-TS201 TigerSHARC Processor Program-ming Reference.
The host address space is the space defined for the host when it is accessed
as a slave. When referring to this space, the pipelined or asynchronous
protocol is used according to the host bits in the SYSCON register—for
additional information on the SYSCON register see Figure 2-35 on
page 2-74 and Figure 2-36 on page 2-75. The backoff signal is also effec-
tive on this zone—see “Back Off (BOFF) Pin” on page 8-48. The host
address is two gigawords and is divided into fields, as illustrated in
Table 2-1.
Table 2-1. Host Address Space
BitsNameDefinition
ADDR30–0AddressAddress in host range
ADDR31Host SelectDetermines the type of address.
If ADDR31=1, the address is in host address space
External Memory Bank Space
This memory space corresponds to off-chip memory and memory-mapped
I/O devices (SDRAM, I/O peripherals, and other standard memory
devices).
Normal external accesses are split into banks. One set of banks is for
SDRAM and is accessed using external memory select pins MSSD0, MSSD1,
MSSD2, and MSSD3; the access is executed in SDRAM protocol. Another set
of banks is identified by external memory select pins
the access protocol is user-configurable as pipelined or slow device protocol, and the parameters are defined by the
SYSCON register value—see
section “External Bus Interface Register Groups” on page 2-72.
The multiprocessing space maps the internal memory space of each
TigerSHARC processor in the cluster into any other TigerSHARC processor. This allows one processor to easily write to and read from other
processors in a multiprocessor system. Broadcast space allows write access
to all TigerSHARC processors in the cluster. Each TigerSHARC processor
in the cluster is identified by its ID. Valid processor ID values are 0
through 7.
The external multiprocessor address is divided into fields, as illustrated in
Table 2-3.
Table 2-3. Multiprocessor Space
BitsNameDefinition
ADDR25–0AddressInternal address space, as described in “Internal Address Space”
on page 2-7
ADDR30–26MS/PRIDMemory Select—select an external/multiprocessor memory
The TigerSHARC processor’s own internal space (including
ters) can be accessed via the multiprocessing space (including broadcast
space) for write transactions only.There are no interlocks. Because this is
performed through the external bus it should only be used in special cases
where data must pass through the TigerSHARC processor bus interface.
Ureg regis-
Internal Address Space
The internal address space corresponds to that processor’s own internal
address space, Ureg registers. The internal address space is illustrated in
Figure 2-1 on page 2-2.
Internal space is the space for transactions within the TigerSHARC processor and access to this memory space is not reflected on the cluster bus.
Internal address space is used to access the internal memory blocks, or any
of the Universal registers (Ureg). Universal registers are internal registers
that are mapped to the TigerSHARC processor memory map. Most software accessible registers are Ureg registers.
Access to Ureg registers as memory locations is only available through
multiprocessing space. Internal access to registers cannot be memory
mapped—in load/store instructions, for example, the address cannot point
to a register through the internal space. The DMA also cannot access a
register directly, although there is an exception—link receive DMA channels may write to other link transmit registers.
Frequently, this text refers to universal register groups and access techniques or restrictions that apply to particular groups. The processor’s
memory-mapped, universal registers are grouped according to their relationship to the processor’s architecture. For example, data registers
(XR31–0, YR31–0, or XYR31–0) for the compute blocks are in one register
group, while data registers (J30–0 or K30–0) for the integer ALUs (J-IALU
or K-IALU) are in another register group. Each register group corresponds
to a range of memory addresses because all registers in these groups are
memory mapped, universal registers.
The term universal (Ureg) register indicates several important features
about the register. First, the register is memory-mapped, indicating that
the register can be accessed by other processors in a multiprocessor system
through the register’s address. Second, the register contents can be loaded,
from memory, stored to memory, or transferred to or from other Ureg registers in the processor. Third, the register can be accessed as a single-,
dual-, or quad-register.
1
Multiprocessor accesses to another processor’s Ureg registers use the memory addresses that appear in the register group tables as part of the address,
but must also add the offset for a particular processor. (See the memory
map in Figure 2-1 on page 2-2.) The The following code examples show
multiprocessor accesses of another processor’s Ureg registers.
/* In the following multiprocessor register access example, one
processor in a multiprocessor system transfers the contents of
its XR1 register to processor P1’s XR0 register */
1
Some universal system registers (for example, sequencer register group 0x1A, DMA control/status register group 0x23, external bus interface register group 0x24, and JTAG register group 0x3D) are not
accessible as dual- or quad-registers.
/* P1_OFFSET_LOC is 0x1400 0000; the location of processor P1
*/
J1 = XR0_LOC ;;
/* XR0_LOC is 0x001E 0000; the location of register XR0 */
[J0 + J1] = XR1 ;;
/* transfers register XR1 data to register XR0 on proc. P1 */
For information on memory read and write instruction syntax, see the
“IALU Load, Store, and Transfer (Data Addressing) Operations” section
in Chapter 7, IALU, in the ADSP-TS201 TigerSHARC Processor Program-ming Reference.
L
using a multiprocessor access. Such an read access causes an illegal
access exception. A processor may make a multiprocessor write
access (such as a multiprocessor broadcast write) to its own Ureg
registers. For more information on exceptions, see “Handling
Exceptions” on page 6-15.
Load, store, or transfer Ureg register accesses within a processor use the
register’s name, not its memory-mapped address. The register names
appear in the register group tables. A register load access copies data from
the processor’s memory and places the data into the register. A register
store access copies data from the register and places the data into the processor’s memory. A register transfer (or move) access copies data from one
register to another. The following code examples show single-register load,
store, and transfer accesses.
/* In the following single-register load, store, and transfer
access examples, data is copied to and from single, 32-bit
registers */
XR0 = 0x76543210 ;; /* loads XR0 with immediate 32-bit data */
XR4 = [J31 + 0x43] ;; /* loads XR0 from memory */
[J0 + J4] = YR0 ;; /* stores YR0 to memory */
XR7 = SQSTAT ;; /* transfers SQSTAT contents to XR7 */
A processor may not make a read access of its own Ureg registers
For information on memory read and write instruction syntax, see the
“IALU Load, Store, and Transfer (Data Addressing) Operations” section
in Chapter 7, IALU, in the ADSP-TS201 TigerSHARC Processor Program-ming Reference.
L
store, or transfer accesses on an instruction line—greatly increasing
performance of routines that require loading values into many registers. For information on instruction parallelism, see the
“Instruction Parallelism Rules” section in Chapter 1, Introduction,
in the ADSP-TS201 TigerSHARC Processor Programming Reference.
Unlike the single-register accesses in the previous example that use the
registers name to access a single register, dual- and quad-register accesses
within a processor use a modified form of the register’s name to access mul-tiple, adjacent registers that are located on dual- or quad-word address
boundaries. The modification to the register’s name indicates a range of
registers, rather than a single register. For example the dual-register name
If using legal instruction parallelism, there can be up to two load,
XR1:0 indicates register R1 and R0 in the X compute block, and the
quad-register name J31:28 indicates registers J31, J30, J29, and J28 in the
J-IALU. Alignment to a dual- or quad-word address boundary refers to the
register number (and address) of the lowest register in the group; dual-register boundaries end in multiples of two (for example, R1:0, R3:2, R27:26);
quad-register boundaries end in multiples of four (for example, R3:0,
R7:4, R31:28). The following code examples show dual- and quad-register
load, store, and transfer accesses.
/* In the following dual- and quad-register load, store, and
transfer access examples data is copied to and from dual (64-bit)
and quad (128-bit) registers */
YR5:4 = L [J31 + 0x1ff0] ;; /* loads YR5:4 from memory */
XR3:0 = Q [K4 += K5] ;; /* loads XR3:0 from memory */
L [J0 + J2] = YR31:30 ;; /* stores YR31:30 to memory */
Q [K31 + K28] = YR7:4 ;; /* stores Y7:4 to memory */
For information on single-, dual-, and quad-register name instruction syntax, see the “Register File Registers” section in Chapter 2, Compute Block
Registers, in the ADSP-TS201 TigerSHARC Processor Programming
Reference.
L
Normally, a register load from memory or a register store to memory
access occurs between one register and a memory word of the same size—a
single-register load/store of a normal (32-bit) word, a dual-register
load/store of a long (64-bit) word, or a quad-register load/store of a quad
(128-bit) word. When one register is loaded/stored using a memory word
of the same size, the operation is called a normal read or write access. To
take advantage of single-instruction, multiple-data (SIMD) processing,
the processor’s bus architecture also supports types of accesses in which
the contents of one memory word can be loaded into two registers (broad-cast read access) and supports types of accesses in which multiple registers
can be loaded/stored with different data using one memory word (merged read or write access). For information on access types and data placement,
see the “Normal, Merged, and Broadcast Memory Accesses” section in
Chapter 7, IALU, in the ADSP-TS201 TigerSHARC Processor Program-ming Reference.
Register groups and register quad relationships can have important
influence on instruction parallelism. For information on instruction parallelism, see the “Instruction Parallelism Rules” section in
Chapter 1, Introduction, in the ADSP-TS201 TigerSHARC Proces-sor Programming Reference.
Note that broadcast read accesses only occur between memory and
registers within a processor, while multiprocessor broadcast write
accesses only occur simultaneously across all processors in a multiprocessor system.
Universal (Ureg) Register Space
The register space is composed of 64 register groups with up to 32 registers in each group. Register groups are defined in the range 0x3F–0
(63–0), where groups 0x1F–0 are accessible by all transfer instructions
(load immediate, move register, load and store), and groups 0x3F–0x20
are accessible only by move register instructions and direct accesses of
other masters.
The register groups are:
•Compute block register file – groups 0x00–0x09
•IALU registers – groups 0x0C–0x0F
•Debug feature registers – groups 0x0A, 0x1B
•Program sequencer – group 0x1A
•Memory controller – groups 0x1E–0x1F
•DMA registers – groups 0x20 – 0x23
•External port control/status registers – group 0x24
•Link port registers – groups 0x25 – 0x27
•Interrupt controller – groups 0x38, 0x39, and 0x3A
•JTAG registers – group 0x3D
•AutoDMA registers – group 0x3F
•Others – Reserved. These registers must not be accessed by applica-
tions since they could cause unexpected behavior by the
TigerSHARC processor.
There is a direct relationship between a Ureg register’s memory-mapped
address and the register group in which the register resides. (See
Figure 2-2.) Because register groups have different access restrictions, it is
always important to know the group to which a register belongs. When in
doubt, use the information in Figure 2-2 to determine the register group
number.
Register Group Number, Bits
Register Address, Bits
Figure 2-2. Register Group Number Versus Register Address
It is also important to note from a
whether the register resides in the processor core (internal Ureg registers)
or in the SOC interface (SOC Ureg registers). (See Figure 2-1 on
page 2-2.) Because register groups in the SOC interface have different
access restrictions than core registers, it is important to know the where
the register resides.
L
Some of the registers do not use all 32 bits. The unused bits are
reserved. When writing to register with reserved bits, these bits
must be written with to zero. When reading a register with reserved
bits, the reserved bits may be of any value.
Each compute block register file is a 32-word register. There are two such
register files, one in each of the compute blocks (X and Y). The register
file is aliased in several groups, some of which are applicable for data transfer instructions only. Some groups point to compute block X and some to
block Y; those that point to both compute blocks in parallel actually point
to the same register. The X and Y registers are memory mapped Universal
(Ureg) registers.
The Compute Block register types of accesses are listed with their memory
mapped addresses in Table 2-4, Table 2-5, Table 2-6, Table 2-7, and
Table 2-8. Note that Ureg register file groups for alternative access are
only for load/store instruction—DAB, circular buffer. For more information about alternate access, see the ADSP-TS201 TigerSHARC Processor Programming Reference.
IVEN—Exception enable on invalid floating-pt.
OEN—Exception enable on overflow status
UEN—Exception enable on underflow status
Reserved
00000 0 0 0
Figure 2-3. XSTAT/YSTAT (Upper) Register Bit Descriptions
Compute Block Status (XSTAT/YSTAT) Registers
The
XSTAT and YSTAT registers are 32-bit compute block status registers
that record the state of the compute block status flags. Every flag is
updated when an instruction belonging to its computation unit is completed. The X/YSTAT registers (reset value = 0x0000 0000) appear in
Figure 2-3 and Figure 2-4. For more information on using the X/YSTAT
registers, see the “Compute Block Registers” chapter in the ADSP-TS201 TigerSHARC Processor Programming Reference.
ALU Registers
The Parallel Results registers (
PR0 and PR1) are two 32-bit registers used
for sideways sum instructions. For more information regarding the PRx
registers, see “ALU” in the ADSP-TS201 TigerSHARC Processor Program-ming Reference.
Figure 2-4. XSTAT/YSTAT (Lower) Register Bit Descriptions
Multiplier Registers
The Multiplier Results registers (
MR3–0 and MR4) are used as accumulators
for the different types of fixed-point Multiply-Accumulate instructions.
For more information regarding the MRx registers, see “Multiplier” in the
ADSP-TS201 TigerSHARC Processor Programming Reference. These registers are accessed by different multiplier instructions.
Shifter Registers
The Bit FIFO Overflow register (BFOTMP) is a 64-bit register used in the
PUTBITS instruction. For more information regarding the BFOTMP register,
see the PUTBITS instruction in the ADSP-TS201 TigerSHARC Processor
Programming Reference.
(THR3–0) registers, and/or the Communication Control (CMCTL) register.
Each register is 32 bits wide. For more information regarding these registers, see the “CLU” chapter in the ADSP-TS201 TigerSHARC Processor Programming Reference.
IALU Register Groups
Each IALU has two Ureg groups. The first group is the general-purpose
register file, which includes 32 normal-word registers each. The second
group is the circular buffer register file which is used to specify parameters
for circular buffering. The IALU register groups are listed in Table 2-9,
Table 2-10, Table 2-11, and Table 2-12.
The circular buffer register files contain these registers:
•3–0 circular buffer base registers JB3–0 and KB3–0
•7–4 circular buffer length registers JL3–0 and KL3–0
JSTAT (for the J-IALU) and KSTAT (for the K-IALU) status registers
records the state of the carry flags and is updated as a result of various
IALU instructions. When being written to, the JSTAT register can be
referred to as J31. When used as an operand in an IALU arithmetic, logical, or function operation—J31 is treated as zero. The J31/JSTAT
(K31/KSTAT ) registers (reset value = 0x0000 0000) appear in Figure 2-5.
For information on using the J31/JSTAT (K31/KSTAT ) registers, see the
“IALU” chapter of the ADSP-TS201 TigerSHARC Processor Programming Reference.
This 32-bit register group is dedicated to sequencer registers, interrupt
control, and status registers. See Table 2-13. All sequencer registers are
accessible by single-word access only. For more information regarding the
sequencer registers, see “Program Sequencer” in the ADSP-TS201 TigerSHARC Processor Programming Reference.
Table 2-13. Sequencer Register Group
NameDescriptionAddressDefault
CJMPComputed jump address
ReservedN/A0x1E 0341N/A
RETIReturn from interrupt address0x1E 0342Undefined
RETIBRETI alias, for interrupt nesting
RETSReturn from subroutine address
DBGEReturn from emulation address
1
0x1E 0340Undefined
2
0x1E 0343Undefined
3
0x1E 0344Undefined
3
0x1E 0345Undefined
ReservedN/A0x1E 0346–0x1E 0347N/A
LC0Loop counter #0 0x1E 0348Undefined
LC1Loop counter #10x1E 0349Undefined
ReservedN/A0x1E 034A–0x1E 034FN/A
IVSWSoftware exception0x1E 0350Undefined
ReservedN/A0x1E 0351–0x1E 0353N/A
FLAGREGFlag control register0x1E 03540x0000 0000
FLAGREGST Flag register set0x1E 0355Undefined
FLAGREGCL Flag register clear0x1E 0356Undefined
ReservedN/A0x1E 0357N/A
SQCTLSequencer control register0x1E 03580x0000 0004
SQCTLSTSequencer control register set bits0x1E 0359Undefined
The sequencer is controlled and configured by writing to the
SQCTL regis-
ter. The SQCTL register (reset value = 0x0000 0004) appears in Figure 2-7.
For information on using the SQCTL register, see the “Program Sequencer”
chapter of the ADSP-TS201 TigerSHARC Processor Programming Reference.
Static flags are used as static copies of conditions. When the programmer
wishes to keep a condition value after another instruction changes its
value, it can be copied into the
SFREG and used later as a condition. All
static condition flags are grouped into the SFREG register. The SFREG register (reset value = 0x0000 0000) appears in Figure 2-8. For information on
using the SFREG register, see the “Program Sequencer” chapter of the
ADSP-TS201 TigerSHARC Processor Programming Reference.
1514131211109876543210
0 0 0 0 0 0 0 0 0 0000000
GSCF0 – IALU/global static flag 0
GSCF1 – IALU/global static flag 1
XSCF0 – X compute block static
flag 0
XSCF1 – X compute block static
flag 1
YSCF0 – Y compute block static
flag 0
YSCF1 – Y compute block static
flag 1
Reserved
Figure 2-8. SFREG (Lower) Register Bit Descriptions
SQCTLST bit is an alias used to write to SQCTL. When writing to this
address, the data written into SQCTL is the OR of the old register value and
the new data written into SQCTLST. A ‘1’ in any bit of the written data sets
the corresponding bit in SQCTL, while a ‘0’ in written data does not change
the bit value.
Sequencer Control Clear Bits (SQCTLCL) Register
The SQCTLCL bit is an alias used to write to SQCTL. When writing to this
address, the data written into SQCTL is the AND of the old register value
and the new data written into SQCTLCL. This way a ‘0’ in any bit of the
written data clears the corresponding bit in SQCTL, while a ‘1’ in written
data does not change the bit value.
Sequencer Status (SQSTAT) Register
This is a read-only register that holds information about the current status
of the sequencer. The SQSTAT register (reset value = 0x0000 FF04) appears
in Figure 2-9 and Figure 2-10. For information on using the SQSTAT register, see the “Program Sequencer” chapter of the ADSP-TS201 TigerSHARC Processor Programming Reference.
00=BTB disabled
01=BTB enabled
10=reserved
11=BTB enabled, locking on
Reserved
EMUL – Emulation interrupt, status
SW – Exception (software interrupt), status
GIE – Global Interrupt Enable, nest
FLGx – Indicates input value of flag pins if pin
is enabled as an input by its FLAGx_EN
bit; bit 16-FLAG0, bit 17-FLAG1,
bit 18-FLAG2, bit 19-FLAG3
Figure 2-9. SQSTAT (Upper) Register Bit Descriptions