Datasheet ADSP-TS101 Datasheet (ANALOG DEVICES)

Page 1
ADSP-TS101 TigerSHARC® Processor
Programming Reference
Revision 1.1, February 2005
Part Number
82-001997-01
a
Page 2
Copyright Information
© 2005 Analog Devices, Inc., ALL RIGHTS RESERVED. This docu­ment may not be reproduced in any form without prior, express written consent from Analog Devices, Inc.
Printed in the USA.
Disclaimer
Analog Devices, Inc. reserves the right to change this product without prior notice. Information furnished by Analog Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog Devices for its use; nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by impli­cation or otherwise under the patent rights of Analog Devices, Inc.
Trademark and Service Mark Notice
The Analog Devices logo, Blackfin, EZ-ICE, EZ-KIT Lite, SHARC, TigerSHARC, the TigerSHARC logo, and VisualDSP++ are registered trademarks of Analog Devices, Inc.
SuperScalar is a trademark of Analog Devices, Inc.
All other brand and product names are trademarks or service marks of their respective owners.
Page 3

CONTENTS

PREFACE

Purpose of This Manual ................................................................ xvii
Intended Audience ........................................................................ xvii
Manual Contents ......................................................................... xviii
What’s New in This Manual ........................................................... xix
Technical or Customer Support ....................................................... xx
Supported Processors ....................................................................... xx
Product Information ...................................................................... xxi
MyAnalog.com ........................................................................ xxii
Processor Product Information ................................................. xxii
Related Documents ............................................................... xxiii
Online Technical Documentation ........................................... xxiv
Accessing Documentation From VisualDSP++ ..................... xxv
Accessing Documentation From Windows ........................... xxv
Accessing Documentation From the Web ............................ xxvi
Printed Manuals ..................................................................... xxvi
VisualDSP++ Documentation Set ....................................... xxvi
Hardware Tools Manuals .................................................... xxvi
ADSP-TS101 TigerSHARC Processor Programming Reference iii
Page 4
CONTENTS
Processor Manuals ............................................................. xxvi
Data Sheets ...................................................................... xxvii
Conventions .............................................................................. xxviii

INTRODUCTION

DSP Architecture ......................................................................... 1-6
Compute Blocks ..................................................................... 1-8
Arithmetic Logic Unit (ALU) .............................................. 1-9
Multiply Accumulator (Multiplier) .................................... 1-11
Bit Wise Barrel Shifter (Shifter) ........................................ 1-11
Integer Arithmetic Logic Unit (IALU) ................................... 1-12
Program Sequencer ............................................................... 1-13
Quad Instruction Execution .............................................. 1-15
Relative Addresses for Relocation ...................................... 1-16
Nested Call and Interrupt ................................................. 1-16
Context Switching ............................................................ 1-16
Internal Memory and Other Internal Peripherals .................... 1-16
Internal Buses ................................................................... 1-17
Internal Transfer ............................................................... 1-18
Data Accesses ................................................................... 1-18
Quad Data Access ............................................................. 1-18
Booting ................................................................................ 1-19
Scalability and Multiprocessing ............................................. 1-19
Emulation and Test Support .................................................. 1-20
iv ADSP-TS101 TigerSHARC Processor Programming Reference
Page 5
CONTENTS
Instruction Line Syntax and Structure .......................................... 1-20
Instruction Notation Conventions ......................................... 1-22
Unconditional Execution Support .......................................... 1-23
Conditional Execution Support .............................................. 1-24
Instruction Parallelism Rules ....................................................... 1-24
General Restriction ................................................................ 1-36
Compute Block Instruction Restrictions ................................. 1-37
IALU Instruction Restrictions ................................................ 1-39
Sequencer Instruction Restrictions ......................................... 1-45

COMPUTE BLOCK REGISTERS

Register File Registers .................................................................... 2-5
Compute Block Selection ......................................................... 2-7
Register Width Selection ......................................................... 2-8
Operand Size and Format Selection ........................................ 2-10
Registers File Syntax Summary ............................................... 2-13
Numeric Formats ........................................................................ 2-16
IEEE Single-Precision Floating-Point Data Format ................. 2-16
Extended Precision Floating-Point Format .............................. 2-19
Fixed-Point Formats .............................................................. 2-19
ADSP-TS101 TigerSHARC Processor Programming Reference v
Page 6
CONTENTS
ALU
ALU Operations ........................................................................... 3-5
ALU Instruction Options ........................................................ 3-7
Signed/Unsigned Option .................................................... 3-8
Saturation Option .............................................................. 3-8
Extension (ABS) Option ..................................................... 3-9
Truncation Option ............................................................. 3-9
Return Zero (MAX/MIN) Option .................................... 3-10
Fractional/Integer Option ................................................. 3-11
ALU Execution Status ........................................................... 3-11
AN — ALU Negative ....................................................... 3-13
AV — ALU Overflow ....................................................... 3-13
AI — ALU Invalid ............................................................ 3-14
AC — ALU Carry ............................................................ 3-14
ALU Execution Conditions ................................................... 3-14
ALU Static Flags ................................................................... 3-15
ALU Examples ........................................................................... 3-16
Example Parallel Addition of Byte Data ................................. 3-18
Example Sideways Addition of Byte Data ............................... 3-19
Example Parallel Result (PR) Register Usage .......................... 3-19
CLU Examples ........................................................................... 3-21
CLU Data Types and Sizes .................................................... 3-22
TMAX Function ................................................................... 3-23
Trellis Function ..................................................................... 3-24
vi ADSP-TS101 TigerSHARC Processor Programming Reference
Page 7
CONTENTS
Despread Function ................................................................ 3-26
CLU Execution Status ........................................................... 3-27
ALU Instruction Summary .......................................................... 3-28

MULTIPLIER

Multiplier Operations ................................................................... 4-4
Multiplier Instruction Options ................................................ 4-8
Signed/Unsigned Option ................................................... 4-10
Fractional/Integer Option ................................................. 4-10
Saturation Option ............................................................. 4-11
Truncation Option ............................................................ 4-12
Clear/Round Option ......................................................... 4-14
Complex Conjugate Option .............................................. 4-16
Multiplier Result Overflow (MR4) Register ............................ 4-17
Multiplier Execution Status ................................................... 4-18
Multiplier Execution Conditions ............................................ 4-20
Multiplier Static Flags ............................................................ 4-21
Multiplier Examples .................................................................... 4-21
Multiplier Instruction Summary .................................................. 4-23

SHIFTER

Shifter Operations ......................................................................... 5-3
Logical Shift Operation ........................................................... 5-5
Arithmetic Shift Operation ...................................................... 5-6
Bit Manipulation Operations ................................................... 5-7
ADSP-TS101 TigerSHARC Processor Programming Reference vii
Page 8
CONTENTS
Bit Field Manipulation Operations .......................................... 5-8
Bit Field Conversion Operations ........................................... 5-11
Bit Stream Manipulation Operations ..................................... 5-11
Shifter Instruction Options ................................................... 5-14
Sign Extended Option ...................................................... 5-15
Zero Filled Option ........................................................... 5-15
Shifter Execution Status ........................................................ 5-15
Shifter Execution Conditions ................................................ 5-16
Shifter Static Flags ................................................................ 5-17
Shifter Examples ......................................................................... 5-17
Shifter Instruction Summary ....................................................... 5-19

IALU

IALU Operations .......................................................................... 6-5
IALU Arithmetic, Logical, and Function Operations ................ 6-5
IALU Instruction Options .................................................. 6-6
Integer Data ................................................................... 6-7
Signed/Unsigned Option ................................................ 6-8
Circular Buffer Option ................................................... 6-8
Bit Reverse Option ......................................................... 6-9
Computed Jump Option ................................................. 6-9
IALU Execution Status ..................................................... 6-10
JN/KN–IALU Negative ................................................ 6-11
JV/KV–IALU Overflow ................................................ 6-11
JC/KC–IALU Carry ...................................................... 6-11
viii ADSP-TS101 TigerSHARC Processor Programming Reference
Page 9
CONTENTS
IALU Execution Conditions .............................................. 6-12
IALU Static Flags .............................................................. 6-13
IALU Data Addressing and Transfer Operations ..................... 6-13
Direct and Indirect Addressing .......................................... 6-14
Normal, Merged, and Broadcast Memory Accesses ............. 6-16
Data Alignment Buffer (DAB) Accesses ............................. 6-23
Circular Buffer Addressing ................................................ 6-27
Bit Reverse Addressing ...................................................... 6-31
Universal Register Transfer Operations .............................. 6-35
Immediate Extension Operations ....................................... 6-36
IALU Examples ........................................................................... 6-37
IALU Instruction Summary ......................................................... 6-39

PROGRAM SEQUENCER

Sequencer Operations ................................................................... 7-7
Conditional Execution ........................................................... 7-12
Branching Execution ............................................................. 7-16
Looping Execution ................................................................ 7-19
Interrupting Execution .......................................................... 7-20
Instruction Pipeline Operations ................................................... 7-26
Instruction Alignment Buffer (IAB) ....................................... 7-31
Branch Target Buffer (BTB) ................................................... 7-34
Conditional Branch Effects on Pipeline .................................. 7-44
ADSP-TS101 TigerSHARC Processor Programming Reference ix
Page 10
CONTENTS
Dependency and Resource Effects on Pipeline ........................ 7-55
Stall From Compute Block Dependency ............................ 7-56
Stall from Bus Conflict ..................................................... 7-59
Stall From Compute Block Load Dependency ................... 7-62
Stall From IALU Load Dependency .................................. 7-63
Stall From Load (From External Memory) Dependency ..... 7-64
Stall From Conditional IALU Load Dependency ............... 7-64
Interrupt Effects on Pipeline ................................................. 7-66
Interrupt During Conditional Instruction ......................... 7-68
Interrupt During Interrupt Disable Instruction ................. 7-70
Exception Effects on Pipeline ................................................ 7-72
Sequencer Examples .................................................................... 7-72
Sequencer Instruction Summary .................................................. 7-76

INSTRUCTION SET

ALU Instructions .......................................................................... 8-2
Add/Subtract .......................................................................... 8-3
Add/Subtract With Carry/Borrow ............................................ 8-6
Average ................................................................................... 8-8
Absolute Value/Absolute Value of Sum or Difference .............. 8-10
Negate .................................................................................. 8-13
Maximum/Minimum ............................................................ 8-14
Viterbi Maximum/Minimum ................................................. 8-17
Increment/Decrement ........................................................... 8-20
Compare ............................................................................... 8-22
x ADSP-TS101 TigerSHARC Processor Programming Reference
Page 11
CONTENTS
Clip ...................................................................................... 8-24
Sum ...................................................................................... 8-26
Ones Counting ...................................................................... 8-28
Parallel Result Register ........................................................... 8-29
Bit FIFO Increment .............................................................. 8-30
Parallel Absolute Value of Difference ...................................... 8-32
Sideways Sum ........................................................................ 8-34
Add/Subtract (Dual Operation) ............................................. 8-36
Pass ....................................................................................... 8-37
Logical AND/AND NOT/OR/XOR/NOT ............................ 8-38
Expand ................................................................................. 8-40
Compact ............................................................................... 8-45
Merge ................................................................................... 8-49
Add/Subtract (Floating-Point) ................................................ 8-51
Average (Floating-Point) ........................................................ 8-53
Maximum/Minimum (Floating-Point) ................................... 8-55
Absolute Value (Floating-Point) ............................................. 8-57
Negate (Floating-Point) ......................................................... 8-60
Compare (Floating-Point) ...................................................... 8-62
Floating- to Fixed-Point Conversion ...................................... 8-64
Fixed- to Floating-Point Conversion ...................................... 8-66
Floating-Point Normal to Extended Word Conversion ............ 8-68
Floating-Point Extended to Normal Word Conversion ............ 8-70
Clip (Floating-Point) ............................................................. 8-72
ADSP-TS101 TigerSHARC Processor Programming Reference xi
Page 12
CONTENTS
Copysign (Floating-Point) ..................................................... 8-74
Scale (Floating-Point) ............................................................ 8-76
Pass (Floating-Point) ............................................................. 8-78
Reciprocal (Floating-Point) ................................................... 8-80
Reciprocal Square Root (Floating-Point) ................................ 8-82
Mantissa (Floating-Point) ...................................................... 8-85
Logarithm (Floating-Point) ................................................... 8-87
Add/Subtract (Dual Operation, Floating-Point) ..................... 8-89
CLU Instructions ....................................................................... 8-91
Trellis Maximum (CLU) ........................................................ 8-92
Maximum (CLU) .................................................................. 8-99
Trellis Registers (CLU) ........................................................ 8-104
Despread (CLU) ................................................................. 8-106
Add/Compare/Select (CLU) ................................................ 8-113
Permute (Byte Word, CLU) ................................................. 8-117
Permute (Short Word, CLU) ............................................... 8-119
Multiplier Instructions .............................................................. 8-121
Multiply (Normal Word) ..................................................... 8-122
Multiply-Accumulate (Normal Word) .................................. 8-125
Multiply-Accumulate/Move (Dual Operation,
Normal Word) ................................................................. 8-130
Multiply (Quad-Short Word) .............................................. 8-138
Multiply-Accumulate (Quad-Short Word) ........................... 8-141
Multiply-Accumulate (Dual Operation,
Quad-Short Word) ........................................................... 8-146
xii ADSP-TS101 TigerSHARC Processor Programming Reference
Page 13
CONTENTS
Complex Multiply-Accumulate (Short Word) ....................... 8-152
Complex Multiply-Accumulate/Move (Dual Operation,
Short Word) ..................................................................... 8-156
Multiply (Floating-Point, Normal/Extended Word) .............. 8-163
Multiplier Result Register .................................................... 8-165
Compact Multiplier Result .................................................. 8-171
Shifter Instructions ................................................................... 8-175
Arithmetic/Logical Shift ...................................................... 8-176
Rotate ................................................................................. 8-179
Field Extract ........................................................................ 8-181
Field Deposit ....................................................................... 8-183
Field/Bit Mask .................................................................... 8-185
Get Bits ............................................................................... 8-187
Put Bits ............................................................................... 8-189
Bit Test ............................................................................... 8-191
Bit Clear/Set/Toggle ............................................................ 8-192
Extract Leading Zeros .......................................................... 8-194
Extract Exponent ................................................................. 8-195
XSTAT/YSTAT Register ...................................................... 8-196
Block Floating-Point ............................................................ 8-197
BFOTMP Register .............................................................. 8-199
IALU (Integer) Instructions ....................................................... 8-200
Add/Subtract (Integer) ......................................................... 8-202
Add/Subtract With Carry/Borrow (Integer) .......................... 8-204
Average (Integer) ................................................................. 8-206
ADSP-TS101 TigerSHARC Processor Programming Reference xiii
Page 14
CONTENTS
Compare (Integer) .............................................................. 8-208
Maximum/Minimum (Integer) ............................................ 8-210
Absolute Value (Integer) ...................................................... 8-212
Logical AND/AND NOT/OR/XOR/NOT (Integer) ............ 8-213
Arithmetic Shift/Logical Shift (Integer) ............................... 8-215
Left Rotate/Right Rotate (Integer) ....................................... 8-217
IALU (Load/Store/Transfer) Instructions ................................... 8-218
Universal Register Load (Data Addressing) ........................... 8-220
Universal Register Store (Data Addressing) .......................... 8-221
Data Register Load and DAB Operation
(Data Addressing) ............................................................ 8-222
Data Register Store (Data Addressing) ................................. 8-224
Universal Register Transfer .................................................. 8-226
Sequencer Instructions .............................................................. 8-228
Jump/Call ........................................................................... 8-230
Computed Jump/Call .......................................................... 8-232
Return (from Interrupt) ...................................................... 8-234
Reduce (Interrupt to Subroutine) ........................................ 8-236
If – Do (Conditional Execution) ......................................... 8-237
If – Else (Conditional Sequencing and Execution) ................ 8-238
Static Flag Registers ............................................................ 8-239
Idle ..................................................................................... 8-240
BTB Invalid ........................................................................ 8-241
xiv ADSP-TS101 TigerSHARC Processor Programming Reference
Page 15
CONTENTS
Trap .................................................................................... 8-242
Emulator Trap ..................................................................... 8-243
No Operation ...................................................................... 8-244

QUICK REFERENCE

ALU Quick Reference .................................................................. A-2
Multiplier Quick Reference .......................................................... A-6
Shifter Quick Reference ............................................................... A-8
IALU Quick Reference ............................................................... A-10
Sequencer Quick Reference ........................................................ A-13

REGISTER/BIT DEFINITIONS

INSTRUCTION DECODE

Instruction Structure .................................................................... C-1
Compute Block Instruction Format .............................................. C-3
ALU Instructions .................................................................... C-4
ALU Fixed-Point, Arithmetic and Logical
Instructions (CU=00) ...................................................... C-5
ALU Fixed-Point, Data Conversion
Instructions (CU=01) ...................................................... C-7
ALU Floating-Point, Arithmetic and Logical
Instructions (CU=01) .................................................... C-10
CLU Instructions ............................................................. C-12
Multiplier Instructions ......................................................... C-14
ADSP-TS101 TigerSHARC Processor Programming Reference xv
Page 16
CONTENTS
Shifter Instructions ............................................................... C-18
Shifter Instructions Using Single Normal-Word
Operands and Single Register ......................................... C-18
Shifter Instructions Using Single Long-Word
or Dual Normal-Word Operands and Dual Register ........ C-19
Shifter Instructions Using Short or Byte Operands
and Single or Dual Registers ........................................... C-20
Shifter Instructions Using Single Operand ......................... C-22
IALU (Integer) Instruction Format .............................................. C-24
IALU Move Instruction Format .................................................. C-25
IALU Load Data Instruction Format ........................................... C-27
IALU Load/Store Instruction Format .......................................... C-28
IALU Immediate Extension Format ............................................. C-32
Sequencer Instruction Format ..................................................... C-33
Sequencer Flow Control Instructions ..................................... C-33
Sequencer Direct Jump/Call Instruction Format .................... C-34
Sequencer Indirect Jump Instruction Format .......................... C-36
Condition Codes .................................................................. C-39
Compute Block Conditions .............................................. C-39
IALU Conditions ............................................................. C-40
Sequencer and External Conditions ................................... C-40
Sequencer Immediate Extension Format ...................................... C-41
Miscellaneous Instruction Format ............................................... C-42

INDEX

xvi ADSP-TS101 TigerSHARC Processor Programming Reference
Page 17
PREFACE
Thank you for purchasing and developing systems using TigerSHARC® processors from Analog Devices.
Purpose of This Manual
The ADSP-TS101 TigerSHARC Processor Programming Reference contains information about the DSP architecture and DSP assembly language for TigerSHARC processors. These are 32-bit, fixed- and floating-point digi­tal signal processors from Analog Devices for use in computing, communications, and consumer applications.
The manual provides information on how assembly instructions execute on the TigerSHARC processor’s architecture along with reference infor­mation about DSP operations.
Intended Audience
The primary audience for this manual is a programmer who is familiar with Analog Devices processors. This manual assumes that the audience has a working knowledge of the appropriate processor architecture and instruction set. Programmers who are unfamiliar with Analog Devices processors can use this manual, but should supplement it with other texts (such as the appropriate hardware reference manuals and data sheets) that describe your target architecture.
ADSP-TS101 TigerSHARC Processor Programming Reference xvii
Page 18
Manual Contents
Manual Contents
The manual consists of:
Chapter 1, “Introduction” Provides a general description of the DSP architecture, instruction slot/line syntax, and instruction parallelism rules.
Chapter 2, “Compute Block Registers” Provides a description of the compute block register file, register naming syntax, and numeric formats.
Chapter 3, “ALU” Provides a description of the arithmetic logic unit (ALU) and com­munications logic unit (CLU) operation, includes ALU/CLU instruction examples, and provides the ALU instruction summary.
Chapter 4, “Multiplier” Provides a description of the multiply-accumulator (multiplier) operation, includes multiplier instruction examples, and provides the multiplier instruction summary.
Chapter 5, “Shifter” Provides a description of the bit wise, barrel shifter (shifter) opera­tion, includes shifter instruction examples, and provides the shifter instruction summary.
Chapter 6, “IALU” Provides a description of the integer arithmetic logic unit (IALU) and data alignment buffer (DAB) operation, includes IALU instruction examples, and provides the IALU instruction summary.
Chapter 7, “Program Sequencer” Provides a description of the program sequencer operation, the instruction alignment buffer (IAB), the branch target buffer (BTB), and the instruction pipeline. This chapter also includes a program sequencer instruction summary.
xviii ADSP-TS101 TigerSHARC Processor Programming Reference
Page 19
Chapter 8, “Instruction Set” Describes the ADSP-TS101 processor instruction set in detail, starting with an overview of the instruction line and instruction types.
Appendix A, “Quick Reference” Contains a concise description of the ADSP-TS101 processor assembly language. It is intended to be used as an assembly pro­gramming reference.
Appendix B, “Register/Bit Definitions” Provides register and bit name definitions to be used in ADSP-TS101 processor programs.
Appendix C, “Instruction Decode” Identifies operation codes (opcodes) for instructions. Use this chapter to learn how to construct opcodes.
Preface
L
This programming reference is a companion document to the ADSP-TS101 TigerSHARC Processor Hardware Reference.
What’s New in This Manual
Revision 1.1 of the ADSP-TS101 TigerSHARC Processor Programming Ref­erence corrects and closes all open Tool Anomaly Reports (TARs) against
this manual, adds figure titles that were missing, and updates Web site and contact numbers. These changes affect the preface, various chapters, appendices, and the index.
ADSP-TS101 TigerSHARC Processor Programming Reference xix
Page 20
Technical or Customer Support
Technical or Customer Support
You can reach Analog Devices, Inc. Customer Support in any of the fol­lowing ways:
Visit the Embedded Processing and DSP products Web site at
http://www.analog.com/processors/technicalSupport
E-mail tools questions to
dsptools.support@analog.com
E-mail processor questions to
embedded.support@analog.com dsp.support@analog.com
Phone questions to 1-800-ANALOGD
Contact your Analog Devices, Inc. local sales office or authorized distributor
Send questions by mail to:
Analog Devices, Inc. One Technology Way P.O. Box 9106 Norwood, MA 02062-9106 USA
Supported Processors
The following is the list of Analog Devices, Inc. processors supported in VisualDSP++®.
xx ADSP-TS101 TigerSHARC Processor Programming Reference
Page 21
Preface
TigerSHARC (ADSP-TSxxx) Processors
The name “TigerSHARC” refers to a family of floating-point and fixed-point [8-bit, 16-bit, and 32-bit] processors. VisualDSP++ currently supports the following TigerSHARC processors:
ADSP-TS101, ADSP-TS201, ADSP-TS202, and ADSP-TS203
SHARC® (ADSP-21xxx) Processors
The name “SHARC” refers to a family of high-performance, 32-bit, floating-point processors that can be used in speech, sound, graphics, and imaging applications. VisualDSP++ currently supports the following SHARC processors:
ADSP-21020, ADSP-21060, ADSP-21061, ADSP-21062, ADSP-21065L, ADSP-21160, ADSP-21161, ADSP-21261, ADSP-21262, ADSP-21266, ADSP-21267, ADSP-21363, ADSP-21364, and ADSP-21365
Blackfin® (ADSP-BFxxx) Processors
The name “Blackfin” refers to a family of 16-bit, embedded processors. VisualDSP++ currently supports the following Blackfin processors:
ADSP-BF531, ADSP-BF532 (formerly ADSP-21532), ADSP-BF533, ADSP-BF535 (formerly ADSP-21535), ADSP-BF561, AD6532, and AD90747
Product Information
You can obtain product information from the Analog Devices Web site, from the product CD-ROM, or from the printed publications (manuals).
Analog Devices is online at www.analog.com. Our Web site provides infor­mation about a broad range of products—analog integrated circuits, amplifiers, converters, and digital signal processors.
ADSP-TS101 TigerSHARC Processor Programming Reference xxi
Page 22
Product Information
MyAnalog.com
MyAnalog.com is a free feature of the Analog Devices Web site that allows
customization of a Web page to display only the latest information on products you are interested in. You can also choose to receive weekly e-mail notifications containing updates to the Web pages that meet your interests. MyAnalog.com provides access to books, application notes, data sheets, code examples, and more.
Registration
Visit www.myanalog.com to sign up. Click Register to use MyAnalog.com. Registration takes about five minutes and serves as a means to select the information you want to receive.
If you are already a registered user, just log on. Your user name is your e-mail address.
Processor Product Information
For information on embedded processors and DSPs, visit the Analog Devices Web site at www.analog.com/processors, which provides access to technical publications, data sheets, application notes, product over­views, and product announcements.
xxii ADSP-TS101 TigerSHARC Processor Programming Reference
Page 23
Preface
You may also obtain additional information about Analog Devices and its products in any of the following ways.
E-mail questions or requests for information to
embedded.support@analog.com dsp.support@analog.com
Fax questions or requests for information to
1-781-461-3010 (North America) +49-89-76903-157 (Europe)
Access the FTP Web site at
ftp ftp.analog.com (or ftp 137.71.25.69) ftp://ftp.analog.com
Related Documents
The following publications that describe the ADSP-TS101 TigerSHARC processor (and related processors) can be ordered from any Analog Devices sales office:
ADSP-TS101S TigerSHARC Embedded Processor Data Sheet
ADSP-TS101 TigerSHARC Processor Hardware Reference
ADSP-TS101 TigerSHARC Processor Programming Reference
For information on product related development software and Analog Devices processors, see these publications:
VisualDSP++ User's Guide for TigerSHARC Processors
VisualDSP++ C/C++ Compiler and Library Manual for Tiger-
SHARC Processors
VisualDSP++ Assembler and Preprocessor Manual for TigerSHARC Processors
ADSP-TS101 TigerSHARC Processor Programming Reference xxiii
Page 24
Product Information
VisualDSP++ Linker and Utilities Manual for TigerSHARC Processors
VisualDSP++ Kernel (VDK) User's Guide
Visit the Technical Library Web site to access all processor and tools manuals and data sheets:
http://www.analog.com/processors/technical_library
Online Technical Documentation
Online documentation comprises the VisualDSP++ Help system, software tools manuals, hardware tools manuals, processor manuals, the Dinkum Abridged C++ library, and Flexible License Manager (FlexLM) network license manager software documentation. You can easily search across the entire VisualDSP++ documentation set for any topic of interest. For easy printing, supplementary .PDF files of most manuals are also provided.
Each documentation file type is described as follows.
File Description
.CHM Help system files and manuals in Help format
.HTM or .HTML
.PDF VisualDSP++ and processor manuals in Portable Documentation Format (PDF).
Dinkum Abridged C++ library and FlexLM network license manager software doc­umentation. Viewing and printing the Internet Explorer 4.0 (or higher).
Viewing and printing the .PDF files requires a PDF reader, such as Adobe Acrobat Reader (4.0 or higher).
.HTML files requires a browser, such as
If documentation is not installed on your system as part of the software installation, you can add it from the VisualDSP++ CD-ROM at any time by running the Tools installation. Access the online documentation from the VisualDSP++ environment, Windows® Explorer, or the Analog Devices Web site.
xxiv ADSP-TS101 TigerSHARC Processor Programming Reference
Page 25
Preface
Accessing Documentation From VisualDSP++
From the VisualDSP++ environment:
Access VisualDSP++ online Help from the Help menu’s Contents, Search, and Index commands.
Open online Help from context-sensitive user interface items (tool­bar buttons, menu commands, and windows).
Accessing Documentation From Windows
In addition to any shortcuts you may have constructed, there are many ways to open VisualDSP++ online Help or the supplementary documenta­tion from Windows.
Help system files (.
CHM) are located in the Help folder, and .PDF files are
located in the Docs folder of your VisualDSP++ installation CD-ROM. The Docs folder also contains the Dinkum Abridged C++ library and the FlexLM network license manager software documentation.
Using Windows Explorer
Double-click the vdsp-help.chm file, which is the master Help sys­tem, to access all the other .CHM files.
Double-click any file that is part of the VisualDSP++ documenta­tion set.
Using the Windows Start Button
Access VisualDSP++ online Help by clicking the Start button and choosing Programs, Analog Devices, VisualDSP++, and VisualDSP++ Documentation.
Access the
.PDF files by clicking the Start button and choosing
Programs, Analog Devices, VisualDSP++, Documentation for Printing, and the name of the book.
ADSP-TS101 TigerSHARC Processor Programming Reference xxv
Page 26
Product Information
Accessing Documentation From the Web
Download manuals at the following Web site:
http://www.analog.com/processors/technical_library
Select a processor family and book title. Download archive (.ZIP) files, one for each manual. Use any archive management software, such as WinZip, to decompress downloaded files.
Printed Manuals
For general questions regarding literature ordering, call the Literature Center at 1-800-ANALOGD (1-800-262-5643) and follow the prompts.
VisualDSP++ Documentation Set
To purchase VisualDSP++ manuals, call 1-603-883-2430. The manuals may be purchased only as a kit.
If you do not have an account with Analog Devices, you are referred to Analog Devices distributors. For information on our distributors, log onto
http://www.analog.com/salesdir.
Hardware Tools Manuals
To purchase EZ-KIT Lite® and In-Circuit Emulator (ICE) manuals, call 1-603-883-2430. The manuals may be ordered by title or by product number located on the back cover of each manual.
Processor Manuals
Hardware reference and instruction set reference manuals may be ordered through the Literature Center at 1-800-ANALOGD (1-800-262-5643), or downloaded from the Analog Devices Web site. Manuals may be ordered by title or by product number located on the back cover of each manual.
xxvi ADSP-TS101 TigerSHARC Processor Programming Reference
Page 27
Preface
Data Sheets
All data sheets (preliminary and production) may be downloaded from the Analog Devices Web site. Only production (final) data sheets (Rev. 0, A, B, C, and so on) can be obtained from the Literature Center at 1-800-ANALOGD (1-800-262-5643); they also can be downloaded from the Web site.
To have a data sheet faxed to you, call the Analog Devices Faxback System at 1-800-446-6212. Follow the prompts and a list of data sheet code numbers will be faxed to you. If the data sheet you want is not listed, check for it on the Web site.
ADSP-TS101 TigerSHARC Processor Programming Reference xxvii
Page 28
Conventions
Conventions
Text conventions used in this manual are identified and described as follows.
Example Description
Close command (File menu)
{this | that} Alternative items in syntax descriptions appear within curly brackets
[this | that] Optional items in syntax descriptions appear within brackets and sepa-
[this,…] Optional item lists in syntax descriptions appear within brackets
.SECTION Commands, directives, keywords, and feature names are in text with
filename Non-keyword placeholders appear in text with italic style format.
L
a
Titles in reference sections indicate the location of an item within the VisualDSP++ environment’s menu system (for example, the Close command appears on the File menu).
and separated by vertical bars; read the example as this or that. One or the other is required.
rated by vertical bars; read the example as an optional
delimited by commas and terminated with an ellipse; read the example as an optional comma-separated list of this.
letter gothic font.
Note: For correct operation, ... A Note: provides supplementary information on a related topic. In the online version of this book, the word Note appears instead of this symbol.
Caution: Incorrect device operation may result if ... Caution: Device damage may result if ...
A Caution: identifies conditions or inappropriate usage of the product that could lead to undesirable results or product damage. In the online version of this book, the word Caution appears instead of this symbol.
this or that.
Warn in g: Injury to device users may result if ... A Warning: identifies conditions or inappropriate usage of the product
[
that could lead to conditions that are potentially hazardous for devices users. In the online version of this book, the word Wa rnin g appears instead of this symbol.
xxviii ADSP-TS101 TigerSHARC Processor Programming Reference
Page 29
Preface
L
Additional conventions, which apply only to specific chapters, may appear throughout this document.
ADSP-TS101 TigerSHARC Processor Programming Reference xxix
Page 30
Conventions
xxx ADSP-TS101 TigerSHARC Processor Programming Reference
Page 31
1 INTRODUCTION
The ADSP-TS101 TigerSHARC Processor Programming Reference describes the Digital Signal Processor (DSP) architecture and instruction set. These descriptions provide the information required for programming TigerSHARC processor systems. This chapter introduces programming concepts for the DSP with the following information:
“DSP Architecture” on page 1-6
“Instruction Line Syntax and Structure” on page 1-20
“Instruction Parallelism Rules” on page 1-24
The TigerSHARC processor is a 128-bit, high performance, next genera­tion version of the ADSP-2106x SHARC DSP. The TigerSHARC processor sets a new standard of performance for digital signal processors, combining multiple computation units for floating-point and fixed-point processing as well as very wide word widths. The TigerSHARC processor maintains a ‘system-on-a-chip’ scalable computing design philosophy, including 6M bit of on-chip SRAM, integrated I/O peripherals, a host processor interface, DMA controllers, link ports, and shared bus connec­tivity for glueless MDSP (Multi Digital Signal Processing).
In addition to providing unprecedented performance in DSP applications in raw MFLOPS and MIPS, the TigerSHARC processor boosts perfor­mance measures such as MFLOPS/Watt and MFLOPS/square inch in multiprocessing applications.
ADSP-TS101 TigerSHARC Processor Programming Reference 1-1
Page 32
COMPUTATIONAL BLOCKS
SHIFTER
ALU
PROGRAMSEQUENCER
PC BTB IRQ
ADDR
IAB
FETCH
DATA ADDRESS GENERATION
32
32
INTEGER
J-IALU
32X32 32X 32
INTEGER
K-IALU
MULTIPLIER
X
REGISTER
FILE
32x32
128 128
DAB
DAB
128 128
Y
REGISTER
FILE
32x32
MULTIPLIER
ALU
SHIFTER
32
128
32
128
32
128
Figure 1-1. ADSP-TS101 TigerSHARC Processor Core Diagram
As shown in Figure 1-1 and Figure 1-2, the processor has the following architectural features:
Dual computation blocks—X and Y—each consisting of a multi­plier, ALU, shifter, and a 32-word register file
Dual integer ALUs—J and K—each containing a 32-bit IALU and 32-word register file
1-2 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 33
Introduction
I/O PROCESSOR
DMA
CONTROLLER
CONTROL/
STATUS/
TCBs
DMA ADDRESS
DMA DATA
INTERNAL MEMORY
MEMORY
M0
64K X 32
AD
32 128
MEMORY
M1
64K X 32
AD
128
MEMORY
M2
64K X 3 2
AD
I/O ADDRESS
LINK DATA
M0 ADDR
M0 DATA
M1 ADDR
M1 DATA
M2 ADDR
M2 DATA
32
LINK PORT
CONTROLLER
CONTROL/
JTAG PORT
SDRAM CONTROLLER
EXTERNAL PORT
MULTIPROCESSOR
INTERFACE
HOST INTERFACE
INPUT FIFO
OUTPUT BUFFER
OUTPUT FIFO
CLUSTER BUS
ARBITOR
LINK
PORTS
STATUS/
BUFFERS
6
32
ADDR
64
DATA
CNTRL
3
L0
8
3
L1
8
3
8
L2
3
8
L3
Figure 1-2. ADSP-TS101 TigerSHARC Processor Peripherals Diagram
Program sequencer—Controls the program flow and contains an instruction alignment buffer (IAB) and a branch target buffer (BTB)
Three 128-bit buses providing high bandwidth connectivity between all blocks
External port interface including the host interface, SDRAM con­troller, static pipelined interface, four DMA channels, four link ports (each with two DMA channels), and multiprocessing support
ADSP-TS101 TigerSHARC Processor Programming Reference 1-3
Page 34
6M bits of internal memory organized as three blocks—M0, M1 and M2—each containing 16K rows and 128 bits wide (a total of 2M bit).
Debug features
JTAG Test Access Port
The TigerSHARC processor external port provides an interface to external memory, to memory-mapped I/O, to host processor, and to additional TigerSHARC processors. The external port performs external bus arbitra­tion and supplies control signals to shared, global memory and I/O devices.
Figure 1-3 illustrates a typical single-processor system. A multiprocessor
system is illustrated in Figure 1-4 on page 1-6 and is discussed later in
“Scalability and Multiprocessing” on page 1-19.
The TigerSHARC processor includes several features that simplify system development. The features lie in three key areas:
Support of IEEE floating-point formats
IEEE 1149.1 JTAG serial scan path and on-chip emulation features
Architectural features supporting high-level languages and operat­ing systems
The features of the TigerSHARC processor architecture that directly sup­port high-level language compilers and operating systems include:
Simple, orthogonal instruction allowing the compiler to efficiently use the multi-instruction slots
General-purpose data and IALU register files
32- and 40-bit floating-point and 8-, 16-, 32-, and 64-bit fixed­point native data types
1-4 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 35
Introduction
CLOCK
REFERENCE
SDRAM
MEMORY
(OPTIONAL)
CS
CLK ADDR
RAS CAS
DATA
DQM
WE
CKE
A10
LINK
DEVICES
(4 MAX)
(OPTIONAL)
ADSP-TS101S
LCLK_P SCLK_P
S/LCLK_N V
REF
LCLKRAT2–0 SCLKFREQ
IRQ3–0
FLAG3–0 ID2–0
MSSD RAS CAS
LDQM HDQM
SDWE
SDCKE SDA10
FLYBY IOEN
LXDAT7–0 LXCLKIN LXCLKOUT LXDIR
TMR0E BM BUSLOCK
CONTROLIMP2–0 DS2–0
ADDR31–0
DATA63–0
WRH/WRL
DMAR3–0
RESET JTAG
BMS
BRST
RD
ACK
MS1–0
MSH HBR HBG
BR7–0
CPA DPA
BOFF
BOOT
EPROM
(OPTIONAL)
CS
ADDR DATA
MEMORY
(OPTIONAL)
ADDR
DATA
OE WE
ACK
CS
HOST
PROCESSOR
INTERFACE (OPTIONAL)
ADDR
DATA
DMA DEVICE
(OPTIONAL)
DATA
L
S
A T
S
O
E
R T N
O C
A
R
D D D A
Figure 1-3. Single Processor Configuration
Large address space
Immediate address modify fields
Easily supported relocatable code and data
Fast save and restore of processor registers onto internal memory stacks
ADSP-TS101 TigerSHARC Processor Programming Reference 1-5
Page 36
DSP Architecture
LINKS
SDRAM
MEMORY
TigerSHARC
MSSD
MS0
TigerSHARC
TigerSHARCTigerSHARC
LINKS
Figure 1-4. Multiprocessing Cluster Configuration
DSP Architecture
DEV
DEV
HOST IF
MSH
MSI
BRIDGE
DEV
DEV
As shown in Figure 1-1 on page 1-2 and Figure 1-2 on page 1-3, the DSP architecture consists of two divisions: the DSP core (where instructions execute) and the I/O peripherals (where data is stored and off-chip I/O is processed). The following discussion provides a high-level description of the DSP core and peripherals architecture. More detail on the core appears in other sections of this reference. For more information on I/O peripher­als, see the ADSP-TS101 TigerSHARC Processor Hardware Reference.
1-6 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 37
Introduction
High performance is facilitated by the ability to execute up to four 32-bit wide instructions per cycle. The TigerSHARC processor uses a variation of a Static Superscalar™ architecture to allow the programmer to specify which instructions are executed in parallel in each cycle. The instructions do not have to be aligned in memory so that program memory is not wasted.
The 6M bit internal memory is divided into three 128-bit wide memory blocks. Each of the three internal address/data bus pairs connect to one of the three memory blocks. The three memory blocks can be used for triple accesses every cycle where each memory block can access up to four, 32-bit words in a cycle.
The external port cluster bus is 64 bits wide. The high I/O bandwidth complements the high processing speeds of the core. To facilitate the high clock rate, the TigerSHARC processor uses a pipelined external bus with programmable pipeline depth for interprocessor communications and for Synchronous SRAM and DRAM (SSRAM and SDRAM).
The four link ports support point-to-point high bandwidth data transfers. Link ports have hardware supported two-way communication.
The processor operates with a two cycle arithmetic pipeline. The branch pipeline is two to six cycles. A branch target buffer (BTB) is implemented to reduce branch delay. The two identical computation units support floating-point as well as fixed-point arithmetic.
During compute intensive operations, one or both integer ALUs compute or generate addresses for fetching up to two quad operands from two memory blocks, while the program sequencer simultaneously fetches the next quad instruction from the third memory block. In parallel, the com­putation units can operate on previously fetched operands while the sequencer prepares for a branch.
While the core processor is doing the above, the DMA channels can be replenishing the internal memories in the background with quad data from either the external port or the link ports.
ADSP-TS101 TigerSHARC Processor Programming Reference 1-7
Page 38
DSP Architecture
The processing core of the TigerSHARC processor reaches exceptionally high DSP performance through using these features:
Computation pipeline
Dual computation units
Execution of up to four instructions per cycle
Access of up to eight words per cycle from memory
The two computation units (compute blocks) perform up to 6 floating­point or 24 fixed-point operations per cycle.
Each multiplier and ALU unit can execute four 16-bit fixed-point opera­tions per cycle, using Single-Instruction, Multiple-Data (SIMD) operation. This operation boosts performance of critical imaging and sig­nal processing applications that use fixed-point data.
Compute Blocks
The TigerSHARC processor core contains two computation units called compute blocks. Each compute block contains a register file and three inde­pendent computation units—an ALU, a multiplier, and a shifter. For meeting a wide variety of processing needs, the computation units process data in several fixed- and floating-point formats listed here and shown in
Figure 1-5:
Fixed-point format
These include 64-bit long word, 32-bit normal word, 16-bit short word, and 8-bit byte word. For short word fixed-point arithmetic,
1-8 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 39
Introduction
quad parallel operations on quad-aligned data allow fast processing of array data. Byte operations are also supported for octal-aligned data.
Floating-point format These include 32-bit normal word and 40-bit extended word. Floating-point operations are single or extended precision. The normal word floating-point format is the standard IEEE format, and the 40-bit extended-precision format occupies a double word (64 bits) with eight additional LSBs of mantissa for greater accuracy.
Each compute block has a general-purpose, multi-port, 32-word data reg­ister file for transferring data between the computation units and the data buses and storing intermediate results. All of these registers can be accessed as single-, dual-, or quad-aligned registers. For more information on the register file, see “Compute Block Registers” on page 2-1.
Arithmetic Logic Unit (ALU)
The ALU performs arithmetic operations on fixed-point and floating­point data and logical operations on fixed-point data. The source and des­tination of most ALU operations is the compute block register file.
On the ADSP-TS101 processor, the ALU includes a special sub-block, which is referred to as the communications logic unit (CLU). The CLU instructions are designed to support different algorithms used for commu­nications applications. The algorithms that are supported by the CLU instructions are:
Viterbi Decoding
Turbo-code Decoding
Despreading for code-division multiple access (CDMA) systems
ADSP-TS101 TigerSHARC Processor Programming Reference 1-9
Page 40
DSP Architecture
Data Bus
(128-bit)
Data
Register
Data Types
Long Word
(64-bit)
Extended Word
(40-bit)
Normal Word
(32-bit)
Short Word
(16-bit)
Byte Word
(8-bit)
(32-bit)
Figure 1-5. Word Format Definitions
1 The TigerSHARC processor internal data buses are 128 bits (one quad word) wide. In a quad word,
the DSP can move 16 byte words, 8 short words, 4 normal words, or 2 long words over the bus at the same time.
Data
Register
(32-bit)
1
Data
Register
(32-bit)
64-bit
Dual Register
8-
bit
Dual Register
Data
Register
(32-bit)
31 031 0 31 0 31 0
32-bit
32-bit
Single Register
16-
bit
Single Register
8-
bit8-bit8-bit8-bit
Single Register
16-
bit
For more information on the ALU (and CLU features), see “ALU” on
page 3-1.
1-10 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 41
Introduction
Multiply Accumulator (Multiplier)
The multiplier performs fixed-point or floating-point multiplication and fixed-point multiply/accumulate operations. The multiplier supports sev­eral data types in fixed- and floating-point. The floating-point formats are float and float-extended, as in the ALU. The source and destination of most operations is the compute block register file.
The TigerSHARC processor’s multiplier supports complex multiply-accu­mulate operations. Complex numbers are represented by a pair of 16-bit short words within a 32-bit word. The least significant bits (LSBs) of the input operand represents the real part, and the most significant bits (MSBs) of the input operand represent the imaginary part.
For more information on the multiplier, see “Multiplier” on page 4-1.
Bit Wise Barrel Shifter (Shifter)
The shifter performs logical and arithmetic shifts, bit manipulation, field deposit, and field extraction. The shifter operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point oper­ands. Shifter operations include:
Shifts and rotates from off-scale left to off-scale right
Bit manipulation operations, including bit set, clear, toggle and test
Bit field manipulation operations, including field extract and deposit, using register
BFOTMP (which is internal to the shifter)
Bit FIFO operations to support bit streams with fields of varying length
Support for ADSP-2100 family compatible fixed-point/floating­point conversion operations (such as exponent extract, number of leading 1s or 0s)
ADSP-TS101 TigerSHARC Processor Programming Reference 1-11
Page 42
DSP Architecture
For more information on the shifter, see “Shifter” on page 5-1.
Integer Arithmetic Logic Unit (IALU)
The IALUs can execute standard standalone ALU operations on IALU register files. The IALUs also provide memory addresses when data is transferred between memory and registers. The DSP has dual IALUs (the J-IALU and the K-IALU) that enable simultaneous addresses for multiple operand reads or writes. The IALUs allow computational operations to execute with maximum efficiency because the computation units can be devoted exclusively to processing data.
Each IALU has a multiport, 32-word register file. Operations in the IALU are not pipelined. The IALUs support pre-modify with no update and post-modify with update address generation. Circular data buffers are implemented in hardware. The IALUs support the following types of instructions:
Regular IALU instructions
Move Data instructions
Load Data instructions
Load/Store instructions with register update
Load/Store instructions with immediate update
For indirect addressing (instructions with update), one of the registers in the register file can be modified by another register in the file or by an immediate 8- or 32-bit value, either before (pre-modify) or after (post­modify) the access. For circular buffer addressing, a length value can be associated with the first four registers to perform automatic modulo addressing for circular data buffers; the circular buffers can be located at arbitrary boundaries in memory. Circular buffers allow efficient imple­mentation of delay lines and other data structures, which are commonly
1-12 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 43
Introduction
used in digital filters and Fourier transformations. The TigerSHARC pro­cessor circular buffers automatically handle address pointer wraparounds, reducing overhead and simplifying implementation.
The IALUs also support bit reverse addressing, which is useful for the FFT algorithm. Bit reverse addressing is implemented using a reverse carry addition that is similar to regular additions, but the carry is taken from the upper bits and is driven into lower bits.
The IALU provides flexibility in moving data as single-, dual-, or quad­words. Every instruction can execute with a throughput of one per cycle. IALU instructions execute with a single cycle of latency while computa­tion units have two cycles of latency. Normally, there are no dependency delays between IALU instructions, but if there are, three or four cycles of latency can occur.
For more information on the IALUs, see “IALU” on page 6-1.
Program Sequencer
The program sequencer supplies instruction addresses to memory and, together with the IALUs, allows computational operations to execute with maximum efficiency. The sequencer supports efficient branching using the branch target buffer (BTB), which reduces branch delays for condi­tional and unconditional instructions. The sequencer and IALU’s control flow instructions divide into two types:
Control flow instructions. These instructions are used to direct pro- gram execution by means of jumps and to execute individual instructions conditionally.
Immediate extension instructions. These instructions are used to extend the numeric fields used in immediate operands for the sequencer and the IALU.
ADSP-TS101 TigerSHARC Processor Programming Reference 1-13
Page 44
DSP Architecture
Control flow instructions divide into two types:
Direct jumps and calls based on an immediate address operand specified in the instruction encoding. For example: ‘
jump 100;
true.
Indirect jumps based on an address supplied by a register. The instructions used for specifying conditional execution of a line are a subcategory of indirect jumps. For example: ‘if <cond> cjmp;’ is a jump to the address pointed to by the CJMP register.
’ always jumps to address 100, if the <cond> evaluates as
if <cond>
L
The TigerSHARC processor achieves its fast execution rate by means of an eight-cycle pipeline.
Two stages of the sequencer’s pipeline actually execute in the computation units. The computation units perform single-cycle operations with a two­cycle computation pipeline, meaning that results are available for use two cycles after the operation is begun. Hardware causes a stall if a result is not available in a given cycle (register dependency check). Up to two compu­tation instructions per compute block can be issued in each cycle, instructing the ALU, multiplier or shifter to perform independent, simul­taneous operations.
The TigerSHARC processor has four general-purpose external interrupts,
IRQ3-0. The processor also has internally generated interrupts for the two
timers, DMA channels, link ports, arithmetic exceptions, multiprocessor vector interrupts, and user-defined software interrupts. Interrupts can be nested through instruction commands. Interrupts have a short latency and do not abort currently executing instructions. Interrupts vector directly to a user-supplied address in the interrupt table register file, removing the overhead of a second branch.
The control flow instruction must use the first instruction slot in the instruction line.
1-14 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 45
Introduction
The branch penalty in a deeply pipelined processor such as the Tiger­SHARC processor can be compensated for by the use of a branch target buffer (BTB) and branch prediction. The branch target address is stored in the BTB. When the address of a jump instruction, which is predicted by the user to be taken in most cases, is recognized (the tag address), the corresponding jump address is read from the BTB and is used as the jump address on the next cycle. Thus the latency of a jump is reduced from three to six wasted cycles to zero wasted cycles. If this address is not stored in the BTB, the instruction must be fetched from memory.
Other instructions also use the BTB to speed up these types of branches. These instructions are interrupt return, call return, and computed jump instructions.
Immediate extensions are associated with IALU or sequencer (control flow) instructions. These instructions are not specified by the program­mer, but are implied by the size of the immediate data used in the instructions. The programmer must place the instruction that requires an immediate extension in the first instruction slot and leave an empty instruction slot in the line (use only three slots), so the assembler can place the immediate extension in the second instruction slot of the instruction line.
L
For more information on the sequencer, BTB, and immediate extensions, see “Program Sequencer” on page 7-1.
Quad Instruction Execution
The TigerSHARC processor can execute up to four instructions per cycle from a single memory block, due to the 128-bit wide access per cycle. The ability to execute several instructions in a single cycle derives from a Static Superscalar architectural concept. This is not strictly a superscalar archi­tecture because the instructions executed in each cycle are specified in the
ADSP-TS101 TigerSHARC Processor Programming Reference 1-15
Note that only one immediate extension may be in a single instruc­tion line.
Page 46
DSP Architecture
instruction by the programmer or by the compiler, and not by the chip hardware. There is also no instruction reordering. Register dependencies are, however, examined by the hardware and stalls are generated where appropriate. Code is fully compacted in memory and there are no align­ment restrictions for instruction lines.
Relative Addresses for Relocation
Most instructions in the TigerSHARC processor support PC relative branches to allow code to be relocated easily. Also, most data references are register relative, which means they allow programs to access data blocks relative to a base register.
Nested Call and Interrupt
Nested call and interrupt return addresses (along with other registers as needed) are saved by specific instructions onto the on-chip memory stack, allowing more generality when used with high-level languages. Non­nested calls and interrupts do not need to save the return address in inter­nal memory, making these more efficient for short, non-nested routines.
Context Switching
The TigerSHARC processor provides the ability to save and restore up to eight registers per cycle onto a stack in two internal memory blocks when using load/store instructions. This fast save/restore capability permits effi­cient interrupts and fast context switching. It also allows the TigerSHARC processor to dispense with on-chip PC stack or alternate registers for regis­ter files or status registers.
Internal Memory and Other Internal Peripherals
The on-chip memory consists of three blocks of 2M bits each. Each block is 128 bits (four words) wide, thus providing high bandwidth sufficient to support both computation units, the instruction stream and external I/O,
1-16 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 47
Introduction
even in very intensive operations. The TigerSHARC processor provides access to program and two data operands without memory or bus con­straints. The memory blocks can store instructions and data interchangeably.
Each memory block is organized as 64K words of 32 bits each. The accesses are pipelined to meet one clock cycle access time needed by the core, DMA, or by the external bus. Each access can be up to four words. Memories (and their associated buses) are a resource that must be shared between the compute blocks, the IALUs, the sequencer, the external port, and the link ports. In general, if during a particular cycle more than one unit in the processor attempts to access the same memory, one of the com­peting units is granted access, while the other is held off for further arbitration until the following cycle—see “Bus Arbitration Protocol” in the ADSP-TS101 TigerSHARC Processor Hardware Reference. This type of conflict only has a small impact on performance due to the very high bandwidth afforded by the internal buses.
An important benefit of large on-chip memory is that by managing the movement of data on and off chip with DMA, a system designer can real­ize high levels of determinism in execution time. Predictable and deterministic execution time is a central requirement in DSP and real­time systems.
Internal Buses
The processor core has three buses, each one connected to one of the internal memories. These buses are 128 bits wide to allow up to four instructions, or four aligned data words, to be transferred in each cycle on each bus. On-chip system elements also use these buses to access memory. Only one access to each memory block is allowed in each cycle, so DMA or external port transfers must compete with core accesses on the same block. Because of the large bandwidth available from each memory block, not all the memory bandwidth can be used by the core units, which leaves
ADSP-TS101 TigerSHARC Processor Programming Reference 1-17
Page 48
DSP Architecture
some memory bandwidth available for use by the DSP’s DMA processes or by the bus interface to serve other DSPs bus master transfers to the DSP’s memory.
Internal Transfer
Most registers of the TigerSHARC processor are classified as universal reg­isters (Uregs). Instructions are provided for transferring data between any two Uregs, between a Ureg and memory, or for the immediate load of a Ureg. This includes control registers and status registers, as well as the data registers in the register files. These transfers occur with the same tim­ing as internal memory load/store.
Data Accesses
Each move instruction specifies the number of words accessed from each memory block. Two memory blocks can be accessed on each cycle because of the two IALUs. For a discussion of data and register widths and the syntax that specifies these accesses, see “Register File Registers” on
page 2-5.
Quad Data Access
Instructions specify whether one, two or four words are to be loaded or stored. Quad words1 can be aligned on a quad-word boundary and long words aligned on a long-word boundary. This, however, is not necessary when loading data to computation units because a data alignment buffer (DAB) automatically aligns quad words that are not aligned in memory.
1
A memory quad word is comprised of four 32-bit words or 128 bits of data.
1-18 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 49
Introduction
Up to four data words from each memory block can be supplied to each computation unit, meaning that new data is not required on every cycle and leaving alternate cycles for I/O to the memories. This is beneficial in applications with high I/O requirements since it allows the I/O to occur without degrading core processor performance.
Booting
The internal memory of the TigerSHARC processor can be loaded from an 8-bit EPROM using a boot mechanism at system powerup. The DSP can also be booted using another master or through one of the link ports. Selection of the boot source is controlled by external pins. For informa­tion on booting the DSP, see the ADSP-TS101 TigerSHARC Processor Hardware Reference.
Scalability and Multiprocessing
The TigerSHARC processor, like the related Analog Devices product the SHARC DSP, is designed for multiprocessing applications. The primary multiprocessing architecture supported is a cluster of up to eight Tiger­SHARC processors that share a common bus, a global memory, and an interface to either a host processor or to other clusters. In large multipro­cessing systems, this cluster can be considered an element and connected in configurations such as torroid, mesh, tree, crossbar, or others. The user can provide a personal interconnect method or use the on-chip communi­cation ports.
The TigerSHARC processor improves on most of the multiprocessing capabilities of the SHARC DSP and enhances the data transfer band­width. These capabilities include:
On-chip bus arbitration for glueless multiprocessing
Globally accessible internal memory and registers
ADSP-TS101 TigerSHARC Processor Programming Reference 1-19
Page 50
Instruction Line Syntax and Structure
Semaphore support
Powerful, in-circuit multiprocessing emulation
Emulation and Test Support
The TigerSHARC processor supports the IEEE standard P1149.1 Joint Test Action Group (JTAG) standard for system test. This standard defines a method for serially scanning the I/O status of each component in a sys­tem. The JTAG serial port is also used by the TigerSHARC processor EZ-ICE® to gain access to the processor’s on-chip emulation features.
Instruction Line Syntax and Structure
TigerSHARC processor is a static superscalar DSP processor that executes from one to four 32-bit instruction slots in an instruction line. With few exceptions, an instruction line executes with a throughput of one cycle in an eight-deep pipeline. Figure 1-6 shows the instruction slot and line structure.
There are some important things to note about the instruction slot and instruction line structure and how this structure relates to instruction execution.
Each instruction line consists of up to four 32-bit instruction slots.
Instruction slots are delimited with one semicolon “;”.
Instruction lines are terminated with two semicolons “;;”.
The up to four instructions on an instruction line are executed in parallel.
Every instruction slot consists of a 32-bit opcode.
1-20 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 51
Introduction
An instruction LINE consists of up to four instruction SLOTS.
Slot_1_Instruction ; Slot_2_instruction ; Slot_3_instruction; Slot_4_instruction ;;
Each instruction SLOT is delimited with one semicolon.
The instruction LINE is terminated with two semicolons.
The first two instruction SLOTS are special:
1. (if used) Conditional (if-do, if-else) or a sequencer (jump or other) instructions must use SLOT 1.
2. (if used) Immediate extension instructions must use SLOT 2.
Figure 1-6. Instruction Line and Slot Structure
Some instructions (such as immediate extensions) require two 32­bit opcodes (instruction slots) to execute.
Some instructions (program sequencer, conditional, and immediate extension) require specific instruction slots.
An instruction is a 32-bit word that activates one or more of the Tiger­SHARC processor’s execution units to carry out an operation. The DSP executes or stalls the instructions in the same instruction line together. Although the DSP fetches quad words from memory, instruction lines do not have to be aligned to quad-word boundaries. Regardless of size (one to four instructions), instruction lines follow one after the other in memory
ADSP-TS101 TigerSHARC Processor Programming Reference 1-21
Page 52
Instruction Line Syntax and Structure
with a new instruction line beginning one word from where the previous instruction line ended. The end of an instruction line is identified by the most significant bit (MSB) in the instruction word.
Instruction Notation Conventions
The TigerSHARC processor assembly language is based on an algebraic syntax for ease of coding and readability. The syntax for TigerSHARC processor instructions selects the operation that the DSP executes and the mode in which the DSP executes the operation. Operations include com­putations, data movements, and program flow controls. Modes include Single-Instruction, Single-Data (SISD) versus Single-Instruction, Multi­ple-Data (SIMD) selection, data format selection, word size selection, enabling saturation, and enabling truncation. All controls on instruction execution are included in the DSP’s instruction syntax—there are no mode bits to set in control registers for this DSP.
This book presents instructions in summary format. This format presents all the selectable items and optional items available for an instruction. The conventions for these are:
this|that|other Lists of items delimited with a vertical bar “|” indi-
cate that syntax permits selection of one of the items. One item from the list must be selected. The vertical bar is not part of instruction syntax.
{option} An item or a list of items enclosed within curley
braces “{}” indicate an optional item. The item may be included or omitted. The curley braces are not part of instruction syntax.
() [] , ; ;; Parenthesis, square bracket, comma, semicolon,
double semicolon, and other symbols are required items in the instruction syntax and must appear
1-22 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 53
Introduction
where shown in summary syntax with one exception. Empty parenthesis (no options selected) may not appear in an instruction.
Rm Rmd Rmq Register names are replaceable items in the sum-
mary syntax and appear in italics. Register names indicate that the syntax requires a single (Rm), dou­ble (Rmd), or quad (Rmq) register. For more information on register name syntax, compute block selection, and data format selection, see “Reg-
ister File Registers” on page 2-5.
<imm#> Immediate data (literal values) in the summary syn-
tax appears as <imm#> with # indicating the bit width of the value.
For example, the following instruction in summary format:
{X|Y|XY}{S|B}Rs = MIN|MAX (Rm, Rn) {({U}{Z})} ;
could be coded as any of the following instructions:
XR3 = MIN (R2, R1) ; YBR2 = MAX (R1, R0) (UZ); XYSR2 = MAX (R3, R4) (U);
Unconditional Execution Support
The DSP supports unconditional execution of up to four instructions in parallel. This support lets programmers use simultaneous computations with data transfers and branching or looping. These operations can be combined with few restrictions. The following example code shows three instruction lines containing 2, 4, and 1 instruction slots each, respectively:
XR3:0=Q[J5+=J9]; YR1:0=R3:2+R1:0;; XR3:0=Q[J5+=J9]; YR3:0=Q[K5+=K9]; XYR7:6=R3:2+R1:0; XYR8=R4*R5;; J5=J9-J10;;
ADSP-TS101 TigerSHARC Processor Programming Reference 1-23
Page 54
Instruction Parallelism Rules
It is important to note that the above instructions execute uncondition­ally. Their execution does not depend on computation-based conditions. For a description of condition dependent (conditional) execution, see
“Conditional Execution Support” on page 1-24.
Conditional Execution Support
All instructions can be executed conditionally (a mechanism also known as predicated execution). The condition field exists in one instruction slot in an instruction line, and all the remaining instructions in that line either execute or not, depending on the outcome of the condition.
In a conditional computational instruction, the execution of the entire instruction line can depend on the specified condition at the beginning of the instruction line. Conditional instructions take one of the following forms:
IF Condition;
DO, Instruction; DO, Instruction; DO, Instruction ;;
IF Condition, Sequencer_Instruction;
ELSE, Instruction; ELSE, Instruction; ELSE, Instruction ;;
This syntax permits up to three instructions to be controlled by a condi­tion. For more information, see “Conditional Execution” on page 7-12.
Instruction Parallelism Rules
The TigerSHARC processor executes from one to four 32-bit instructions per line. The compiler or programmer determines which instructions may execute in parallel in the same line prior to runtime (leading to the name Static Superscalar). The DSP architecture places several constraints on the application of different instructions and various instruction combinations.
1-24 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 55
Introduction
Note that all the restrictions refer to combinations of instructions within the same line. There is no restriction of combinations between lines. There are, however, cases in which certain combinations between lines may cause stall cycles (see “Conditional Branch Effects on Pipeline” on
page 7-44), mostly because of data conflicts (operand of an instruction in
line n+1 is the result of instruction in line #n, which is not ready when fetched).
Table 1-1 on page 1-29 and Table 1-2 on page 1-34 identify instruction
parallelism rules for the TigerSHARC processor. The following sections provide more details on each type of constraint and accompany the details with examples:
“General Restriction” on page 1-36
“IALU Instruction Restrictions” on page 1-39
“Compute Block Instruction Restrictions” on page 1-37
“Sequencer Instruction Restrictions” on page 1-45
The instruction parallelism rules in Table 1-1 and Table 1-2 present the resource usage constraints for instructions that occupy instruction slots in the same instruction line. The horizontal axis lists resources—portions of the DSP architecture that are active during an instruction—and lists the number of resources that are available. The vertical axis lists instruction types—descriptive names for classes of instructions. For resources, a ‘1’ indicates that a particular instruction uses one unit of the resource, and a ‘2’ indicates that the instruction uses two units of the resource. Typical instructions of most classes are listed with the descriptive name for the instruction type.
It is important to note that Table 1-1 and Table 1-2 identify static restric­tions for the TigerSHARC processor. Static restrictions are distinguished from dynamic restrictions, in that static restrictions can be resolved by the
ADSP-TS101 TigerSHARC Processor Programming Reference 1-25
Page 56
Instruction Parallelism Rules
assembler. For example, the assembler flags the instruction
XR3:0 = Q[J0 += 3];; because the modifier is not a multiple of 4—this is
a static violation.
Dynamic restrictions cannot be resolved by the assembler because these restrictions represent runtime conditions, such as stray pointers. When the processor encounters a dynamic (runtime) violation, an exception is issued when the violation runs through the core. Whatever the case, the proces­sor does not arrive at a deadlock situation, although unpredictable results may be written into registers or memory.
As a dynamic restriction example, examine the instruction
xr3:0 = Q[J0 += 4];;. Although this instruction looks correct to the
assembler, it may violate hardware restrictions if J0 is not quad aligned. Because the assembler cannot predict what the code will do to J0 up to the point of this instruction, this violation is dynamic, since it occurs at runtime.
Further, Table 1-1 and Table 1-2 cover restrictions that arise from the interaction of instructions that share a line, but mostly omits restrictions of single instructions. An example of the former occurs when two instruc­tions attempt to use the same unit in the same line. An example of an individual instruction restriction is an attempt to use a register that is not valid for the instruction. For example, the instruction XR0 = CB[J5+=1];; is illegal because circular buffer accesses can only use IALU registers J0 through J3.
For most instruction types, you can locate the instruction in Table 1-1 or
Table 1-2 and read across to find out the resources it uses. Resource usage
for data movement instructions is more complicated to analyze. Resource usage for these instructions is calculated by adding together base resources, where base resources are determined by the type of move instruction. Move instructions are Ureg transfer (register to register), immediate load (immediate values to register), memory load (memory to register), and
1-26 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 57
Introduction
memory store (register to memory). Source resources are determined by the resource register and are only applicable when the source itself is a reg­ister (Ureg transfer and stores). Destination resources may be of two types:
Address pointer in post-modify (for example,
XR0 = [J0 += 2];;)
Destination register—only applicable when the destination is a reg­ister (Ureg transfer, memory loads and immediate loads)
If a particular combination of base, source, and destination uses more resources than are available, that combination is illegal. Consider, for example, the following instruction:
XR3:0 = Q[K31+0x40000];;
This is a memory load instruction, or specifically, a K-IALU load using a 32-bit offset. Reading across the table, the base resources used by the instruction are two slots in the line—the K-IALU instruction and the sec­ond instruction slot (for the immediate extension). The destination is
XR3:0, which are X-compute block registers. The ‘X-Register File,
Dreg = XR31–0’ line under ‘Ureg transfer and Store (Source Register) Resources’ in the table indicates that the instruction also uses an X-com­pute block port and an X-compute block input port.
The following Ureg transfer instruction provides another example:
XYR0=CJMP;;
This example uses the following resources:
One instruction slot
Base resources—an IALU instruction (no matter whether J-IALU or K-IALU) and the Ureg transfer resource (base resources) for the IALU instruction
ADSP-TS101 TigerSHARC Processor Programming Reference 1-27
Page 58
Instruction Parallelism Rules
Source resources—the sequencer I/O port
Destination resources—an X-compute block port, an X-compute block input port, a Y-compute block port, and a Y-compute block input port
By comparison, the instruction
R3:0 = j7:4;; uses an instruction slot, an
IALU slot (no matter whether J or K), the Ureg transfer slot, and the J­IALU input port and output port.
1-28 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 59
Introduction
Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions
Resources:
Inst. slots used
First inst. slot1Second inst. slot2IALU inst.
Resources Available: ⇒ Instruction Types: ⇓
IALU Arithmetic
J-IALU
Js = Jm Op Jn|Imm8
J-IALU, 32-bit immediate
Js = Jm Op Imm32
K-IALU
Ks = Km Op Kn|Imm8
K-IALU, 32-bit immediate
Ks = Km Op Imm32
Data Move (resource total = instr. + Uregs)
Ureg Transfer
Ureg = Ureg
Immediate Load (resource total = instr. + Ureg)
Immediate 16-bit Load
Ureg = Imm16
Immediate 32-bit Load
Ureg = Imm32
4112111112 2112 21111 3 3
111
2111
111
211 1
111
111
2 111
Imm. load or Ureg xfer
J-IALU
K-IALU
J-IALU-port I/O
K-IALU-port I/O
X-ports I/O3X-ports input
X-ports output
X-DAB
Y-ports I/O3Y-ports input
Y-ports output
Y-D AB
Seq.-port I/O
3
3
Ext. Port I/O
IOP-port I/O
Link Port I/O
ADSP-TS101 TigerSHARC Processor Programming Reference 1-29
Page 60
Instruction Parallelism Rules
Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)
Resources:
Inst. slots used
First inst. slot1Second inst. slot2IALU inst.
Resources Available:
4112111112 2112 21111 3 3
Instruction Types:
Memory Load (resource total = instr. + Ureg)
J-IALU Load
Ureg = [Jm +|+= Jn|imm8]
J-IALU Load, 32-bit offset
Ureg = [Jm +|+= imm32]
K-IALU Load
Ureg = [Km +|+= Kn|imm8]
K-IALU Load, 32-bit offset
Ureg = [Km +|+= imm32]
1111
2111
111
211 1
Memory Store (resource total = instr. + Ureg)
J-IALU Store
[Jm +|+= Jn|imm8] = Ureg
J-IALU Store, 32-bit offset
[Jm +|+= imm32] = Ureg
K-IALU Store
[Km +|+= Kn|imm8] = Ureg
K-IALU Store, 32-bit offset
[Km +|+= imm32] = Ureg
111
2111
111
211 1
Imm. load or Ureg xfer
J-IALU
K-IALU
J-IALU-port I/O
K-IALU-port I/O
X-ports I/O3X-ports input
X-ports output
X-DAB
Y-ports I/O3Y-ports input
Y-ports output
Y-D AB
Seq.-port I/O
3
3
Ext. Port I/O
IOP-port I/O
Link Port I/O
1-30 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 61
Introduction
Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)
Resources:
Inst. slots used
First inst. slot1Second inst. slot2IALU inst.
Resources Available:
4112111112 2112 21111 3 3
Instruction Types:
Ureg transfer and Store (Source Register) Resources
J-IALU
Ureg = J30–0|JB3–0|JL3–0
K-IALU
Ureg = K30–0|KB3–0|KL3–0
X-Register File
Dreg = XR31–0
Y-Register Fi l e
Dreg = XR31–0
XY-Register Files (SIMD)
Ureg = XYR31–0
Sequencer
Ureg = CJMP|RETI|RETS|…
External Port Control/Status
Ureg = SYSCON|BUSLK|…
I/O Processor (DMA)
Ureg = DCS0|DCD0|…
Link Port Control/Status/Buf.
Ureg = LCTL0|LCTL1|…
4
5
6
7
Imm. load or Ureg xfer
J-IALU
K-IALU
J-IALU-port I/O
K-IALU-port I/O
X-ports I/O3X-ports input
X-ports output
X-DAB
Y-ports I/O3Y-ports input
Y-ports output
Y-D AB
Seq.-port I/O
1
1
11
11
1111
1
1
3
3
Ext. Port I/O
IOP-port I/O
Link Port I/O
1
1
ADSP-TS101 TigerSHARC Processor Programming Reference 1-31
Page 62
Instruction Parallelism Rules
Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)
Resources:
Inst. slots used
First inst. slot1Second inst. slot2IALU inst.
Resources Available:
4112111112 2112 21111 3 3
Instruction Types:
Ureg Transfer and Load (Destination Register) Resources
J-IALU
Ureg = J30–0|JB3–0|JL3–0
K-IALU
Ureg = K30–0|KB3–0|KL3–0
X-Register File
Dreg = XR31–0
Y-Register Fi l e
Dreg = XR31–0
XY-Register Files (SIMD)
Ureg = XYR31–0
Sequencer
Ureg = CJMP|RETI|RETS|…
External Port Control/Status
Ureg = SYSCON|BUSLK|…
I/O Processor (DMA)
Ureg = DCS0|DCD0|…
Link Port Control/Status/Buf.
Ureg = LCTL0|LCTL1|…
4
5
6
7
Imm. load or Ureg xfer
J-IALU
K-IALU
J-IALU-port I/O
K-IALU-port I/O
X-ports I/O3X-ports input
X-ports output
X-DAB
Y-ports I/O3Y-ports input
Y-ports output
Y-D AB
Seq.-port I/O
1
1
11
11
11 11
1
1
3
3
Ext. Port I/O
IOP-port I/O
Link Port I/O
1
1
1-32 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 63
Introduction
Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)
Resources:
Inst. slots used
First inst. slot1Second inst. slot2IALU inst.
Resources Available: ⇒ Instruction Types: ⇓
Memory Load Ureg (Destination Register) Resources
X-Register File DAB/SDAB
XDreg = DAB q[addr] XDreg = XR31–0
Y-Register Fi l e DAB/SDAB
YDreg = DAB q[addr] YDreg = YR31–0
XY-Register Files DAB/SDAB
XYDreg = DAB q[addr] XYDreg = XYR31–0
4112111112 2112 21111 3 3
Imm. load or Ureg xfer
J-IALU
K-IALU
J-IALU-port I/O
K-IALU-port I/O
X-ports I/O3X-ports input
X-ports output
X-DAB
Y-ports I/O3Y-ports input
Y-ports output
Y-D AB
Seq.-port I/O
11 1
11 1
11 111 1
3
3
Ext. Port I/O
IOP-port I/O
Link Port I/O
1 If a conditional instruction is present on the instruction line, it must use the first instruction slot. 2 If an immediate extension is present on the instruction line, it must use the second instruction slot. 3 These resources are listed for informational purposes only. These constraints can not be exceeded
within the core.
4 Complete list is all registers in register groups 0x1A, 0x38, and 0x39: CJMP, RETI, RETIB, RETS,
DBGE, ILATSTL, ILATSTH, LC0, LC1, ILATL, ILATH, IMASKL, IMASKH, PMASKL, PMASKH, TIMER0L, TIMER0H, TIMER1L, TIMER1H, TMRIN0L, TMRIN0H, TMRIN1L, TMRIN1H, SQCTL, SQCTLST, SQCTLCL, SQSTAT, SFREG, ILATCLL, and ILATCLH.
5 Complete list is all registers in register groups 0x24 and 0x3A: SYSCON, BUSLK, SDRCON, SYS-
TAT, SYSTATCL, BMAX, BMAXC, AUTODMA0, and AUTODMA1.
6 Complete list is all registers in register groups 0x20 and 0x23: DCS0, DCD0, DCS1, DCD1, DCS2,
DCD2, DCS3, DCD3, DCNT, DCNTST, DCNTCL, CSTAT, and DSTATC.
7 Complete list is all registers in register groups 0x25 and 0x27: LBUFTX0, LBUFRX0, LBUFTX1,
LBUFRX1, LBUFTX2, LBUFRX2, LBUFTX3, LBUFRX3, LCTL0, LCTL1, LCTL2, LCTL3, LSTAT0, LSTAT1, LSTAT2, and LSTAT3.
ADSP-TS101 TigerSHARC Processor Programming Reference 1-33
Page 64
Instruction Parallelism Rules
Table 1-2. Parallelism Rules for Compute Block and Sequencer Instructions
Resources:
Inst. slots used
First inst. slot1Second inst. slot2X-Comp Block Inst.
Resources Available: ⇒ ⇓ Instruction Types: ⇓
Sequencer Instructions
Conditional Jump/Call, 16-bit offset
IF cond, JUMP|CALL Imm16
Conditional Jump/Call, 32-bit offset
IF cond, JUMP|CALL Imm32
Other Conditionals, Indirect Jumps, Static Flag Ops 1 1
X Compute Block Operations
X-ALU instruction, except quad output
XDreg = Dreg + Dreg
X-Multiplier instruction, except quad output
XDreg = Dreg * Dreg
X-Shifter instruction, except MASK, FDEP, STAT 111
X-ALU instruction with quad output
add_sub, EXPAND, MERGE)
(
X-Multiplier instruction with quad output 1 1 1 1
X-Shifter instructions MASK, FDEP, XSTAT 12
41121112111
11
211
111
111
1111
X-ALU
X-Multiplier
X-Shifter
Y-Comp Block Inst.
Y-A LU
Y-Multiplier
Y-Shifter
1-34 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 65
Introduction
Table 1-2. Parallelism Rules for Compute Block and Sequencer Instructions
Resources:
Inst. slots used
First inst. slot1Second inst. slot2X-Comp Block Inst.
Resources Available: ⇒ ⇓ Instruction Types: ⇓
Y Compute Block Operations
Y-ALU instruction, except quad output
YDreg = Dreg + Dreg
Y-Multiplier instruction, except quad output
YDreg = Dreg * Dreg
Y-Shifter instruction, except MASK, FDEP, STAT 111
Y-ALU instruction with quad output
add_sub, EXPAND, MERGE)
(
Y-Multiplier instruction with quad output 1 1 1 1
Y-Shifter instructions MASK, FDEP, YSTAT 12
X and Y Compute Block Operations (SIMD)
XY-ALU instruction, except quad output
XYDreg = Dreg + Dreg
XY-Multiplier instruction, except quad output
XYDreg = Dreg * Dreg
XY-Shifter instruction, except
XY-ALU instruction with quad output
add_sub, EXPAND, MERGE)
(
XY-Multiplier instruction with quad output 1 1 1 1 1 1 1
XY-Shifter instructions MASK, FDEP, X/YSTAT 12 2
MASK, FDEP, STAT 11111
41121112111
111
111
1111
11111
1 1111
1111111
X-ALU
X-Multiplier
X-Shifter
Y-Comp Block Inst.
Y-A LU
Y-Multiplier
Y-Shifter
1 If a conditional instruction is present on the instruction line, it must use the first instruction slot. 2 If an immediate extension is present on the instruction line, it must use the second instruction slot.
ADSP-TS101 TigerSHARC Processor Programming Reference 1-35
Page 66
Instruction Parallelism Rules
General Restriction
There is a general restriction that applies to all types of instructions: Two instructions may not write to the same register. This restriction is checked
statically by the assembler. For example:
XR0 = R1 + R2 ; XR0 = R5 * R6 ;; /* Invalid; these instructions cannot be on the same instruction line */
XR1 = R2 + R3 , XR1 = R2 - R3 ;; /* Invalid; add-subtract to the same register */
Consequently, a load instruction may not be targeted to a register that is updated in the same line by another instruction. For example:
XR0 = [J17 + 1] ; R0 = R3 * R8 ;; /* Invalid */
A load/store instruction in that uses post-modify and update addressing cannot load the same register that is used as the index Jm/Km (pointer to memory). For example:
J0 = [J0 += 1] ;; /* Invalid; J0 cannot be used as both destination (Js) and index (Jm) in a post-modify (+=) load or store */
No instruction can write to the CJMP register in the same line as a CALL instruction (which also updates the
if ALE, CALL label ; J6 = J0 + J1 (CJMP) ;; /* Invalid */
CJMP register). For example:
There are two types of loop counter updates, where combining them is illegal. For example:
IF LC0E; DO … ; LC0 = [J0 + J1] ;; /* Invalid */
1-36 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 67
Introduction
Compute Block Instruction Restrictions
There are two compute blocks, and instructions can be issued to either or both.
Instructions in the format XRs = Rm op Rn are issued to the X-com­pute block
Instructions in the format YRs = Rm op Rn are issued to the Y-com­pute block
Instructions in the format Rs = Rm op Rn or XYRs = Rm op Rn are issued to both the X- and Y-compute blocks
The following conditions apply when issuing instructions to the compute blocks. Note that the assembler statically checks all of these restrictions.
Up to two instructions can be issued to each compute block (mak­ing that a maximum of four compute block instructions in one line). Note, however, that for this rule, the instructions of type
Rs = Rm op Rn count as one instruction for each compute block.
For example:
R0 = R1 + R2 ; R3 = R4 * R5 ;; /* Valid; a total of four instructions */
XR0 = R1 + R2 ; XR3 = R4 * R5 ; XR6 = LSHIFT R1 BY R7 ;; /* Invalid; three instructions to compute block X */
Only one instruction can be issued to each unit (ALU, multiplier, or shifter) in a cycle. Each of the two instructions must be issued to a different unit (ALU, multiplier or shifter). For example:
XR0 = R1 + R2 ; XR6 = R1 + R2 ;; /* Invalid */
XR0 = R1 + R2 ; YR0 = R1 + R2 ;; /* Valid */
ADSP-TS101 TigerSHARC Processor Programming Reference 1-37
Page 68
Instruction Parallelism Rules
When one of the shifter instructions listed below is executed, it must be the only instruction in that line for the particular compute block. The instructions are: access to XSTAT/YSTAT registers. For example:
XR0 += MASK R1 BY R2 ; XR6 = R1 + R2 ;; /* Invalid; three operand shifter instruction in same line with an ALU operation; both issued to compute block X */
Only one unit (ALU or multiplier) can use two result buses. A unit uses two result buses either when the result is quad word or when there are two results (dual ADD and SUB instructions—R0 = R1+R2,
R5 = R1-R2;). Another instruction is allowed in the same line, as
long as it is not a shifter instruction. For example:
R0 = R1 + R2 , R5 = R1 - R2 ; XR6 = R1 * R2 ;; /* Valid */
R0 = R1 + R2 , R5 = R1 - R2 ; XR6 = LSHIFT R1 BY R2 ;; /* Invalid; shifter instruction and two result ALU instruc­tion */
FDEP, MASK, GETBITS, PUTBITS and
R0 = R1 + R2 , R5 = R1 - R2 ; XR3:0 = MR3:0 ;; /* Invalid; two instructions using two buses */
There can be no other compute block instruction with Shifter load/ store of
In the multiplier, the option
X/YSTAT.
CR (clear and set round bit) and the
option I (integer – not fractional) may not be used in the same multiply-accumulate instruction.
The
CR option of multiplier may be used only in these instructions:
MR3:2|MR1:0 +|-= Rm * Rn 32-bit fractional multiply-accumulate MR3:0 +|-= Rmd * Rnd Quad 16-bit fractional multiply-accumulate MR3:2|MR1:0 += Rm ** Rn Complex multiply-accumulate
1-38 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 69
Communications Logic Unit (CLU) register load instructions have the same restrictions as shifter instructions, with one exception—a CLU register load instruction can be executed in the same instruc­tion line with another compute instruction that has a quad result.
All CLU instructions, except for load of CLU registers, refer to the same rules as compute ALU instructions.
IALU Instruction Restrictions
There are four types of IALU instructions:
Memory load/store—for example: R0 = [J0 + 1] ;
IALU operations—for example: J0 = J1 + J2 ;
Load data—for example: R1 = 0xABCD ;
Introduction
Ureg transfer—for example: XR0 = YR0 ;
These restrictions apply when issuing instructions to the IALU. Except for the load data restriction, the assembler flags all of these restrictions.
Up to one J-IALU and up to one K-IALU instruction can be issued in the same instruction line. For example:
R0 = [J0 += 1] ; R1 = [K0 += 1] ;; /* It’s recommended that J0 and K0 point to different mem­ory blocks to avoid stall */
[J0 += 1] = XR0 ; [K0 += 1] = YR0;; J0 = [J5 + 1] ; XR0 = [K6 + 1] ;; R1 = 0xABCD ; R0 = [J0 += 1] ;; /* One load data instruction (in K-IALU) and one J-IALU operation */
XR0 = YR0 ; XR1 = [J0 += 1] ; YR1 = [K0 += 1] ;;
ADSP-TS101 TigerSHARC Processor Programming Reference 1-39
Page 70
Instruction Parallelism Rules
/* Invalid; three IALU instructions */
XR0 = [J0 + 1] ; YR0 = [J1 + 1] ;; /* Invalid; both use the same IALU (J-IALU) */
XR0 = [J0 + 1] ; J5 = J1 + 1 ;; /* Invalid; both use the same IALU (J-IALU) */
Two accesses to the same memory address in the same line, when one of them is a store instruction is liable to give unpredictable results.
Loading from external memory is only allowed to the compute block and IALU register files.
Reading from a multiprocessing broadcast zone is illegal.
Move register to register instruction: if one of the registers is com­pute block merged, the other may not be compute block register. For example:
XYR1:0 = XR11:8 ; /* Invalid */
XR11:8 = XYR1:0 ; /* Invalid */
XYR1:0 = J11:8 ; /* Valid */
J11:8 = XYR1:0 ; /* Valid */
A line of instructions may contain at the most one of either “load immediate data to register” or “Ureg to Ureg transfer” instructions. For example:
XR0 = YR0 ;; /* Valid */
XR5 = YR5 ; YR8 = [J3 + J6] ;; /* Valid */
1-40 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 71
Introduction
R0 = 0xFFFFFFFF ;; /* Valid; one load immediate data and one immediate exten­sion */
XR0 = YR0 ; J5 = 0xFFFF ;; /* Invalid; one Ureg to Ureg transfer and one load immedi­ate data instruction */
XR0 = YR0 ; J0 = XR1 ;; /* Invalid; two Ureg to Ureg transfers */
R0 = 0xFFFF ; J1 = 0xFF ;; /* Invalid; two load immediate data instructions */
Access via DAB must be through a quad word load. It can not be via “merged” Ureg groups. For example:
R3:0 = DAB Q[J0 += 4] ;; /* Valid; broadcast */
R1:0 = DAB Q[J0 += 4] ;; /* Invalid; merged */
DAB and circular buffer access to memory is allowed only with post-modify with update. For example:
XR1:0 = CB L[J2 + 2] ;; /* Invalid */
Register groups 0x20 to 0x3F can be accessed via Ureg transfer only.
In a register-to-register move,
XY register may not be used as source
or destination of the transaction, unless it is both source and desti­nation. For example:
R1:0 = R11:10 ;; /* Valid */
J1:0 = R11:10 ;; /* Invalid */
R3:0 = J3:0 ;; /* Invalid */
ADSP-TS101 TigerSHARC Processor Programming Reference 1-41
Page 72
Instruction Parallelism Rules
There can be up to two load instructions to the same compute block register file or up to one load to and one store from the same compute block register file. (A compute block register file has one input port and one input/output port.) If two store instructions are issued, none of them will be executed.For example:
[J0 + 1] = XR0 ; [K0 + 1] = XR1 ;; /* Invalid; attempts to use two output ports */
R0 = [J0 + 1] ; R1 = [K1 + 1] ;; /* Valid; uses two input ports in compute block X and Y */
R0 = [J0 + 1] ; [K1 + 1] = XR1 ;; /* Valid */
A Ureg transfer within the same compute register file cannot be used with any other store to that register file. For example:
XR3:0 = R7:4 ; [J17 + 2] = YR4 ;; /* Valid; different register files */
XR3:0 = R7:4 ; XR0 = [J17 + 2] ;; /* Valid; one Ureg trans. and one load to compute block X */
XR3:0 = R7:4 ; [J17 + 2] = XR4 ;; /* Invalid; one Ureg transfer and one store from compute block X */
R3:0 = R31:28 ;; /* Valid—SIMD Ureg transfer */
R3:0 = R31:28 ; [J17 + 2] = YR8;; /* Invalid—SIMD Ureg transfer (in both RFs) and store from compute block Y */
1-42 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 73
Introduction
Only one DAB load per Compute Block is allowed. For example:
XR3:0 = DAB Q[J0 += 4] ; XR7:4 = DAB Q[K0 += 4] ;; /* Invalid */
XR3:0 = DAB Q[J0 += 4] ; YR7:4 = DAB Q[K0 += 4] ;; /* Valid */
Only one memory load/store to and from the same single port reg­ister files is allowed. The single port register files are:
J-IALU registers: groups 0xC and 0xE
K-IALU registers: groups 0xD and 0xF
Bus Control registers: groups 0x24 and 0x3A
Sequencer, Interrupt and BTB registers: groups 0x1A, 0x30
0x39, and 0x3B
Debug logic registers: groups 0x1B, 0x3D0x3F
For example:
J0 = [J5 + 1] ; K0 = [K6 + 1] ;; /* Valid */
J0 = [J5 + 1] ; [K6 + 1] = K0 ;; /* Valid */
J0 = [J5 + 1] ; [K6 + 1] = J1 ;; /* Invalid; one load to J-IALU register file and one store from J-IALU register file */
Access to memory must be aligned to its size. For example, quad word access must be quad-word aligned. The long access must be aligned to an even address. This excludes load to compute block via
ADSP-TS101 TigerSHARC Processor Programming Reference 1-43
Page 74
Instruction Parallelism Rules
DAB. In addition, the immediate address modifier must be a mul­tiple of four in quad accesses and of two in long accesses. For example:
XR3:0 = Q[J0 += 3] ;; /* Invalid */
XR3:0 = Q[J0 += 4] ;; /* Valid */
A Ureg store instruction and an instruction that updates the same Ureg may not be issued in the same instruction line, because the store instruction may be stalled and by the time it progresses, the contents may have been modified by the update instruction. For example:
XR0 = R1 + R3 ; Q[J7 += 4] = XR3:0 ;; /* Invalid */
IF ALE, CALL label ; [J0 += 1] = CJMP ;; /* Invalid; CJMP is updated by the call instruction */
For the following J-IALU circular buffer or bit-reversed addressing operations, Jm (the index) only may be J0, J1, J2, or J3:
Js = Jm +|- Jn (CB) Ureg = CB [L] [Q] (Jm +|+= Jn|Imm) CB [L] [Q] (Jm +/+= Jn|Imm) = Ureg Ureg = DAB [L] [Q] (Jm +|+= Jn|Imm) Ureg = BR [L] [Q] (Jm +|+= Jn|imm) BR [L] [ Q] (Jm +|+= Jn|imm) = Ureg Ureg = BR [L] [Q] (Jm +|+= Jn|Imm)
The same restrictions apply to K-IALU instructions that use circu-
L
lar buffer or bit-reversed addressing operations.
1-44 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 75
Introduction
On load or store instructions the memory address may not be a reg­ister. For example, the address may not be a memory mapped register address in the range of
Q[J2 + 0] = XR3:0 ;; /* Invalid if J2 is in the range of 0x180000 to 0x1FFFFF */
0x180000 to 0x1FFFFF. For example:
If one IALU is used to access the other IALU register, there may not be an immediate load instruction in the same line. For example:
Q[J2 + 0] = K3:0 ; XR0 = 100 ;; /* Invalid */
Q[K2 + 0] = K3:0 ; XR0 = 100 ;; /* Valid */
Sequencer Instruction Restrictions
There can be one sequencer instruction and one immediate extension per line, where the sequencer instruction can be jump, indirect jump, and other instructions. The assembler statically checks all of these restrictions:
The sequencer instruction must be the first instruction in the four­slot instruction line.
The immediate extension must be the second instruction in the four-slot instruction line.
The immediate extension is counted as one of the four instructions in the line.
ADSP-TS101 TigerSHARC Processor Programming Reference 1-45
Page 76
Instruction Parallelism Rules
There cannot be two instructions that end in the same quad-word boundary, and where both have branch instructions with a pre­dicted bit set. For example:
IF MLE, JUMP + 100 ;; /* begin address 100 */ IF NALE JUMP -50 ; XR0 = R5 + R6 ; J0 = J2 + J3 ; YR4 = [K3 + 40] ;; /* Valid; first instruction line ends on 1001; second instruction line ends on 1005 */
IF MLE, JUMP + 100 ;; /* begin address 100 */ IF NALE JUMP - 50 ;; /* Invalid; both lines within the same quad word */
For instruction SCFx += op Cond, there can be no operation between compute block static flags (XSF0/1, YSF0/1, and XYSF0/1) and non-compute block conditions.
1-46 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 77
2 COMPUTE BLOCK REGISTERS
The TigerSHARC processor core contains two compute blocks. Each compute block contains a register file and three independent computation units—an ALU, a multiplier, and a shifter. Because the execution of all computational instructions in the TigerSHARC DSP depends on the input and output data formats and depends on whether the instruction is executed on one computational block or both, it is important to under­stand how to use the TigerSHARC DSP’s compute block registers. This chapter describes the registers in the compute blocks, shows how the regis­ter name syntax controls data format and execution location, and defines the available data formats.
The DSP has two compute blocks—compute block X and compute block Y. Each block contains a register file and three independent compu­tation units. The units are the ALU, multiplier, and shifter.
A general-purpose, multiport, 32-word data register file in each compute block serves for transferring data between the computation units and the data buses and stores intermediate results. Figure 2-1 shows how each of the register files provide the interface between the internal buses and the computational units within the compute blocks.
As shown in Figure 2-1, data input to the register file passes through the data alignment buffer (DAB). The DAB is a two quad-word FIFO that provides aligned data for registers when dual- or quad-register loads receive misaligned data from memory. For more information on using the DAB, see “IALU” on page 6-1.
ADSP-TS101 TigerSHARC Processor Programming Reference 2-1
Page 78
COMPUTE BLOCK X
COMPUTE BLOCK Y
DAB
128 128
X
REGISTER
FILE
32x32
MULTIPLIER
ALU
SHIFTER
128 128
128
TO DATA BUSES
6464
DAB
128 128
Y
REGISTER
FILE
32x32
MULTIPLIER
ALU
SHIFTER
128 128
128
TO DATA BUSES
6464
Figure 2-1. Data Register Files in Compute Block X and Y
Within the compute block, there are two types of registers—mem­ory-mapped registers and non-memory-mapped registers. The memory mapped registers in each of the compute blocks are the general-purpose data register file registers XR31–0 and YR31–0. Because these registers are memory mapped, they are accessible to external bus devices.
For operations within a single DSP, the distinction between mem­ory-mapped and non-memory-mapped compute block registers is important because the memory-mapped registers are Universal registers (Ureg). The Ureg group of registers is available for many types of opera­tions working with portions of the DSP’s core other than the portion of the core where the Ureg resides. The compute block Ureg registers can be
2-2 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 79
Compute Block Registers
used for additional operations unavailable to other tinguish the compute block register file registers from other Ureg registers, the XR31–0 and YR31–0 registers are also referred to as Data registers (Dreg).
For operations in a multiprocessing DSP system, it is very useful that 90% of the registers in the TigerSHARC processor are memory-mapped regis­ters. The memory-mapped registers have absolute addresses associated with them, meaning that they can be accessed by other processors through multiprocessor space or accessed by any other bus masters in the system.
L
The compute blocks have a few registers that are non-memory mapped. These registers do not have absolute addresses associated with them. The non-memory-mapped registers are special registers that are dedicated for special instructions in each compute block. The unmapped registers in the compute blocks include:
A DSP can access its own registers by using the multiprocessor memory space, but the DSP would have to tie up the external bus to access its own registers this way.
Compute block status (XSTAT and YSTAT) registers
Parallel Result (XPR1–0 and YPR1–0) registers—ALU
Ureg registers. To dis-
Multiplier Result (XMR3–0 and YMR3–0) registers—Multiplier
Multiplier Result Overflow (XMR4 and YMR4) registers—Multiplier
Bit FIFO Overflow Temporary ( Shifter
ADSP-TS101 TigerSHARC Processor Programming Reference 2-3
XBFOTMP and YBFOTMP) registers—
Page 80
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
00000000
MIS—Multiplier floating-pt. invalid op., sticky MOS—Multiplier fixed-pt. overflow, sticky MVS—Multiplier floating-pt. overflow, sticky MUS—Multiplier floating-pt, underflow, sticky AIS—ALU floating-pt. invalid op., sticky AOS—ALU fixed-pt. overflow, sticky AVS—ALU floating-pt. overflow, sticky AUS—ALU floating-pt. underflow,sticky
Reserved
IVEN—Invalid enable OEN—Overflow enable UEN—Underflow enable
Reserved
00000 0 0 0
Figure 2-2. XSTAT/YSTAT (Upper) Register Bit Descriptions
The non-memory-mapped registers serve special purposes in each com­pute block. The
X/YSTAT registers (shown in Figure 2-2 and Figure 2-3)
hold the status flags for each compute block. These flags are set or reset to indicate the status of an instruction’s execution a compute block’s ALU, multiplier, and shifter. The X/YPR1–0 registers hold parallel results from the ALU’s SUM, ABS, VMAX, and VMIN instructions. The X/YMR3–0 registers optionally hold results from fixed-point multiply operations, and the
X/YMR4 register holds overflow from those operations. The X/YBFOTMP reg-
isters temporarily store or return overflow from
GETBITS and PUTBITS
instructions.
2-4 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 81
Compute Block Registers
1514131211109876543210
0000000000000000
AZ—ALU zero AN—ALU negative AV—ALU overflow AC—ALU carry MZ—Multiplier zero MN—Multiplier negative MV—Multiplier overflow MU—Multiplier underflow SZ—Shifter zero SN—Shifter negative BF—Block floating-point flags AI—ALU floating-point invalid operation MI—Multiplier floating-point invalid operation TROV—Trellis overflow TRSOV—Trellis overflow, sticky
Figure 2-3. XSTAT/YSTAT (Lower) Register Bit Descriptions
Register File Registers
The compute block X and Y register files contain thirty-two 32-bit regis­ters, which serve as a compute block’s interface between DSP internal bus and the computational units. The register file registers—XR31–0 and
YR31–0—are both universal registers (Ureg) and data registers (Dreg).
All inputs for computations come from the register file and all results are sent to the register file, except for fixed-point multiplies which can optionally be sent to the MR3–0 registers.
L
ADSP-TS101 TigerSHARC Processor Programming Reference 2-5
It is important to note that a register may be used once in an instruction slot, but the assembly syntax permits using registers mul­tiple times within an instruction line (which contains up to four instruction slots). The register file registers are hardware inter­locked, meaning that there is dependency checking during each computation to make sure the correct values are being used. When
Page 82
Register File Registers
a computation accesses a register, the DSP performs a register check to make sure there are no other dependencies on that regis­ter. For more information on instruction lines and dependencies, see “Instruction Line Syntax and Structure” on page 1-20 and
“Instruction Parallelism Rules” on page 1-24.
There are many ways to name registers in the TigerSHARC DSP’s assem­bly syntax. The register name syntax provides selection of many features of computational instructions. Using the register name syntax in an instruc­tion, you can specify:
Compute block selection
Register width selection
Operand size selection
Data format selection
2-6 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 83
Compute Block Registers
Figure 2-4 shows the parts of the register name syntax and the features
that the syntax selects.
___R_
Register name
Register width selection (# or #:#) Fixed- or floating-point data format selection (none or F) Operand size selection (none, L, S, or B) Compute block selection (none, X, Y, or XY)
{for result registers only}
Figure 2-4. Register File Register Name Syntax
The DSP’s assembly syntax also supports selection of integer or
L
fractional and real or complex data types. These selections are pro­vided as options to instructions and are not part of register file register name syntax.
Compute Block Selection
As shown in Figure 2-4, the assembly syntax for naming registers lets you select the compute block of the register with which you are working.
The X and Y register-name prefixes denote in which compute block the register resides: X = compute block X only, Y = compute block Y only, and XY (or no prefix) = both. The following ALU instructions provide some register name syntax examples.
XR0 = R1 + R2 ;; /* This instruction executes in block X */
This instruction uses registers XR0, XR1, and XR2.
YR1 = R5 + R6 ;; /* This instruction executes in block Y */
This instruction uses registers YR1, YR5, and YR6.
ADSP-TS101 TigerSHARC Processor Programming Reference 2-7
Page 84
Register File Registers
XYR0 = R0 + R2 ;; /* This instruction executes in block X & Y */
This instruction uses registers XR0, XR2, YR0, and YR2.
R0 = R22 + R3 ;; /* This instruction executes in block X & Y */
This instruction uses registers XR0, XR22, XR3, YR0, YR22, and YR3.
Because the compute block prefix lets you select between executing the instruction in one or both compute blocks, this prefix provides the selec­tion between Single-Instruction, Single-Data (SISD) execution and Single-Instruction, Multiple-Data (SIMD) execution. Using SIMD execu­tion is a powerful way to optimize execution if the same algorithm is being used to process multiple channels of data.
It is important to note that SISD and SIMD are not modes that are turned on or off with some latency in the change. SISD and SIMD execution are always available as execution options simply through register name selection.
To represent optional items, instruction syntax definitions use curley braces { } around the item. To represent choices between items, instruc­tion syntax definitions place a vertical bar | between items. The following syntax definition example and comparable instruction indicates the differ­ence for compute block selection:
{X|Y|XY}Rs = Rm + Rn ;; /* the curly braces enclose options */ /* the vertical bars separate choices */
XYR0 = R1 + R0 ;; /* code, no curly braces — no vertical bars */
Register Width Selection
As shown in Figure 2-4 on page 2-7, the assembly syntax for naming regis­ters lets you select the width of the register with which you are working.
2-8 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 85
Compute Block Registers
Each individual register file register (
XR31–0 and YR31–0) is 32 bits wide.
To support data sizes larger than a 32-bit word, the DSP’s assembly syntax lets you combine registers to hold larger words. The register name syntax for register width works as follows:
Rs, Rm, or Rn indicates a Single register containing a 32-bit word (or smaller).
For example, these are register names such as R1, XR2, and so on.
Rsd, Rmd, or Rnd indicates a Double register containing a 64-bit word (or smaller).
For example, these are register names such as R1:0, XR3:2, and so on. The lower register must be evenly divisible by two.
Rsq, Rmq, or Rnq indicates a Quad register containing a 128-bit word (or smaller).
For example, these are register names such as R3:0, XR7:4, and so on. The lowest register must be evenly divisible by 4.
The combination of italic and code font in the register name syntax above indicates a user-substitutable value. Instruction syntax definitions use this convention to represent multiple register names. The following syntax definition example and comparable instruction indicates the difference for register width selection.
{X|Y|XY}Rsd = Rmd + Rnd ;; /* replaceable register names, italics are variables */
XR1:0 = R3:2 + R1:0 ;; /* code, no substitution */
ADSP-TS101 TigerSHARC Processor Programming Reference 2-9
Page 86
Register File Registers
Operand Size and Format Selection
As shown in Figure 2-4 on page 2-7, the assembly syntax for naming regis­ters lets you select the operand size and fixed- or floating-point format of the data placed within the register with which you are working.
Single, double, and quad register file registers ( (inputs and outputs) for instructions. Depending on the operand size and fixed- or floating-point format, there may be more that one operand in a register.
To select the operand size within a register file register, a register name prefix selects a size that is equal or less than the size of the register. These operand size prefixes for fixed-point data work as follows.
B — Indicates Byte (8-bit) word data. The data in a single 32-bit register is treated as four 8-bit words. Example register names with byte word operands are
S — Indicates Short (16-bit) word data. The data in a single 32-bit register is treated as two 16-bit words. Example register names with short word operands are SR1, SR1:0, and SR3:0.
None — Indicates Normal (32-bit) word data. Example register names with normal word operands are R0 R1:0, and R3:0.
L — Indicates Long (64-bit) word data. An example register name with a long word operand is LR1:0.
BR1, BR1:0, and BR3:0.
Rs, Rsd, Rsq) hold operands
L
2-10 ADSP-TS101 TigerSHARC Processor Programming Reference
The B, S, and L options apply for ALU and Shifter operations. Operand size selection differs slightly for the multiplier. For more information, see “Multiplier Operations” on page 4-4.
Page 87
Compute Block Registers
To distinguish between fixed- and floating-point data, the register name prefix F indicates that the register contains floating-point data. The DSP supports the following floating-point data formats.
None — Indicates fixed-point data
FRs, FRm, or FRn (floating-point data in a single register) — Indi-
cates normal (IEEE format, 32-bit) word data. An example register name with a normal word, floating-point operand is FR3.
FRsd, FRmd, or FRnd (floating-point data in a double register) — Indicates extended (40-bit) word data. An example register name with an extended word, floating-point operand is FR1:0.
ADSP-TS101 TigerSHARC Processor Programming Reference 2-11
Page 88
Register File Registers
It is important to note that the operand size influences the execution of the instruction. For example,
SRsd = Rmd + Rnd;; is an addition of four
short data operands, stored in two register pairs. An example of this type of instruction follows and has the results shown in Figure 2-5.
SR1:0 = R31:30 + R25:24;;
Registers
R31:30
R25:24
R1:0
[31:16]
[31:16]
R31[15:0]+R31[31:16]+
R25[15:0]R25[31:16]
[15:0]
[15:0]
R30[31:16]+
Low RegisterHigh Register
[31:16]
[31:16]
[15:0]
[15:0]
R30[15:0]+
R24[15:0]R24[31:16]
Figure 2-5. Addition of Four Short Word Operands in Double Registers
As shown in Figure 2-5, this instruction executes the operation on all 64 bits in this example. The operation is executed on every group of 16 bits separately.
2-12 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 89
Compute Block Registers
Registers File Syntax Summary
Data register file registers are used in computational instructions and memory load/store instructions. The syntax for those instructions is described in:
“ALU” on page 3-1
“Multiplier” on page 4-1
“Shifter” on page 5-1
The following ALU instruction syntax description shows the conventions that all syntax descriptions use for data register file names:
{X|Y|XY}{F}Rsd = Rmd + Rnd ;;
Where:
{X|Y|XY} — The X, Y, or XY (none is same as XY) prefix on the register name selects the compute block or blocks to execute the instruction. The curly braces around these items indicate they are optional, and the vertical bars indicate that only one may be chosen.
{F} — The F prefix on the register name selects floating-point for­mat for the operation. Omitting the prefix selects fixed-point format.
Rsd — The result is a double register as indicated by the d. The reg-
• ister name takes the form divisible by two (as in
R#:#, where the lower number is evenly
R1:0).
Rmd, Rnd — The inputs are double registers. The m and n indicate that these must be different registers.
ADSP-TS101 TigerSHARC Processor Programming Reference 2-13
Page 90
Register File Registers
Here are some examples of register naming. In Figure 2-6, the register name
XBR3 indicates the operation uses four fixed-point 8-bit words in the
X compute block R3 data register. In Figure 2-7, the register name XSR3 indicates the operation uses two fixed-point 16-bit words in the X com­pute block R3 data register. In Figure 2-8, the register name XR3 indicates the operation uses one fixed-point 32-bit word in the X compute block R3 data register. In Figure 2-8, the register name XFR3 indicates floating-point data.
31 24 23 16 15 8 7 0
XBR3
(Byte)
8 bits
8 bits
8 bits
8 bits
Figure 2-6. Register R3 in Compute Block X, Treated as Byte Data
31 16 15 0
XSR3
(Short)
16 bits16 bits
Figure 2-7. Register R3 in Compute Block X, Treated as Short Data
31 0
XR3 or XFR3
(Normal)
32 bits
Figure 2-8. Register R3 in Compute Block X, Treated as Normal Data
2-14 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 91
Compute Block Registers
Here are additional examples of register naming. Figure 2-9, Figure 2-10, and Figure 2-11 show examples of operand size in double registers, which are similar to the examples in Figure 2-6, Figure 2-7, and Figure 2-8.
63 48 47 32 31 16 15 0
56 55 40 39 24 23 8 7
XBR3:2
(Byte)
8 bits8 bits8 bits8 bits 8 bits 8 bits 8 bits 8 bits
Figure 2-9. Register R3:2 in Compute Block X, Treated as Byte Data
63 48 47 32 31 16 15 0
XSR3:2
(Short)
16 bits 16 bits 16 bits 16 bits
Figure 2-10. Register R3:2 in Compute Block X, Treated as Short Data
63 32 31 0
XR3:2
(Normal)
32 bits 32 bits
Figure 2-11. Register R3:2 in Compute Block X, Treated as Normal Data
The examples in Figure 2-12 and Figure 2-13 refer to two registers, but hold a single data word.
63 40 39 0
XFR3:2
(Extended)
not used 40 bits
Figure 2-12. Register R3:2 in Compute Block X, Treated as Extended (Floating-Point) Data
ADSP-TS101 TigerSHARC Processor Programming Reference 2-15
Page 92
Numeric Formats
63 0
XLR3:2
(Long)
64 bits
Figure 2-13. Register R3:2 in Compute Block X, Treated as Long Data
Numeric Formats
The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports a 40-bit extended-precision version of the same format with eight additional bits in the mantissa. The DSP also supports 8-, 16-, 32-, and 64-bit fixed-point formats—fractional and integer—which can be signed (two’s-complement) or unsigned.
IEEE Single-Precision Floating-Point Data Format
IEEE Standard 754/854 specifies a 32-bit single-precision floating-point format, shown in Figure 2-14. A number in this format consists of a sign bit s, a 24-bit significand, and an 8-bit unsigned-magnitude exponent e.
For normalized numbers, the significand consists of a 23-bit fraction f and a hidden bit of 1 that is implicitly presumed to precede f22 in the signifi­cand. The binary point is presumed to lie between this hidden bit and f22. The least significant bit (LSB) of the fraction is f0; the LSB of the expo­nent is e0.
The hidden bit effectively increases the precision of the floating-point sig­nificand to 24 bits from the 23 bits actually stored in the data format. This bit also insures that the significand of any number in the IEEE nor­malized number format is always greater than or equal to 1 and less than 2.
2-16 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 93
Compute Block Registers
The unsigned exponent e can range between 1 e 254 for normal num- bers in the single-precision format. This exponent is biased by +127 (254/2). To calculate the true unbiased exponent, 127 must be sub­tracted from e.
31 30 23 22 0
FRs
se
. . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Hidden Bit
e01.f
22
Binary Point
f
0
Figure 2-14. IEEE 32-Bit Single-Precision Floating-Point Format (Normal Word)
The IEEE standard also provides for several special data types in the sin­gle-precision floating-point format:
An exponent value of 255 (all ones) with a nonzero fraction is a Not-A-Number (NAN). NANs are usually used as flags for data flow control, for the values of uninitialized variables, and for the results of invalid operations such as 0 ∗ ∞.
Infinity is represented as an exponent of 255 and a zero fraction. Note that because the fraction is signed, both positive and negative Infinity can be represented.
Zero is represented by a zero exponent and a zero fraction. As with Infinity, both positive zero and negative zero can be represented.
The IEEE single-precision floating-point data types supported by the DSP and their interpretations are summarized in Table 2-1.
ADSP-TS101 TigerSHARC Processor Programming Reference 2-17
Page 94
Numeric Formats
Table 2-1. IEEE Single-Precision Floating-Point Data Types
Type Exponent Fraction Value
NAN 255 Nonzero Undefined
Infinity 255 0 (–1)s Infinity Normal 1 e 254 Any (–1)s (1.f
Zero 0 0 (–1)s Zero
22-0
) 2 e
–127
The TigerSHARC processor is compatible with the IEEE single-precision floating-point data format in all respects, except for:
The TigerSHARC processor does not provide inexact flags.
NAN inputs generate an invalid exception and return a quiet NAN.
Denormal operands are flushed to zero when input to a computa­tion unit and do not generate an underflow exception. Any denormal or underflow result from an arithmetic operation is flushed to zero and an underflow exception is generated.
Round-to-nearest and round-towards-zero are supported. Round-to-±infinity are not supported.
2-18 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 95
Compute Block Registers
Extended Precision Floating-Point Format
The extended precision floating-point format is 40 bits wide, with the same 8-bit exponent as in the standard format but with a 32-bit signifi­cand. This format is shown in Figure 2-15. In all other respects, the extended floating-point format is the same as the IEEE standard format.
39 38 31 30 0
FRsd
se
. . . . . . . . . . . . . . . . . . . . . . . . . . .
7
Hidden Bit
e01.f
30
Binary Point
Figure 2-15. 40-Bit Extended-Precision Floating-Point Format (Extended Word)
Fixed-Point Formats
The DSP supports fixed-point fractional and integer formats for 16-, 32-, and 64-bit data. In these formats, numbers can be signed (two’s-comple­ment) or unsigned. The possible combinations are shown in Figure 2-20 through Figure 2-27. In the fractional format, there is an implied binary point to the left of the most significant magnitude bit. In integer format, the binary point is understood to be to the right of the LSB. Note that the sign bit is negatively weighted in a two’s-complement format.
L
The DSP supports a fixed-point, signed, integer format for 8-bit data. Data in the 8- and 16-bit formats is always packed in 32-bit registers as follows—a single register holds four 8-bit or two 16-bit words, a dual register holds eight 8-bit or four 16-bit words, and a quad register holds sixteen 8-bit or eight 16-bit words.
f
0
ADSP-TS101 TigerSHARC Processor Programming Reference 2-19
Page 96
Numeric Formats
ALU outputs always have the same width and data format as the inputs. The multiplier, however, produces a 64-bit product from two 32-bit inputs. If both operands are unsigned integers, the result is a 64-bit unsigned integer. If both operands are unsigned fractions, the result is a 64-bit unsigned fraction. These formats are shown in Figure 2-30 and
Figure 2-31.
If one operand is signed and the other unsigned, the result is signed. If both inputs are signed, the result is signed and automatically shifted left one bit. The LSB becomes zero and bit 62 moves into the sign bit posi­tion. Normally bit 63 and bit 62 are identical when both operands are signed. (The only exception is full-scale negative multiplied by itself.) Thus, the left shift normally removes a redundant sign bit, increasing the precision of the most significant product. Also, if the data format is frac­tional, a single bit left shift renormalizes the MSB to a fractional format. The signed formats with and without left shifting are shown in
Figure 2-28 and Figure 2-29.
The multiplier has an 80-bit accumulator to allow the accumulation of 64-bit products. For more information on the multiplier and accumula­tor, see “Multiplier” on page 4-1.
BRs Signed Integer
765 2 0
7
–2
Sign Bit
262
5
. . . . . . . . . . . . . . . . . . . . .
Binary Point
1
22212
0
.
Figure 2-16. 8-Bit Fixed-Point Format, Signed Integer (Byte Word)
2-20 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 97
Compute Block Registers
BRs
Signed
Fractional
Sign Bit
765 2 0
–2
–0
.
–2
2–12
Binary Point
. . . . . . . . . . . . . . . . . . . . .
Figure 2-17. 8-Bit Fixed-Point Format, Signed Fractional (Byte Word)
BRs
Unsigned
Integer
765 2 0
7
2
262
5
. . . . . . . . . . . . . . . . . . . . .
Figure 2-18. 8-Bit Fixed-Point Format, Unsigned Integer (Byte Word)
BRs
Unsigned
Fractional
765 2 0
–1
2
.
2–22
–3
. . . . . . . . . . . . . . . . . . . . .
1
2–52–62
1
22212
Binary Point
1
2–62–72
–7
–8
0
.
Binary Point
Figure 2-19. 8-Bit Fixed-Point Format, Unsigned Fractional (Byte Word)
ADSP-TS101 TigerSHARC Processor Programming Reference 2-21
Page 98
Numeric Formats
SRs Signed Integer
15 14 13 2 0
15
–2
Sign Bit
2142
13
. . . . . . . . . . . . . . . . . . . . .
Figure 2-20. 16-Bit Fixed-Point Format, Signed Integer (Short Word)
SRs Signed
Fractional
Sign Bit
15 14 13 2 0
–2
–0
.
–2
2–12
Binary Point
. . . . . . . . . . . . . . . . . . . . .
Figure 2-21. 16-Bit Fixed-Point Format, Signed Fractional (Short Word)
SRs
Unsigned
Integer
15 14 13 2 0
15
2
2142
13
. . . . . . . . . . . . . . . . . . . . .
1
22212
Binary Point
1
–132–142–15
2
1
22212
0
.
0
.
Binary Point
Figure 2-22. 16-Bit Fixed-Point Format, Unsigned Integer (Short Word)
2-22 ADSP-TS101 TigerSHARC Processor Programming Reference
Page 99
Compute Block Registers
SRs
Unsigned
Fractional
15 14 13 2 0
–1
2
.
Binary Point
2–22
–3
. . . . . . . . . . . . . . . . . . . . .
–142–152–16
2
Figure 2-23. 16-Bit Fixed-Point Format, Unsigned Fractional (Short Word)
Rs
Signed Integer
31 30 29 2 0
31
–2
Sign Bit
2302
29
. . . . . . . . . . . . . . . . . . . . .
22212
Binary Point
Figure 2-24. 32-Bit Fixed-Point Format, Signed Integer (Normal Word)
Rs
Signed
Fractional
31 30 29 2 0
–2
–0
.
2–12
–2
. . . . . . . . . . . . . . . . . . . . .
–292–302–31
2
1
1
0
.
1
Sign Bit
Binary Point
Figure 2-25. 32-Bit Fixed-Point Format, Signed Fractional (Normal Word)
ADSP-TS101 TigerSHARC Processor Programming Reference 2-23
Page 100
Numeric Formats
Rs
Unsigned
Integer
31 30 29 2 0
31
2
2302
29
. . . . . . . . . . . . . . . . . . . . .
22212
Binary Point
Figure 2-26. 32-Bit Fixed-Point Format, Unsigned Integer (Normal Word)
Rs
Unsigned Fractional
31 30 29 2 0
–1
2
.
Binary Point
2–22
–3
. . . . . . . . . . . . . . . . . . . . .
–302–312–32
2
Figure 2-27. 32-Bit Fixed-Point Format, Unsigned Fractional (Normal Word)
LRs Signed Integer
63 62 61 2 0
–2
63
–262–2
61
. . . . . . . . . . . . . . . . . . . . .
–22–21–2
1
0
.
1
1
0
.
Sign Bit
Binary Point
Figure 2-28. 64-Bit Fixed-Point Format, Signed Integer (Long Word)
2-24 ADSP-TS101 TigerSHARC Processor Programming Reference
Loading...