Datasheet ADSP-TS101 Datasheet (ANALOG DEVICES)

Download

Page 1

ADSP-TS101 TigerSHARC® Processor

Programming Reference

Analog Devices, Inc. One Technology Way Norwood, Mass. 02062-9106

Revision 1.1, February 2005

Part Number

82-001997-01

Page 2

Printed in the USA.

Disclaimer

Analog Devices, Inc. reserves the right to change this product without prior notice. Information furnished by Analog Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog Devices for its use; nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by implication or otherwise under the patent rights of Analog Devices, Inc.

Trademark and Service Mark Notice

The Analog Devices logo, Blackfin, EZ-ICE, EZ-KIT Lite, SHARC, TigerSHARC, the TigerSHARC logo, and VisualDSP++ are registered trademarks of Analog Devices, Inc.

SuperScalar is a trademark of Analog Devices, Inc.

All other brand and product names are trademarks or service marks of their respective owners.

Page 3

PREFACE

Purpose of This Manual ................................................................ xvii

Intended Audience ........................................................................ xvii

Manual Contents ......................................................................... xviii

What’s New in This Manual ........................................................... xix

Technical or Customer Support ....................................................... xx

Supported Processors ....................................................................... xx

Product Information ...................................................................... xxi

MyAnalog.com ........................................................................ xxii

Processor Product Information ................................................. xxii

Related Documents ............................................................... xxiii

Online Technical Documentation ........................................... xxiv

Accessing Documentation From VisualDSP++ ..................... xxv

Accessing Documentation From Windows ........................... xxv

Accessing Documentation From the Web ............................ xxvi

Printed Manuals ..................................................................... xxvi

VisualDSP++ Documentation Set ....................................... xxvi

Hardware Tools Manuals .................................................... xxvi

ADSP-TS101 TigerSHARC Processor Programming Reference iii

Page 4

CONTENTS

Processor Manuals ............................................................. xxvi

Data Sheets ...................................................................... xxvii

Conventions .............................................................................. xxviii

INTRODUCTION

DSP Architecture ......................................................................... 1-6

Compute Blocks ..................................................................... 1-8

Arithmetic Logic Unit (ALU) .............................................. 1-9

Multiply Accumulator (Multiplier) .................................... 1-11

Bit Wise Barrel Shifter (Shifter) ........................................ 1-11

Integer Arithmetic Logic Unit (IALU) ................................... 1-12

Program Sequencer ............................................................... 1-13

Quad Instruction Execution .............................................. 1-15

Relative Addresses for Relocation ...................................... 1-16

Nested Call and Interrupt ................................................. 1-16

Context Switching ............................................................ 1-16

Internal Memory and Other Internal Peripherals .................... 1-16

Internal Buses ................................................................... 1-17

Internal Transfer ............................................................... 1-18

Data Accesses ................................................................... 1-18

Quad Data Access ............................................................. 1-18

Booting ................................................................................ 1-19

Scalability and Multiprocessing ............................................. 1-19

Emulation and Test Support .................................................. 1-20

iv ADSP-TS101 TigerSHARC Processor Programming Reference

Page 5

CONTENTS

Instruction Line Syntax and Structure .......................................... 1-20

Instruction Notation Conventions ......................................... 1-22

Unconditional Execution Support .......................................... 1-23

Conditional Execution Support .............................................. 1-24

Instruction Parallelism Rules ....................................................... 1-24

General Restriction ................................................................ 1-36

Compute Block Instruction Restrictions ................................. 1-37

IALU Instruction Restrictions ................................................ 1-39

Sequencer Instruction Restrictions ......................................... 1-45

COMPUTE BLOCK REGISTERS

Compute Block Selection ......................................................... 2-7

Operand Size and Format Selection ........................................ 2-10

Registers File Syntax Summary ............................................... 2-13

Numeric Formats ........................................................................ 2-16

IEEE Single-Precision Floating-Point Data Format ................. 2-16

Extended Precision Floating-Point Format .............................. 2-19

Fixed-Point Formats .............................................................. 2-19

ADSP-TS101 TigerSHARC Processor Programming Reference v

Page 6

CONTENTS

ALU

ALU Operations ........................................................................... 3-5

ALU Instruction Options ........................................................ 3-7

Signed/Unsigned Option .................................................... 3-8

Saturation Option .............................................................. 3-8

Extension (ABS) Option ..................................................... 3-9

Truncation Option ............................................................. 3-9

Return Zero (MAX/MIN) Option .................................... 3-10

Fractional/Integer Option ................................................. 3-11

ALU Execution Status ........................................................... 3-11

AN — ALU Negative ....................................................... 3-13

AV — ALU Overflow ....................................................... 3-13

AI — ALU Invalid ............................................................ 3-14

AC — ALU Carry ............................................................ 3-14

ALU Execution Conditions ................................................... 3-14

ALU Static Flags ................................................................... 3-15

ALU Examples ........................................................................... 3-16

Example Parallel Addition of Byte Data ................................. 3-18

Example Sideways Addition of Byte Data ............................... 3-19

Example Parallel Result (PR) Register Usage .......................... 3-19

CLU Examples ........................................................................... 3-21

CLU Data Types and Sizes .................................................... 3-22

TMAX Function ................................................................... 3-23

Trellis Function ..................................................................... 3-24

vi ADSP-TS101 TigerSHARC Processor Programming Reference

Page 7

CONTENTS

Despread Function ................................................................ 3-26

CLU Execution Status ........................................................... 3-27

ALU Instruction Summary .......................................................... 3-28

MULTIPLIER

Multiplier Operations ................................................................... 4-4

Multiplier Instruction Options ................................................ 4-8

Signed/Unsigned Option ................................................... 4-10

Fractional/Integer Option ................................................. 4-10

Saturation Option ............................................................. 4-11

Truncation Option ............................................................ 4-12

Clear/Round Option ......................................................... 4-14

Complex Conjugate Option .............................................. 4-16

Multiplier Result Overflow (MR4) Register ............................ 4-17

Multiplier Execution Status ................................................... 4-18

Multiplier Execution Conditions ............................................ 4-20

Multiplier Static Flags ............................................................ 4-21

Multiplier Examples .................................................................... 4-21

Multiplier Instruction Summary .................................................. 4-23

SHIFTER

Shifter Operations ......................................................................... 5-3

Logical Shift Operation ........................................................... 5-5

Arithmetic Shift Operation ...................................................... 5-6

Bit Manipulation Operations ................................................... 5-7

ADSP-TS101 TigerSHARC Processor Programming Reference vii

Page 8

CONTENTS

Bit Field Manipulation Operations .......................................... 5-8

Bit Field Conversion Operations ........................................... 5-11

Bit Stream Manipulation Operations ..................................... 5-11

Shifter Instruction Options ................................................... 5-14

Sign Extended Option ...................................................... 5-15

Zero Filled Option ........................................................... 5-15

Shifter Execution Status ........................................................ 5-15

Shifter Execution Conditions ................................................ 5-16

Shifter Static Flags ................................................................ 5-17

Shifter Examples ......................................................................... 5-17

Shifter Instruction Summary ....................................................... 5-19

IALU

IALU Operations .......................................................................... 6-5

IALU Arithmetic, Logical, and Function Operations ................ 6-5

IALU Instruction Options .................................................. 6-6

Integer Data ................................................................... 6-7

Signed/Unsigned Option ................................................ 6-8

Circular Buffer Option ................................................... 6-8

Bit Reverse Option ......................................................... 6-9

Computed Jump Option ................................................. 6-9

IALU Execution Status ..................................................... 6-10

JN/KN–IALU Negative ................................................ 6-11

JV/KV–IALU Overflow ................................................ 6-11

JC/KC–IALU Carry ...................................................... 6-11

viii ADSP-TS101 TigerSHARC Processor Programming Reference

Page 9

CONTENTS

IALU Execution Conditions .............................................. 6-12

IALU Static Flags .............................................................. 6-13

IALU Data Addressing and Transfer Operations ..................... 6-13

Direct and Indirect Addressing .......................................... 6-14

Normal, Merged, and Broadcast Memory Accesses ............. 6-16

Data Alignment Buffer (DAB) Accesses ............................. 6-23

Circular Buffer Addressing ................................................ 6-27

Bit Reverse Addressing ...................................................... 6-31

Universal Register Transfer Operations .............................. 6-35

Immediate Extension Operations ....................................... 6-36

IALU Examples ........................................................................... 6-37

IALU Instruction Summary ......................................................... 6-39

PROGRAM SEQUENCER

Sequencer Operations ................................................................... 7-7

Conditional Execution ........................................................... 7-12

Branching Execution ............................................................. 7-16

Looping Execution ................................................................ 7-19

Interrupting Execution .......................................................... 7-20

Instruction Pipeline Operations ................................................... 7-26

Instruction Alignment Buffer (IAB) ....................................... 7-31

Branch Target Buffer (BTB) ................................................... 7-34

Conditional Branch Effects on Pipeline .................................. 7-44

ADSP-TS101 TigerSHARC Processor Programming Reference ix

Page 10

CONTENTS

Dependency and Resource Effects on Pipeline ........................ 7-55

Stall From Compute Block Dependency ............................ 7-56

Stall from Bus Conflict ..................................................... 7-59

Stall From Compute Block Load Dependency ................... 7-62

Stall From IALU Load Dependency .................................. 7-63

Stall From Load (From External Memory) Dependency ..... 7-64

Stall From Conditional IALU Load Dependency ............... 7-64

Interrupt Effects on Pipeline ................................................. 7-66

Interrupt During Conditional Instruction ......................... 7-68

Interrupt During Interrupt Disable Instruction ................. 7-70

Exception Effects on Pipeline ................................................ 7-72

Sequencer Examples .................................................................... 7-72

Sequencer Instruction Summary .................................................. 7-76

INSTRUCTION SET

ALU Instructions .......................................................................... 8-2

Add/Subtract .......................................................................... 8-3

Add/Subtract With Carry/Borrow ............................................ 8-6

Average ................................................................................... 8-8

Absolute Value/Absolute Value of Sum or Difference .............. 8-10

Negate .................................................................................. 8-13

Maximum/Minimum ............................................................ 8-14

Viterbi Maximum/Minimum ................................................. 8-17

Increment/Decrement ........................................................... 8-20

Compare ............................................................................... 8-22

x ADSP-TS101 TigerSHARC Processor Programming Reference

Page 11

CONTENTS

Clip ...................................................................................... 8-24

Sum ...................................................................................... 8-26

Ones Counting ...................................................................... 8-28

Parallel Result Register ........................................................... 8-29

Bit FIFO Increment .............................................................. 8-30

Parallel Absolute Value of Difference ...................................... 8-32

Sideways Sum ........................................................................ 8-34

Add/Subtract (Dual Operation) ............................................. 8-36

Pass ....................................................................................... 8-37

Logical AND/AND NOT/OR/XOR/NOT ............................ 8-38

Expand ................................................................................. 8-40

Compact ............................................................................... 8-45

Merge ................................................................................... 8-49

Add/Subtract (Floating-Point) ................................................ 8-51

Average (Floating-Point) ........................................................ 8-53

Maximum/Minimum (Floating-Point) ................................... 8-55

Absolute Value (Floating-Point) ............................................. 8-57

Negate (Floating-Point) ......................................................... 8-60

Compare (Floating-Point) ...................................................... 8-62

Floating- to Fixed-Point Conversion ...................................... 8-64

Fixed- to Floating-Point Conversion ...................................... 8-66

Floating-Point Normal to Extended Word Conversion ............ 8-68

Floating-Point Extended to Normal Word Conversion ............ 8-70

Clip (Floating-Point) ............................................................. 8-72

ADSP-TS101 TigerSHARC Processor Programming Reference xi

Page 12

CONTENTS

Copysign (Floating-Point) ..................................................... 8-74

Scale (Floating-Point) ............................................................ 8-76

Pass (Floating-Point) ............................................................. 8-78

Reciprocal (Floating-Point) ................................................... 8-80

Reciprocal Square Root (Floating-Point) ................................ 8-82

Mantissa (Floating-Point) ...................................................... 8-85

Logarithm (Floating-Point) ................................................... 8-87

Add/Subtract (Dual Operation, Floating-Point) ..................... 8-89

CLU Instructions ....................................................................... 8-91

Trellis Maximum (CLU) ........................................................ 8-92

Maximum (CLU) .................................................................. 8-99

Trellis Registers (CLU) ........................................................ 8-104

Despread (CLU) ................................................................. 8-106

Add/Compare/Select (CLU) ................................................ 8-113

Permute (Byte Word, CLU) ................................................. 8-117

Permute (Short Word, CLU) ............................................... 8-119

Multiplier Instructions .............................................................. 8-121

Multiply (Normal Word) ..................................................... 8-122

Multiply-Accumulate (Normal Word) .................................. 8-125

Multiply-Accumulate/Move (Dual Operation,

Normal Word) ................................................................. 8-130

Multiply (Quad-Short Word) .............................................. 8-138

Multiply-Accumulate (Quad-Short Word) ........................... 8-141

Multiply-Accumulate (Dual Operation,

Quad-Short Word) ........................................................... 8-146

xii ADSP-TS101 TigerSHARC Processor Programming Reference

Page 13

CONTENTS

Complex Multiply-Accumulate (Short Word) ....................... 8-152

Complex Multiply-Accumulate/Move (Dual Operation,

Short Word) ..................................................................... 8-156

Multiply (Floating-Point, Normal/Extended Word) .............. 8-163

Multiplier Result Register .................................................... 8-165

Compact Multiplier Result .................................................. 8-171

Shifter Instructions ................................................................... 8-175

Arithmetic/Logical Shift ...................................................... 8-176

Rotate ................................................................................. 8-179

Field Extract ........................................................................ 8-181

Field Deposit ....................................................................... 8-183

Field/Bit Mask .................................................................... 8-185

Get Bits ............................................................................... 8-187

Put Bits ............................................................................... 8-189

Bit Test ............................................................................... 8-191

Bit Clear/Set/Toggle ............................................................ 8-192

Extract Leading Zeros .......................................................... 8-194

Extract Exponent ................................................................. 8-195

XSTAT/YSTAT Register ...................................................... 8-196

Block Floating-Point ............................................................ 8-197

BFOTMP Register .............................................................. 8-199

IALU (Integer) Instructions ....................................................... 8-200

Add/Subtract (Integer) ......................................................... 8-202

Add/Subtract With Carry/Borrow (Integer) .......................... 8-204

Average (Integer) ................................................................. 8-206

ADSP-TS101 TigerSHARC Processor Programming Reference xiii

Page 14

CONTENTS

Compare (Integer) .............................................................. 8-208

Maximum/Minimum (Integer) ............................................ 8-210

Absolute Value (Integer) ...................................................... 8-212

Logical AND/AND NOT/OR/XOR/NOT (Integer) ............ 8-213

Arithmetic Shift/Logical Shift (Integer) ............................... 8-215

Left Rotate/Right Rotate (Integer) ....................................... 8-217

IALU (Load/Store/Transfer) Instructions ................................... 8-218

Universal Register Load (Data Addressing) ........................... 8-220

Universal Register Store (Data Addressing) .......................... 8-221

Data Register Load and DAB Operation

(Data Addressing) ............................................................ 8-222

Data Register Store (Data Addressing) ................................. 8-224

Universal Register Transfer .................................................. 8-226

Sequencer Instructions .............................................................. 8-228

Jump/Call ........................................................................... 8-230

Computed Jump/Call .......................................................... 8-232

Return (from Interrupt) ...................................................... 8-234

Reduce (Interrupt to Subroutine) ........................................ 8-236

If – Do (Conditional Execution) ......................................... 8-237

If – Else (Conditional Sequencing and Execution) ................ 8-238

Static Flag Registers ............................................................ 8-239

Idle ..................................................................................... 8-240

BTB Invalid ........................................................................ 8-241

xiv ADSP-TS101 TigerSHARC Processor Programming Reference

Page 15

CONTENTS

Trap .................................................................................... 8-242

Emulator Trap ..................................................................... 8-243

No Operation ...................................................................... 8-244

QUICK REFERENCE

ALU Quick Reference .................................................................. A-2

Multiplier Quick Reference .......................................................... A-6

Shifter Quick Reference ............................................................... A-8

IALU Quick Reference ............................................................... A-10

Sequencer Quick Reference ........................................................ A-13

REGISTER/BIT DEFINITIONS

INSTRUCTION DECODE

Instruction Structure .................................................................... C-1

Compute Block Instruction Format .............................................. C-3

ALU Instructions .................................................................... C-4

ALU Fixed-Point, Arithmetic and Logical

Instructions (CU=00) ...................................................... C-5

ALU Fixed-Point, Data Conversion

Instructions (CU=01) ...................................................... C-7

ALU Floating-Point, Arithmetic and Logical

Instructions (CU=01) .................................................... C-10

CLU Instructions ............................................................. C-12

Multiplier Instructions ......................................................... C-14

ADSP-TS101 TigerSHARC Processor Programming Reference xv

Page 16

CONTENTS

Shifter Instructions ............................................................... C-18

Shifter Instructions Using Single Normal-Word

Operands and Single Register ......................................... C-18

Shifter Instructions Using Single Long-Word

or Dual Normal-Word Operands and Dual Register ........ C-19

Shifter Instructions Using Short or Byte Operands

and Single or Dual Registers ........................................... C-20

Shifter Instructions Using Single Operand ......................... C-22

IALU (Integer) Instruction Format .............................................. C-24

IALU Move Instruction Format .................................................. C-25

IALU Load Data Instruction Format ........................................... C-27

IALU Load/Store Instruction Format .......................................... C-28

IALU Immediate Extension Format ............................................. C-32

Sequencer Instruction Format ..................................................... C-33

Sequencer Flow Control Instructions ..................................... C-33

Sequencer Direct Jump/Call Instruction Format .................... C-34

Sequencer Indirect Jump Instruction Format .......................... C-36

Condition Codes .................................................................. C-39

Compute Block Conditions .............................................. C-39

IALU Conditions ............................................................. C-40

Sequencer and External Conditions ................................... C-40

Sequencer Immediate Extension Format ...................................... C-41

Miscellaneous Instruction Format ............................................... C-42

INDEX

xvi ADSP-TS101 TigerSHARC Processor Programming Reference

Page 17

PREFACE

Thank you for purchasing and developing systems using TigerSHARC® processors from Analog Devices.

Purpose of This Manual

The ADSP-TS101 TigerSHARC Processor Programming Reference contains information about the DSP architecture and DSP assembly language for TigerSHARC processors. These are 32-bit, fixed- and floating-point digital signal processors from Analog Devices for use in computing, communications, and consumer applications.

The manual provides information on how assembly instructions execute on the TigerSHARC processor’s architecture along with reference information about DSP operations.

Intended Audience

The primary audience for this manual is a programmer who is familiar with Analog Devices processors. This manual assumes that the audience has a working knowledge of the appropriate processor architecture and instruction set. Programmers who are unfamiliar with Analog Devices processors can use this manual, but should supplement it with other texts (such as the appropriate hardware reference manuals and data sheets) that describe your target architecture.

ADSP-TS101 TigerSHARC Processor Programming Reference xvii

Page 18

Manual Contents

The manual consists of:

• Chapter 1, “Introduction” Provides a general description of the DSP architecture, instruction slot/line syntax, and instruction parallelism rules.

• Chapter 2, “Compute Block Registers” Provides a description of the compute block register file, register naming syntax, and numeric formats.

• Chapter 3, “ALU” Provides a description of the arithmetic logic unit (ALU) and communications logic unit (CLU) operation, includes ALU/CLU instruction examples, and provides the ALU instruction summary.

• Chapter 4, “Multiplier” Provides a description of the multiply-accumulator (multiplier) operation, includes multiplier instruction examples, and provides the multiplier instruction summary.

• Chapter 5, “Shifter” Provides a description of the bit wise, barrel shifter (shifter) operation, includes shifter instruction examples, and provides the shifter instruction summary.

• Chapter 6, “IALU” Provides a description of the integer arithmetic logic unit (IALU) and data alignment buffer (DAB) operation, includes IALU instruction examples, and provides the IALU instruction summary.

• Chapter 7, “Program Sequencer” Provides a description of the program sequencer operation, the instruction alignment buffer (IAB), the branch target buffer (BTB), and the instruction pipeline. This chapter also includes a program sequencer instruction summary.

xviii ADSP-TS101 TigerSHARC Processor Programming Reference

Page 19

• Chapter 8, “Instruction Set” Describes the ADSP-TS101 processor instruction set in detail, starting with an overview of the instruction line and instruction types.

• Appendix A, “Quick Reference” Contains a concise description of the ADSP-TS101 processor assembly language. It is intended to be used as an assembly programming reference.

• Appendix B, “Register/Bit Definitions” Provides register and bit name definitions to be used in ADSP-TS101 processor programs.

• Appendix C, “Instruction Decode” Identifies operation codes (opcodes) for instructions. Use this chapter to learn how to construct opcodes.

Preface

This programming reference is a companion document to the ADSP-TS101 TigerSHARC Processor Hardware Reference.

What’s New in This Manual

Revision 1.1 of the ADSP-TS101 TigerSHARC Processor Programming Reference corrects and closes all open Tool Anomaly Reports (TARs) against

this manual, adds figure titles that were missing, and updates Web site and contact numbers. These changes affect the preface, various chapters, appendices, and the index.

ADSP-TS101 TigerSHARC Processor Programming Reference xix

Page 20

Technical or Customer Support

You can reach Analog Devices, Inc. Customer Support in any of the following ways:

• Visit the Embedded Processing and DSP products Web site at

http://www.analog.com/processors/technicalSupport

• E-mail tools questions to

dsptools.support@analog.com

• E-mail processor questions to

embedded.support@analog.com dsp.support@analog.com

• Phone questions to 1-800-ANALOGD

• Contact your Analog Devices, Inc. local sales office or authorized distributor

• Send questions by mail to:

Analog Devices, Inc. One Technology Way P.O. Box 9106 Norwood, MA 02062-9106 USA

Supported Processors

The following is the list of Analog Devices, Inc. processors supported in VisualDSP++®.

xx ADSP-TS101 TigerSHARC Processor Programming Reference

Page 21

Preface

TigerSHARC (ADSP-TSxxx) Processors

The name “TigerSHARC” refers to a family of floating-point and fixed-point [8-bit, 16-bit, and 32-bit] processors. VisualDSP++ currently supports the following TigerSHARC processors:

ADSP-TS101, ADSP-TS201, ADSP-TS202, and ADSP-TS203

SHARC® (ADSP-21xxx) Processors

The name “SHARC” refers to a family of high-performance, 32-bit, floating-point processors that can be used in speech, sound, graphics, and imaging applications. VisualDSP++ currently supports the following SHARC processors:

ADSP-21020, ADSP-21060, ADSP-21061, ADSP-21062, ADSP-21065L, ADSP-21160, ADSP-21161, ADSP-21261, ADSP-21262, ADSP-21266, ADSP-21267, ADSP-21363, ADSP-21364, and ADSP-21365

Blackfin® (ADSP-BFxxx) Processors

The name “Blackfin” refers to a family of 16-bit, embedded processors. VisualDSP++ currently supports the following Blackfin processors:

ADSP-BF531, ADSP-BF532 (formerly ADSP-21532), ADSP-BF533, ADSP-BF535 (formerly ADSP-21535), ADSP-BF561, AD6532, and AD90747

Product Information

You can obtain product information from the Analog Devices Web site, from the product CD-ROM, or from the printed publications (manuals).

Analog Devices is online at www.analog.com. Our Web site provides information about a broad range of products—analog integrated circuits, amplifiers, converters, and digital signal processors.

ADSP-TS101 TigerSHARC Processor Programming Reference xxi

Page 22

Product Information

MyAnalog.com

MyAnalog.com is a free feature of the Analog Devices Web site that allows

customization of a Web page to display only the latest information on products you are interested in. You can also choose to receive weekly e-mail notifications containing updates to the Web pages that meet your interests. MyAnalog.com provides access to books, application notes, data sheets, code examples, and more.

Registration

Visit www.myanalog.com to sign up. Click Register to use MyAnalog.com. Registration takes about five minutes and serves as a means to select the information you want to receive.

If you are already a registered user, just log on. Your user name is your e-mail address.

Processor Product Information

For information on embedded processors and DSPs, visit the Analog Devices Web site at www.analog.com/processors, which provides access to technical publications, data sheets, application notes, product overviews, and product announcements.

xxii ADSP-TS101 TigerSHARC Processor Programming Reference

Page 23

Preface

You may also obtain additional information about Analog Devices and its products in any of the following ways.

• E-mail questions or requests for information to

embedded.support@analog.com dsp.support@analog.com

• Fax questions or requests for information to

1-781-461-3010 (North America) +49-89-76903-157 (Europe)

• Access the FTP Web site at

ftp ftp.analog.com (or ftp 137.71.25.69) ftp://ftp.analog.com

Related Documents

The following publications that describe the ADSP-TS101 TigerSHARC processor (and related processors) can be ordered from any Analog Devices sales office:

• ADSP-TS101S TigerSHARC Embedded Processor Data Sheet

• ADSP-TS101 TigerSHARC Processor Hardware Reference

• ADSP-TS101 TigerSHARC Processor Programming Reference

For information on product related development software and Analog Devices processors, see these publications:

• VisualDSP++ User's Guide for TigerSHARC Processors

• VisualDSP++ C/C++ Compiler and Library Manual for Tiger-

SHARC Processors

• VisualDSP++ Assembler and Preprocessor Manual for TigerSHARC Processors

ADSP-TS101 TigerSHARC Processor Programming Reference xxiii

Page 24

Product Information

• VisualDSP++ Linker and Utilities Manual for TigerSHARC Processors

• VisualDSP++ Kernel (VDK) User's Guide

Visit the Technical Library Web site to access all processor and tools manuals and data sheets:

http://www.analog.com/processors/technical_library

Online Technical Documentation

Online documentation comprises the VisualDSP++ Help system, software tools manuals, hardware tools manuals, processor manuals, the Dinkum Abridged C++ library, and Flexible License Manager (FlexLM) network license manager software documentation. You can easily search across the entire VisualDSP++ documentation set for any topic of interest. For easy printing, supplementary .PDF files of most manuals are also provided.

Each documentation file type is described as follows.

File Description

.CHM Help system files and manuals in Help format

.HTM or .HTML

.PDF VisualDSP++ and processor manuals in Portable Documentation Format (PDF).

Dinkum Abridged C++ library and FlexLM network license manager software documentation. Viewing and printing the Internet Explorer 4.0 (or higher).

Viewing and printing the .PDF files requires a PDF reader, such as Adobe Acrobat Reader (4.0 or higher).

.HTML files requires a browser, such as

If documentation is not installed on your system as part of the software installation, you can add it from the VisualDSP++ CD-ROM at any time by running the Tools installation. Access the online documentation from the VisualDSP++ environment, Windows® Explorer, or the Analog Devices Web site.

xxiv ADSP-TS101 TigerSHARC Processor Programming Reference

Page 25

Preface

Accessing Documentation From VisualDSP++

From the VisualDSP++ environment:

• Access VisualDSP++ online Help from the Help menu’s Contents, Search, and Index commands.

• Open online Help from context-sensitive user interface items (toolbar buttons, menu commands, and windows).

Accessing Documentation From Windows

In addition to any shortcuts you may have constructed, there are many ways to open VisualDSP++ online Help or the supplementary documentation from Windows.

Help system files (.

CHM) are located in the Help folder, and .PDF files are

located in the Docs folder of your VisualDSP++ installation CD-ROM. The Docs folder also contains the Dinkum Abridged C++ library and the FlexLM network license manager software documentation.

Using Windows Explorer

• Double-click the vdsp-help.chm file, which is the master Help system, to access all the other .CHM files.

• Double-click any file that is part of the VisualDSP++ documentation set.

Using the Windows Start Button

• Access VisualDSP++ online Help by clicking the Start button and choosing Programs, Analog Devices, VisualDSP++, and VisualDSP++ Documentation.

• Access the

.PDF files by clicking the Start button and choosing

Programs, Analog Devices, VisualDSP++, Documentation for Printing, and the name of the book.

ADSP-TS101 TigerSHARC Processor Programming Reference xxv

Page 26

Product Information

Accessing Documentation From the Web

Download manuals at the following Web site:

http://www.analog.com/processors/technical_library

Select a processor family and book title. Download archive (.ZIP) files, one for each manual. Use any archive management software, such as WinZip, to decompress downloaded files.

Printed Manuals

For general questions regarding literature ordering, call the Literature Center at 1-800-ANALOGD (1-800-262-5643) and follow the prompts.

VisualDSP++ Documentation Set

To purchase VisualDSP++ manuals, call 1-603-883-2430. The manuals may be purchased only as a kit.

If you do not have an account with Analog Devices, you are referred to Analog Devices distributors. For information on our distributors, log onto

http://www.analog.com/salesdir.

Hardware Tools Manuals

To purchase EZ-KIT Lite® and In-Circuit Emulator (ICE) manuals, call 1-603-883-2430. The manuals may be ordered by title or by product number located on the back cover of each manual.

Processor Manuals

Hardware reference and instruction set reference manuals may be ordered through the Literature Center at 1-800-ANALOGD (1-800-262-5643), or downloaded from the Analog Devices Web site. Manuals may be ordered by title or by product number located on the back cover of each manual.

xxvi ADSP-TS101 TigerSHARC Processor Programming Reference

Page 27

Preface

Data Sheets

All data sheets (preliminary and production) may be downloaded from the Analog Devices Web site. Only production (final) data sheets (Rev. 0, A, B, C, and so on) can be obtained from the Literature Center at 1-800-ANALOGD (1-800-262-5643); they also can be downloaded from the Web site.

To have a data sheet faxed to you, call the Analog Devices Faxback System at 1-800-446-6212. Follow the prompts and a list of data sheet code numbers will be faxed to you. If the data sheet you want is not listed, check for it on the Web site.

ADSP-TS101 TigerSHARC Processor Programming Reference xxvii

Page 28

Conventions

Text conventions used in this manual are identified and described as follows.

Example Description

Close command (File menu)

{this | that} Alternative items in syntax descriptions appear within curly brackets

[this | that] Optional items in syntax descriptions appear within brackets and sepa-

[this,…] Optional item lists in syntax descriptions appear within brackets

.SECTION Commands, directives, keywords, and feature names are in text with

filename Non-keyword placeholders appear in text with italic style format.

Titles in reference sections indicate the location of an item within the VisualDSP++ environment’s menu system (for example, the Close command appears on the File menu).

and separated by vertical bars; read the example as this or that. One or the other is required.

rated by vertical bars; read the example as an optional

delimited by commas and terminated with an ellipse; read the example as an optional comma-separated list of this.

letter gothic font.

Note: For correct operation, ... A Note: provides supplementary information on a related topic. In the online version of this book, the word Note appears instead of this symbol.

Caution: Incorrect device operation may result if ... Caution: Device damage may result if ...

A Caution: identifies conditions or inappropriate usage of the product that could lead to undesirable results or product damage. In the online version of this book, the word Caution appears instead of this symbol.

this or that.

Warn in g: Injury to device users may result if ... A Warning: identifies conditions or inappropriate usage of the product

[

that could lead to conditions that are potentially hazardous for devices users. In the online version of this book, the word Wa rnin g appears instead of this symbol.

xxviii ADSP-TS101 TigerSHARC Processor Programming Reference

Page 29

Preface

Additional conventions, which apply only to specific chapters, may appear throughout this document.

ADSP-TS101 TigerSHARC Processor Programming Reference xxix

Page 30

Conventions

xxx ADSP-TS101 TigerSHARC Processor Programming Reference

Page 31

1 INTRODUCTION

The ADSP-TS101 TigerSHARC Processor Programming Reference describes the Digital Signal Processor (DSP) architecture and instruction set. These descriptions provide the information required for programming TigerSHARC processor systems. This chapter introduces programming concepts for the DSP with the following information:

• “DSP Architecture” on page 1-6

• “Instruction Line Syntax and Structure” on page 1-20

• “Instruction Parallelism Rules” on page 1-24

The TigerSHARC processor is a 128-bit, high performance, next generation version of the ADSP-2106x SHARC DSP. The TigerSHARC processor sets a new standard of performance for digital signal processors, combining multiple computation units for floating-point and fixed-point processing as well as very wide word widths. The TigerSHARC processor maintains a ‘system-on-a-chip’ scalable computing design philosophy, including 6M bit of on-chip SRAM, integrated I/O peripherals, a host processor interface, DMA controllers, link ports, and shared bus connectivity for glueless MDSP (Multi Digital Signal Processing).

In addition to providing unprecedented performance in DSP applications in raw MFLOPS and MIPS, the TigerSHARC processor boosts performance measures such as MFLOPS/Watt and MFLOPS/square inch in multiprocessing applications.

ADSP-TS101 TigerSHARC Processor Programming Reference 1-1

Page 32

COMPUTATIONAL BLOCKS

SHIFTER

ALU

PROGRAMSEQUENCER

PC BTB IRQ

ADDR

IAB

FETCH

DATA ADDRESS GENERATION

INTEGER

J-IALU

32X32 32X 32

INTEGER

K-IALU

MULTIPLIER

FILE

32x32

128 128

DAB

128 128

FILE

32x32

MULTIPLIER

ALU

SHIFTER

128

Figure 1-1. ADSP-TS101 TigerSHARC Processor Core Diagram

As shown in Figure 1-1 and Figure 1-2, the processor has the following architectural features:

• Dual computation blocks—X and Y—each consisting of a multiplier, ALU, shifter, and a 32-word register file

• Dual integer ALUs—J and K—each containing a 32-bit IALU and 32-word register file

1-2 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 33

Introduction

I/O PROCESSOR

DMA

CONTROLLER

CONTROL/

STATUS/

TCBs

DMA ADDRESS

DMA DATA

INTERNAL MEMORY

MEMORY

64K X 32

32 128

MEMORY

64K X 32

128

MEMORY

64K X 3 2

I/O ADDRESS

LINK DATA

M0 ADDR

M0 DATA

M1 ADDR

M1 DATA

M2 ADDR

M2 DATA

LINK PORT

CONTROLLER

CONTROL/

JTAG PORT

SDRAM CONTROLLER

EXTERNAL PORT

MULTIPROCESSOR

INTERFACE

HOST INTERFACE

INPUT FIFO

OUTPUT BUFFER

OUTPUT FIFO

CLUSTER BUS

ARBITOR

LINK

PORTS

STATUS/

BUFFERS

ADDR

DATA

CNTRL

Figure 1-2. ADSP-TS101 TigerSHARC Processor Peripherals Diagram

• Program sequencer—Controls the program flow and contains an instruction alignment buffer (IAB) and a branch target buffer (BTB)

• Three 128-bit buses providing high bandwidth connectivity between all blocks

• External port interface including the host interface, SDRAM controller, static pipelined interface, four DMA channels, four link ports (each with two DMA channels), and multiprocessing support

ADSP-TS101 TigerSHARC Processor Programming Reference 1-3

Page 34

• 6M bits of internal memory organized as three blocks—M0, M1 and M2—each containing 16K rows and 128 bits wide (a total of 2M bit).

• Debug features

• JTAG Test Access Port

The TigerSHARC processor external port provides an interface to external memory, to memory-mapped I/O, to host processor, and to additional TigerSHARC processors. The external port performs external bus arbitration and supplies control signals to shared, global memory and I/O devices.

Figure 1-3 illustrates a typical single-processor system. A multiprocessor

system is illustrated in Figure 1-4 on page 1-6 and is discussed later in

“Scalability and Multiprocessing” on page 1-19.

The TigerSHARC processor includes several features that simplify system development. The features lie in three key areas:

• Support of IEEE floating-point formats

• IEEE 1149.1 JTAG serial scan path and on-chip emulation features

• Architectural features supporting high-level languages and operating systems

The features of the TigerSHARC processor architecture that directly support high-level language compilers and operating systems include:

• Simple, orthogonal instruction allowing the compiler to efficiently use the multi-instruction slots

• General-purpose data and IALU register files

• 32- and 40-bit floating-point and 8-, 16-, 32-, and 64-bit fixedpoint native data types

1-4 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 35

Introduction

CLOCK

REFERENCE

SDRAM

MEMORY

(OPTIONAL)

CLK ADDR

RAS CAS

DATA

DQM

CKE

A10

LINK

DEVICES

(4 MAX)

(OPTIONAL)

ADSP-TS101S

LCLK_P SCLK_P

S/LCLK_N V

REF

LCLKRAT2–0 SCLKFREQ

IRQ3–0

FLAG3–0 ID2–0

MSSD RAS CAS

LDQM HDQM

SDWE

SDCKE SDA10

FLYBY IOEN

LXDAT7–0 LXCLKIN LXCLKOUT LXDIR

TMR0E BM BUSLOCK

CONTROLIMP2–0 DS2–0

ADDR31–0

DATA63–0

WRH/WRL

DMAR3–0

RESET JTAG

BMS

BRST

ACK

MS1–0

MSH HBR HBG

BR7–0

CPA DPA

BOFF

BOOT

EPROM

(OPTIONAL)

ADDR DATA

MEMORY

(OPTIONAL)

ADDR

DATA

OE WE

ACK

HOST

PROCESSOR

INTERFACE (OPTIONAL)

ADDR

DATA

DMA DEVICE

(OPTIONAL)

DATA

A T

R T N

O C

D D D A

Figure 1-3. Single Processor Configuration

• Large address space

• Immediate address modify fields

• Easily supported relocatable code and data

• Fast save and restore of processor registers onto internal memory stacks

ADSP-TS101 TigerSHARC Processor Programming Reference 1-5

Page 36

DSP Architecture

LINKS

SDRAM

MEMORY

TigerSHARC

MSSD

MS0

TigerSHARC

TigerSHARCTigerSHARC

LINKS

Figure 1-4. Multiprocessing Cluster Configuration

DSP Architecture

DEV

HOST IF

MSH

MSI

BRIDGE

DEV

As shown in Figure 1-1 on page 1-2 and Figure 1-2 on page 1-3, the DSP architecture consists of two divisions: the DSP core (where instructions execute) and the I/O peripherals (where data is stored and off-chip I/O is processed). The following discussion provides a high-level description of the DSP core and peripherals architecture. More detail on the core appears in other sections of this reference. For more information on I/O peripherals, see the ADSP-TS101 TigerSHARC Processor Hardware Reference.

1-6 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 37

Introduction

High performance is facilitated by the ability to execute up to four 32-bit wide instructions per cycle. The TigerSHARC processor uses a variation of a Static Superscalar™ architecture to allow the programmer to specify which instructions are executed in parallel in each cycle. The instructions do not have to be aligned in memory so that program memory is not wasted.

The 6M bit internal memory is divided into three 128-bit wide memory blocks. Each of the three internal address/data bus pairs connect to one of the three memory blocks. The three memory blocks can be used for triple accesses every cycle where each memory block can access up to four, 32-bit words in a cycle.

The external port cluster bus is 64 bits wide. The high I/O bandwidth complements the high processing speeds of the core. To facilitate the high clock rate, the TigerSHARC processor uses a pipelined external bus with programmable pipeline depth for interprocessor communications and for Synchronous SRAM and DRAM (SSRAM and SDRAM).

The four link ports support point-to-point high bandwidth data transfers. Link ports have hardware supported two-way communication.

The processor operates with a two cycle arithmetic pipeline. The branch pipeline is two to six cycles. A branch target buffer (BTB) is implemented to reduce branch delay. The two identical computation units support floating-point as well as fixed-point arithmetic.

During compute intensive operations, one or both integer ALUs compute or generate addresses for fetching up to two quad operands from two memory blocks, while the program sequencer simultaneously fetches the next quad instruction from the third memory block. In parallel, the computation units can operate on previously fetched operands while the sequencer prepares for a branch.

While the core processor is doing the above, the DMA channels can be replenishing the internal memories in the background with quad data from either the external port or the link ports.

ADSP-TS101 TigerSHARC Processor Programming Reference 1-7

Page 38

DSP Architecture

The processing core of the TigerSHARC processor reaches exceptionally high DSP performance through using these features:

• Computation pipeline

• Dual computation units

• Execution of up to four instructions per cycle

• Access of up to eight words per cycle from memory

The two computation units (compute blocks) perform up to 6 floatingpoint or 24 fixed-point operations per cycle.

Each multiplier and ALU unit can execute four 16-bit fixed-point operations per cycle, using Single-Instruction, Multiple-Data (SIMD) operation. This operation boosts performance of critical imaging and signal processing applications that use fixed-point data.

Compute Blocks

The TigerSHARC processor core contains two computation units called compute blocks. Each compute block contains a register file and three independent computation units—an ALU, a multiplier, and a shifter. For meeting a wide variety of processing needs, the computation units process data in several fixed- and floating-point formats listed here and shown in

Figure 1-5:

• Fixed-point format

These include 64-bit long word, 32-bit normal word, 16-bit short word, and 8-bit byte word. For short word fixed-point arithmetic,

1-8 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 39

Introduction

quad parallel operations on quad-aligned data allow fast processing of array data. Byte operations are also supported for octal-aligned data.

• Floating-point format These include 32-bit normal word and 40-bit extended word. Floating-point operations are single or extended precision. The normal word floating-point format is the standard IEEE format, and the 40-bit extended-precision format occupies a double word (64 bits) with eight additional LSBs of mantissa for greater accuracy.

Each compute block has a general-purpose, multi-port, 32-word data register file for transferring data between the computation units and the data buses and storing intermediate results. All of these registers can be accessed as single-, dual-, or quad-aligned registers. For more information on the register file, see “Compute Block Registers” on page 2-1.

Arithmetic Logic Unit (ALU)

The ALU performs arithmetic operations on fixed-point and floatingpoint data and logical operations on fixed-point data. The source and destination of most ALU operations is the compute block register file.

On the ADSP-TS101 processor, the ALU includes a special sub-block, which is referred to as the communications logic unit (CLU). The CLU instructions are designed to support different algorithms used for communications applications. The algorithms that are supported by the CLU instructions are:

• Viterbi Decoding

• Turbo-code Decoding

• Despreading for code-division multiple access (CDMA) systems

ADSP-TS101 TigerSHARC Processor Programming Reference 1-9

Page 40

DSP Architecture

Data Bus

(128-bit)

Data

Data Types

Long Word

(64-bit)

Extended Word

(40-bit)

Normal Word

(32-bit)

Short Word

(16-bit)

Byte Word

(8-bit)

(32-bit)

Figure 1-5. Word Format Definitions

1 The TigerSHARC processor internal data buses are 128 bits (one quad word) wide. In a quad word,

the DSP can move 16 byte words, 8 short words, 4 normal words, or 2 long words over the bus at the same time.

Data

(32-bit)

Data

(32-bit)

64-bit

Dual Register

bit

Dual Register

Data

(32-bit)

31 031 0 31 0 31 0

32-bit

Single Register

16-

bit

Single Register

bit8-bit8-bit8-bit

Single Register

16-

bit

For more information on the ALU (and CLU features), see “ALU” on

page 3-1.

1-10 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 41

Introduction

Multiply Accumulator (Multiplier)

The multiplier performs fixed-point or floating-point multiplication and fixed-point multiply/accumulate operations. The multiplier supports several data types in fixed- and floating-point. The floating-point formats are float and float-extended, as in the ALU. The source and destination of most operations is the compute block register file.

The TigerSHARC processor’s multiplier supports complex multiply-accumulate operations. Complex numbers are represented by a pair of 16-bit short words within a 32-bit word. The least significant bits (LSBs) of the input operand represents the real part, and the most significant bits (MSBs) of the input operand represent the imaginary part.

For more information on the multiplier, see “Multiplier” on page 4-1.

Bit Wise Barrel Shifter (Shifter)

The shifter performs logical and arithmetic shifts, bit manipulation, field deposit, and field extraction. The shifter operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands. Shifter operations include:

• Shifts and rotates from off-scale left to off-scale right

• Bit manipulation operations, including bit set, clear, toggle and test

• Bit field manipulation operations, including field extract and deposit, using register

BFOTMP (which is internal to the shifter)

• Bit FIFO operations to support bit streams with fields of varying length

• Support for ADSP-2100 family compatible fixed-point/floatingpoint conversion operations (such as exponent extract, number of leading 1s or 0s)

ADSP-TS101 TigerSHARC Processor Programming Reference 1-11

Page 42

DSP Architecture

For more information on the shifter, see “Shifter” on page 5-1.

Integer Arithmetic Logic Unit (IALU)

The IALUs can execute standard standalone ALU operations on IALU register files. The IALUs also provide memory addresses when data is transferred between memory and registers. The DSP has dual IALUs (the J-IALU and the K-IALU) that enable simultaneous addresses for multiple operand reads or writes. The IALUs allow computational operations to execute with maximum efficiency because the computation units can be devoted exclusively to processing data.

Each IALU has a multiport, 32-word register file. Operations in the IALU are not pipelined. The IALUs support pre-modify with no update and post-modify with update address generation. Circular data buffers are implemented in hardware. The IALUs support the following types of instructions:

• Regular IALU instructions

• Move Data instructions

• Load Data instructions

• Load/Store instructions with register update

• Load/Store instructions with immediate update

For indirect addressing (instructions with update), one of the registers in the register file can be modified by another register in the file or by an immediate 8- or 32-bit value, either before (pre-modify) or after (postmodify) the access. For circular buffer addressing, a length value can be associated with the first four registers to perform automatic modulo addressing for circular data buffers; the circular buffers can be located at arbitrary boundaries in memory. Circular buffers allow efficient implementation of delay lines and other data structures, which are commonly

1-12 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 43

Introduction

used in digital filters and Fourier transformations. The TigerSHARC processor circular buffers automatically handle address pointer wraparounds, reducing overhead and simplifying implementation.

The IALUs also support bit reverse addressing, which is useful for the FFT algorithm. Bit reverse addressing is implemented using a reverse carry addition that is similar to regular additions, but the carry is taken from the upper bits and is driven into lower bits.

The IALU provides flexibility in moving data as single-, dual-, or quadwords. Every instruction can execute with a throughput of one per cycle. IALU instructions execute with a single cycle of latency while computation units have two cycles of latency. Normally, there are no dependency delays between IALU instructions, but if there are, three or four cycles of latency can occur.

For more information on the IALUs, see “IALU” on page 6-1.

Program Sequencer

The program sequencer supplies instruction addresses to memory and, together with the IALUs, allows computational operations to execute with maximum efficiency. The sequencer supports efficient branching using the branch target buffer (BTB), which reduces branch delays for conditional and unconditional instructions. The sequencer and IALU’s control flow instructions divide into two types:

• Control flow instructions. These instructions are used to direct pro- gram execution by means of jumps and to execute individual instructions conditionally.

• Immediate extension instructions. These instructions are used to extend the numeric fields used in immediate operands for the sequencer and the IALU.

ADSP-TS101 TigerSHARC Processor Programming Reference 1-13

Page 44

DSP Architecture

Control flow instructions divide into two types:

• Direct jumps and calls based on an immediate address operand specified in the instruction encoding. For example: ‘

jump 100;

true.

• Indirect jumps based on an address supplied by a register. The instructions used for specifying conditional execution of a line are a subcategory of indirect jumps. For example: ‘if <cond> cjmp;’ is a jump to the address pointed to by the CJMP register.

’ always jumps to address 100, if the <cond> evaluates as

if <cond>

The TigerSHARC processor achieves its fast execution rate by means of an eight-cycle pipeline.

Two stages of the sequencer’s pipeline actually execute in the computation units. The computation units perform single-cycle operations with a twocycle computation pipeline, meaning that results are available for use two cycles after the operation is begun. Hardware causes a stall if a result is not available in a given cycle (register dependency check). Up to two computation instructions per compute block can be issued in each cycle, instructing the ALU, multiplier or shifter to perform independent, simultaneous operations.

The TigerSHARC processor has four general-purpose external interrupts,

IRQ3-0. The processor also has internally generated interrupts for the two

timers, DMA channels, link ports, arithmetic exceptions, multiprocessor vector interrupts, and user-defined software interrupts. Interrupts can be nested through instruction commands. Interrupts have a short latency and do not abort currently executing instructions. Interrupts vector directly to a user-supplied address in the interrupt table register file, removing the overhead of a second branch.

The control flow instruction must use the first instruction slot in the instruction line.

1-14 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 45

Introduction

The branch penalty in a deeply pipelined processor such as the TigerSHARC processor can be compensated for by the use of a branch target buffer (BTB) and branch prediction. The branch target address is stored in the BTB. When the address of a jump instruction, which is predicted by the user to be taken in most cases, is recognized (the tag address), the corresponding jump address is read from the BTB and is used as the jump address on the next cycle. Thus the latency of a jump is reduced from three to six wasted cycles to zero wasted cycles. If this address is not stored in the BTB, the instruction must be fetched from memory.

Other instructions also use the BTB to speed up these types of branches. These instructions are interrupt return, call return, and computed jump instructions.

Immediate extensions are associated with IALU or sequencer (control flow) instructions. These instructions are not specified by the programmer, but are implied by the size of the immediate data used in the instructions. The programmer must place the instruction that requires an immediate extension in the first instruction slot and leave an empty instruction slot in the line (use only three slots), so the assembler can place the immediate extension in the second instruction slot of the instruction line.

For more information on the sequencer, BTB, and immediate extensions, see “Program Sequencer” on page 7-1.

Quad Instruction Execution

The TigerSHARC processor can execute up to four instructions per cycle from a single memory block, due to the 128-bit wide access per cycle. The ability to execute several instructions in a single cycle derives from a Static Superscalar architectural concept. This is not strictly a superscalar architecture because the instructions executed in each cycle are specified in the

ADSP-TS101 TigerSHARC Processor Programming Reference 1-15

Note that only one immediate extension may be in a single instruction line.

Page 46

DSP Architecture

instruction by the programmer or by the compiler, and not by the chip hardware. There is also no instruction reordering. Register dependencies are, however, examined by the hardware and stalls are generated where appropriate. Code is fully compacted in memory and there are no alignment restrictions for instruction lines.

Relative Addresses for Relocation

Most instructions in the TigerSHARC processor support PC relative branches to allow code to be relocated easily. Also, most data references are register relative, which means they allow programs to access data blocks relative to a base register.

Nested Call and Interrupt

Nested call and interrupt return addresses (along with other registers as needed) are saved by specific instructions onto the on-chip memory stack, allowing more generality when used with high-level languages. Nonnested calls and interrupts do not need to save the return address in internal memory, making these more efficient for short, non-nested routines.

Context Switching

The TigerSHARC processor provides the ability to save and restore up to eight registers per cycle onto a stack in two internal memory blocks when using load/store instructions. This fast save/restore capability permits efficient interrupts and fast context switching. It also allows the TigerSHARC processor to dispense with on-chip PC stack or alternate registers for register files or status registers.

Internal Memory and Other Internal Peripherals

The on-chip memory consists of three blocks of 2M bits each. Each block is 128 bits (four words) wide, thus providing high bandwidth sufficient to support both computation units, the instruction stream and external I/O,

1-16 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 47

Introduction

even in very intensive operations. The TigerSHARC processor provides access to program and two data operands without memory or bus constraints. The memory blocks can store instructions and data interchangeably.

Each memory block is organized as 64K words of 32 bits each. The accesses are pipelined to meet one clock cycle access time needed by the core, DMA, or by the external bus. Each access can be up to four words. Memories (and their associated buses) are a resource that must be shared between the compute blocks, the IALUs, the sequencer, the external port, and the link ports. In general, if during a particular cycle more than one unit in the processor attempts to access the same memory, one of the competing units is granted access, while the other is held off for further arbitration until the following cycle—see “Bus Arbitration Protocol” in the ADSP-TS101 TigerSHARC Processor Hardware Reference. This type of conflict only has a small impact on performance due to the very high bandwidth afforded by the internal buses.

An important benefit of large on-chip memory is that by managing the movement of data on and off chip with DMA, a system designer can realize high levels of determinism in execution time. Predictable and deterministic execution time is a central requirement in DSP and realtime systems.

Internal Buses

The processor core has three buses, each one connected to one of the internal memories. These buses are 128 bits wide to allow up to four instructions, or four aligned data words, to be transferred in each cycle on each bus. On-chip system elements also use these buses to access memory. Only one access to each memory block is allowed in each cycle, so DMA or external port transfers must compete with core accesses on the same block. Because of the large bandwidth available from each memory block, not all the memory bandwidth can be used by the core units, which leaves

ADSP-TS101 TigerSHARC Processor Programming Reference 1-17

Page 48

DSP Architecture

some memory bandwidth available for use by the DSP’s DMA processes or by the bus interface to serve other DSPs bus master transfers to the DSP’s memory.

Internal Transfer

Most registers of the TigerSHARC processor are classified as universal registers (Uregs). Instructions are provided for transferring data between any two Uregs, between a Ureg and memory, or for the immediate load of a Ureg. This includes control registers and status registers, as well as the data registers in the register files. These transfers occur with the same timing as internal memory load/store.

Data Accesses

Each move instruction specifies the number of words accessed from each memory block. Two memory blocks can be accessed on each cycle because of the two IALUs. For a discussion of data and register widths and the syntax that specifies these accesses, see “Register File Registers” on

page 2-5.

Quad Data Access

Instructions specify whether one, two or four words are to be loaded or stored. Quad words1 can be aligned on a quad-word boundary and long words aligned on a long-word boundary. This, however, is not necessary when loading data to computation units because a data alignment buffer (DAB) automatically aligns quad words that are not aligned in memory.

A memory quad word is comprised of four 32-bit words or 128 bits of data.

1-18 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 49

Introduction

Up to four data words from each memory block can be supplied to each computation unit, meaning that new data is not required on every cycle and leaving alternate cycles for I/O to the memories. This is beneficial in applications with high I/O requirements since it allows the I/O to occur without degrading core processor performance.

Booting

The internal memory of the TigerSHARC processor can be loaded from an 8-bit EPROM using a boot mechanism at system powerup. The DSP can also be booted using another master or through one of the link ports. Selection of the boot source is controlled by external pins. For information on booting the DSP, see the ADSP-TS101 TigerSHARC Processor Hardware Reference.

Scalability and Multiprocessing

The TigerSHARC processor, like the related Analog Devices product the SHARC DSP, is designed for multiprocessing applications. The primary multiprocessing architecture supported is a cluster of up to eight TigerSHARC processors that share a common bus, a global memory, and an interface to either a host processor or to other clusters. In large multiprocessing systems, this cluster can be considered an element and connected in configurations such as torroid, mesh, tree, crossbar, or others. The user can provide a personal interconnect method or use the on-chip communication ports.

The TigerSHARC processor improves on most of the multiprocessing capabilities of the SHARC DSP and enhances the data transfer bandwidth. These capabilities include:

• On-chip bus arbitration for glueless multiprocessing

• Globally accessible internal memory and registers

ADSP-TS101 TigerSHARC Processor Programming Reference 1-19

Page 50

Instruction Line Syntax and Structure

• Semaphore support

• Powerful, in-circuit multiprocessing emulation

Emulation and Test Support

The TigerSHARC processor supports the IEEE standard P1149.1 Joint Test Action Group (JTAG) standard for system test. This standard defines a method for serially scanning the I/O status of each component in a system. The JTAG serial port is also used by the TigerSHARC processor EZ-ICE® to gain access to the processor’s on-chip emulation features.

Instruction Line Syntax and Structure

TigerSHARC processor is a static superscalar DSP processor that executes from one to four 32-bit instruction slots in an instruction line. With few exceptions, an instruction line executes with a throughput of one cycle in an eight-deep pipeline. Figure 1-6 shows the instruction slot and line structure.

There are some important things to note about the instruction slot and instruction line structure and how this structure relates to instruction execution.

• Each instruction line consists of up to four 32-bit instruction slots.

• Instruction slots are delimited with one semicolon “;”.

• Instruction lines are terminated with two semicolons “;;”.

• The up to four instructions on an instruction line are executed in parallel.

• Every instruction slot consists of a 32-bit opcode.

1-20 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 51

Introduction

An instruction LINE consists of up to four instruction SLOTS.

Slot_1_Instruction ; Slot_2_instruction ; Slot_3_instruction; Slot_4_instruction ;;

Each instruction SLOT is delimited with one semicolon.

The instruction LINE is terminated with two semicolons.

The first two instruction SLOTS are special:

1. (if used) Conditional (if-do, if-else) or a sequencer (jump or other) instructions must use SLOT 1.

2. (if used) Immediate extension instructions must use SLOT 2.

Figure 1-6. Instruction Line and Slot Structure

• Some instructions (such as immediate extensions) require two 32bit opcodes (instruction slots) to execute.

• Some instructions (program sequencer, conditional, and immediate extension) require specific instruction slots.

An instruction is a 32-bit word that activates one or more of the TigerSHARC processor’s execution units to carry out an operation. The DSP executes or stalls the instructions in the same instruction line together. Although the DSP fetches quad words from memory, instruction lines do not have to be aligned to quad-word boundaries. Regardless of size (one to four instructions), instruction lines follow one after the other in memory

ADSP-TS101 TigerSHARC Processor Programming Reference 1-21

Page 52

Instruction Line Syntax and Structure

with a new instruction line beginning one word from where the previous instruction line ended. The end of an instruction line is identified by the most significant bit (MSB) in the instruction word.

Instruction Notation Conventions

The TigerSHARC processor assembly language is based on an algebraic syntax for ease of coding and readability. The syntax for TigerSHARC processor instructions selects the operation that the DSP executes and the mode in which the DSP executes the operation. Operations include computations, data movements, and program flow controls. Modes include Single-Instruction, Single-Data (SISD) versus Single-Instruction, Multiple-Data (SIMD) selection, data format selection, word size selection, enabling saturation, and enabling truncation. All controls on instruction execution are included in the DSP’s instruction syntax—there are no mode bits to set in control registers for this DSP.

This book presents instructions in summary format. This format presents all the selectable items and optional items available for an instruction. The conventions for these are:

this|that|other Lists of items delimited with a vertical bar “|” indi-

cate that syntax permits selection of one of the items. One item from the list must be selected. The vertical bar is not part of instruction syntax.

{option} An item or a list of items enclosed within curley

braces “{}” indicate an optional item. The item may be included or omitted. The curley braces are not part of instruction syntax.

() [] , ; ;; Parenthesis, square bracket, comma, semicolon,

double semicolon, and other symbols are required items in the instruction syntax and must appear

1-22 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 53

Introduction

where shown in summary syntax with one exception. Empty parenthesis (no options selected) may not appear in an instruction.

Rm Rmd Rmq Register names are replaceable items in the sum-

mary syntax and appear in italics. Register names indicate that the syntax requires a single (Rm), double (Rmd), or quad (Rmq) register. For more information on register name syntax, compute block selection, and data format selection, see “Reg-

ister File Registers” on page 2-5.

<imm#> Immediate data (literal values) in the summary syn-

tax appears as <imm#> with # indicating the bit width of the value.

For example, the following instruction in summary format:

{X|Y|XY}{S|B}Rs = MIN|MAX (Rm, Rn) {({U}{Z})} ;

could be coded as any of the following instructions:

XR3 = MIN (R2, R1) ; YBR2 = MAX (R1, R0) (UZ); XYSR2 = MAX (R3, R4) (U);

Unconditional Execution Support

The DSP supports unconditional execution of up to four instructions in parallel. This support lets programmers use simultaneous computations with data transfers and branching or looping. These operations can be combined with few restrictions. The following example code shows three instruction lines containing 2, 4, and 1 instruction slots each, respectively:

XR3:0=Q[J5+=J9]; YR1:0=R3:2+R1:0;; XR3:0=Q[J5+=J9]; YR3:0=Q[K5+=K9]; XYR7:6=R3:2+R1:0; XYR8=R4*R5;; J5=J9-J10;;

ADSP-TS101 TigerSHARC Processor Programming Reference 1-23

Page 54

Instruction Parallelism Rules

It is important to note that the above instructions execute unconditionally. Their execution does not depend on computation-based conditions. For a description of condition dependent (conditional) execution, see

“Conditional Execution Support” on page 1-24.

Conditional Execution Support

All instructions can be executed conditionally (a mechanism also known as predicated execution). The condition field exists in one instruction slot in an instruction line, and all the remaining instructions in that line either execute or not, depending on the outcome of the condition.

In a conditional computational instruction, the execution of the entire instruction line can depend on the specified condition at the beginning of the instruction line. Conditional instructions take one of the following forms:

IF Condition;

DO, Instruction; DO, Instruction; DO, Instruction ;;

IF Condition, Sequencer_Instruction;

ELSE, Instruction; ELSE, Instruction; ELSE, Instruction ;;

This syntax permits up to three instructions to be controlled by a condition. For more information, see “Conditional Execution” on page 7-12.

Instruction Parallelism Rules

The TigerSHARC processor executes from one to four 32-bit instructions per line. The compiler or programmer determines which instructions may execute in parallel in the same line prior to runtime (leading to the name Static Superscalar). The DSP architecture places several constraints on the application of different instructions and various instruction combinations.

1-24 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 55

Introduction

Note that all the restrictions refer to combinations of instructions within the same line. There is no restriction of combinations between lines. There are, however, cases in which certain combinations between lines may cause stall cycles (see “Conditional Branch Effects on Pipeline” on

page 7-44), mostly because of data conflicts (operand of an instruction in

line n+1 is the result of instruction in line #n, which is not ready when fetched).

Table 1-1 on page 1-29 and Table 1-2 on page 1-34 identify instruction

parallelism rules for the TigerSHARC processor. The following sections provide more details on each type of constraint and accompany the details with examples:

• “General Restriction” on page 1-36

• “IALU Instruction Restrictions” on page 1-39

• “Compute Block Instruction Restrictions” on page 1-37

• “Sequencer Instruction Restrictions” on page 1-45

The instruction parallelism rules in Table 1-1 and Table 1-2 present the resource usage constraints for instructions that occupy instruction slots in the same instruction line. The horizontal axis lists resources—portions of the DSP architecture that are active during an instruction—and lists the number of resources that are available. The vertical axis lists instruction types—descriptive names for classes of instructions. For resources, a ‘1’ indicates that a particular instruction uses one unit of the resource, and a ‘2’ indicates that the instruction uses two units of the resource. Typical instructions of most classes are listed with the descriptive name for the instruction type.

It is important to note that Table 1-1 and Table 1-2 identify static restrictions for the TigerSHARC processor. Static restrictions are distinguished from dynamic restrictions, in that static restrictions can be resolved by the

ADSP-TS101 TigerSHARC Processor Programming Reference 1-25

Page 56

Instruction Parallelism Rules

assembler. For example, the assembler flags the instruction

XR3:0 = Q[J0 += 3];; because the modifier is not a multiple of 4—this is

a static violation.

Dynamic restrictions cannot be resolved by the assembler because these restrictions represent runtime conditions, such as stray pointers. When the processor encounters a dynamic (runtime) violation, an exception is issued when the violation runs through the core. Whatever the case, the processor does not arrive at a deadlock situation, although unpredictable results may be written into registers or memory.

As a dynamic restriction example, examine the instruction

xr3:0 = Q[J0 += 4];;. Although this instruction looks correct to the

assembler, it may violate hardware restrictions if J0 is not quad aligned. Because the assembler cannot predict what the code will do to J0 up to the point of this instruction, this violation is dynamic, since it occurs at runtime.

Further, Table 1-1 and Table 1-2 cover restrictions that arise from the interaction of instructions that share a line, but mostly omits restrictions of single instructions. An example of the former occurs when two instructions attempt to use the same unit in the same line. An example of an individual instruction restriction is an attempt to use a register that is not valid for the instruction. For example, the instruction XR0 = CB[J5+=1];; is illegal because circular buffer accesses can only use IALU registers J0 through J3.

For most instruction types, you can locate the instruction in Table 1-1 or

Table 1-2 and read across to find out the resources it uses. Resource usage

for data movement instructions is more complicated to analyze. Resource usage for these instructions is calculated by adding together base resources, where base resources are determined by the type of move instruction. Move instructions are Ureg transfer (register to register), immediate load (immediate values to register), memory load (memory to register), and

1-26 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 57

Introduction

memory store (register to memory). Source resources are determined by the resource register and are only applicable when the source itself is a register (Ureg transfer and stores). Destination resources may be of two types:

• Address pointer in post-modify (for example,

XR0 = [J0 += 2];;)

• Destination register—only applicable when the destination is a register (Ureg transfer, memory loads and immediate loads)

If a particular combination of base, source, and destination uses more resources than are available, that combination is illegal. Consider, for example, the following instruction:

XR3:0 = Q[K31+0x40000];;

This is a memory load instruction, or specifically, a K-IALU load using a 32-bit offset. Reading across the table, the base resources used by the instruction are two slots in the line—the K-IALU instruction and the second instruction slot (for the immediate extension). The destination is

XR3:0, which are X-compute block registers. The ‘X-Register File,

Dreg = XR31–0’ line under ‘Ureg transfer and Store (Source Register) Resources’ in the table indicates that the instruction also uses an X-compute block port and an X-compute block input port.

The following Ureg transfer instruction provides another example:

XYR0=CJMP;;

This example uses the following resources:

• One instruction slot

• Base resources—an IALU instruction (no matter whether J-IALU or K-IALU) and the Ureg transfer resource (base resources) for the IALU instruction

ADSP-TS101 TigerSHARC Processor Programming Reference 1-27

Page 58

Instruction Parallelism Rules

• Source resources—the sequencer I/O port

• Destination resources—an X-compute block port, an X-compute block input port, a Y-compute block port, and a Y-compute block input port

By comparison, the instruction

R3:0 = j7:4;; uses an instruction slot, an

IALU slot (no matter whether J or K), the Ureg transfer slot, and the JIALU input port and output port.

1-28 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 59

Introduction

Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions

Resources:

Inst. slots used

First inst. slot1Second inst. slot2IALU inst.

⇒ Resources Available: ⇒ ⇓ Instruction Types: ⇓

IALU Arithmetic

J-IALU

Js = Jm Op Jn|Imm8

J-IALU, 32-bit immediate

Js = Jm Op Imm32

K-IALU

Ks = Km Op Kn|Imm8

K-IALU, 32-bit immediate

Ks = Km Op Imm32

Data Move (resource total = instr. + Uregs)

Ureg Transfer

Ureg = Ureg

Immediate Load (resource total = instr. + Ureg)

Immediate 16-bit Load

Ureg = Imm16

Immediate 32-bit Load

Ureg = Imm32

4112111112 2112 21111 3 3

111

2111

111

211 1

111

2 111

Imm. load or Ureg xfer

J-IALU

K-IALU

J-IALU-port I/O

K-IALU-port I/O

X-ports I/O3X-ports input

X-ports output

X-DAB

Y-ports I/O3Y-ports input

Y-ports output

Y-D AB

Seq.-port I/O

Ext. Port I/O

IOP-port I/O

Link Port I/O

ADSP-TS101 TigerSHARC Processor Programming Reference 1-29

Page 60

Instruction Parallelism Rules

Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)

Resources:

Inst. slots used

First inst. slot1Second inst. slot2IALU inst.

⇒ Resources Available: ⇒

4112111112 2112 21111 3 3

⇓ Instruction Types: ⇓

Memory Load (resource total = instr. + Ureg)

J-IALU Load

Ureg = [Jm +|+= Jn|imm8]

J-IALU Load, 32-bit offset

Ureg = [Jm +|+= imm32]

K-IALU Load

Ureg = [Km +|+= Kn|imm8]

K-IALU Load, 32-bit offset

Ureg = [Km +|+= imm32]

1111

2111

111

211 1

Memory Store (resource total = instr. + Ureg)

J-IALU Store

[Jm +|+= Jn|imm8] = Ureg

J-IALU Store, 32-bit offset

[Jm +|+= imm32] = Ureg

K-IALU Store

[Km +|+= Kn|imm8] = Ureg

K-IALU Store, 32-bit offset

[Km +|+= imm32] = Ureg

111

2111

111

211 1

Imm. load or Ureg xfer

J-IALU

K-IALU

J-IALU-port I/O

K-IALU-port I/O

X-ports I/O3X-ports input

X-ports output

X-DAB

Y-ports I/O3Y-ports input

Y-ports output

Y-D AB

Seq.-port I/O

Ext. Port I/O

IOP-port I/O

Link Port I/O

1-30 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 61

Introduction

Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)

Resources:

Inst. slots used

First inst. slot1Second inst. slot2IALU inst.

⇒ Resources Available: ⇒

4112111112 2112 21111 3 3

⇓ Instruction Types: ⇓

Ureg transfer and Store (Source Register) Resources

J-IALU

Ureg = J30–0|JB3–0|JL3–0

K-IALU

Ureg = K30–0|KB3–0|KL3–0

X-Register File

Dreg = XR31–0

Y-Register Fi l e

Dreg = XR31–0

XY-Register Files (SIMD)

Ureg = XYR31–0

Sequencer

Ureg = CJMP|RETI|RETS|…

External Port Control/Status

Ureg = SYSCON|BUSLK|…

I/O Processor (DMA)

Ureg = DCS0|DCD0|…

Link Port Control/Status/Buf.

Ureg = LCTL0|LCTL1|…

Imm. load or Ureg xfer

J-IALU

K-IALU

J-IALU-port I/O

K-IALU-port I/O

X-ports I/O3X-ports input

X-ports output

X-DAB

Y-ports I/O3Y-ports input

Y-ports output

Y-D AB

Seq.-port I/O

1111

Ext. Port I/O

IOP-port I/O

Link Port I/O

ADSP-TS101 TigerSHARC Processor Programming Reference 1-31

Page 62

Instruction Parallelism Rules

Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)

Resources:

Inst. slots used

First inst. slot1Second inst. slot2IALU inst.

⇒ Resources Available: ⇒

4112111112 2112 21111 3 3

⇓ Instruction Types: ⇓

Ureg Transfer and Load (Destination Register) Resources

J-IALU

Ureg = J30–0|JB3–0|JL3–0

K-IALU

Ureg = K30–0|KB3–0|KL3–0

X-Register File

Dreg = XR31–0

Y-Register Fi l e

Dreg = XR31–0

XY-Register Files (SIMD)

Ureg = XYR31–0

Sequencer

Ureg = CJMP|RETI|RETS|…

External Port Control/Status

Ureg = SYSCON|BUSLK|…

I/O Processor (DMA)

Ureg = DCS0|DCD0|…

Link Port Control/Status/Buf.

Ureg = LCTL0|LCTL1|…

Imm. load or Ureg xfer

J-IALU

K-IALU

J-IALU-port I/O

K-IALU-port I/O

X-ports I/O3X-ports input

X-ports output

X-DAB

Y-ports I/O3Y-ports input

Y-ports output

Y-D AB

Seq.-port I/O

11 11

Ext. Port I/O

IOP-port I/O

Link Port I/O

1-32 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 63

Introduction

Table 1-1. Parallelism Rules for Register File, DAB, J/K-IALU, and Port Access Instructions (Cont’d)

Resources:

Inst. slots used

First inst. slot1Second inst. slot2IALU inst.

⇒ Resources Available: ⇒ ⇓ Instruction Types: ⇓

Memory Load Ureg (Destination Register) Resources

X-Register File DAB/SDAB

XDreg = DAB q[addr] XDreg = XR31–0

Y-Register Fi l e DAB/SDAB

YDreg = DAB q[addr] YDreg = YR31–0

XY-Register Files DAB/SDAB

XYDreg = DAB q[addr] XYDreg = XYR31–0

4112111112 2112 21111 3 3

Imm. load or Ureg xfer

J-IALU

K-IALU

J-IALU-port I/O

K-IALU-port I/O

X-ports I/O3X-ports input

X-ports output

X-DAB

Y-ports I/O3Y-ports input

Y-ports output

Y-D AB

Seq.-port I/O

11 1

11 111 1

Ext. Port I/O

IOP-port I/O

Link Port I/O

1 If a conditional instruction is present on the instruction line, it must use the first instruction slot. 2 If an immediate extension is present on the instruction line, it must use the second instruction slot. 3 These resources are listed for informational purposes only. These constraints can not be exceeded

within the core.

4 Complete list is all registers in register groups 0x1A, 0x38, and 0x39: CJMP, RETI, RETIB, RETS,

DBGE, ILATSTL, ILATSTH, LC0, LC1, ILATL, ILATH, IMASKL, IMASKH, PMASKL, PMASKH, TIMER0L, TIMER0H, TIMER1L, TIMER1H, TMRIN0L, TMRIN0H, TMRIN1L, TMRIN1H, SQCTL, SQCTLST, SQCTLCL, SQSTAT, SFREG, ILATCLL, and ILATCLH.

5 Complete list is all registers in register groups 0x24 and 0x3A: SYSCON, BUSLK, SDRCON, SYS-

TAT, SYSTATCL, BMAX, BMAXC, AUTODMA0, and AUTODMA1.

6 Complete list is all registers in register groups 0x20 and 0x23: DCS0, DCD0, DCS1, DCD1, DCS2,

DCD2, DCS3, DCD3, DCNT, DCNTST, DCNTCL, CSTAT, and DSTATC.

7 Complete list is all registers in register groups 0x25 and 0x27: LBUFTX0, LBUFRX0, LBUFTX1,

LBUFRX1, LBUFTX2, LBUFRX2, LBUFTX3, LBUFRX3, LCTL0, LCTL1, LCTL2, LCTL3, LSTAT0, LSTAT1, LSTAT2, and LSTAT3.

ADSP-TS101 TigerSHARC Processor Programming Reference 1-33

Page 64

Instruction Parallelism Rules

Table 1-2. Parallelism Rules for Compute Block and Sequencer Instructions

Resources:

Inst. slots used

First inst. slot1Second inst. slot2X-Comp Block Inst.

⇒ Resources Available: ⇒ ⇓ Instruction Types: ⇓

Sequencer Instructions

Conditional Jump/Call, 16-bit offset

IF cond, JUMP|CALL Imm16

Conditional Jump/Call, 32-bit offset

IF cond, JUMP|CALL Imm32

Other Conditionals, Indirect Jumps, Static Flag Ops 1 1

X Compute Block Operations

X-ALU instruction, except quad output

XDreg = Dreg + Dreg

X-Multiplier instruction, except quad output

XDreg = Dreg * Dreg

X-Shifter instruction, except MASK, FDEP, STAT 111

X-ALU instruction with quad output

add_sub, EXPAND, MERGE)

(

X-Multiplier instruction with quad output 1 1 1 1

X-Shifter instructions MASK, FDEP, XSTAT 12

41121112111

211

111

1111

X-ALU

X-Multiplier

X-Shifter

Y-Comp Block Inst.

Y-A LU

Y-Multiplier

Y-Shifter

1-34 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 65

Introduction

Table 1-2. Parallelism Rules for Compute Block and Sequencer Instructions

Resources:

Inst. slots used

First inst. slot1Second inst. slot2X-Comp Block Inst.

⇒ Resources Available: ⇒ ⇓ Instruction Types: ⇓

Y Compute Block Operations

Y-ALU instruction, except quad output

YDreg = Dreg + Dreg

Y-Multiplier instruction, except quad output

YDreg = Dreg * Dreg

Y-Shifter instruction, except MASK, FDEP, STAT 111

Y-ALU instruction with quad output

add_sub, EXPAND, MERGE)

(

Y-Multiplier instruction with quad output 1 1 1 1

Y-Shifter instructions MASK, FDEP, YSTAT 12

X and Y Compute Block Operations (SIMD)

XY-ALU instruction, except quad output

XYDreg = Dreg + Dreg

XY-Multiplier instruction, except quad output

XYDreg = Dreg * Dreg

XY-Shifter instruction, except

XY-ALU instruction with quad output

add_sub, EXPAND, MERGE)

(

XY-Multiplier instruction with quad output 1 1 1 1 1 1 1

XY-Shifter instructions MASK, FDEP, X/YSTAT 12 2

MASK, FDEP, STAT 11111

41121112111

111

1111

11111

1 1111

1111111

X-ALU

X-Multiplier

X-Shifter

Y-Comp Block Inst.

Y-A LU

Y-Multiplier

Y-Shifter

ADSP-TS101 TigerSHARC Processor Programming Reference 1-35

Page 66

Instruction Parallelism Rules

General Restriction

There is a general restriction that applies to all types of instructions: Two instructions may not write to the same register. This restriction is checked

statically by the assembler. For example:

XR0 = R1 + R2 ; XR0 = R5 * R6 ;; /* Invalid; these instructions cannot be on the same instruction line */

XR1 = R2 + R3 , XR1 = R2 - R3 ;; /* Invalid; add-subtract to the same register */

Consequently, a load instruction may not be targeted to a register that is updated in the same line by another instruction. For example:

XR0 = [J17 + 1] ; R0 = R3 * R8 ;; /* Invalid */

A load/store instruction in that uses post-modify and update addressing cannot load the same register that is used as the index Jm/Km (pointer to memory). For example:

J0 = [J0 += 1] ;; /* Invalid; J0 cannot be used as both destination (Js) and index (Jm) in a post-modify (+=) load or store */

No instruction can write to the CJMP register in the same line as a CALL instruction (which also updates the

if ALE, CALL label ; J6 = J0 + J1 (CJMP) ;; /* Invalid */

CJMP register). For example:

There are two types of loop counter updates, where combining them is illegal. For example:

IF LC0E; DO … ; LC0 = [J0 + J1] ;; /* Invalid */

1-36 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 67

Introduction

Compute Block Instruction Restrictions

There are two compute blocks, and instructions can be issued to either or both.

• Instructions in the format XRs = Rm op Rn are issued to the X-compute block

• Instructions in the format YRs = Rm op Rn are issued to the Y-compute block

• Instructions in the format Rs = Rm op Rn or XYRs = Rm op Rn are issued to both the X- and Y-compute blocks

The following conditions apply when issuing instructions to the compute blocks. Note that the assembler statically checks all of these restrictions.

• Up to two instructions can be issued to each compute block (making that a maximum of four compute block instructions in one line). Note, however, that for this rule, the instructions of type

Rs = Rm op Rn count as one instruction for each compute block.

For example:

R0 = R1 + R2 ; R3 = R4 * R5 ;; /* Valid; a total of four instructions */

XR0 = R1 + R2 ; XR3 = R4 * R5 ; XR6 = LSHIFT R1 BY R7 ;; /* Invalid; three instructions to compute block X */

• Only one instruction can be issued to each unit (ALU, multiplier, or shifter) in a cycle. Each of the two instructions must be issued to a different unit (ALU, multiplier or shifter). For example:

XR0 = R1 + R2 ; XR6 = R1 + R2 ;; /* Invalid */

XR0 = R1 + R2 ; YR0 = R1 + R2 ;; /* Valid */

ADSP-TS101 TigerSHARC Processor Programming Reference 1-37

Page 68

Instruction Parallelism Rules

• When one of the shifter instructions listed below is executed, it must be the only instruction in that line for the particular compute block. The instructions are: access to XSTAT/YSTAT registers. For example:

XR0 += MASK R1 BY R2 ; XR6 = R1 + R2 ;; /* Invalid; three operand shifter instruction in same line with an ALU operation; both issued to compute block X */

• Only one unit (ALU or multiplier) can use two result buses. A unit uses two result buses either when the result is quad word or when there are two results (dual ADD and SUB instructions—R0 = R1+R2,

R5 = R1-R2;). Another instruction is allowed in the same line, as

long as it is not a shifter instruction. For example:

R0 = R1 + R2 , R5 = R1 - R2 ; XR6 = R1 * R2 ;; /* Valid */

R0 = R1 + R2 , R5 = R1 - R2 ; XR6 = LSHIFT R1 BY R2 ;; /* Invalid; shifter instruction and two result ALU instruction */

FDEP, MASK, GETBITS, PUTBITS and

R0 = R1 + R2 , R5 = R1 - R2 ; XR3:0 = MR3:0 ;; /* Invalid; two instructions using two buses */

• There can be no other compute block instruction with Shifter load/ store of

• In the multiplier, the option

X/YSTAT.

CR (clear and set round bit) and the

option I (integer – not fractional) may not be used in the same multiply-accumulate instruction.

• The

CR option of multiplier may be used only in these instructions:

MR3:2|MR1:0 +|-= Rm * Rn 32-bit fractional multiply-accumulate MR3:0 +|-= Rmd * Rnd Quad 16-bit fractional multiply-accumulate MR3:2|MR1:0 += Rm ** Rn Complex multiply-accumulate

1-38 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 69

• Communications Logic Unit (CLU) register load instructions have the same restrictions as shifter instructions, with one exception—a CLU register load instruction can be executed in the same instruction line with another compute instruction that has a quad result.

• All CLU instructions, except for load of CLU registers, refer to the same rules as compute ALU instructions.

IALU Instruction Restrictions

There are four types of IALU instructions:

• Memory load/store—for example: R0 = [J0 + 1] ;

• IALU operations—for example: J0 = J1 + J2 ;

• Load data—for example: R1 = 0xABCD ;

Introduction

• Ureg transfer—for example: XR0 = YR0 ;

These restrictions apply when issuing instructions to the IALU. Except for the load data restriction, the assembler flags all of these restrictions.

• Up to one J-IALU and up to one K-IALU instruction can be issued in the same instruction line. For example:

R0 = [J0 += 1] ; R1 = [K0 += 1] ;; /* It’s recommended that J0 and K0 point to different memory blocks to avoid stall */

[J0 += 1] = XR0 ; [K0 += 1] = YR0;; J0 = [J5 + 1] ; XR0 = [K6 + 1] ;; R1 = 0xABCD ; R0 = [J0 += 1] ;; /* One load data instruction (in K-IALU) and one J-IALU operation */

XR0 = YR0 ; XR1 = [J0 += 1] ; YR1 = [K0 += 1] ;;

ADSP-TS101 TigerSHARC Processor Programming Reference 1-39

Page 70

Instruction Parallelism Rules

/* Invalid; three IALU instructions */

XR0 = [J0 + 1] ; YR0 = [J1 + 1] ;; /* Invalid; both use the same IALU (J-IALU) */

XR0 = [J0 + 1] ; J5 = J1 + 1 ;; /* Invalid; both use the same IALU (J-IALU) */

• Two accesses to the same memory address in the same line, when one of them is a store instruction is liable to give unpredictable results.

• Loading from external memory is only allowed to the compute block and IALU register files.

• Reading from a multiprocessing broadcast zone is illegal.

• Move register to register instruction: if one of the registers is compute block merged, the other may not be compute block register. For example:

XYR1:0 = XR11:8 ; /* Invalid */

XR11:8 = XYR1:0 ; /* Invalid */

XYR1:0 = J11:8 ; /* Valid */

J11:8 = XYR1:0 ; /* Valid */

• A line of instructions may contain at the most one of either “load immediate data to register” or “Ureg to Ureg transfer” instructions. For example:

XR0 = YR0 ;; /* Valid */

XR5 = YR5 ; YR8 = [J3 + J6] ;; /* Valid */

1-40 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 71

Introduction

R0 = 0xFFFFFFFF ;; /* Valid; one load immediate data and one immediate extension */

XR0 = YR0 ; J5 = 0xFFFF ;; /* Invalid; one Ureg to Ureg transfer and one load immediate data instruction */

XR0 = YR0 ; J0 = XR1 ;; /* Invalid; two Ureg to Ureg transfers */

R0 = 0xFFFF ; J1 = 0xFF ;; /* Invalid; two load immediate data instructions */

• Access via DAB must be through a quad word load. It can not be via “merged” Ureg groups. For example:

R3:0 = DAB Q[J0 += 4] ;; /* Valid; broadcast */

R1:0 = DAB Q[J0 += 4] ;; /* Invalid; merged */

• DAB and circular buffer access to memory is allowed only with post-modify with update. For example:

XR1:0 = CB L[J2 + 2] ;; /* Invalid */

• Register groups 0x20 to 0x3F can be accessed via Ureg transfer only.

• In a register-to-register move,

XY register may not be used as source

or destination of the transaction, unless it is both source and destination. For example:

R1:0 = R11:10 ;; /* Valid */

J1:0 = R11:10 ;; /* Invalid */

R3:0 = J3:0 ;; /* Invalid */

ADSP-TS101 TigerSHARC Processor Programming Reference 1-41

Page 72

Instruction Parallelism Rules

• There can be up to two load instructions to the same compute block register file or up to one load to and one store from the same compute block register file. (A compute block register file has one input port and one input/output port.) If two store instructions are issued, none of them will be executed.For example:

[J0 + 1] = XR0 ; [K0 + 1] = XR1 ;; /* Invalid; attempts to use two output ports */

R0 = [J0 + 1] ; R1 = [K1 + 1] ;; /* Valid; uses two input ports in compute block X and Y */

R0 = [J0 + 1] ; [K1 + 1] = XR1 ;; /* Valid */

• A Ureg transfer within the same compute register file cannot be used with any other store to that register file. For example:

XR3:0 = R7:4 ; [J17 + 2] = YR4 ;; /* Valid; different register files */

XR3:0 = R7:4 ; XR0 = [J17 + 2] ;; /* Valid; one Ureg trans. and one load to compute block X */

XR3:0 = R7:4 ; [J17 + 2] = XR4 ;; /* Invalid; one Ureg transfer and one store from compute block X */

R3:0 = R31:28 ;; /* Valid—SIMD Ureg transfer */

R3:0 = R31:28 ; [J17 + 2] = YR8;; /* Invalid—SIMD Ureg transfer (in both RFs) and store from compute block Y */

1-42 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 73

Introduction

• Only one DAB load per Compute Block is allowed. For example:

XR3:0 = DAB Q[J0 += 4] ; XR7:4 = DAB Q[K0 += 4] ;; /* Invalid */

XR3:0 = DAB Q[J0 += 4] ; YR7:4 = DAB Q[K0 += 4] ;; /* Valid */

• Only one memory load/store to and from the same single port register files is allowed. The single port register files are:

• J-IALU registers: groups 0xC and 0xE

• K-IALU registers: groups 0xD and 0xF

• Bus Control registers: groups 0x24 and 0x3A

• Sequencer, Interrupt and BTB registers: groups 0x1A, 0x30–

0x39, and 0x3B

• Debug logic registers: groups 0x1B, 0x3D–0x3F

For example:

J0 = [J5 + 1] ; K0 = [K6 + 1] ;; /* Valid */

J0 = [J5 + 1] ; [K6 + 1] = K0 ;; /* Valid */

J0 = [J5 + 1] ; [K6 + 1] = J1 ;; /* Invalid; one load to J-IALU register file and one store from J-IALU register file */

• Access to memory must be aligned to its size. For example, quad word access must be quad-word aligned. The long access must be aligned to an even address. This excludes load to compute block via

ADSP-TS101 TigerSHARC Processor Programming Reference 1-43

Page 74

Instruction Parallelism Rules

DAB. In addition, the immediate address modifier must be a multiple of four in quad accesses and of two in long accesses. For example:

XR3:0 = Q[J0 += 3] ;; /* Invalid */

XR3:0 = Q[J0 += 4] ;; /* Valid */

• A Ureg store instruction and an instruction that updates the same Ureg may not be issued in the same instruction line, because the store instruction may be stalled and by the time it progresses, the contents may have been modified by the update instruction. For example:

XR0 = R1 + R3 ; Q[J7 += 4] = XR3:0 ;; /* Invalid */

IF ALE, CALL label ; [J0 += 1] = CJMP ;; /* Invalid; CJMP is updated by the call instruction */

• For the following J-IALU circular buffer or bit-reversed addressing operations, Jm (the index) only may be J0, J1, J2, or J3:

Js = Jm +|- Jn (CB) Ureg = CB [L] [Q] (Jm +|+= Jn|Imm) CB [L] [Q] (Jm +/+= Jn|Imm) = Ureg Ureg = DAB [L] [Q] (Jm +|+= Jn|Imm) Ureg = BR [L] [Q] (Jm +|+= Jn|imm) BR [L] [ Q] (Jm +|+= Jn|imm) = Ureg Ureg = BR [L] [Q] (Jm +|+= Jn|Imm)

The same restrictions apply to K-IALU instructions that use circu-

lar buffer or bit-reversed addressing operations.

1-44 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 75

Introduction

• On load or store instructions the memory address may not be a register. For example, the address may not be a memory mapped register address in the range of

Q[J2 + 0] = XR3:0 ;; /* Invalid if J2 is in the range of 0x180000 to 0x1FFFFF */

0x180000 to 0x1FFFFF. For example:

• If one IALU is used to access the other IALU register, there may not be an immediate load instruction in the same line. For example:

Q[J2 + 0] = K3:0 ; XR0 = 100 ;; /* Invalid */

Q[K2 + 0] = K3:0 ; XR0 = 100 ;; /* Valid */

Sequencer Instruction Restrictions

There can be one sequencer instruction and one immediate extension per line, where the sequencer instruction can be jump, indirect jump, and other instructions. The assembler statically checks all of these restrictions:

• The sequencer instruction must be the first instruction in the fourslot instruction line.

• The immediate extension must be the second instruction in the four-slot instruction line.

• The immediate extension is counted as one of the four instructions in the line.

ADSP-TS101 TigerSHARC Processor Programming Reference 1-45

Page 76

Instruction Parallelism Rules

• There cannot be two instructions that end in the same quad-word boundary, and where both have branch instructions with a predicted bit set. For example:

IF MLE, JUMP + 100 ;; /* begin address 100 */ IF NALE JUMP -50 ; XR0 = R5 + R6 ; J0 = J2 + J3 ; YR4 = [K3 + 40] ;; /* Valid; first instruction line ends on 1001; second instruction line ends on 1005 */

IF MLE, JUMP + 100 ;; /* begin address 100 */ IF NALE JUMP - 50 ;; /* Invalid; both lines within the same quad word */

• For instruction SCFx += op Cond, there can be no operation between compute block static flags (XSF0/1, YSF0/1, and XYSF0/1) and non-compute block conditions.

1-46 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 77

2 COMPUTE BLOCK REGISTERS

The TigerSHARC processor core contains two compute blocks. Each compute block contains a register file and three independent computation units—an ALU, a multiplier, and a shifter. Because the execution of all computational instructions in the TigerSHARC DSP depends on the input and output data formats and depends on whether the instruction is executed on one computational block or both, it is important to understand how to use the TigerSHARC DSP’s compute block registers. This chapter describes the registers in the compute blocks, shows how the register name syntax controls data format and execution location, and defines the available data formats.

The DSP has two compute blocks—compute block X and compute block Y. Each block contains a register file and three independent computation units. The units are the ALU, multiplier, and shifter.

A general-purpose, multiport, 32-word data register file in each compute block serves for transferring data between the computation units and the data buses and stores intermediate results. Figure 2-1 shows how each of the register files provide the interface between the internal buses and the computational units within the compute blocks.

As shown in Figure 2-1, data input to the register file passes through the data alignment buffer (DAB). The DAB is a two quad-word FIFO that provides aligned data for registers when dual- or quad-register loads receive misaligned data from memory. For more information on using the DAB, see “IALU” on page 6-1.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-1

Page 78

COMPUTE BLOCK X

COMPUTE BLOCK Y

DAB

128 128

FILE

32x32

MULTIPLIER

ALU

SHIFTER

128 128

128

TO DATA BUSES

6464

DAB

128 128

FILE

32x32

MULTIPLIER

ALU

SHIFTER

128 128

128

TO DATA BUSES

6464

Figure 2-1. Data Register Files in Compute Block X and Y

Within the compute block, there are two types of registers—memory-mapped registers and non-memory-mapped registers. The memory mapped registers in each of the compute blocks are the general-purpose data register file registers XR31–0 and YR31–0. Because these registers are memory mapped, they are accessible to external bus devices.

For operations within a single DSP, the distinction between memory-mapped and non-memory-mapped compute block registers is important because the memory-mapped registers are Universal registers (Ureg). The Ureg group of registers is available for many types of operations working with portions of the DSP’s core other than the portion of the core where the Ureg resides. The compute block Ureg registers can be

2-2 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 79

Compute Block Registers

used for additional operations unavailable to other tinguish the compute block register file registers from other Ureg registers, the XR31–0 and YR31–0 registers are also referred to as Data registers (Dreg).

For operations in a multiprocessing DSP system, it is very useful that 90% of the registers in the TigerSHARC processor are memory-mapped registers. The memory-mapped registers have absolute addresses associated with them, meaning that they can be accessed by other processors through multiprocessor space or accessed by any other bus masters in the system.

The compute blocks have a few registers that are non-memory mapped. These registers do not have absolute addresses associated with them. The non-memory-mapped registers are special registers that are dedicated for special instructions in each compute block. The unmapped registers in the compute blocks include:

A DSP can access its own registers by using the multiprocessor memory space, but the DSP would have to tie up the external bus to access its own registers this way.

• Compute block status (XSTAT and YSTAT) registers

• Parallel Result (XPR1–0 and YPR1–0) registers—ALU

Ureg registers. To dis-

• Multiplier Result (XMR3–0 and YMR3–0) registers—Multiplier

• Multiplier Result Overflow (XMR4 and YMR4) registers—Multiplier

• Bit FIFO Overflow Temporary ( Shifter

ADSP-TS101 TigerSHARC Processor Programming Reference 2-3

XBFOTMP and YBFOTMP) registers—

Page 80

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16

00000000

MIS—Multiplier floating-pt. invalid op., sticky MOS—Multiplier fixed-pt. overflow, sticky MVS—Multiplier floating-pt. overflow, sticky MUS—Multiplier floating-pt, underflow, sticky AIS—ALU floating-pt. invalid op., sticky AOS—ALU fixed-pt. overflow, sticky AVS—ALU floating-pt. overflow, sticky AUS—ALU floating-pt. underflow,sticky

Reserved

IVEN—Invalid enable OEN—Overflow enable UEN—Underflow enable

Reserved

00000 0 0 0

Figure 2-2. XSTAT/YSTAT (Upper) Register Bit Descriptions

The non-memory-mapped registers serve special purposes in each compute block. The

X/YSTAT registers (shown in Figure 2-2 and Figure 2-3)

hold the status flags for each compute block. These flags are set or reset to indicate the status of an instruction’s execution a compute block’s ALU, multiplier, and shifter. The X/YPR1–0 registers hold parallel results from the ALU’s SUM, ABS, VMAX, and VMIN instructions. The X/YMR3–0 registers optionally hold results from fixed-point multiply operations, and the

X/YMR4 register holds overflow from those operations. The X/YBFOTMP reg-

isters temporarily store or return overflow from

GETBITS and PUTBITS

instructions.

2-4 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 81

Compute Block Registers

1514131211109876543210

0000000000000000

AZ—ALU zero AN—ALU negative AV—ALU overflow AC—ALU carry MZ—Multiplier zero MN—Multiplier negative MV—Multiplier overflow MU—Multiplier underflow SZ—Shifter zero SN—Shifter negative BF—Block floating-point flags AI—ALU floating-point invalid operation MI—Multiplier floating-point invalid operation TROV—Trellis overflow TRSOV—Trellis overflow, sticky

Figure 2-3. XSTAT/YSTAT (Lower) Register Bit Descriptions

The compute block X and Y register files contain thirty-two 32-bit registers, which serve as a compute block’s interface between DSP internal bus and the computational units. The register file registers—XR31–0 and

YR31–0—are both universal registers (Ureg) and data registers (Dreg).

All inputs for computations come from the register file and all results are sent to the register file, except for fixed-point multiplies which can optionally be sent to the MR3–0 registers.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-5

It is important to note that a register may be used once in an instruction slot, but the assembly syntax permits using registers multiple times within an instruction line (which contains up to four instruction slots). The register file registers are hardware interlocked, meaning that there is dependency checking during each computation to make sure the correct values are being used. When

Page 82

a computation accesses a register, the DSP performs a register check to make sure there are no other dependencies on that register. For more information on instruction lines and dependencies, see “Instruction Line Syntax and Structure” on page 1-20 and

“Instruction Parallelism Rules” on page 1-24.

There are many ways to name registers in the TigerSHARC DSP’s assembly syntax. The register name syntax provides selection of many features of computational instructions. Using the register name syntax in an instruction, you can specify:

• Compute block selection

• Register width selection

• Operand size selection

• Data format selection

2-6 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 83

Compute Block Registers

Figure 2-4 shows the parts of the register name syntax and the features

that the syntax selects.

___R_

Register width selection (# or #:#) Fixed- or floating-point data format selection (none or F) Operand size selection (none, L, S, or B) Compute block selection (none, X, Y, or XY)

{for result registers only}

Figure 2-4. Register File Register Name Syntax

The DSP’s assembly syntax also supports selection of integer or

fractional and real or complex data types. These selections are provided as options to instructions and are not part of register file register name syntax.

Compute Block Selection

As shown in Figure 2-4, the assembly syntax for naming registers lets you select the compute block of the register with which you are working.

The X and Y register-name prefixes denote in which compute block the register resides: X = compute block X only, Y = compute block Y only, and XY (or no prefix) = both. The following ALU instructions provide some register name syntax examples.

XR0 = R1 + R2 ;; /* This instruction executes in block X */

This instruction uses registers XR0, XR1, and XR2.

YR1 = R5 + R6 ;; /* This instruction executes in block Y */

This instruction uses registers YR1, YR5, and YR6.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-7

Page 84

XYR0 = R0 + R2 ;; /* This instruction executes in block X & Y */

This instruction uses registers XR0, XR2, YR0, and YR2.

R0 = R22 + R3 ;; /* This instruction executes in block X & Y */

This instruction uses registers XR0, XR22, XR3, YR0, YR22, and YR3.

Because the compute block prefix lets you select between executing the instruction in one or both compute blocks, this prefix provides the selection between Single-Instruction, Single-Data (SISD) execution and Single-Instruction, Multiple-Data (SIMD) execution. Using SIMD execution is a powerful way to optimize execution if the same algorithm is being used to process multiple channels of data.

It is important to note that SISD and SIMD are not modes that are turned on or off with some latency in the change. SISD and SIMD execution are always available as execution options simply through register name selection.

To represent optional items, instruction syntax definitions use curley braces { } around the item. To represent choices between items, instruction syntax definitions place a vertical bar | between items. The following syntax definition example and comparable instruction indicates the difference for compute block selection:

{X|Y|XY}Rs = Rm + Rn ;; /* the curly braces enclose options */ /* the vertical bars separate choices */

XYR0 = R1 + R0 ;; /* code, no curly braces — no vertical bars */

As shown in Figure 2-4 on page 2-7, the assembly syntax for naming registers lets you select the width of the register with which you are working.

2-8 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 85

Compute Block Registers

Each individual register file register (

XR31–0 and YR31–0) is 32 bits wide.

To support data sizes larger than a 32-bit word, the DSP’s assembly syntax lets you combine registers to hold larger words. The register name syntax for register width works as follows:

• Rs, Rm, or Rn indicates a Single register containing a 32-bit word (or smaller).

For example, these are register names such as R1, XR2, and so on.

• Rsd, Rmd, or Rnd indicates a Double register containing a 64-bit word (or smaller).

For example, these are register names such as R1:0, XR3:2, and so on. The lower register must be evenly divisible by two.

• Rsq, Rmq, or Rnq indicates a Quad register containing a 128-bit word (or smaller).

For example, these are register names such as R3:0, XR7:4, and so on. The lowest register must be evenly divisible by 4.

The combination of italic and code font in the register name syntax above indicates a user-substitutable value. Instruction syntax definitions use this convention to represent multiple register names. The following syntax definition example and comparable instruction indicates the difference for register width selection.

{X|Y|XY}Rsd = Rmd + Rnd ;; /* replaceable register names, italics are variables */

XR1:0 = R3:2 + R1:0 ;; /* code, no substitution */

ADSP-TS101 TigerSHARC Processor Programming Reference 2-9

Page 86

Operand Size and Format Selection

As shown in Figure 2-4 on page 2-7, the assembly syntax for naming registers lets you select the operand size and fixed- or floating-point format of the data placed within the register with which you are working.

Single, double, and quad register file registers ( (inputs and outputs) for instructions. Depending on the operand size and fixed- or floating-point format, there may be more that one operand in a register.

To select the operand size within a register file register, a register name prefix selects a size that is equal or less than the size of the register. These operand size prefixes for fixed-point data work as follows.

• B — Indicates Byte (8-bit) word data. The data in a single 32-bit register is treated as four 8-bit words. Example register names with byte word operands are

• S — Indicates Short (16-bit) word data. The data in a single 32-bit register is treated as two 16-bit words. Example register names with short word operands are SR1, SR1:0, and SR3:0.

• None — Indicates Normal (32-bit) word data. Example register names with normal word operands are R0 R1:0, and R3:0.

• L — Indicates Long (64-bit) word data. An example register name with a long word operand is LR1:0.

BR1, BR1:0, and BR3:0.

Rs, Rsd, Rsq) hold operands

2-10 ADSP-TS101 TigerSHARC Processor Programming Reference

The B, S, and L options apply for ALU and Shifter operations. Operand size selection differs slightly for the multiplier. For more information, see “Multiplier Operations” on page 4-4.

Page 87

Compute Block Registers

To distinguish between fixed- and floating-point data, the register name prefix F indicates that the register contains floating-point data. The DSP supports the following floating-point data formats.

• None — Indicates fixed-point data

•

FRs, FRm, or FRn (floating-point data in a single register) — Indi-

cates normal (IEEE format, 32-bit) word data. An example register name with a normal word, floating-point operand is FR3.

• FRsd, FRmd, or FRnd (floating-point data in a double register) — Indicates extended (40-bit) word data. An example register name with an extended word, floating-point operand is FR1:0.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-11

Page 88

It is important to note that the operand size influences the execution of the instruction. For example,

SRsd = Rmd + Rnd;; is an addition of four

short data operands, stored in two register pairs. An example of this type of instruction follows and has the results shown in Figure 2-5.

SR1:0 = R31:30 + R25:24;;

Registers

R31:30

R25:24

R1:0

[31:16]

R31[15:0]+R31[31:16]+

R25[15:0]R25[31:16]

[15:0]

R30[31:16]+

Low RegisterHigh Register

[31:16]

[15:0]

R30[15:0]+

R24[15:0]R24[31:16]

Figure 2-5. Addition of Four Short Word Operands in Double Registers

As shown in Figure 2-5, this instruction executes the operation on all 64 bits in this example. The operation is executed on every group of 16 bits separately.

2-12 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 89

Compute Block Registers

Registers File Syntax Summary

Data register file registers are used in computational instructions and memory load/store instructions. The syntax for those instructions is described in:

• “ALU” on page 3-1

• “Multiplier” on page 4-1

• “Shifter” on page 5-1

The following ALU instruction syntax description shows the conventions that all syntax descriptions use for data register file names:

{X|Y|XY}{F}Rsd = Rmd + Rnd ;;

Where:

• {X|Y|XY} — The X, Y, or XY (none is same as XY) prefix on the register name selects the compute block or blocks to execute the instruction. The curly braces around these items indicate they are optional, and the vertical bars indicate that only one may be chosen.

• {F} — The F prefix on the register name selects floating-point format for the operation. Omitting the prefix selects fixed-point format.

Rsd — The result is a double register as indicated by the d. The reg-

• ister name takes the form divisible by two (as in

R#:#, where the lower number is evenly

R1:0).

• Rmd, Rnd — The inputs are double registers. The m and n indicate that these must be different registers.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-13

Page 90

Here are some examples of register naming. In Figure 2-6, the register name

XBR3 indicates the operation uses four fixed-point 8-bit words in the

X compute block R3 data register. In Figure 2-7, the register name XSR3 indicates the operation uses two fixed-point 16-bit words in the X compute block R3 data register. In Figure 2-8, the register name XR3 indicates the operation uses one fixed-point 32-bit word in the X compute block R3 data register. In Figure 2-8, the register name XFR3 indicates floating-point data.

31 24 23 16 15 8 7 0

XBR3

(Byte)

8 bits

Figure 2-6. Register R3 in Compute Block X, Treated as Byte Data

31 16 15 0

XSR3

(Short)

16 bits16 bits

Figure 2-7. Register R3 in Compute Block X, Treated as Short Data

31 0

XR3 or XFR3

(Normal)

32 bits

Figure 2-8. Register R3 in Compute Block X, Treated as Normal Data

2-14 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 91

Compute Block Registers

Here are additional examples of register naming. Figure 2-9, Figure 2-10, and Figure 2-11 show examples of operand size in double registers, which are similar to the examples in Figure 2-6, Figure 2-7, and Figure 2-8.

63 48 47 32 31 16 15 0

56 55 40 39 24 23 8 7

XBR3:2

(Byte)

8 bits8 bits8 bits8 bits 8 bits 8 bits 8 bits 8 bits

Figure 2-9. Register R3:2 in Compute Block X, Treated as Byte Data

63 48 47 32 31 16 15 0

XSR3:2

(Short)

16 bits 16 bits 16 bits 16 bits

Figure 2-10. Register R3:2 in Compute Block X, Treated as Short Data

63 32 31 0

XR3:2

(Normal)

32 bits 32 bits

Figure 2-11. Register R3:2 in Compute Block X, Treated as Normal Data

The examples in Figure 2-12 and Figure 2-13 refer to two registers, but hold a single data word.

63 40 39 0

XFR3:2

(Extended)

not used 40 bits

Figure 2-12. Register R3:2 in Compute Block X, Treated as Extended (Floating-Point) Data

ADSP-TS101 TigerSHARC Processor Programming Reference 2-15

Page 92

Numeric Formats

63 0

XLR3:2

(Long)

64 bits

Figure 2-13. Register R3:2 in Compute Block X, Treated as Long Data

Numeric Formats

The DSP supports the 32-bit single-precision floating-point data format defined in the IEEE Standard 754/854. In addition, the DSP supports a 40-bit extended-precision version of the same format with eight additional bits in the mantissa. The DSP also supports 8-, 16-, 32-, and 64-bit fixed-point formats—fractional and integer—which can be signed (two’s-complement) or unsigned.

IEEE Single-Precision Floating-Point Data Format

IEEE Standard 754/854 specifies a 32-bit single-precision floating-point format, shown in Figure 2-14. A number in this format consists of a sign bit s, a 24-bit significand, and an 8-bit unsigned-magnitude exponent e.

For normalized numbers, the significand consists of a 23-bit fraction f and a hidden bit of 1 that is implicitly presumed to precede f22 in the significand. The binary point is presumed to lie between this hidden bit and f22. The least significant bit (LSB) of the fraction is f0; the LSB of the exponent is e0.

The hidden bit effectively increases the precision of the floating-point significand to 24 bits from the 23 bits actually stored in the data format. This bit also insures that the significand of any number in the IEEE normalized number format is always greater than or equal to 1 and less than 2.

2-16 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 93

Compute Block Registers

The unsigned exponent e can range between 1 ≤ e ≤ 254 for normal num- bers in the single-precision format. This exponent is biased by +127 (254/2). To calculate the true unbiased exponent, 127 must be subtracted from e.

31 30 23 22 0

FRs

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Hidden Bit

e01.f

Binary Point

Figure 2-14. IEEE 32-Bit Single-Precision Floating-Point Format (Normal Word)

The IEEE standard also provides for several special data types in the single-precision floating-point format:

• An exponent value of 255 (all ones) with a nonzero fraction is a Not-A-Number (NAN). NANs are usually used as flags for data flow control, for the values of uninitialized variables, and for the results of invalid operations such as 0 ∗ ∞.

• Infinity is represented as an exponent of 255 and a zero fraction. Note that because the fraction is signed, both positive and negative Infinity can be represented.

• Zero is represented by a zero exponent and a zero fraction. As with Infinity, both positive zero and negative zero can be represented.

The IEEE single-precision floating-point data types supported by the DSP and their interpretations are summarized in Table 2-1.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-17

Page 94

Numeric Formats

Table 2-1. IEEE Single-Precision Floating-Point Data Types

Type Exponent Fraction Value

NAN 255 Nonzero Undefined

Infinity 255 0 (–1)s Infinity Normal 1 ≤ e ≤ 254 Any (–1)s (1.f

Zero 0 0 (–1)s Zero

22-0

) 2 e

–127

The TigerSHARC processor is compatible with the IEEE single-precision floating-point data format in all respects, except for:

• The TigerSHARC processor does not provide inexact flags.

• NAN inputs generate an invalid exception and return a quiet NAN.

• Denormal operands are flushed to zero when input to a computation unit and do not generate an underflow exception. Any denormal or underflow result from an arithmetic operation is flushed to zero and an underflow exception is generated.

• Round-to-nearest and round-towards-zero are supported. Round-to-±infinity are not supported.

2-18 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 95

Compute Block Registers

Extended Precision Floating-Point Format

The extended precision floating-point format is 40 bits wide, with the same 8-bit exponent as in the standard format but with a 32-bit significand. This format is shown in Figure 2-15. In all other respects, the extended floating-point format is the same as the IEEE standard format.

39 38 31 30 0

FRsd

. . . . . . . . . . . . . . . . . . . . . . . . . . .

Hidden Bit

e01.f

Binary Point

Figure 2-15. 40-Bit Extended-Precision Floating-Point Format (Extended Word)

Fixed-Point Formats

The DSP supports fixed-point fractional and integer formats for 16-, 32-, and 64-bit data. In these formats, numbers can be signed (two’s-complement) or unsigned. The possible combinations are shown in Figure 2-20 through Figure 2-27. In the fractional format, there is an implied binary point to the left of the most significant magnitude bit. In integer format, the binary point is understood to be to the right of the LSB. Note that the sign bit is negatively weighted in a two’s-complement format.

The DSP supports a fixed-point, signed, integer format for 8-bit data. Data in the 8- and 16-bit formats is always packed in 32-bit registers as follows—a single register holds four 8-bit or two 16-bit words, a dual register holds eight 8-bit or four 16-bit words, and a quad register holds sixteen 8-bit or eight 16-bit words.

ADSP-TS101 TigerSHARC Processor Programming Reference 2-19

Page 96

Numeric Formats

ALU outputs always have the same width and data format as the inputs. The multiplier, however, produces a 64-bit product from two 32-bit inputs. If both operands are unsigned integers, the result is a 64-bit unsigned integer. If both operands are unsigned fractions, the result is a 64-bit unsigned fraction. These formats are shown in Figure 2-30 and

Figure 2-31.

If one operand is signed and the other unsigned, the result is signed. If both inputs are signed, the result is signed and automatically shifted left one bit. The LSB becomes zero and bit 62 moves into the sign bit position. Normally bit 63 and bit 62 are identical when both operands are signed. (The only exception is full-scale negative multiplied by itself.) Thus, the left shift normally removes a redundant sign bit, increasing the precision of the most significant product. Also, if the data format is fractional, a single bit left shift renormalizes the MSB to a fractional format. The signed formats with and without left shifting are shown in

Figure 2-28 and Figure 2-29.

The multiplier has an 80-bit accumulator to allow the accumulation of 64-bit products. For more information on the multiplier and accumulator, see “Multiplier” on page 4-1.

BRs Signed Integer

765 2 0

–2

Sign Bit

262

. . . . . . . . . . . . . . . . . . . . .

Binary Point

22212

Figure 2-16. 8-Bit Fixed-Point Format, Signed Integer (Byte Word)

2-20 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 97

Compute Block Registers

BRs

Signed

Fractional

Sign Bit

765 2 0

–2

–0

–2

2–12

Binary Point

. . . . . . . . . . . . . . . . . . . . .

Figure 2-17. 8-Bit Fixed-Point Format, Signed Fractional (Byte Word)

BRs

Unsigned

Integer

765 2 0

262

. . . . . . . . . . . . . . . . . . . . .

Figure 2-18. 8-Bit Fixed-Point Format, Unsigned Integer (Byte Word)

BRs

Unsigned

Fractional

765 2 0

–1

2–22

–3

. . . . . . . . . . . . . . . . . . . . .

2–52–62

22212

Binary Point

2–62–72

–7

–8

Binary Point

Figure 2-19. 8-Bit Fixed-Point Format, Unsigned Fractional (Byte Word)

ADSP-TS101 TigerSHARC Processor Programming Reference 2-21

Page 98

Numeric Formats

SRs Signed Integer

15 14 13 2 0

–2

Sign Bit

2142

. . . . . . . . . . . . . . . . . . . . .

Figure 2-20. 16-Bit Fixed-Point Format, Signed Integer (Short Word)

SRs Signed

Fractional

Sign Bit

15 14 13 2 0

–2

–0

–2

2–12

Binary Point

. . . . . . . . . . . . . . . . . . . . .

Figure 2-21. 16-Bit Fixed-Point Format, Signed Fractional (Short Word)

SRs

Unsigned

Integer

15 14 13 2 0

2142

. . . . . . . . . . . . . . . . . . . . .

22212

Binary Point

–132–142–15

22212

Binary Point

Figure 2-22. 16-Bit Fixed-Point Format, Unsigned Integer (Short Word)

2-22 ADSP-TS101 TigerSHARC Processor Programming Reference

Page 99

Compute Block Registers

SRs

Unsigned

Fractional

15 14 13 2 0

–1

Binary Point

2–22

–3

. . . . . . . . . . . . . . . . . . . . .

–142–152–16

Figure 2-23. 16-Bit Fixed-Point Format, Unsigned Fractional (Short Word)

Signed Integer

31 30 29 2 0

–2

Sign Bit

2302

. . . . . . . . . . . . . . . . . . . . .

22212

Binary Point

Figure 2-24. 32-Bit Fixed-Point Format, Signed Integer (Normal Word)

Signed

Fractional

31 30 29 2 0

–2

–0

2–12

–2

. . . . . . . . . . . . . . . . . . . . .

–292–302–31

Sign Bit

Binary Point

Figure 2-25. 32-Bit Fixed-Point Format, Signed Fractional (Normal Word)

ADSP-TS101 TigerSHARC Processor Programming Reference 2-23

Page 100

Numeric Formats

Unsigned

Integer

31 30 29 2 0

2302

. . . . . . . . . . . . . . . . . . . . .

22212

Binary Point

Figure 2-26. 32-Bit Fixed-Point Format, Unsigned Integer (Normal Word)

Unsigned Fractional

31 30 29 2 0

–1

Binary Point

2–22

–3

. . . . . . . . . . . . . . . . . . . . .

–302–312–32

Figure 2-27. 32-Bit Fixed-Point Format, Unsigned Fractional (Normal Word)

LRs Signed Integer

63 62 61 2 0

–2

–262–2

. . . . . . . . . . . . . . . . . . . . .

–22–21–2

Sign Bit

Binary Point

Figure 2-28. 64-Bit Fixed-Point Format, Signed Integer (Long Word)

2-24 ADSP-TS101 TigerSHARC Processor Programming Reference

Datasheet ADSP-TS101 Datasheet (ANALOG DEVICES)

Specifications and Main Features

Frequently Asked Questions

User Manual

CONTENTS

PREFACE

INTRODUCTION

COMPUTE BLOCK REGISTERS

MULTIPLIER

SHIFTER

IALU

PROGRAM SEQUENCER

INSTRUCTION SET

QUICK REFERENCE

REGISTER/BIT DEFINITIONS

INSTRUCTION DECODE

INDEX