This document contains information on a new product.
Specifications and information herein are subject to
change without notice.
(c) Freescale Semiconductor, Inc. 2005, All rights
LICENSOR is defined as Freescale Semiconductor, Inc. LICENSOR reserves the right to make
changes without further notice to any products included and covered hereby. LICENSOR makes
no warranty, representation or guarantee regarding the suitability of its products for any particular
purpose, nor does LICENSOR assume any liability arising out of the application or use of any
product or circuit, and specifically disclaims any and all liability, including without limitation
incidental, consequential, reliance, exemplary, or any other similar such damages, by way of
illustration but not limitation, such as, loss of profits and loss of business opportunity. "Typical"
parameters which may be provided in LICENSOR data sheets and/or specifications can and do
vary in different applications and actual performance may vary over time. All operating
parameters, including "Typicals" must be validated for each customer application by customer’s
technical experts. LICENSOR does not convey any license under its patent rights nor the rights of
others. LICENSOR products are not designed, intended, or authorized for use as components in
systems intended for surgical implant into the body, or other applications intended to support life,
or for any other application in which the failure of the LICENSOR product could create a situation
where personal injury or death may occur. Should Buyer purchase or use LICENSOR products for
any such unintended or unauthorized application, Buyer shall indemnify and hold LICENSOR and
its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims, cost,
damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim
of personal injury or death associated with such unintended or unauthorized use, even if such claim
alleges that LICENSOR was negligent regarding the design or manufacture of the part.
Freescale and are registered trademarks of Freescale Semiconductor, Inc. Freescale, Inc. is an
Equal Opportunity/Affirmative Action Employer.
All other tradenames, trademarks, and registered trademarks are the property of their respective
owners.
This manual provides reference information for the StarCore SC140 digital signal processor (DSP) core.
Specifically, this book describes the instruction set architecture and programming model for the SC140
core as well as corresponding register details, debug capabilities, and programming rules.
An appendix provides a detailed instruction reference for the SC140 instruction set, describing the
operation, mnemonics, instruction fields, and encoding for each instruction. Instruction examples are also
provided.
The resulting system-on-chip devices designed around the SC140 core will usually include additional
functional blocks such as on-chip memory, an external memory interface, peripheral accelerators, and
coprocessor devices. The specification of these functional blocks is customer-specific as well as
application-specific. Therefore, this information is not covered in this manual.
Audience
This manual is intended for systems software developers, hardware designers, and application developers.
Organization
This book is organized into six chapters and one appendix as follows:
•Chapter 1, “Introduction”, describes key features of the SC140 architecture. This chapter also
illustrates a typical system using the SC140 core.
•Chapter 2, “Core Architecture”, describes the main functional blo cks and data paths of the SC140
core.
•Chapter 3, “Control Registers”, details the core’s control registers.
•Chapter 4, “Emulation and Debug (EOnCE)”, describes the hardware debug capabilities of the core.
•Chapter 5, “Program Control”, details program control features such as the pipeline, instruction
programming rules, and programming guidelines for correct code construction.
•Appendix A, “SC140 DSP Core Instruction Set,” references the SC140 instruction set.
•Appendix B, “StarCore Registry,” shows how to access the core version
SC140 DSP Core Reference Manualxxiii
Abbreviations
The abbreviations used in this manual are listed below:
Table 1. Abbreviations
AbbreviationDescription
AAUAddress arithmetic unit
ADMApplication development module
AGUAddress generation unit
ALUArithmetic logic unit
BnAGU base address register n
BFUBit-field unit
BMUBit mask unit
DALUData arithmetic and logic unit
DSPDigital signal processor
ECREOnCE control register
EDUEvent detection unit, with respect to the EOnCE
EEEOnCE event pins
EMCREOnCE monitor and control register
EMRException and mode register
EOnCEEnhanced on-chip emulator
ERCVEOnCE receive register
ESEvent selector, with respect to the EOnCE
ESPE xception mode stack pointer
ESREOnCE status register
ETRSMTEOnCE transmit register
EXTExtension portion of a data register
FCFetch counter
FIFOFirst-in first-out
FFTFast Fourier transform
HPHigh portion of a data register
IPLInterrupt priority level
ISAPInstruction Set Accelerator Plug-in
xxivSC140 DSP Core Reference Manual
Table 1. Abbreviations (Continued)
AbbreviationDescription
ISRInterrupt service routine
JTAGJoint test action group
LALast address
LCnLoop counter register n
LnLimit tag bit n
LPLow portion of a data register
LSBLeast significant bits
LSPLeast significant portion
MnAGU modifier register
MACMultiply-accumulate
MCTLModifier control register
MIPSMillion instructions per second
MMACSMillion multiply and accumulate operations per second
MSBMost significant bits
MSPMost significant portion
NnAGU offset register n
NMINon-maskable interrupt
NSPNormal mode stack pointer
OSOperating system
PABProgram address bus
PAGProgram address generator
PCProgram counter register
PCUProgram control unit
PDBProgram data bus
PDUProgram dispatch unit
PICProgrammable interrupt controller
PLLPhase locked loop
PSEQProgram sequencer unit
RnAGU address register n
SC140 DSP Core Reference Manualxxv
Table 1. Abbreviations (Continued)
AbbreviationDescription
RASReturn address register
RTOSReal-time operating system
SAnStart address register n
SFSigned fractional
SISigned integer
SMSaturation mode
SoCSystem-on-chip
SPStack pointer
SRStatus register
TTrue bit
UIUnsigned integer
VBAI nterrupt vector base address register
VLESVariable length execution set instruction grouping
XABAData memory address bus A
XABBData memory address bus B
XDBAData memory data bus A
XDBBData memory data bus B
The StarCore SC140 digital signal processing (DSP) core, a new member of the SC100 architecture,
addresses key market needs of next-generation DSP applications. It is especially suited for wireline and
wireless communications, including infrastructure and subscriber communications. It is a flexible
programmable DSP core which enables the emergence of computational-intensive communication
applications by providing exceptional performance, low power consumption, efficient compilability, and
compact code density. The SC140 core efficiently deploys a variable-length execution set (VLES)
execution model which utilizes maximum parallelism by allowing multiple address generation and data
arithmetic logic units to execute multiple instructions in a single clock cycle.
This chapter describes key features of the SC140 core architecture.
1.1 Target Markets
The design of the SC140 architecture aims to provide a DSP software platform that fulfills the constantly
increasing computational requirements of DSP applications due to:
•New communication standards and services
•Wideband channels and data rates
•New user interfaces and media
Currently, software-configurable wireless terminals are already required to accommodate multiple air
interfaces and frequency bands for cellular phones, PCs, paging devices, cordless phones, wireless LAN
systems, and modems. In addition, multiple voice, messaging, internet, and video services must also be
supported. These terminals must be flexible and upgradable so that they can be personalized for each user
(such as permitting the dynamic download of applets). Finally, these terminals must be able to process
baseband data using software to implement a range of functions previously carried out by hardware.
Target markets for the SC140 architecture include:
•Third generation wireless handset systems with wideband data services
•Wireless and wireline base stations as well as the corresponding infrastructure
•Speech coding, synthesis, and voice recognition
•Wireless internet and multimedia
•Network and data communication
SC140 DSP Core Reference Manual1-1
Architectural Differentiation
1.2 Architectural Differentiation
The SC140architecture differentiates itself in the market with the following capabilities:
•High-level Abstraction of the Application Software
— DSP applications and kernels can currently be developed in the C programming language. An
optimizing compiler generates parallel instructions while maintaining a high code density.
— An orthogonal instruction set and programming model along with single data space and byte
addressability enable the compiler to generate efficient code.
— Hardware supported integer and fractional data types enable application developers to choose
their own style of code development, or to use coding techniques derived from an
application-specific standard.
•Scalable Performance
— The number of execution units is independent of the instruction set, and can be tailored to the
application’ s performance requirement. The SC140 contains four arithmetic logic units (ALUs)
and two address arithmetic units (AAUs).
— A high frequency of operation is achieved at low voltage, providing four million multiply and
accumulate (MAC) operations per second (4 MMACS) for each megahertz of clock frequency.
— Support exists for application-specific accelerators, providing a performance boost and
reduction in power consumption.
•High Code Density for Minimized Cost
— 16-bit wide instruction encoding.
— A rich and orthogonal instruction set, major portions of which focus on control code that can
often occupy most of the application code.
— Variable length execution set (VLES) for DSP kernel operations.
•Improved Support for Multi-tasking Applications
— Dual stack pointer support in HW.
•Optimized Power Management Control
— Very low power consumption.
— Low voltage operation.
— Power saving modes.
•Efficient Memory and I/O Interface
— Very large on-chip zero-wait state static random access memory (SRAM) capability.
— Support for slower on-chip memory via wait-states.
— 32-bit address space for both program and data (byte-addressable).
— Unified data and program memory space.
— Decoupled external memory timing with independent clock.
•Core Organization and Design
— Supports flexible system-on-a-chip (SoC) configurations.
— Portable across fabrication lines and foundries.
1-2SC140 DSP Core Reference Manual
Core Architecture Features
1.3 Core Architecture Features
The SC140 core consists of the following:
•Data arithmetic logic unit (DALU) that contains four instances of an arithmetic logic unit (ALU) and
a data register file
•Address generation unit (AGU) that contains two address arithmetic units (AAU) and an address
register file
•Program sequencer and control unit (PSEQ)
Key features of the SC140 core include the following:
•Up to four million multiply-accumulate (MAC) operations per second (4 MMACS) for each
megahertz of clock frequency
•Up to 10 RISC MIPS (million instruction words per second) for each megahertz of clock frequency
(a MAC operation is counted as two RISC instructions)
•Four ALUs comprising MAC and bit-field units
•A true (16 ∗ 16) + 40 to 40-bit MAC unit in each ALU
•A true 40-bit parallel barrel shifter in each ALU
•Sixteen 40-bit data registers for fractional and integer data operand storage
•Sixteen 32-bit address registers, eight of which can be used as 32-bit base address registers
•Four address offset registers and four modulo address registers
•Hardware support for fractional and integer data types
•Up to six instructions executed in a single clock cycle
•Very rich 16-bit wide orthogonal instruction set
•Support for application specific instruction set enhancements with an interface to an ISAP
(Instruction Set Accelerator Plug-in)
•VLES execution model
•Two AAUs with integer arithmetic capabilities
•A bit mask unit (BMU) for bit and bit-field logic operations
•Unique DSP addressing modes
•32-bit unified data and program address space
•Zero-overhead hardware loops with up to four levels of nesting
•Byte-addressable data memory
•Position independent code utilizing change-of-flow instructions that are relative to the
program counter (PC)
•Enhanced on-chip emulation (EOnCE) module with real-time debug capabilities
•Low power wait standby mode
•Very low power complementary metal-oxide semiconductor (CMOS) design
•Fully static logic
SC140 DSP Core Reference Manual1-3
Core Architecture Features
1.3.1 Typical System-On-Chip Configuration
The SC140 is a high-performance general-purpose fixed-point DSP core, allowing it to support many
system-on-chip (SoC) configurations. A library of modules containing memories, peripherals, accelerators,
and other processor cores makes it possible for a variety of highly integrated and cost-effective SoC
devices to be built around the SC140. Figure 1-1 shows a block diagram of a typical SoC chip made up of
the SC140 core and associated SoC components (described below). In a typical system the SC140 core is
enveloped in a platform that includes the core and supporting zero wait-state memories. This platform is
integrated as a unit in the SoC. Although not indicated in this configuration, an SoC can contain more than
one SC140 core platform.
An on-platform instruction set accelerator plug-in can be used as part of the SC140 core platform to
provide additional instructions for unique application solutions such as video processing, which require
specific arithmetic instructions in addition to the main instruction set.
•SC140 DSP core platform — Includes the DSP core and the immediate supporting blocks that
typically run at the full core frequency. The DSP platform typically includes:
— SC140 DSP core
— Instruction Set Accelerator Plug-in (ISAP) - for expanding the instruction set with
application-specific instructions.
— L1 caches - data and instruction caches, operating with zero wait states in case of cache hit
— Unified M1 memory - supporting both program and data, and hence connected to both the
program and data buses of the core. The M1 memory operates with no wait states. It could be
either RAM or ROM, or a mix of both. The RAM, depending on its’ size, may be conn ected as
a slave to an external DMA.
— Program interrupt controller (PIC)
— Interfaces - translate the core data and program fetch requests to the bus protocol supported by
the system, usually in reduced frequency.
•DSP Expansion Area — This area includes the functional units that interface between the core and
the DSP application, most importantly the functions that send and receive data from external
input/output sources, under the control of the software running on the DSP core. In addition, this area
includes accelerators that execute portions of the application, in order to boost performance and
decrease power consumption. This area is application-specific and may or may not include various
functional units such as:
— Synchronous serial interface
— Serial communication interface
— Viterbi accelerator
— Filter coprocessors
•System Expansion Area — This area includes the SoC functional units that are not tightly coupled
with the DSP core. Typically it may include other processors with their support platform as well.
This area is application-specific, and may include various functional units such as:
— External memory interface
— Direct memory access (DMA) controller
— L2 Cache controller for either data or program
— Chip-level Interrupt control unit
— On-chip Level 2 (M2) memory expansion modules
— Other processor cores with their supporting platforms
1-4SC140 DSP Core Reference Manual
SoC
DSP expansion area
Core Architecture Features
System expansion area
Standard I/O Peripherals
Application specific accelerators
General purpose programmable
accelerators
SC140 platform
Trace
buffer
EOnCEISAP
JTAG
SC140 core
PIC
External memory interface
Level-2 caches
On-chip RAM and ROM
Host interface
Other micro-controllers
Bus switch & interfaces
Instruction
cache
P
XA
XB
Data
cache
PLL
DMA
Unified M1
prog. & data
memory
ROM
RAM
Figure 1-1. Block Diagram of a Typical SoC Configuration with the SC140 Core
1.3.2 Variable Length Execution Set (VLES) Software Model
The VLES software model is the instruction grouping used by the SC140 to address the requirements of
DSP kernels. Using an orthogonal compiler-friendly instruction set, this model maintains a compact code
density for applications.
All SC140 instruction words are 16 bits wide. Most instructions are encoded with one word. Each SC140
instruction encodes an atomic (lowest-level) operation. For example, MAC and store (move) instructions
are encoded in 16 bits. Since atomic operations need fewer bits to encode, the 16-bit instruction set
becomes fully orthogonal and very rich in the functionality it supports.
In order to execute signal processing kernels, a set of SC140 instructions can be grouped to be executed in
parallel. The PSEQ performs this automatically with up to four DALU instructions and two AGU
instructions executed at the same time.
SC140 DSP Core Reference Manual1-5
Core Architecture Features
1-6SC140 DSP Core Reference Manual
Chapter 2
Core Architecture
This chapter provides an overview of the SC140 core architecture. It describes the main functional blocks
and data paths of the core.
2.1 Architecture Overview
The SC140 core provides the following main functional units:
•Data arithmetic and logic unit (DALU)
•Address generation unit (AGU)
•Program sequencer unit (PSEQ)
To provide data exchange between the core and the other on-chip blocks, the following buses are
implemented:
•Two data memory buses (address and data pairs: XABA and XDBA, XABB and XDBB) that are
used for all data transfers between the core and memory.
•Program data and address buses (PDB and PAB) for carrying program words from the memory to
the core.
•Special buses to support tightly coupled external user-definable instruction set accelerators.
A block diagram of the SC140 core is shown in Figure 2-3.
SC140 DSP Core Reference Manual2-1
Architecture Overview
.
StarCore
SC140
Core
PDB
128
Unified
Data/Program Memory
32
PAB
XABA
64
32
32
XABB
64
XDBA
XDBB
JTAG
controller
Program
Sequencer
EOnCE
Instruction Bus
Address Generator
Register File
AGU
2 AAUs
BMU
DALU
Register File
DALU
4 ALUs
Figure 2-1. Block Diagram of the SC140 Core
2.1.1 Data Arithmetic Logic Unit (DALU)
The DALU performs arithmetic and logical operations on data operands in the SC140 core. The
components of the DALU are as follows:
•A register file of sixteen 40-bit registers
•Four parallel ALUs, each ALU containing a multiply-accumulate (MAC) unit and
a bit-field unit (BFU)
ISAP
25
•Eight data bus shifter/limiters
All the MAC units and BFUs can access all the DALU registers. Each register is partitioned into three
portions: two 16-bit registers (low and high portion of the register) and one 8-bit register (extension
portion). Accesses to or from these registers can be in widths of 8 bits, 16 bits, 32 bits, or 40 bits,
depending on the instruction.
The two data buses between the DALU register file and the memory are each 64 bits wide. This enables a
very high data transfer speed between memory and registers by allowing two data moves in parallel, each
up to 64 bits in width. The move instructions vary in access width from 8 bits to 64 bits, and can transfer
multiple words within the 64 bit constraint. With every MOVE instruction affecting the memory, one of
four signals to the memory interface is asserted, defining the access width.
•MOVE.B loads or stores bytes (8-bit).
•MOVE.W or MOVE.F loads or stores integer or fractional words (16-bit).
•MOVE.2W, MOVE.2F or MOVE.L loads or stores two integers, two fractions and long words
respectively (32-bit).
•MOVE.4W or MOVE.4F loads or stores four integers or four fractions, respectively (64-bit).
2-2SC140 DSP Core Reference Manual
Architecture Overview
•MOVE.2L loads or stores two long words (64-bit).
2.1.1.1 Data Register File
The DALU registers can be read or written over the data buses (XDBA and XDBB). A DALU register can
be the source for up to four simultaneous instructions, but simultaneous writes of a destination register are
illegal. The source operands for DALU arithmetic instructions usually originate from DALU registers. The
destination of every arithmetic operation is a DALU register, and each such destination can be used as a
source operand for the operation immediately following, without any time penalty.
2.1.1.2 Multiply-Accumulate (MAC) Unit
The MAC unit comprises the main arithmetic processing unit of the SC140 core and performs the
arithmetic operations. The MAC unit has a 40-bit input and outputs one 40-bit result in the form of
[Extension:High Portion:Low Portion] (EXT:HP:LP).
The multiplier executes 16-bit by 16-bit fractional or integer multiplication between two’s complement
signed, unsigned, or mixed operands (16-bit multiplier and multiplicand). The 32-bit product is
right-justified, sign-extended, and may be added to the 40-bit contents of one of the 16 data registers.
2.1.1.3 Bit-Field Unit (BFU)
The BFU contains a 40-bit parallel bidirectional shifter with a 40-bit input and a 40-bit output, a mask
generation unit, and a logic unit. The BFU is used in the following operations:
•Multi-bit left/right shift (arithmetic or logical)
•One-bit rotate (right or left)
•Bit-field insert and extract
•Count leading bits (ones or zeros)
•Logical operations
•Sign or zero extension operations
2.1.1.4 Shifter/Limiters
Eight shifter/limiters provide scaling and limiting on 32-bit transfers from the data register file to memory.
Scaling up or down by one bit is programmable as is limiting to the maximum values provided in 32 bits.
For more detailed information, see Section 2.2.1.4, “Data Shifter/Limiter,” Section 2.2.1.5, “Scaling,” and
Section 2.2.1.6, “Limiting.”
2.1.2 Address Generation Unit (AGU)
The AGU contains address registers and performs address calculations using integer arithmetic necessary
to address data operands in memory. It implements four types of arithmetic: linear, modulo, multiple
wrap-around modulo, and reverse-carry. The AGU operates in parallel with other core resources to
minimize address generation overhead.
SC140 DSP Core Reference Manual2-3
Architecture Overview
The AGU in the SC140 core has two address arithmetic units (AAU) to allow two address generation
operations at every clock cycle. The AAU has access to:
•Sixteen 32-bit address registers (R0–R15), of which R8–R15 can also be used as base address
registers for modulo addressing.
•Four 32-bit offset registers (N0–N3).
•Four 32-bit modulo registers (M0–M3).
The two AAUs are identical. Each contains:
•A 32-bit full adder, used for offset calculations.
•A second 32-bit full adder, used for modulo calculations.
Each AAU can update one address register in the address register file in one instruction cycle.
The AGU also contains a 32-bit modulo control register (MCTL). This control register is used to specify
the addressing mode of the R registers: linear, reverse-carry, modulo, or multiple wrap-around modulo.
When modulo addressing mode is selected, the MCTL register is used to specify which of the four modulo
registers is assigned to a specific R register.
Explicit instructions in the SC140 instruction set are used to execute arithmetic operations on the address
pointers. This capability can also be used for general data arithmetic. In addition, the AGU generates
change-of-flow program addresses and updates the stack pointers as needed.
2.1.2.1 Stack Pointer Registers
Two special registers with special addressing modes are used for software stacks. These are the Normal
mode stack pointer (NSP) and the Exception mode stack pointer (ESP). Both the ESP and the NSP are
32-bit read/write address registers with pre-decrement and post-increment updates. Both are offset with
immediate values to allow random access to a software stack.
The ESP is used by stack instructions when the SC140 is in the Exception mode of operation, which is
entered when exceptions occur. The NSP is used in Normal mode when there are no exceptions. The
existence of two stack pointers enables separate allocation of stack space by the operating system and each
application task, which optimizes memory use in multi-tasking systems.
2.1.2.2 Bit Mask Unit (BMU)
The BMU provides an easy way of setting, clearing, inverting, or testing a selected, but not necessarily
adjacent, group of bits in a register or memory location.
The BMU supports a set of bit mask instructions that operate on:
•All AGU pointers (R0–R15)
•All DALU registers (D0–D15)
•All control registers (EMR, VBA, SR, MCTL)
•Memory locations
Only a single bit mask instruction is allowed in any single execution set since only one execution unit
exists for these instructions.
A subgroup of the bit mask instructions (BMTSET) provides hardware support of semaphoring, providing
one instruction for read-modify-write.
2-4SC140 DSP Core Reference Manual
Architecture Overview
2.1.3 Program Sequencer Unit (PSEQ)
The PSEQ performs instruction fetch, instruction dispatch, hardware loop control, and exception
processing. The PSEQ controls the different processing states of the SC140 core. The PSEQ consists of
three hardware blocks:
•Program dispatch unit (PDU)—Responsible for detecting the execution set out of a one or two fetch
set, and dispatching the execution set’s various instructions to their appropriate execution units
where they are decoded.
•Program control unit (PCU)—Responsible for controlling the sequence of the program flow.
•Program address generator (PAG)—Responsible for generating the program counter (PC) for
instruction fetch operations, including hardware looping.
The PSEQ implements its functions using the following registers:
•PC—Program counter register
•SR—Status register
•SA0-3—Four start address registers (SA0–SA3)
•LC0-3—Four loop counter registers (LC0–LC3)
•EMR—Exception and mode register
•VBA—Interrupt vector base address register
2.1.4 Enhanced On-Chip Emulator (EOnCE)
The EOnCE module provides a non-intrusive means of interacting with the SC140 core and its peripherals
so that a user can examine registers, memory, or on-chip peripherals as well as define various breakpoints
and read the trace-FIFO. The EOnCE module greatly aids the development of hardware and software on
the SC140 core processor, EOnCE interfacing with the debugging system through on-chip JTAG TAP
controller pins. Refer to Chapter 4, “Emulation and Debug (EOnCE),” for details.
2.1.5 Instruction Set Accelerator Plug-in (ISAP) Interface
A user-defined instruction set accelerator plug-in (ISAP) module provides a means of enhancing the
SC140 basic instruction set with additional instructions. These additional instructions are executed in an
external module connected to the core. The new instructions are added to the SC140 Assembler and
Compiler via intrinsic libraries making application-specific or general-purpose functions available to the
user. A 25-bit instruction bus from the SC140 core to the ISAP enables the definition and support of a very
rich instruction set. The ISAP is also connected to the two 64-bit data buses, providing a large data
bandwidth to the main memory system.
2.1.6 Memory Interface
The SC140 core uses a unified memory space. Each address can contain either program information or
data. The exact memory configuration is customizable for each chip containing an SC140 core. Memory
space typically consists of on-chip RAM and ROM that can be expanded off-chip. The memory system
must support two parallel data accesses. However, it may issue stalls due to its specific implementation.
Refer to Section 2.4, “Memory Interface,” for further details.
Both internal and external memory configurations are specific to each member of the SC140 family.
SC140 DSP Core Reference Manual2-5
DALU
2.2 DALU
This section describes the architecture and operation of the DALU, the block where most of the arithmetic
and logical operations are performed on data operands. In addition, this section details the arithmetic and
rounding operations performed by the DALU as well as its programming model.
2.2.1 DALU Architecture
The DALU performs most of the arithmetic and logical operations on data operands in the SC140 core.
The data registers can be read from or written to memory over the XDBA and the XDBB as 8-bit, 16-bit,
or 32-bit operands. The 64-bit wide data buses, XDBA and XDBB, support the transfer of several operands
in a single access. The source operands for the DALU, which may be 16, 32, or 40 bits, originate either
from data registers or from immediate data. The results of all DALU operations are stored in the data
registers.
All DALU operations are performed in one clock cycle. Up to parallel arithmetic operations can be
performed in each cycle. The destination of every arithmetic operation can be used as a source operand for
the operation immediately following without any time penalty.
The components of the DALU are as follows:
•A register file of sixteen 40-bit registers
•Four parallel ALUs, each containing a MAC unit and a BFU with a 40-bit barrel shifter
•Eight data bus shifter/limiters that allow scaling and limiting of up to four 32-bit operands
transferred over each of the XDBA and XDBB buses in a single cycle
Figure 2-2 shows the architecture of the DALU.
Memory Data Bus 1 (XDBA)
Memory Data Bus 2 (XDBB)
64646464
(8) Shifter/Limiters
40404040404040
Data Registers D0–D15
40404040
404040404040
404040
ALUALUALU
Figure 2-2. DALU Architecture
2-6SC140 DSP Core Reference Manual
ALU
DALU
The DALU programming model is shown in Table 2-1. Register D0 refers to the entire 40-bit register,
whereas D0.e, D0.h, and D0.l refer to the extension: high portion and low portion of the D0 register,
respectively. In addition, one limit tag bit is associated with each data register. L0–L15 are concatenated to
D0–D15, respectively.
In this section, the D0–D15 data registers are referred to as Dn. They can be used as:
•Source operands
•Destination operands
•Accumulators
The registers can serve as input buffer registers between XDBA or XDBB and the ALUs. The registers are
used as DALU source operands, allowing new operands to be loaded for the next instruction while the
register contents are used by the current arithmetic instruction.
Each data register Dn has a limit tag bit (Ln) which is used to signify whether the extension portion of the
register is in use. The limit tag bit Ln is coupled to the extension portion Dn.e, which forms a 9-bit operand
for the purpose of storing these bits to memory. See Section 2.2.1.6, “Limiting,” for further details.
The data registers can be accessed over XDBA and XDBB with three data widths:
•A long-word access, writing or reading 32-bit operands
•A word access, writing or reading 16-bit operands
•A byte access, writing or reading 8-bit operands
For move instructions of fractional data, the transfer of a Dn register to memory over XDBA and XDBB is
protected against overflow by substituting a limiting constant for the data that is being transferred. The
content of Dn is not affected should limiting occur. Only the value transferred over XDBA or XDBB is
limited. This process is commonly referred to as transfer saturation and should not be confused with the
arithmetic saturation mode as described in Section 2.2.2.7, “Arithmetic Saturation Mode.”
Limiting is performed after the contents of the register have been shifted according to the scaling mode.
Shifting and limiting are performed only for MOVES instructions when a fractional operand is specified as
the source for a data move over XDBA or XDBB. When an integer operand is specified as the source for a
data move, shifting and limiting are not performed.
Automatic sign extension (or zero extension of the data values into the 40-bit registers) is provided when
an operand is transferred from memory to a data register. Sign extension can occur when loading the Dn
register from memory. If a fractional word operand is to be written to a data register, the high portion (HP)
of the register is written with the word operand. The low portion (LP) is zero-filled. The EXT portion is
sign-extended from the HP, and the limit tag bit (Ln) is cleared.
When an integer word operand is to be written to a data register, the LP portion of the register is written
with the word operand. The HP and EXT portions are either zero-extended or sign-extended from the LP.
Long-word operands are written into the HP:LP portions of the register. The EXT portion is zero-extended
or sign-extended, and the limit tag bit (Ln) is cleared.
When a byte operand is to be written to a data register, the register’s first 8-bit portion of the LP
(Dn.1[7:0]) is written with the byte operand. The following eight bits of the LP (Dn.1[15:8]), the high
portion, and the EXT are either zero-extended or sign-extended from the LP lower byte. The limit tag bit
(Ln) is cleared.
2-8SC140 DSP Core Reference Manual
DALU
A special case of the MOVE.L instruction is used for reading from or writing to the EXT portion of a data
register. Six variations of this instruction save (restore) the extension bits and Ln bit of data registers to
(from) memory. One of the variations writes to memory the Ln bit and extension bits of an even and an odd
pair of registers. Another variation reads bits 8:0 from memory to the extension bits and the Ln bit of an
even register. Another variation reads bits 24:16 to the extension bits and the Ln bit of an odd register.
Memory writes are done from the even/odd pair of registers. Memory reads are done to a single register.
An extension saved to memory from an even numbered register must be restored to an even register,
likewise for odd registers.
All move instructions are described in detail in Appendix A, “SC140 DSP Core Instruction Set.”
Table 2-2 summarizes the various types of data bus write access to the data registers.
Note:When an unsigned long operand is written to a data register, Dn.e is zero-extended.
Table 2-3 summarizes the various types of data bus read accesses from the data registers.
Table 2-3. Read from Data Registers
Operand TypeMemory Data Bus.hMemory Data Bus.lLimiting/Scaling
Fractional Word-Dn.hYes/No (Se e N ot e)
Fractional LongDn.hDn.lYes/No (See Note)
Integer Word-Dn.lNo
Integer LongDn.hDn.lNo
Integer Byte-Low byte - Dn.l[7:0]No
2 Extensions - LongEXT word: {7 zero bits, L
.e}
D
n+1
n+1
EXT word: {7 zero bits, Ln,
,
Dn.e}
No
Note:A fractional word or fractional long word can be written to memory with or without limiting and
shifting. See MOVE.F and MOVES.F in Appendix A, “SC140 DSP Core Instruction Set.”
The register file architecture and the 64-bit wide data buses XDBA and XDBB support wide data transfers
between the memory and the data registers. Up to four 16-bit words or two 32-bit long words can be
transferred between the register file and the memory in a single move operation on each data bus, XDBA
or XDBB.
Table 2-4 summarizes the various data widths for data moves from/to the data register file.
SC140 DSP Core Reference Manual2-9
DALU
.
Table 2-4. Data Registers Access Width
Operand TypeData Width (Bits)
Byte8
Word16
Long32
Two word32
Four byte32
Two long word64
Four word64
2.2.1.2 Multiply-Accumulate (MAC) Unit
The MAC unit is the arithmetic part of the ALU containing both a multiplier and an adder. It also performs
other operations such as rounding, saturation, comparisons, and shifting. Inputs to the MAC unit are from
data registers or from immediate data programmed into the instruction. As many as three operands may be
inputs. The destination for MAC instructions is always a data register in the 40-bit form EXT:HP:LP. The
multiplier executes 16 by 16 parallel multiplication of two’s complement data, signed or unsigned,
fractional or integer. The multiplier output can be accumulated with 40-bit data in a destination register. A
detailed description of each multiplication operation is given in Section 2.2.2.3, “Multiplication.” The
adder executes addition and subtraction of two 40-bit operands. All MAC instructions are executed in one
clock cycle.
Table 2-5 lists the arithmetic instructions that are executed in the MAC unit. A more detailed description of
each instruction is given in Appendix A, “SC140 DSP Core Instruction Set.”
Table 2-5. DALU Arithmetic Instructions (MAC)
InstructionDescription
ABSAbsolute value
ADCAdd long with carry
ADDAdd
ADD2Add two words
ADDNC.WAdd without changing the carry bit in the SR
ADRAdd and round
ASLArithmetic shift left by one bit
ASRArithmetic shift right by one bit
CLRClear
CMPEQCompare for equal
CMPGTCompare for greater than
DECEQDecrement a data register and set T (the true bit) if zero
DECGEDecrement a data register and set T if greater than or equal to zero
DIVDivide iteration
DMACSSMultiply signed by signed and accumulate with data register
right-shifted by word size
DMACSUMultiply signed by unsigned and accumulate with data register
right-shifted by word size
IADDNC.W40-bit non-saturating add integers with immediate, no carry update
IMACMultiply-accumulate integers
IMACLHUUMultiply-accumulate unsigned integers:
first source from low portion, second from high portion
IMACUSMultiply-accumulate unsigned integer and signed integer
IMPY.WMultiply integer
DALU
IMPYHLUUMultiply unsigned integer and unsigned integer:
first source from high portion, second from low portion
IMPYSUMultiply signed integer and unsigned integer
IMPYUUMultiply unsigned integer and unsigned integer
INCIncrement a data register
INC.FIncrement a data register (as fractional data)
MACMultiply-accumulate signed fractions
MACRMultiply-accumulate signed fractions and round
MACSUMultiply-accumulate signed fraction and unsigned fraction
MACUSMultiply-accumulate unsigned fraction and signed fraction
MACUUMultiply-accumulate unsigned fraction and unsigned fraction
MAXTransfer maximum signed value
MAX2Transfer two 16-bit maximum signed values
MAX2VITTransfer two 16-bit maximum signed values, update Viterbi flags
MAXMTransfer maximum magnitude value
MINTransfer minimum signed value
MPYMultiply signed fractions
MPYRMultiply signed fractions and round
MPYSUMultiply signed fraction and unsigned fraction
MPYUSMultiply unsigned fraction and signed fraction
MPYUUMultiply unsigned fraction and unsigned fraction
SAT.FSaturate fractional value in data register to fit in high portion
SAT.LSaturate value in data register to fit in 32 bits
SBCSubtract long with carry
SBRSubtract and round
SUBSubtract
SUB2Subtract two words
SUBLShift left and subtract
SUBNC.WSubtract with no carry bit generation
TFRTransfer data register to a data register
TFRFTransfer data register to a data register if T bit is false
TFRTTransfer data register to a data register if T bit is true
TSTEQTest for equal to zero
TSTEQ.L32-bit compare for equal to zero
TSTGETest for greater than or equal to zero
TSTGTTest for greater than zero
2.2.1.3 Bit-Field Unit (BFU)
The BFU is the logic part of the ALU. It contains a 40-bit parallel bidirectional shifter (with a 40-bit input
and a 40-bit output) mask generation unit and logic unit. The BFU is used in the following operations:
•Multi-bit left/right shift (arithmetic or logical)
•One-bit rotate (right or left)
•Bit-field insert and extract
•Count leading bits (ones or zeros)
•Logical operations
•Sign or zero extension operations
Table 2-6 lists the instructions which are executed in the BFU. A more detailed description of each
instruction is given in Appendix A, “SC140 DSP Core Instruction Set.”
2-12SC140 DSP Core Reference Manual
DALU
InstructionDescription
ANDLogical AND
ASLLMulti-bit arithmetic shi ft le ft
ASLWWord arithmetic shift left (16-bit shift)
ASRRMulti-bit arithmetic shift right
ASRWWord arithmetic shift right (16-bit shift)
CLBCount leading bits (ones or zeros)
EORBit-wise exclusive OR
EXTRACTExtract signed bit-field
EXTRACTUExtract unsigned bit-field
INSERTInsert bit-field
LSLL Multi-bit logical shift left
LSRLogical shift right by one bit
LSRR Multi-bit logical shift right
Table 2-6. DALU Logical Instructions (BFU)
LSRWWord logical shift right (16-bit shift)
NOTOne’s complement (inversion)
ORBit-wise inclusive OR
ROLRotate one bit left through the carry bit
RORRotate one bit right through the carry bit
SXT.BSign extend byte
SXT.LSign extend long
SXT.WSign extend word
ZXT.BZero extend byte
ZXT.LZero extend long
ZXT.WZero extend word
2.2.1.4 Data Shifter/Limiter
The data shifters/limiters provide special post-processing on data written from a Dn register to the XDBA
or XDBB buses. There are eight independent shifters/limiters, four for the XDBA bus and four for the
XDBB bus, allowing transfers to memory of up to four words per MOVES instruction with scaling and
limiting. Each consists of a shifter for scaling followed by a limiter. Note that arithmetic saturation from
DALU operations is a different function. Saturation occurs in the DALU before data is written to a
destination register.
SC140 DSP Core Reference Manual2-13
DALU
2.2.1.5 Scaling
The data shifters in the shifter/limiter unit can perform the following data shift operations:
•Scale up—Shift data one bit to the left
•Scale down—Shift data one bit to the right
•No scaling—Pass the data unshifted
The eight shifters permit direct dynamic scaling of fixed-point data without additional program steps. For
example, this permits straightforward block floating-point implementation of Fast Fourier Transforms
(FFTs).
Scaling occurs if programmed in the scaling mode bits S0 and S1 (bits 4 and 5 in the SR). Scaling of
operands only occurs with the MOVES.F, MOVES.2F, MOVES.4F, and MOVES.L instructions, moving
data from a DALU register (or registers) to memory. The data in the register is not changed, only the data
that is transferred. The scaling mode also affects the Ln bit calculation and the rounding function for a set
of DALU instructions. Scaling is disabled when the arithmetic saturation mode is set. See Section 3.1.1,
“Status Register (SR),” and below for further details. An example of scaling is provided in Table 2-7.
Table 2-7. Scaling Example
Instruction
move.w #$0030,r0r0$0000 0030R0 initialized for first memory write
moveu.w #$0200,d0.hd0$0200 0000D0 written
bmset #$10,sr.lsr$0000 0010Scale down set in SR
moves.f d0,(r0)+$0030$0100Memory written with scaled down value
move.l #$00e40020,srsr$00e4 0020Scale up set in SR
moves.f d0,(r0)$0032$0400Memory written with scaled up value
Memory/
Register
New ValueComments
2.2.1.6 Limiting
The limiting capability is enabled only for the MOVES.F, MOVES.2F, MOVES.4F, and MOVES.L
instructions, and not for any other fractional moves such as MOVE.F. These instructions move data from
DALU register(s) to memory. The limiting operation takes place in two steps: first, calculating the Ln bit
when a previous ALU instruction wrote to a register, and second, transferring the data from that register
with a MOVES instruction. The transferred data is limited if the Ln bit is set.
2.2.1.6.1 Calculating the Ln Bit
The Ln bit can be affected by ALU instructions which are capable of using the extension portion of a data
register. The only use of the Ln bit is to set up or prepare for a subsequent MOVES instruction. The Ln bit
is calculated based on the effective extension bits shown in Table 2-8. These are the bits to the left of the
implied decimal point after scaling. If the bits are not all zeros or all ones, the extension is effectively in
use and the Ln bit will be set. The Ln bit is cleared as data is written to a DALU register if the defining bits
below are all zeros or all ones.
2-14SC140 DSP Core Reference Manual
DALU
S1S0Scaling ModeBits Defining the Ln bit Calculation
00No ScalingBits 39, 38..............32, 31
01Scale DownBits 39, 38..............33, 32
10Scale UpBits 39, 38..............31, 30
Table 2-8. Ln Bit Calculation
The Ln bit is calculated (and set or cleared) for the following saturable instructions: ABS, ADC, ADR,
ADD, ADDNC, ASL, ASR, DIV, INC, MAC, MACR, MPY, MPYR, NEG, RND, SBC, SBR, SUB,
SUBL, SUBNC, and TFRx. The Ln bit is cleared if arithmetic saturation mode is set, except for these
instructions: ADC, DIV, SBC, TFR, TFRT, and TFTF. For the latter six, the Ln bit calculation is done,
even if arithmetic saturation mode is set. However, no scaling is considered in the Ln bit calculation if the
arithmetic saturation mode is set, even if a scaling mode bit is set.
The Ln bit is always cleared as a result of the execution of one of the following instructions: CLR,
DECEQ, DECGE, MAX, MAXM, MIN, ADD2, SUB2, MAX2, MAX2VIT, DMACsu, DMACss,
MACsu, MACuu, MACus, MPYsu, MPYuu, MPYus, IADDNC, SAT, all integer multiplication
operations, all BFU operations (as listed in Table 2-6 on page 2-13), and all MOVE instructions except for
the specialized MOVE instruction that restores (pops the stack) the extension and Ln bits from memory. If
the result of these instructions is required to be limited by a following move operation (a TFR Dn), the Dn
instruction should be executed after the original instruction in order to validate the Ln bit before the value
is written to memory using a MOVES.x operation.
2.2.1.6.2 Limiting with the MOVES Instructions
The second stage of limiting occurs with the execution of a MOVES instruction. A limited value is
substituted for the transferred data if the Ln bit of that register was set. The data in the register is not
changed, only the data transferred.
Having four limiters for each bus allows eight operands to be limited independently in the same instruction
cycle. The four data limiters per bus can also be combined to form two 32-bit data limiters per bus for
long-word operands.
If limiting occurs, the data limiter substitutes a limited data value having maximum magnitude (saturated)
and the same sign as the 40-bit source register content:
•$7FFF for 16-bit positive numbers
•$7FFF FFFF for 32-bit positive numbers
•$8000 for 16-bit negative numbers
•$8000 0000 for 32-bit negative numbers
This substitution process is sometimes called transfer saturation. The value in the register is not shifted or
limited, and can be reused by subsequent instructions. If the arithmetic saturation mode is set in the SR,
scaling is not considered in the calculation of the Ln bit. An example of limiting is provided in Table 2-9.
SC140 DSP Core Reference Manual2-15
DALU
Table 2-9. Limiting Example
Instruction
move.w #$0030,r0r0$0000 0020R0 holds the address for the first move to memory
moveu.w #$7fff,d0.hd0$7fff 0000d0.h set with the most positive 2’s complement number
moveu.w #$7fff,d1.hd1$7fff 0000d1.h set with the most positive 2’s complement number
add d0,d1,d3d3$1:00:fffe 0000L3 bit set from overflow
move.f d3,(r0)+$0020$fffeNo limiting from the move instruction
moves.f d3,(r0)$0022$7fffLimiting occurs with the moves instruction
Memory/
Register
New ValueComments
Note that in the unusual case where arithmetic saturation mode is set between a DALU instruction and a
subsequent moves instruction, scaling with the moves instruction is inhibited. However, limiting will occur
if the Ln bit is already set.
2.2.1.7 Scaling and Arithmetic Saturation Mode Interactions
The following table shows the scaling and limiting operations for the four possible cases of scaling/no
scaling with arithmetic saturation mode on/off. Note that the mode of both scaling and arithmetic
saturation selected is not a normal mode of operation for the core. The “Special Six” instructions referred
to in Table 2-10 and Table 2-11 are ADC, DIV, SBC, TFR, TFRT, and TFTF.
Table 2-10. Scaling and Limiting Interactions
Ln Bit Calculation
Scaling
Selected
NoneOffCalculated,
Up/downOffCalculated,
OffOnClearedCalculated,
Up/downOnClearedCalculated,
Arithmetic
Saturation
Mode
Saturable
DALU
Instructions
no scaling
with scaling
Instructions
Note:Limiting will occur if the Ln bit is set.
Special Six
Calculated,
no scaling
Calculated,
with scaling
no scaling
no scaling
Limiting
with MOVES
Other DALU
Instructions
ClearedYesNo
ClearedYesYes
ClearedYesNo
ClearedYesNo
instructions
(see note below)
Scaling with
Instructions
MOVES
2-16SC140 DSP Core Reference Manual
The following table (Table 2-11) shows the arithmetic saturation and rounding operations for the four
possible cases of scaling, no scaling, and arithmetic saturation mode on/off.
Table 2-11. Saturation and Rounding Interactions
DALU
Scaling
Selected
NoneOffNoneNoneRounding with no scaling
Up/downOffNoneNoneRounding with scaling
NoneOnSaturation can occurNoneRounding with no scaling
Up/downOnSaturatio n can occur, no
Arithmetic
Saturation
Mode
Saturable DALU
Instructions
scaling considered
Arithmetic Saturation
Rounding
Special Six Instructions
considered
NoneRounding with no scaling
2.2.2 DALU Arithmetic and Rounding
The following paragraphs describe the DALU data representation, rounding modes, and arithmetic
methods.
2.2.2.1 Data Representation
The SC140 core uses either a fractional or integer two’s complement data representation for all DALU
operations. The main difference between fractional and integer representations is the location of the
decimal (or binary) point. For fractional arithmetic, the decimal (or binary) point is always located
immediately to the right of the most significant bit of the high portion. For integer values, it is always
located immediately to the right of the least significant bit (LSB) of the value. Figure 2-3 shows the
location of the decimal point (binary point) bit weighting and operand alignment for different fractional
and integer representations supported on the SC140 architecture.
SC140 DSP Core Reference Manual2-17
DALU
16-bit word operand
D0.h—D15.h,
16-bit memory
40-bit registers
D0—D15
16-bit word operand
D0.l—D15.l,
16-bit memory
40-bit registers
D0—D15
0
–
2
8
–
2
39
–
2
0
2
Signed Fractional Two’s Complement Representations
31
2
Signed Integer Two’s Complement Representations
–15
2
–152–16
2
2162
Figure 2-3. DALU Data Representations
–31
2
15
–
14
2
2
0
2
.
15
0
2
.
2.2.2.2 Data Formats
Three types of two’s complement data formats are supported by the SC140 core:
•Signed fractional (SF)
•Signed integer (SI)
•Unsigned integer (UI)
The ranges for each of these formats, described below, apply to all data stored in memory as well as data
stored in the data registers. The extension associated with each register allows word growth so that the
most positive fractional number that can be represented in a register is almost 256.0 with the most negative
fractional number being exactly -256.0. When the register extension is in use, the data contained in the
register cannot be stored exactly in memory or in other registers in a single move. In these cases, the
storage error can be minimized by limiting the data to the most positive or most negative number
consistent with the size of the destination, the sign of the register and the MSB of the extension.
2.2.2.2.1 Signed Fractional
In this format, without extension bits 39-32, the N-bit operand is represented using the 1.[N-1] bit format
(1 sign bit, N-1 fractional bits). Signed fractional numbers lie in the following range:
≤ SF ≤+1.0 - 2
-1.0
For words and long-word signed fractions, the most negative number that can be represented is exactly
–1.0, of which the internal representation is $8000 and $8000 0000, respectively. The most positive word
is $7FFF or 1.0–2
-15
, and the most positive long word is $7FFF FFFF or 1.0–2
If the extension bits are in use, the most positive number is 256 – 2
the most negative number is –256, represented by $80 0000 0000.
-[N-1]
-31
.
–31
represented by $7F FFFF FFFF, and
2-18SC140 DSP Core Reference Manual
DALU
2.2.2.2.2 Signed Integer
This format is used when processing data as integers. Using this format, the N-bit operand is represented
using the N.0 bit format (N integer bits). Signed integer numbers lie in the following range:
-2
[N-1]
≤
SI ≤ [2
[N-1]
-1]
For words and long-word signed integers, the most negative word that can be represented is -32768
($8000) and the most negative long word is -2147483648 ($8000 0000). The most positive word is 32767
($7FFF) and the most positive long word is 2147483647 ($7FFF FFFF).
39
If the extension bits are in use, N becomes 40, and the most positive number is 2
39
$7F FFFF FFFF. The most negative number is –2
, represented by $80 0000 0000.
– 1 represented by
2.2.2.2.3 Unsigned Integer
Unsigned integer numbers may be thought of as positive only. The unsigned numbers have nearly twice
the magnitude of a signed number of the same length. Unsigned integer numbers lie in the following range:
≤ UI ≤ [2
0
N
-1]
The binary word is interpreted as having a binary point immediately to the right of the LSB. The most
32
positive 16-bit unsigned integer is 65535 ($FFFF). The most positive 32-bit unsigned integer is 2
-1
($FFFF FFFF). The smallest unsigned number is zero ($0000).
40
If the extension bits are in use, the range is from zero to +2
Table 2-12. Two’s Complement Word Representations
Signed FractionalSigned IntegerUnsigned Integer
$7FFF$7FFF$FFFF
llll$FFFE
llllll
$0001$0001+1ll
$00000$00000ll
$FFFF$FFFFll
llllll
llll$00011
$8000$8000$00000
15–
1.0 2
–2
15–
2
15–
2
–
1.0–
– 1.
15
1–2
1–
15
–
2
16
1–
16
2–
2
SC140 DSP Core Reference Manual2-19
DALU
2.2.2.3 Multiplication
Most of the operations are performed identically in fractional and integer arithmetic. However, the
multiplication operation is not the same for integer and fractional arithmetic. As illustrated in Figure 2-4,
fractional and integer multiplication differ by a 1-bit shift. Any binary multiplication of two N-bit signed
numbers gives a signed result that is 2N-1 bits in length. This 2N-1 bit result must then be correctly placed
into a field of 2N-bits to correctly fit into the on-chip registers. For correct fractional multiplication, an
extra 0-bit is placed at the LSB to give a 2N-bit result. For correct integer multiplication, an extra sign bit
is placed at the MSB to give a 2N-bit result.
Signed Multiplication: N x N --> 2N – 1 Bits
IntegerFractional
SS
X
Signed MultiplierSigned Multiplier
SHPLPSHPLP
S
2N – 1 product
sign extensionzero fill
2N bits
Figure 2-4. Fractional and Integer Multiplication
SS
X
0
2N – 1 product
2N bits
The MPY, MAC, MPYR, and MACR instructions perform fractional multiplication and fractional
multiply-accumulation. The IMPY and the IMAC instructions perform integer multiplication.
2.2.2.4 Division
Fractional division of both positive and signed values is supported using the DIV instruction. The dividend
(numerator) is a 32-bit fraction and the divisor (denominator) is a 16-bit fraction. For a detailed description
of the DIV instruction, see Appendix A, “SC140 DSP Core Instruction Set.”
2.2.2.5 Unsigned Arithmetic
Unsigned arithmetic can be performed on the SC140 core architecture. Most of the unsigned arithmetic
instructions are performed the same as the signed instructions. However, some operations require special
hardware and are implemented as separate instructions.
2.2.2.5.1 Unsigned Multiplication
Unsigned multiplication (MPYUU, MACUU) and mixed unsigned-signed multiplication (MPYSU,
MACSU) are used to support double precision, as described in Section 2.2.2.8, “Multi-Precision
Arithmetic Support.” These instructions can be used for unsigned arithmetic multiplication.
2-20SC140 DSP Core Reference Manual
DALU
2.2.2.5.2 Unsigned Comparison
When performing an unsigned comparison, the condition code computation is different from signed
comparisons. The most significant bit of the unsigned operand has a positive weight, while in signed
representation it has a negative weight. Special instructions are implemented to support unsigned
comparison such as CMPHI (compare greater).
2.2.2.6 Rounding Modes
The SC140 DALU performs rounding of the full register to single precision if requested in the instruct ion.
The high portion of the register is rounded according to the contents of the low portion of the register.
Then the low portion is cleared. The boundary between the low portion and the high portion is determined
by the scaling mode bits (S0 and S1) in the SR. Two types of rounding are implemented, convergent
rounding and two’s complement rounding. The type of rounding is selected by the rounding mo de (RM)
bit in the SR.
Table 2-13 shows the boundary between the high portion and the low portion depending on scaling. The
scaling adjustment is disabled if arithmetic saturation mode is selected.
Table 2-13. Rounding Position in Relation to Scaling Mode
Convergent rounding (also called round-to-nearest even number) is the default rounding mode. It is
selected when the rounding mode (RM) bit in the SR is cleared. The traditional rounding method rounds up
any value greater than one-half, and rounds down any value less than one-half. However, the question
arises as to which way one-half should be rounded. If it is always rounded one way, the results are
eventually biased in that direction. Convergent rounding, however, removes the bias by rounding down if
the high portion is even (LSB = 0) and rounding up if the high portion is odd (LSB = 1).
For no scaling, the higher portion (HP) of the register is bits 39:16; the low portion (LP) is bits 15:0. The
HP is incremented by one bit if the LP was > 1/2, or if the LP = 1/2 and bit 16 was 1 (odd). The HP is left
alone if the LP was <1/2, or if LP = 1/2 and bit 16 was 0 (even). After rounding, the LP is cleared. If
scaling down is selected, the HP is bits 39:17 and the LP is bits 16:0. If scaling up is selected, the HP is bits
39:15 and the LP is bits 14:0.
SC140 DSP Core Reference Manual2-21
DALU
Figure 2-5 shows the four cases for rounding a number in the Dn.h register. If scaling is set in the SR, the
rounding position is updated to reflect the alignment of the result when it is put on the data bus. However,
the contents of the register are not scaled.
Case I: If D0.l < $8000 (1/2), then round down (add nothing)
Before Rounding
0
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 0 0 1 1 X X X . . . . X X X
3932 3116 150
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 0 0 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Case II: If D0.l > $8000 (1/2), then round up (add 1 to D0.h)
Before Rounding
1
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 0 1 1 1 0 X X . . . . X X X
3932 3116 150
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 0 1 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Case III: If D0.l = $8000 (1/2), and the LSB of D0.h= 0, then round down (add nothing)
Before Rounding
0
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 0 1 0 0 0 . . . . . . . . 0 0 0
3932 3116 150
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 0 0 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Case IV: If D0.l = $8000 (1/2), and the LSB of Do.h = 1, then round up (add 1 to D0.h)
Before Rounding
1
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 1 1 0 0 0 . . . . . . . . 0 0 0
3932 3116 150
*D0.l is always clear, performed during RND, MPYR, and MACR.
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 1 0 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Figure 2-5. Convergent Rounding (No Scaling)
2-22SC140 DSP Core Reference Manual
DALU
2.2.2.6.2 Two’s Complement Rounding
When two’s complement rounding is selected by setting the rounding mode (RM) bit in the SR, all values
greater than or equal to one-half are rounded up, and all values less than one-half are rounded down.
Therefore, a small positive bias is introduced.
For no scaling, the higher portion (HP) of the register is bits 39:16; the low portion (LP) is bits 15:0. The
HP is incremented by one bit if the LP was ≥ 1/2. The HP is left alone if the LP was <1/2. After rounding,
the LP is cleared. If scaling down is selected, the HP is bits 39:17 and the LP is bits 16:0. If scaling up is
selected, the HP is bits 39:15 and LP is bits 14:0.
SC140 DSP Core Reference Manual2-23
DALU
Figure 2-6 shows the four cases for rounding a number in the Dn.h register. If scaling is set in the SR, the
rounding position is updated to reflect the alignment of the result when it is transferred to the data bus.
However, the contents of the register are not scaled.
Case I: If D0.l < $8000 (1/2), then round down (add nothing)
Before Rounding
0
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 0 0 1 1 X X X . . . . X X X
3932 3116 150
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 0 0 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Case II: If D0.l > $8000 (1/2), then round up (add 1 to D0.h)
Before Rounding
1
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 0 1 1 1 0 X X . . . . X X X
3932 3116 150
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 0 1 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Case III: If D0.l = $8000 (1/2), and the LSB of D0.h = 0, then round up (add 1 to D0.h)
Before Rounding
1
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 0 1 0 0 0 . . . . . . . . 0 0 0
3932 3116 150
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 0 1 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Case IV: If D0.l = $8000 (1/2), and the LSB of D0.h = 1, then round up (add 1 to D0.h)
Before Rounding
1
D0.eD0.hD0.l
X X . . X X X X X . . . X X X 0 1 0 1 1 0 0 0 . . . . . . . . 0 0 0
3932 3116 150
*D0.l is always cleared, performed during RND, MPYR, and MACR.
After Rounding
D0.eD0.hD0.l*
X X . . X X X X X . . . X X X 0 1 1 0 0 0 0 . . . . . . . . . 0 0 0
3932 3116 150
Figure 2-6. Two’s Complement Rounding (No Scaling)
2-24SC140 DSP Core Reference Manual
DALU
2.2.2.7 Arithmetic Saturation Mode
By setting the arithmetic saturation mode (SM) bit in the SR, the arithmetic unit’s result is limited to 32
bits (high portion and low portion). The dynamic range of the DALU is therefore reduced to 32 bits. The
purpose of this bit is to provide a saturation mode for algorithms that do not recognize or cannot take
advantage of the extension bits.
Arithmetic saturation operates by checking whether bits 39–31 of a relevant DALU instruction result in all
ones or all zeros. If they are not, and if bit 39 is one, the result receives the negative saturation co nstant
$FF 8000 0000. If bit 39 is zero, the result receives the positive saturation constant $00 7FFF FFFF. If
saturation occurs, the DOVF bit in the EMR register is set.
The calculation for saturation is not affected by the scaling mode. In the same way, the rounding of the
saturation constant during execution of MPYR, MACR and RND instructions is independent of the scaling
mode: $00 7FFF FFFF is rounded to $00 7FFF 0000 and $FF 8000 0000 is unchanged.
The instructions that are affected by arithmetic saturation mode are: MAC, MPY, MACR, MPYR, SUB,
ADD, NEG, ABS, RND, INC, ADR, SBR, SUBL, ASR, SUBNC, ADDNC, and ASL.
When the arithmetic saturation mode is set, for most of the instructions, the scaling mode bits are ignored
for the calculation of the Ln bit, and the Ln bit cannot be set. For instructions ADC, DIV, SBC, TFR,
TFRT, and TFRF, however, the arithmetic saturation mode is ignored, and the Ln bit will be calculated.
These six are dependent on arithmetic saturation mode to the extent that scaling is not considered in the Ln
bit calculation if arithmetic saturation mode is on. See Section 2.2.1.7, “Scaling and Arithmetic Saturation
Mode Interactions,” on page 2-16 for more information.
1
The arithmetic saturation mode is always disabled during the execution of the following instructions:
TFR, TFRT, TFRF, MAX, MAXM, MIN, ADD2, SUB2, DIV, SBC, ADC, MAX2, MAX2VIT,
DMACSU, DMACSS, MACSU, MACUS, MACUU, MPYSU, MPYUU, MPYUS, IADDNC, CMPHI,
DECEQ, DECGE all integer multiplication operations, and all BFU operations as described in Table 2-6
on page 2-13. If the result of these instructions should be saturated, a SAT.L Dn instruction must be
executed following the original instruction.
If the arithmetic saturation mode is set and data saturation occurs, the sticky data overflow bit (DOVF) in
the EMR is set to signify that the arithmetic result before saturation cannot be represented in 32 bits. Note
that if arithmetic saturation mode is not set, the DOVF bit is set when overflow from 40 bits occurs.
Table 2-14 provides an example of the arithmetic saturation mode.
Table 2-14. Arithmetic Saturation Example
Instruction
bmset #$0004,sr.lsr$00e4 0004Arithmetic saturation mode set
moveu.w #$7fff,d0.hd0$7fff 0000d0.h set with the most positive 2’s complement number
moveu.w #$7fff,d1.hd1$7fff 0000d1.h set with the most positive 2’s complement number
add d0,d1,d3d3$0:00:7fff ffffMax positive constant loaded in D3. L3 bit not set from
Memory/
Register
emr$0000 0004DALU overflow bit set
New ValueComments
overflow
1. In case of a 40-bit overflow which takes place in conjunction with arithmetic saturation, the constant being chosen is undefined, and it can
be either the negative or positive constant.
SC140 DSP Core Reference Manual2-25
DALU
2.2.2.8 Multi-Precision Arithmetic Support
The SC140 DALU supports multi-precision arithmetic for fractional and integer operations.
2.2.2.8.1 Fractional Multi-Precision Arithmetic
A set of DALU instructions is provided for fractional multi-precision multiplications. When these
instructions are used, the multiplier accepts some combinations of two’s complement signed and unsigned
formats. Table 2-15 lists these instructions.
Table 2-15. Fractional Signed and Unsigned Two’s Complement Multiplication
InstructionDescription
MPYSU/MACSUFractional multiplication and multiply-accumulate with signed × unsigned operands
MPYUS/MACUSFractional multiplication and multiply-accumulate with unsigned × signed operands
MPYUU/MACUUFractional multiplication and multiply-accumulate with unsigned × unsigned ope rands
DMACSSFractional multiplication with signed × signed operands and 16-bit arithmetic right shift
of the accumulator before accumulation
DMACSUFractional multiplication with signed × unsigned operands and 16-bit arithmetic right
shift of the accumulator before accumulation
Figure 2-7 shows how the DMAC instruction is implemented.
Register Shifter
16-bit Operand
>> 16
+
40-bit Accumulate
Figure 2-7. DMAC Implementation
16-bit Operand
Multiply
2-26SC140 DSP Core Reference Manual
DALU
Figure 2-8 illustrates the use of these instructions in the case of a double-precision multiplication of
32-bit x 32-bit operands. The “Unsigned x Unsigned” operation is used to multiply or multiply-accumulate
the unsigned low portion of one double-precision number with the un signed low portion of the other
double-precision number. The “Signed x Unsigned” and “Unsigned x Signed” operations are used to
multiply or multiply-accumulate the signed high portion of one double-precision number with the unsigned
low portion of the other double-precision number. The “Signed x Signed” operation is used to multiply or
multiply-accumulate the two signed high portions of two signed double-precision numbers. The TFRx
instructions in parentheses are optional instructions that are used only in case all 64 bits of the result are
needed. Otherwise, the result is truncated to a 32-bit fraction.
Figure 2-9 illustrates the use of the fractional multiplication and multiply-accumulate instructions in the
case of a mixed double-precision multiplication of 16-bit by 32-bit signed operands. The “Signed x
Unsigned” operation is used to multiply the signed high portion of one single-precision number with the
unsigned low portion of the other double-precision number. The “Signed x Signed” DMAC operation is
used to multiply-accumulate the two signed high portions of the two signed operands. The TFRx
instruction in parentheses is an optional instruction that is used only in case all 48 bits of the result are
needed. Otherwise, the result is truncated to a 32 bit fraction.
A set of DALU operations is provided for integer multi-precision multiplications. When these instructions
are used, the multiplier accepts some combinations of two’s complement signed and unsigned formats.
Both signed and unsigned multi-precision multiplication are supported. Table 2-16 lists these instructions.
Table 2-16. Integer Signed and Unsigned Two’s Complement Multiplication
InstructionDescription
IMPYSU/IMACSUInteger multiplication and multiply-accumulate with signed x unsigned operands
IMPYUUInteger multiplication with unsigned x unsigned operands
IMPYHLUUInteger multiply unsigned x unsigned:
first source from high portion, second from low portion
IMACLHUUInteger multiply-accumulate unsigned x unsigned:
first source from low portion, second from high portion
2-28SC140 DSP Core Reference Manual
DALU
Figure 2-10 illustrates the use of these instructions in the case of a signed integer double-precision
multiplication of 32-bit by 32-bit signed operands. In this example, only a 32-bit result is generated. The
most significant 32 bits are shifted out.The “Unsigned x Unsigned” operation is used to multiply or
multiply-accumulate the unsigned low portion of one double-precision number with the unsigned low
portion of the other double-precision number. The “Signed x Unsigned” and “Unsigned x Signed”
operations are used to multiply or multiply-accumulate the signed high portion of one double-precision
number with the unsigned low portion of the other double-precision number. This example generates only
a 32-bit integer.
32 bits
D0.lD0.h
×
D1.hD1.l
impyuu
impysu
D0,D1,D2
D0,D1,D3
Signed × Unsigned
D0.h
× D1.l
Unsigned × Signed
Unsigned × Unsigned
D1.l
=
× D0.l
+
imacus
aslw
add D2,D3
D0,D1,D3
D3
D1.h
× D0.l
D3.l
32 bits
Figure 2-10. Signed Integer Double-Precision Multiplication
+
0
D3.lD3.h
SC140 DSP Core Reference Manual2-29
DALU
Figure 2-11 illustrates the use of these instructions in the case of an unsigned integer double-precision
multiplication of 32-bit by 32-bit unsigned operands. In this example, only a 32-bit result is generated. The
most significant 32-bits are shifted out. All multiplications are of the “Unsigned x Unsigned” type using
different combinations of high and low portions.
A set of DALU and AGU operations is provided for Viterbi decoding kernels. A special MAX2VIT
operation is defined. This instruction functions as a regular MAX2 instruction and is used to transfer two
16-bit maximum signed values. In addition, the MAX2VIT instruction updates two Viterbi flags (VFs)
which reside in the status register as described in Section 3.1.1, “Status Register (SR),” on page 3-1.
Complementary AGU move operations are provided (VSL instructions). For a full description of the
Viterbi instructions, see Appendix A, “Viterbi Shift Left Move (AGU) VSL,” on page A-422.
2-30SC140 DSP Core Reference Manual
Address Generation Unit
2.3 Address Generation Unit
The AGU is one of the execution units in the SC140 core. The AGU performs effective address
calculations using the integer arithmetic necessary to address data operands in memory. It also contains the
registers used to generate the addresses. The AGU implements four types of arithmetic: linear, modulo,
multiple wrap-around modulo, and reverse-carry. It operates in parallel with other chip resources to
minimize address generation overhead. The AGU also generates change-of-flow program addresses as
well as updates the stack pointer (SP), whenever needed.
2.3.1 AGU Architecture
The major components of the AGU are listed below:
•Eight low bank address registers (R0–R7)
•Eight high bank address registers (R8–R15), or alternatively, eight base address registers (B0–B7)
•Two stack pointers (NSP, ESP), only one of which is active at a time (SP)
•Four offset registers (N0–N3)
•Four modifier registers (M0–M3)
•A modifier control register (MCTL)
•Two address arithmetic units (AAU)
•One bit mask unit (BMU)
In this section, the registers are referred to as:
•Rn for any of the R0–R15 address registers
•Bn for any of the B0–B7 base address registers
•Ni for any of the N0–N3 offset registers
•Mj for any of the M0–M3 modifier registers
All the Rn, Bn, SP, Ni, and Mj registers are referred to as AGU registers. All of the AGU registers are
32-bits.
Figure 2-12 shows a block diagram of the AGU.
SC140 DSP Core Reference Manual2-31
Address Generation Unit
PABXABBXABA
R8/B0
R9/B1
R10/B2
R11/B3
R12/B4
R13/B5
R14/B6
R15/B7
M0
M1
M2
M3
MCTL
N0
N1
N2
N3
Address
Arithmetic
Unit (AAU)
32
3232
R0
R1
R2
R3
R4
R5
R6
R7
NSP
ESP
Bit
Program Counter (PC) Address
Memory Data Bus 1 (XDBA)
Memory Data Bus 2 (XDBB)
64
64
Mask
Unit
(BMU)
Figure 2-12. AGU Block Diagram
All sixteen address registers (R0–R15) as well as the NSP or ESP are used for generating addresses in the
register indirect addressing modes. All four offset registers (N0–N3) can be used by all sixteen address
registers. The four modifier registers (M0–M3) can only be used by the low bank of eight address registers
(R0–R7).
The base address (Bn) registers are uniquely associated with the low bank of Rn registers such that B0 is
used with R0, B1 with R1, and so on.
The BMU is used to perform bit mask operations such as setting, clearing, changing, or testing bits in a
destination according to an immediate mask operand. Data is loaded into the BMU over the data memory
buses XDBA or XDBB. The result is written back over XDBA or XDBB to the destinations in the next
cycle. All bit mask instructions are typically executed in two cycles and work on 16-bit data. This data can
be a memory location or a portion (high or low) of a register. For more information, see Section 2.3.6, “Bit
Mask Instructions.”
2-32SC140 DSP Core Reference Manual
Address Generation Unit
During every instruction cycle, the two AAUs can generate one 32-bit program memory address on the
PAB (in case of change of flow) or two 32-bit data memory addresses (one on each of the XABA and
XABB). Each AAU can generate an address to access a byte, a 16-bit word, a 32-bit long word, or a 64-bit
two-word long operand in memory to feed into the DALU in a single cycle.
Each AAU can update one address register during one instruction cycle. The modifier control register
(MCTL) specifies the type of arithmetic to be used in the address register update calculation. The address
arithmetic instructions provide arithmetic operations for address calculations or for general purpose
calculations.
The two AAUs are identical. Each contains a 32-bit full adder, called an offset adder, which can perform
the following:
•Add or subtract two AGU registers
•Add an immediate value
•Increment or decrement an AGU register
•Add the PC
•Add with reverse-carry
The offset adder can also perform compare or test operations as well as arithmetic and logical shifts. The
offset values added in this adder can be pre-shifted left by 1, 2, or 3 bits according to the access width. In
reverse-carry mode, the carry propagates in the opposite direction.
A second full adder, called a modulo adder, adds the summed result of the first full adder to a modulo
value, M or minus M, where M is stored in the selected modifier register. In modulo mode, a modulo
comparator tests whether the result is inside the buffer by comparing the results to the B register, choosing
the correct result from the offset adder or the modulo adder.
For more information, see Section 2.3.5, “Arithmetic Instructions on Address Registers.”
SC140 DSP Core Reference Manual2-33
Address Generation Unit
2.3.2 AGU Programming Model
The programming model of the AGU is shown in Figure 2-13.
The address registers can be programmed for linear addressing, modulo addressing (regular or multiple
wrap-around), and reverse-carry addressing.
using address register indirect addressing.
Automatic updating of address registers is available when
R0
R1
R2
R3
R4
R5
R6
R7
SP (NSP, ESP)
ADDRESS REGISTERS
R8 / B0
R9 / B1
R10 / B2
R11 / B3
031
N0
N1
N2
N3
M0
M1
M2
M3
MCTL
031
OFFSET, MODIFIER, and MCTL REGISTERS
031
031
R12 / B4
R13 / B5
R14 / B6
R15 / B7
ADDRESS REGISTERS / BASE ADDRESS REGISTERS
Figure 2-13. AGU Programming Model
2-34SC140 DSP Core Reference Manual
Address Generation Unit
2.3.2.1 Address Registers (R0–R15)
The sixteen 32-bit address registers R0–R15 can contain addresses or general-purpose data. These are
32-bit read/write registers. The 32-bit address in a selected address register is used in calculating the
effective address of an operand. The contents of an address register can point directly to data, or can be
used as an index.
The sixteen address registers R0–R15 are composed of two separate banks, a low bank (R0–R7) and a high
bank (R8–R15). The high bank can be used alternatively as a base addre ss regist er bank (B0–B7). Each
address register Rn of the high bank can serve as an address register on condition that the corresponding
register is not used. Both Rn and B
B
n-8
available only if R0 is not being used in modulo addressing since this requires the base address register B0.
Use of both R
and B
n
notations as source and destination of move-like instructions is permitted,
n-8
regardless of the use of the physical register as Base modulo or as a pointer. For example:
MOVE.L #ADDRESS, B0
...
MOVE.W (R8), D0
See Section 2.3.2.6, “Modifier Control Register (MCTL),” for further information. The high bank of
registers can only be used as pointers in the linear mode of addressing since the other modes of addressing
are only encoded for the low bank in the MCTL register.
are mapped to the same physical register. For example, R8 is
n-8
In addition, an address register can be post-updated according to the addressing mode selected. If an
address register is updated, one of the modifier registers (Mj) ca n be used to specify the type of update
arithmetic. Offset registers (Ni) are used for post-incrementing and indexing by offset.
The address register modification can be performed by either of the two AAUs. Most addressing modes
modify the selected address register in a read-modify-write fashion. The address register is read, its
contents are modified by the associated modulo arithmetic unit, and the register is written with the
appropriate output of the AAU. The form of address register modification performed by the address
arithmetic unit is controlled by the contents of the offset and modifier registers described in the following
sections.
2.3.2.2 Stack Pointer Registers (NSP, ESP)
The SC140 core has two stack pointer registers: the normal stack pointer (NSP) and the exception stack
pointer (ESP). These 32-bit registers are used implicitly in all PUSH and POP instructions. Only one stack
pointer is active at one time according to the mode:
•In Normal working mode, the NSP is used.
•In Exception working mode, the ESP is used.
The EXP bit in the status register (SR) determines the active working mode. The active stack pointer (SP)
is used explicitly for memory references when used with the address register indirect modes. The stack
pointers point to the next unoccupied location in the stacks. They are post-incremented on all the implicit
PUSH operations and pre-decremented on all the implicit POP operations.
Note:Both stack pointer registers must be initialized explicitly by the programmer after reset.
SC140 DSP Core Reference Manual2-35
Address Generation Unit
2.3.2.2.1 Shadow Stack Pointer Registers
Both stack pointers have shadow registers which contain a decremented value of the stack pointers. When
the shadow register is not valid, the POP instruction is executed in two cycles. The first cycle is used to
decrement the stack pointer. When the shadow register is valid, the POP instruction is executed in only one
cycle.
When an SP is written by the AAU register transfer (TFRA), its shadow register automatically becomes
invalid. When a PUSH/POP instruction is executed, the shadow register of the active SP becomes valid. As
a result, during consecutive POPs, even in the worst case, only the first POP requires an additional cycle.
2.3.2.2.2 Initializing ESP
ESP should be initialized using the AAU register transfer (TFRA) instruction. This guarantees a valid ESP
value even if execution of this instruction is interrupted by an exception. The TFRA instruction is
considered an address arithmetic operation. The ESP is updated at the address generation pipeline stage,
avoiding pipeline conflicts.
2.3.2.3 Offset Registers (N0–N3)
The four 32-bit read/write offset registers N0–N3 can contain offset values used to increment or decrement
address registers in address register update calculations. These registers can also be used for 32-bit general
purpose storage. For example, the contents of an offset register can specify the offset into a table or the
base of the table for indexed addressing, or can be used to step through a table at a specified rate (for
example, five locations per step for waveform generation). Each address register can be used with each
offset register. For example, R0 can be used with N0, N1, N2, or N3 for offset address calculations. The
signed value in an offset register is pre-shifted to the left by 0, 1, 2, or 3 bits to align to the access width.
2.3.2.4 Base Address Registers (B0–B7)
The eight 32-bit read/write base address registers B0–B7 are used in modulo calculations. Each B register
is associated with an R register (B0 with R0, and so on). When activating the modulo addressing mode, the
B register contains the lower boundary value of the modulo buffer. The upper boundary of the modulo
buffer is calculated by B+M-1, where M is the modifier register associated with the R register by MCTL.
When not used for modulo addressing, these registers can be used as high bank address registers
(R8–R15). Both Rn and Bn
modulo addressing, the base address register B0 can serve as an additional address register R8.
share the same physical register. For example, if R0 is not programmed for
-8
2.3.2.5 Modifier Registers (M0–M3)
The four 32-bit read/write modifier registers M0–M3 can contain the value of the modulus modifier. These
registers can also be used for general-purpose storage. When activating the modulo arithmetic, the contents
of Mj specify the modulus. Each low address register can be used with each modifier register as
programmed in the MCTL register.
2-36SC140 DSP Core Reference Manual
Address Generation Unit
2.3.2.6 Modifier Control Register (MCTL)
The MCTL register is a 32-bit read/write register. This control register is used to program the address
mode (AM) for each of the eight low address registers (R0–R7). The addressing mode of the high address
register file (R8–R15) cannot be programmed and functions in linear addressing mode only. The format of
MCTL is shown in Figure 2-14.
Bit 313029282726252423222120191817Bit 16
R7 AM[3:0]R6 AM[3:0]R5 AM[3:0]R4 AM[3:0]
Bit 151413121110987654321Bit 0
R3 AM[3:0]R2 AM[3:0]R1 AM[3:0]R0 AM[3:0]
Figure 2-14. Modifier Control Register (MCTL) Format
The AM bits (AM3, AM2, AM1, AM0) associated with each address register (R0-R7) reflect the address
modifier mode of this address register as shown in Table 2-17. Each of the Rn registers can use M0, M1,
M2, or M3 as their associated modulo register either in modulo addressing mode, or in multiple
wrap-around modulo addressing mode. When activating the modulo addressing mode , the corresponding
B register is used to define the lower boundary value (B0 with R0, and so on). The linear or the
reverse-carry addressing modes can also be used, freeing the B register to be used as an additional linear
address register.
The high bank of the address register file (R8–R15) can only be used in linear addressing mode. Each Rn
(n = 8:15) is available only if the corresponding B
MCTL is initialized to zero at reset, setting a default linear mode for all Rn registers. All other AM field
combinations are reserved and should not be used.
SC140 DSP Core Reference Manual2-37
Address Generation Unit
2.3.3 Addressing Modes
The SC140 core provides four types of addressing modes:
•Register direct
•Address register indirect
•PC relative
•Special
The addressing modes are related to where the operands are to be found and how the address calculations
are to be made. These modes are described in the following sections:
2.3.3.1 Register Direct Modes
The register direct addressing modes specify that the operand is in one or more of the DALU registers,
AGU registers, or control registers, and are classified as register references.
•Data or Control Register Direct — The operand is in one, two, or four DALU registers as specified
in a portion of the data bus movement field in the instruction. An example is: mac d4,d5,d6,
which uses data registers d4, d5, and d6 as sources for the multiply-accumulate operation. This
addressing mode is also used to specify a control register operand for special instructions.
•Address Register Direct — The operand is in one of the twenty-seven AGU registers (R0–R7,
R8–R15/B0–B7, N0–N3, M0–M3, MCTL, N/ESP) specified by a field in the instruction. An
example is addl1a r0,r1, which performs a 1-bit arithmetic left shift on the data in R0, and adds the
result to the data in R1.
2.3.3.2 Address Register Indirect Modes
The address register indirect modes specify that the address register is used to point to a memory location.
The term indirect is used because the register contents are not the operand itself, but rather the operand
address. These addressing modes specify that an operand is in a memory location and specify the effective
address of that operand. These references are classified as memory references. The term “index” refers to
an offset stored in a register. The term “displacement” refers to an offset from an immediate in the
instruction.
•No Update, (Rn) — The operand address is in the address register. The contents of the address
register are unchanged by executing the instruction. For R0-R7, the contents of the modifier control
register (MCTL) are ignored. An example is: bmclr.w #$004f,(r4). A word is read from
memory location stored in r4, operated on, and written back to the same location. The address in r4
is unchanged.
•Post-increment, (Rn)+ — The operand address is in the address register. After the operand address
is used, it is incremented by the access width (1, 2, 4, or 8 bytes) and stored in the same address
register. The access width is the number of bytes used by the active instruction on the memory data
bus. Incrementing the operand address by the access width places the next available byte address in
the register. The type of arithmetic used for updating R0-R7 is determined by programming the
MCTL register. An example is: move.f (r3)+,d2. The data in the location identified by the
value in r3 is moved to data register d2. Then the value in r3 is incremented by two.
2-38SC140 DSP Core Reference Manual
Address Generation Unit
•Post-decrement, (Rn)- —The operand address is in the address register . After the operand address
is used, it is decremented by the access width (1, 2, 4, or 8 bytes) and stored in the same address
register. The type of arithmetic used for updating R0-R7 is determined by programming the
MCTL register. An example is: move.l (r3)-,d2. In this case, the value in r3 is decremented
by four after the move has taken place.
•Post-increment by Offset Ni, (Rn) + Ni — The operand address is in the address register . After the
operand address is used, it is incremented or decremented by an amount determined by the signed
contents of the Ni register pre-shifted to the left by 0, 1, 2, or 3 bits according to the access width.
The result is stored in the same address register. The type of arithmetic used for updating R0-R7 is
determined by programming the MCTL register. The contents of the Ni register are unchanged. An
example is: move.w d3,(r2)+n3. The access width is two, so the increment is twice the value
in the n3 register.
•Indexed By Offset N0, (Rn + N0) — The operand address is the sum of the contents of the address
register and the signed contents of the N0 register, pre-shifted to the left by 0, 1, 2, or 3 bits according
to the access width. The type of arithmetic used for updating R0-R7 is determined by programming
the MCTL register. The contents of the Rn and N0 registers are unchanged. For example:
move.bd6,(r3+n0). The access width is one, so the contents of the n0 register are used
directly to modify the address before the move is done.
Note that only the N0 offset register can be used in this addressing mode.
•Indexed by Address Register Rm, (Rn + Rm) — The operand address is the sum of the contents
of the address register Rn and the contents of the address register Rm, pre-shifted to the left by 0, 1,
2, or 3 bits according to the access width. The type of arithmetic used for updating R0-R7 is
determined by programming the MCTL register. The contents of the Rn and Rm registers are
unchanged. An example is: move.l (r0+r2),d6. Here, the access width is four, so the value in
r2 is shifted left two bits before adding to the address in r0.
Note that only address registers (R0–R7) can be used as Rm.
•Short Displacement, (Rn + x) — The operand address is the sum of the contents of the address
register Rn and a short displacement x that occupies three bits in the instruction word. The
displacement (unsigned) is first shifted to the left by 0, 1, 2, or 3 bits according to the access width.
It is then zero-extended to 32 bits and added to Rn to obtain the operand address. Thus, the
displacement can range from [0] to [+7] bytes, words, long words, or two long words according to
the access width. The contents of the Rn register are unchanged. The type of arithmetic used for
updating R0-R7 is determined by programming the MCTL register. An example is: move.l d4,(r3+$1c). The access width is four, and the displacement encoded in the instruction is seven
(4 x 7 = 28 = $1c).
•Word Displacement, (Rn + xxxx) — The operand address is the sum of the contents of the address
register Rn and an immediate displacement. The displacement is a signed 15-bit word that requires
a second instruction word. It is sign-extended to 32 bits and then added to Rn to obtain the operand
address. Thus, the displacement can range from [-16,384] to [+16,383] bytes, [-8192] to [+8191]
words, [-4096] to [+4095] long words, or [-2048] to [+2047] two long words according to the access
width. The contents of the Rn register are unchanged. The type of arithmetic used for updating
R0-R7 is determined by programming the MCTL register.
•SP Short Displacement, (SP – xx) — The instruction word contains a 5-bit or 6-bit short unsigned
immediate index field. This field is first shifted to the left by 1 or 2 bits according to the access width,
then zero-extended to form a 32-bit offset and subtracted from the active stack pointer (NSP in
Normal mode, ESP in Exception mode) to obtain the operand address. Thus, the displacement can
range from [0] to [31/63] words or long words according to the access width. The contents of the
SC140 DSP Core Reference Manual2-39
Address Generation Unit
active SP register are unchanged. The type of arithmetic used is always linear. An example is:
move.w #$ffff,(sp–$3e). The encoded displacement is 31,the maximum value of five bits,
and the actual displacement is 62 ($3e), since the access width is two.
•SP Wo rd Displacement, (SP + xxxx)—The operand address is the sum of the contents of the active
stack pointer (SP) and an immediate displacement. The displacement is a signed 15-bit word that
requires a second instruction word. It is sign-extended to 32 bits and added to the active stack pointer
(NSP in Normal mode, ESP in Exception mode) to obtain the operand address. Thus, the
displacement can range from [-16,384] to [+16,383] bytes, [-8192] to [+8191] words, [-4096] to
[+4095] long words, or [-2048] to [+2047] two long words according to the access width. The
contents of the active SP register are unchanged. The type of arithmetic used is always linear. An
example is: move.l(sp+$2000),d2.e. Here, the positive value $2000 is added to the active
stack pointer before the memory access.
2.3.3.3 PC Relative Mode
The PC relative address mode is used to calculate the program destination of change-of-flow instructions
such as branches (BRA). In the PC relative addressing mode, the instruction encoding contains a signed
displacement operand. The operand address is obtained by left-shifting (multiplying by two) the
displacement and adding the result to the value of the program counter (PC). The operand is left-shifted
because the addresses of the program instructions are word-aligned, and memory addressing is in units of
bytes. The arithmetic used is always linear. For example,
_label2 is at location $0020. The encoded displacement will be ($0020 – $0010)/2 = $0008.
bra _label2. Assume that PC=$0010 and that
The number of bits occupied by the displacement in the instruction differs with the different kinds of PC
relative instructions. In all cases, the displacement is first sign-extended to 32 bits, then multiplied by two,
and added to the PC to obtain the operand address.
In the one-word conditional branch instructions, the displacement occupies 8 bits of the instruction word
and can range from [-256] to [254] words. In the one-word unconditional branch instructions, the
displacement occupies 10 bits of the instruction word and can range from [-1024] to [1022] words. In the
two-word branch instructions, the displacement occupies 20 bits and can range from [-1,048,576] to
[1,048,574] words. In the DOSETUP instruction, the displacement occupies 16 bits of the instruction. The
displacement for the start address (SA) can range from [-65,536] to [65,534] words.
2-40SC140 DSP Core Reference Manual
Address Generation Unit
2.3.3.4 Special Addressing Modes
The special addressing modes do not use an address register when specifying an effective address. They
either use an immediate value that is included in the instruction for the data value, such as the data value
address, or they use a register that is implicitly referenced by the instruction for the data value.
•Immediate Short Data — A 5-bit, 6-bit, or 7-bit operand is part of the instruction operation word.
The 5-bit zero-extended operand is used for DALU and AGU arithmetic instructions. The 6-bit
zero-extended operand is used for DALU instructions to move short immediate data to an LCn
register. The 7-bit sign-extended operand is used for immediate moves to a register. This reference
is classified as a program reference. An example is:
loop counter 2.
•Immediate Word Data — This addressing mode requires a one-word instruction extension. The
immediate data is a 16-bit operand. This reference is classified as a program reference. An example
is:
doen2 #$40. The value 64 is loaded to loop counter 2. The value exceeds the 6-bit limit for
immediate short data, so an extra word is needed for the encoding.
•Immediate Long Data — This addressing mode requires a two-word instruction extension. The
immediate data is a 32-bit operand. This reference is classified as a program reference. An example
is: move.l
#$f00d0d01,n0. The 32-bit unsigned value is moved to the general register n0.
•Absolute Word Address — This addressing mode requires a one-word instruction extension. The
operand address occupies 16 bits in the instruction operation words, and is zero-extended to form a
32-bit address. This reference is classified as a memory reference. An example is:
($8a20),d0
.
doen2 #$3f. The value $3f, 63, is loaded to
move.w
•Absolute Long Address — This addressing mode requires a two-word instruction extension. A
32-bit address is contained in the instruction words. This reference is classified as a memory
reference. An example is:
move.w ($34008a20),d0.
•Absolute Jump Address — The operand occupies 32 bits in the instruction operation words. It
requires a two-word instruction extension. This reference is classified as a program reference. An
example is:
jmp lbl4, where the instruction is encoded with the program memory address of lbl4.
•Implicit Reference — Some instructions make implicit reference to the PC, normal or exception
stack, loop registers (SA0, SA1, SA2, SA3, LC0, LC1, LC2, LC3), or status register (SR). These
registers are implied by the instruction, and their use is defined by the individual instruction
descriptions. An example is:
tfra osp,r2, which transfers the 32-bit word stored at the other
(non-active) stack pointer to address register R2.
SC140 DSP Core Reference Manual2-41
Address Generation Unit
2.3.3.5 Memory Access Width
The SC140 core supports variable width access to data memory. With every memory access, the core sends
one of four signals to the memory interface to designate whether the access width is 8 bits, 16 bits, 32 bits,
or 64 bits wide. The access width is determined by the type of MOVE instruction being used. For example,
MOVE.B is used for byte access. MOVE.W is used for word access. For long-word access, MOVE.L,
MOVE.2F, and MOVE.2W are used. And, for two long-word access, MOVE.2L, MOVE.4F, and
MOVE.4W are used.
The memory addresses are always in units of bytes. For example, addresses for two-word MOVE
operations to/from memory are available in multiples of four in order to best align the data with the byte
addressing.
Address calculations and register update calculations are performed according to the memory access width
as shown in Table 2-18.
Table 2-18. Access Width Support for Address and Register Update Calculations
Memory Access Width
Addressing ModeCalculation
ByteWordLongTwo Long
Post-increment (Rn) +
Post-decrement (Rn) -
Post-increment by
Offset (Rn)+Ni
Indexed by Offset N0
(Rn + N0)
Indexed by Address
Register Rm (Rn + Rm)
Short Displacement
(Rn + x)
Word Displacement
(Rn + xxxx)
SP update in Push/PopSP post-increment or
SP Short Displacement
(SP - xx)
SP Word DisplacementActual address displacementxxxxxxxxxxxxxxxx
Rn register post-increment or
post-decrement by —>
Rn register post-increment by ->Ni*1Ni*2Ni*4Ni*8
Actual address offsetN02*N04*N08*N0
Actual address offsetRm2*Rm4*Rm8*Rm
Actual address displacementxxxx
Actual address displacementxxxxxxxxxxxxxxxx
pre-decrement by —>
Actual address displacementNAxxxxNA
1248
8888
2.3.3.6 Memory Access Misalignment
Each access to the memory generated by the core should be aligned according to the access type. If the
alignment rule is violated, erroneous data may be fetched from the memory. In addition, an exception may
be generated to identify that an unaligned access occurred. For more information, see Section 5.8,
“Exception Processing,” on page 5-46.
2-42SC140 DSP Core Reference Manual
Address Generation Unit
Table 2-19 summarizes the memory address alignment rule for each type of memory access.
Table 2-19. Memory Address Alignment
Access TypeAligned Address
Byte access Any address
Word access Multiple of 2
Long-word access Multiple of 4
Two long-word access Multiple of 8
2.3.3.7 Addressing Modes Summary
Table 2-20 provides a summary of the addressing modes described in the previous sections. The Operand
Reference columns are abbreviated as follows:
•S = Software Stack Reference in data memory (uses NSP or ESP according to mode)
•C = Program Control Unit Register Reference
•D = DALU Register Reference
•A = AGU Register Reference
•P = Program Memory Reference
•X = Data Memory Reference
Table 2-20. Addressing Modes Summary
Addressing Modes
Data or Control Register—√√Dn
Address Register (Rn)—√Rn
R0-R7
Uses
MCTL
Register Direct
Operand Reference
Assembler Syntax
SCDAPX
Dn Dm
Dn Dm Di Dj
MCTL
SR, EMR, VBA
LC0, LC1
LC2, LC3
SA0, SA1
SA2, SA3
Address Modifier Register (Mj)—√Mj
Base Address Register (Bn)—√Bn
Address Offset Register (Ni)—√Ni
Stack Pointer—√SP
SC140 DSP Core Reference Manual2-43
Address Generation Unit
Table 2-20. Addressing Modes Summary (Continued)
R0-R7
Addressing Modes
No Update, (Rn)No√(Rn)
Post-increment, (Rn)+Yes√(Rn)+
Post-decrement, (Rn)–Yes√(Rn)–
Post-increment by Offset Ni, (Rn)+NiYes√(Rn) + Ni
Indexed by offset N0, (Rn+N0)Yes√(Rn + N0)
Indexed by Address Register Rm,
(Rn+Rm)
Short Displacement, (Rn+x)
Word Displacement, (Rn+xxxx)
SP Short Displacement, (SP-xx)—√√(SP - xx)
SP Word Displacement, (SP+xxxx)—√√(SP + xxxx)
Uses
MCTL
Address Register Indirect
Yes√(Rn + Rm)
Yes√(Rn + x)
PC Relative
Operand Reference
Assembler Syntax
SCDAPX
(Rn + xxxx)
PC Relative with Displacement—√#xx (8 bits)
#xxx (10 bits)
#xxxx (16 bits)
#xxxxx (20 bits)
Special
Immediate Short Data
Immediate Word Data
Immediate Long Data
Absolute Word Address
Absolute Long Address
Absolute Jump Address—√xxxxxxxx (32 bits)
Implicit Reference—√√√
—√#xx (5, 6, or 7bits)
#xxxx (16 bits)
#xxxxxxxx(32 bits)
—√xxxx (16 bits)
xxxxxxxx (32 bits)
Note:The “—” that appears in the “R0-R7 Uses MCTL” heading means that it is not applicable for that
addressing mode.
2-44SC140 DSP Core Reference Manual
Address Generation Unit
2.3.4 Address Modifier Modes
The AAU supports linear, reverse-carry, modulo, and multiple wrap-around modulo arithmetic types for
address register indirect modes operating on R0-R7. These arithmetic types allow the easy creation of data
structures in memory for First-In/First-Out (FIFO) queues, delay lines, circular buffers, stacks, and
reverse-carry Fast Fourier Transform (FFT) buffers.
Data is manipulated by updating address registers (Rn) used as pointers rather than moving large blocks of
data. The contents of the modifier control register MCTL define the type of arithmetic to be performed for
address calculations. For modulo arithmetic, the address modifier register Mj specifies the modulus. Each
of the address register lower banks (R0–R7) can be used with any of the modifier registers (M0–M3) as
programmed in the MCTL register.
2.3.4.1 Linear Addressing Mode
Linear addressing is useful for general-purpose addressing such as stacks. In linear addressing mode, the
address is calculated using standard binary arithmetic. The entire memory space is addressable. Linear
addressing mode is selected by setting the AM3–0 bits to 0000 in the MCTL register. This is the default
state.
2.3.4.2 Reverse-carry Addressing Mode
Reverse-carry addressing is useful for 2k point FFT addressing. This mode is selected for R0-R7 by setting
the AM3-0 bits to 0001 in the MCTL register. Address modification is performed in the hardware by
propagating the carry from each pair of added bits in the reverse direction (from the MSB end toward the
LSB end). For the +Ni addressing mode, reverse-carry is equivalent to:
•Bit-reversing the contents of Rn (redefining the MSB as the LSB, the next MSB as bit 1, and so on )
•Shifting the offset value in Ni left by 0, 1, 2, or 3 according to the access width
•Bit-reversing the shifted Ni
•Adding normally
•Bit-reversing the result
This address modification is useful for addressing the twiddle factors in 2
to unscramble 2
addressing for FFTs up to 4,294,967,296 points.
Note:To achieve correct reverse-carry accessing for access widths of 2, 4, or 8, the last 1, 2, or 3 least
significant bits (respectively) of the address calculation result are forced to zero.
k
point FFT data. The range of values for Ni is 0 to 232-1, which allows reverse-carry
k
point FFT addressing as well as
2.3.4.3 Modulo Addressing Mode
Modulo address modification is useful for creating circular buffers for FIFO queues, delay lines, and
sample buffers up to 2
31
bytes long.
Modulo addressing is selected by writing the MCTL AM3-0 bits of the MCTL register (as shown in
Table 2-10) as well as writing the desired modulus to the corresponding Mj register. Address modification
is performed in modulo M, where M ranges from 1 to +2
register values to remain within an address range of size M, thus defining a buffer with a lower and an
upper address boundary.
Each base address register (Bn register) is associated with an Rn register (B0 with R0, and so on). Each
SC140 DSP Core Reference Manual2-45
31
-1. Modulo M arithmetic causes the address
Address Generation Unit
register Rn has one Mj register assigned to it by encoding in the MCTL. The lower boundary value of the
buffer resides in the Bn register, and the upper boundary is calculated as Bn+Mj-1. Mj must be smaller
31
than 2
- 1 (Mj < 231 - 1).
The modulo addressing definition, using a base register (Bn) and a modulo register (Mj), enables the
programmer to locate the modulo buffer at any address. The buffer start address is only required to be
aligned to the access width.
The address pointer Rn is not required to start at the lower address boundary, nor to end on the upper
address boundary. Rn can initially point anywhere (aligned to its access width) within the defined modulo
address range, Bn ≤ Rn < B+Mj. Assuming the (Rn)+ indirect addressing mode, if the address register
pointer increments past the upper boundary of the buffer (base address + Mj-1), it wraps around through
the base address (lower boundary). Alternatively, assuming the (Rn)- indirect addressing mode, if the
address decrements past the lower boundary (base address), it wraps around through the base address +
Mj-1 (upper boundary).
The following constraints apply:
1. For proper modulo addressing, if an offset Ni is used in the address calculation, the 32-bit
absolute effective value |Ni| must be less than or equal to Mj, where “effective” means the
programmed Ni is multiplied by the access width. For example, move.w (r0)+n0,d0 translates
to the restriction 2*n0 ≤ Μj, and move.l (r0)+,d0 translates to 4 ≤ Mj. If effective Ni > Mj, the
result of the address calculation is undefined. Multiple wrap-around modulo addressing
supports the situation of effective Ni greater than Mj.
2. Mj must be aligned to the access width used. For example, if the buffer is used with a
MOVE.2L instruction, Mj must be aligned to 8 (be a multiple of 8). If the modulus is less
than the access width, the data accessed as well as the address calculations are undefined.
3. When Bn is used as a base address register, the use of R
as a pointer is illegal since this
n+8
is the same physical register.
Modulo addressing is illustrated in Figure 2-15. Addresses will be kept within the eleven addresses shown.
For the instruction,
move.w (r0+$000e),d0, the access will be made from $26 (38), if the base address
is $20, the modulus is $c, and r0 is $24. The operation is 36+14=50=38 in modulu s 12, base address 32
(50–44 + 32 = 38).
32
36
38
44
Figure 2-15. Modulo Addressing Example
$0020 = B
M = 12
$002c = B + M – 1
2-46SC140 DSP Core Reference Manual
Address Generation Unit
Table 2-21 describes the modulo register values and the corresponding address calculation.
Table 2-21. Modulo Register Values for Modulo Addressing Mode
Multiple wrap-around addressing is useful for decimation, interpolation, and waveform generation. The
multiple wrap-around capability can be used for argument reduction. In multiple wrap-around modulo
addressing mode, the modulus M is a power of 2 in the range of 2
modifier register (Mj). The B registers B0 to B7 are not used for multiple wrap-around modulo addressing;
therefore, their corresponding R8–R15 registers can be used for linear addressing.
The lower and upper boundaries are derived from the contents of Mj. The lower boundary (base address)
k
value has zeros in the k LSBs where M = 2
and therefore must be a multiple of M. The Rn register
involved in the memory access is used to set the MSBs of the base address. The base address is set so that
the initial value in the Rn register is within the lower and upper boundaries. The upper boundary is the
lower boundary plus the modulo size minus one (base address + M–1).
The size of the modulo buffer must be aligned to (be a multiple of) the access width. If the modulus is less
than the access width, the data accessed as well as the address calculations are undefined.
If an offset Ni is used in the address calculations, it is not required to be less than or equal to M for proper
modulo addressing. The multiple wrap-around modulo addressing mode supports unlimited boundary
wraps.
When using the (Rn)+ and (Rn)- addressing modes with a modulus 2
between the multiple wrap-around and normal modulo modes since the address can only be wrapped
around once.
1
to 231. The value M-1 is stored in the
k
≥ 8, there is no functional difference
As an example, consider the instruction
move.w (r0 + $0042),d0. If the mctl is set to $000c, and m0
is set to $000f, then M0 = 16. If r0 is initially $24 (36), the lower boundary is $20 (32) and the upper
boundary is $2f (47). The memory access is done from address $26 (38), calculated by 36 + 66 = 102,
102–48=54, 54–3x16=6, 6+32=38.
SC140 DSP Core Reference Manual2-47
Address Generation Unit
Table 2-22 describes the modulo register Mj values and the corresponding multiple wrap-around address
calculation.
Table 2-22. Modulo Register Values for Wrap-Around Modulo Addressing Mode
2.3.5 Arithmetic Instructions on Address Registers
The SC140 core provides arithmetic instructions on the address registers (R0–R15), offset registers
(N0–N3), the stack pointer (SP), and the program counter (PC).
Address modification modes can affect the arithmetic results stored in R0-R7 using instructions ADDA,
SUBA, ADDL1A, or ADDL2A. In addition, an address calculation that increments or decrements address
register R0-R7 is affected by the modifier mode. When updating R0-R7 in modulo addressing mode, the
modulo registers hold the modulus.
Table 2-23 lists the arithmetic instructions that are executed in the AGU unit. A more detailed description
of the operations is provided in Appendix A, “SC140 DSP Core Instruction Set.”
Table 2-23. AGU Arithmetic Instructions
InstructionDescription
ADDAAGU Add (affected by the modifier mode)
ADDL2AAGU Add with 2-bit left shift of source operand (affected by the
modifier mode)
ADDL1AAGU Add with 1-bit left shift of source operand (affected by the
modifier mode)
ASL2AAGU Arithmetic shift left by 2 bits (32-bit)
ASLAAGU Arithmetic shift left (32-bit)
ASRAAGU Arithmetic shift right (32-bit)
CMPEQAAGU Compare for equal
CMPGTAAGU Compa re for greater than
CMPHIAAGU Compare for higher (unsigned)
DECAAGU Decrement register (affected by the modifier mode)
DECEQAAGU Decrement and set T if result is zero
DECGEAAGU Decrement and set T if result is equal to or greater than zero
INCAAGU Increment register (affected by the modifier mode)
LSRAAGU Logical shift right (32-bit)
SUBAAGU Subtract (affected by the modifier mode)
SXTA.BAGU Sign-extend byte
SXTA.WAGU Sign-extend word
TFRAAGU Register transfer
TSTEQAAGU Test for equal to zero
TSTEQA.WAGU Test for equal to zero on lower 16 bits
TSTGEAAGU Test for greater than or equal to zero
TSTGTAAGU Test for greater than zero
ZXTA.BAGU Zero-extend byte
ZXTA.WAGU Zero-extend word
2.3.6 Bit Mask Instructions
The SC140 core provides bit mask instructions on all address registers (R0–R15), all DALU registers
(D0–D15), all control registers (EMR, VBA, SR, MCTL), and all memory locations.
Bit mask instructions provide an easy way of setting, clearing, inverting, or testing a selected but not
necessarily adjacent group of bits in a register or memory location.
All bit mask instructions work on 16-bit data. This data can be the contents of a memory location or a
portion (high or low) of a register.
Only a single bit mask instruction is allowed in one execution set since only one execution unit exists for
these instructions. A subgroup of the bit mask instructions (BMTSET) supports hardware semaphores. For
more information, see Section 2.3.6.1, “Bit Mask Test and Set (Semaphore Support) Instruction.”
SC140 DSP Core Reference Manual2-49
Address Generation Unit
Table 2-24 lists the arithmetic instructions that are executed in the BMU.
Table 2-24. AGU Bit Mask Instructions (BMU)
InstructionDescription
AND.WLogical AND on a 16-bit operand
BMCHGBit mask change
Inverts every bit in the destination (register or memory) that has the value 1 in the mask.
BMCLRBit mask clear
Clears every bit in the destination (register or memory) that has the value 1 in the mask.
BMSETBit mask set
Sets every bit position in the destination (register or memory) that has the value 1 in the mask.
BMTSETBit mask test (if set) and set
Sets the T bit if every bit that has the value 1 in the mask is 1 in the destination (register or
memory). Sets (writes) every bit in the destination (register or memory) that has the value 1 in
the mask, and sets the T-bit if the set (write) failed. See Section 2.3.6.1, “Bit Mask Test and Set
(Semaphore Support) Instruction.”
BMTSTCBit mask test if clear
Sets the T-bit, if every bit position that has the value 1 in the mask is 0 in an operand.
BMTSTSBit mask test if set
Sets the T bit if every bit position that has the value 1 in the mask is 1 in an operand.
EOR.WLogical exclusive OR on a 16-bit operand
NOT.WBinary inversion of a 16-bit opera nd
OR.WLogical OR on a 16-bit operand
2.3.6.1 Bit Mask Test and Set (Semaphore Support) Instruction
The bit mask test and set instruction (BMTSET) provides support for hardware semaphores. A semaphore
is a signal which can be set to indicate whether a program resource can be accessed or not. The destination
of this instruction can be a register or a memory location in either internal or external memory. If the
semaphore indicates that the resource is available, the T bit has the value 0. If the semaphore indicates that
the resource is not available (T = 1), a jump can be made to skip the resource code.
This instruction performs the following tasks:
1. Reads the destination register, tests the data, and sets the T bit, if every bit that has the value
1 in the mask is 1 in the destination.
2. Writes back to the destination a word with ones for the masked bits, and the original
destination bits for the unmasked bits.
3. Sets the T bit if the set (write) failed.
Normally , the BMTSET consists of three indivisible operations: read, update the T bit, and
write. A set (write) failed condition occurs if the destination failed to be written indivisibly
from the previous read operation of that BMTSET instruction. The memory subsystem
signals the core of a write failure if a memory access that is initiated by another master
source intervenes between the read and the write accesses of the BMTSET operation. As a
result of the non-exclusive write indication, the T bit is set, signalling that the resource may
not be available, thereby avoiding a hazard condition.
2-50SC140 DSP Core Reference Manual
Address Generation Unit
2.3.6.1.1 Example of Normal Usage of the Semaphoring Mechanism
The following sequence accesses a resource controlled by a semaphore.
label : BMTSET.W #mask,(R0)
JT label
Normally, the mask enables only one bit. In this case, the memory destination pointed to by (R0) is read,
and the enabled bit is tested. The enabled bit is then set, and the memory destination is written back.
The T bit is set if the enabled bit was originally 1 (meaning that it was semaphore-occupied), or that the
write-back failed. A T bit value of TRUE indicates to the conditional jump that the attempt to obtain the
resource has failed, and that the jump should be taken. The T bit is cleared if the enabled bit was originally
zero. This means that the semaphore was not allocated. Therefore, the resource was available, and the
instruction was successful in setting the semaphore exclusively. A successful allocation writeback results.
When the destination is a register, the write is always successful.
2.3.6.2 Semaphore Hardware Implementation
During the address phase of the read and write accesses associated with the BMTSET instruction, an
output of the core is asserted. This assertion indicates that the read and the following write are part of a
read-modify-write sequence.
During the data phase of the write access, a core input provides the core with the result of the access
(de-asserted = write failed).
2.3.7 Move Instructions
The SC140 instruction set supports various types of move instructions which differ in the following
properties:
•Access width — Byte (8 bits), word (16 bits), long-word (32 bits), and two long words (64 bits)
•Data type — Signed integer, unsigned integer, fractional (with or without limiting)
•Multi-register moves — Some move operations split data between two or four registers
•Addressing mode — For example, absolute, relative to an address pointer (with various offset and
post-update options), and relative to the stack pointer
The move instructions perform data movement over the XDBA and XDBB buses (for data moves). Move
instructions do not affect the status register with the exception of the sticky scaling bit in reading a DALU
register.
Table 2-25 lists the move instructions. The suffix just before the period in the MOVE nomenclature
indicates the following:
•None = Signed
•U = Unsigned
•S = Scaling and limiting (saturation) enabled
SC140 DSP Core Reference Manual2-51
Address Generation Unit
The suffix just after the period in the MOVE nomenclature indicates the following:
•B = Byte
•W = Integer word (16 bits)
•L = Long word (32 bits)
•F = Fractional word (16 bits)
Either a two or four may modify the last suffix.
Table 2-25. AGU Move Instructions
InstructionDescription
MOVE.2FMove two fractional words from memory to a register pair
MOVE.2LMove two longs to/from a register pair
MOVE.2WMove two integer words to/from memory and a register pair
MOVE.4FMove four fractional words from memory to a re gister quad
MOVE.4WMove four integer words to/from memory and a register quad
MOVE.BM ove byte to/from memory
MOVE.FMove fractional word to/from memory
MOVE.LMove long
MOVE.WMove integer word to/from memory, or immediate to register or
memory
MOVEcConditional move between address registers
MOVES.2FMove two fractional words to memory with scaling and limiting enabled
MOVES.4FMove four fractional words to memory with scaling and limiting enabled
MOVES.FMove fractional word to memory with scaling and limiting enabled
MOVES.LMove long to memory with scaling and limiting enabled
MOVEU.BMove unsigned byte from memory
MOVEU.LMove unsigned long from immediate
MOVEU.WMove unsigned integer word from memory or from immediate
VSL.2FViterbi shift left—specialized move to support Viterbi kernel
VSL.2WViterbi shift left—specialized move to support Viterbi kernel
VSL.4FViterbi shift left—specialized move to support Viterbi kernel
VSL.4WViterbi shift left—specialized move to support Viterbi kernel
2-52SC140 DSP Core Reference Manual
Address Generation Unit
Integer moves from memory (byte, word, long, two long) are right-aligned in the destination register, and
by default are sign-extended to the left. Unsigned moves are marked with “U” (for example, MOVEU.B),
and zero extended in the destination register. A schematic representation of integer moves from memory
into a 40-bit register is shown in Figure 2-16. Moves from registers to memory use the appropriate portion
from the source register. Moves to registers of less than 40 bits behave the same as in Figure 2-16 up to
their bit length.
0398
MOVE.B (signed byte move)
sign extension
MOVEU.B (unsigned byte move)
MOVEU.W (unsigned word move)
MOVE.W (signed word move)
MOVE.L (signed long move)
MOVEU.L (unsigned long move)
MOVE.2W (signed two word move)
MOVE.2L (signed two long move)
sign
extension
zero
extension
sign
extension
sign
extension
zero extension
zero extension
sign extension
32
32
sign extension
sign extension
32
0398
03916
03916
039
039
03916
039
039
MOVE.4W (signed four-word move)
16
sign extension
sign extension
sign extension
sign extension
Figure 2-16. Integer Move Instructions
Fractional moves are supported only to DALU registers. Moves from memory are put in the high portion
of the data register, sign-extended to the extension, and zero-filled in the low portion. MOVE.L and
MOVE.2L may also be considered fractional moves since alignment in the destination register is the same
for integer long moves and fractional long moves. A schematic representation of fractional moves from
memory to 40-bit data registers is shown in Figure 2-17.
SC140 DSP Core Reference Manual2-53
Address Generation Unit
.
MOVE.F (fractional move)
MOVE.2F (fractional double move)
MOVE.4F (fractional quad-move)
sign
extension
sign
extension
sign
extension
sign
extension
sign
extension
sign
extension
sign
extension
32
16
zero-fill
16
zero-fill
zero-fill
16
zero-fill
zero-fill
zero-fill
zero-fill
03932
03932
039
Figure 2-17. Fractional Move Instructions
The four instructions MOVES.F, MOVES.2F, MOVES.4F, and MOVES.L move data from data registers
to the memory with scaling and limiting. The first three operate on 16-bit data. The MOVES.L instruction
performs 32-bit scaling and limiting before the move.
For all moves on the SC140, the syntax requires that the source of the data be specified first followed by
the destination (SRC, DST). The source and destination are separated by a comma with no spaces either
before or after the comma.
Multi-register move instructions originate or update several registers. Registers that are accessed as part of
the same move instruction are specified with a colon separator. For example, a MOVE.4F from a memory
location pointed by R0 to the registers D0, D1, D2, and D3 is written as:
MOVE.4F (R0),D0:D1:D2:D3
In this case, let the address in R0 be noted as A0. The fractional word in location A0 then goes to D0, the
word in A0 + 2 goes to D1, the word in A0 + 4 goes to D2, and the word in A0 + 6 goes to D3. The
addresses increment by two since the addressing unit is always a byte. Moves to or from more than one
register are treated according to the same principle.
A special MOVE.L instruction supports moving data to and from data register extensions (Dn.e). In order
to support full saving and restoring of the machine state, extension moves also include the limit bit Ln of
the register, and are therefore nine bits wide. In one case of the MOVE.L instruction, two extensions
belonging to two consecutive data registers are moved concurrently from the registers to the memory as
part of a 32-bit access.
2-54SC140 DSP Core Reference Manual
Memory Interface
The extension bits of the even data register occupy bits 0 to 8 (bit 8 is the limit bit). The extension bits of
the odd register occupy bits 16 to 24 (bit 24 is the limit bit) as described in Figure 2-18.
08162431
Memory Long Word00
03932
D0
D1
L0
L1
16
extension
+
+
extension
Figure 2-18. Bit Allocation in MOVE.L D0.e:D1.e
Moves from memory to an extension are only to single registers. However, they are also 32-bit wide and
implicitly assume the bit allocation described above according to the register number (odd or even). For
example, move.l $4F42,d3.E is the instruction for moving bits 24:16 from the memory location addressed
by $4F42 to the limit bit and extension bits of the odd register d3. See Appendix A, “Move Long Word
(AGU) MOVE.L,” , for more information about the moves to and from extension registers.
2.4 Memory Interface
The SC140 core interfaces to memory via the following:
•32-bit program memory address bus (PAB) and 128-bit program memory data bus (PDB)
•32-bit data memory address bus A (XABA) and 64-bit data memory data bus A (XDBA)
•32-bit data memory address bus B (XABB) and 64-bit data memory data bus B (XDBB)
•Control signals such as read and write access strobes as well as access width control
The SC140 does not specify a memory subsystem architecture, only the minimum requirements for correct
execution of SC140 code. Listed below are requirements for all memory designs that interface with the
SC140 core.
•The SC140 core supports only unified memory designs. Memory is regarded as a single space. There
is no distinction between program memory locations and data memory locations. Each memory
location possesses a unique address that can be accessible from either the program or data buses.
From the core’s perspective, there is only one memory address “a,” which can hold either data or
program information.
•Data must be byte-addressable and accessible by the two data memory buses.
•All data width accesses used by the SC140 core must be supported by the memory such as byte
(8 bits), word (16 bits), long word (32 bits), or double-long word and four-word (64 bits). One of
four control signals will indicate to the memory which access width is needed for each access.
•Multi-byte memory accesses must support both endian modes.
SC140 DSP Core Reference Manual2-55
Memory Interface
•Memory must resolve access ordering on a cycle by cycle basis. All accesses on a given cycle must
be completed before proceeding to accesses in the next cycle. Note that a conflict acces may occur
when there are multiple requests to access the same memory module, in the same cycle. An access
conflict is resolved by a stall cycle (per conflict), which serializes the multiple request.
•Multiple access rules in a given cycle are as follows:
— Multiple read or write accesses to different memory locations execute without any
predetermined sequence.
— In cases where multiple accesses to the same memory location occur, the access sequence is
program fetch, data read, and data write.
— If two write operations access the same byte in memory in the same cycle, the operation is illegal
and the result is undefined. The same byte may be written by different but overlapping words or
long words. The memory subsystem should be able to detect these cases and issue an imprecise
interrupt to the core. The use of this interrupt is optional. Refer to Section 5.3.3.2, “Implicit
Push/Pop Memory Timing,” on page 5-24 for more details.
•Accesses to non-existent memory locations are illegal and the result is undefined. The memory
subsystem can issue an imprecise interrupt to the core. The use of this interrupt is optional.
2.4.1 SC140 Endian Support
The term “little endian” is defined as a computer architecture such that given a multi-byte operand
representation, bytes at lower addresses have lower numeric significance. Each word is stored little end
first. In little endian mode, the MOVE.W D0,(R0) instruction (for example) stores bits 7–0 of D0 into
address (R0), and bits 15–8 into address (R0 + 1).
In “big endian” architectures, the most significant byte has the lowest address, and each word is stored big
end first. In big endian mode, the MOVE.W D0,(R0) instruction stores bits 15–8 of D0 into address (R0),
and bits 7–0 into address (R0 + 1).
The SC140 supports both big and little endian architectures through the big endian memory (BEM) mode
bit in the EMR. This bit samples a core input signal when exiting the reset state, and cannot be changed
during normal operation.
Figure 2-19 shows an example how data is transferred from a register to memory in the two endian modes.
Little EndianBig Endian
15
87
0
REGISTER
7
7
0
0
A0
MEMORY
A0+1
15
87
0
REGISTER
7
7
0
0
A0
MEMORY
A0+1
Figure 2-19. Endian Example
2.4.1.1 SC140 Bus Structure
The entire memory space of the SC140 core is unified. The memory supports two parallel 64-bit data
accesses and one 128-bit program fetch. All can occur in parallel.
2-56SC140 DSP Core Reference Manual
Memory Interface
The two data buses that connect between the core and the memory are each 64 bits wide. Instructions such
as load to registers and store to memory utilize the bus according to the application requirement. Different
versions of the instructions are used for different bandwidths such that:
•MOVE.B loads or stores bytes (8 bits).
•MOVE.W and MOVE.F load or store integer or fractional words (16 bits).
•MOVE.2W, MOVE.2F, and MOVE.L load or store double-integers, double-fractions, and long
words respectively (32 bits).
•MOVE.4W , MOVE.4F , and MOVE.2L load or store quad-integers, quad-fractions, and double-long
words respectively (64 bits).
Figure 2-20 shows the data busses between the SC140 core and the memory.
SC140 Core
128-bit PDB-bus
64-bit XDBA-bus
64-bit XDBB-bus
Unified
Memory
Space
Figure 2-20. Basic Connection between SC140 Core and Memory
2.4.1.2 Memory Organization
Different types of data are stored differently in memory for each of the two endian modes. However, the
data retains the same meaning. For example, 64 bits of data can be represented by any of the following:
•Eight 8-bit bytes
•Four 16-bit numbers
•Two 32-bit numbers
Figure 2-21 shows how data is organized in memory in the two endian mo des. Each data unit is a byte
made of two hexadecimal numbers.
Little Endian
0f 0e 0d 0c 0b 0a
07 08 05 06 03 04 01 02
cc dd ee ff 11 22 33 44
0
8
16 ($10)
0
8
16 ($10)
Big Endian
01234567
0a 0b 0c 0d 0e 0f
01 02 03 04 05 06 07 08
11 22 33 44 cc dd ee ff
76543210
Figure 2-21. Memory Organization of Big and Little Endian Mode
SC140 DSP Core Reference Manual2-57
Memory Interface
Table 2-26describes the data representation for each 64-bit row inFigure 2-21.
Figure 2-22. Data Transfer in Big and Little Endian Modes
For single-register moves, assuming an equivalent memory map in big and little endian modes, the byte
organization on the buses is identical in both modes. However, the memory subsystem must route the data
bus bytes to different memory addresses for each supported endian mode.
SC140 DSP Core Reference Manual2-59
Memory Interface
2.4.1.4 Multi-Register Moves
For accesses involving more than one register, such as with MOVE.2W or MOVE.4F instructions, the
SC140 ensures that data originating from a specific register reaches the same address in memory in both
little and big endian modes (and the other way round). The memory system does not distinguish between
MOVE.L and MOVE.2W transfers that have the same data width. Memory treats them both like a long
word transfer. If the data bus were the same for both endian modes in a two-register transfer, the data from
the two registers would end up in different addresses. To correct for this, the byte order on the buses for
multi-register transfers is adjusted for the little endian mode. The memory also does not distinguish
between transfers of four words or two long words. It treats them both like a string of eight bytes. The bus
structure for the little endian mode corrects for both cases to ensure that register data is stored at the same
address for both modes.
As an example of the problem that arises if a correction is not made, consider the following case:
The instruction move.2w d0:d1,(a8) transfers two integer words from data registers d0 and d1 to
memory at address a8. For d0 = $0102 and d1 = $0304 , the data bus would be $010 20304, and the memory
would be accessed for a width of 32 bits. For big endian mode, the memory would look like:
AddressData
a801
a902
a1003
a1104
For little endian mode, the memory would be accessed for a width of 32 bits (like a long word), and then it
would write the data little end first such that the memory would look like:
AddressData
a804
a903
a1002
a1101
Note that the data word from d0, $0102, is at a different address for the two modes. If the data bus were
modified by the core to $03040102, then the memory for little endian mode would look like:
AddressData
a802
a901
a1004
a1103
2-60SC140 DSP Core Reference Manual
Memory Interface
This is the desired result. This effect is achieved in little endian mode through logic in the core, which
modifies the data on the data bus to the memory for both reads and writes.
Figure 2-23 shows examples of multi-register data transfers in big and little endian modes.
Figure 2-23. Multi-Register Transfer in Big and Little Endian Modes
Note:The only exceptions to the behavior described above are the VSL instructions. These instructions
cause source data words from the core to be written to different memory locations in big and little
endian modes. For more information about the VSL instructions, refer to Table 2-27 on page 2-64,
and Appendix A, “Viterbi Shift Left Move (AGU) VSL,” on page A-422..
SC140 DSP Core Reference Manual2-61
Memory Interface
2.4.1.5 Instruction Word Transfers
Instruction words are transferred to the core from memory over the program data bus (PDB) to special
instruction registers in the program dispatch unit (PDU).
The instruction registers can be accessed only with aligned access of 128-bit width (8 instruction words).
Figure 2-24 shows the program memory organization in big and little endian modes. Note that program
data consists of a series of 16-bit instructions. In this example the assembler determines the instructions to
be:
word address $
00 instruction $a0b0
word address $02 instruction $c0d0
word address $04 instruction $e0f0
word address $06 instruction $a1b1
word address $08 instruction $c1d1
word address $0a instruction $e1f1
word address $0c instruction $a2b2
word address $0e instruction $c2d2
word address $10 instruction $e2f2
.....
These are to be placed in memory as shown in the following figure.
Big Endian
01234567
a0b0
c1d1
a3b3e2f2
c3d3
a1b1e0f0c0d0
c2d2a2b2e1f1
e3f3
0
8
16 ($10)
24 ($18)
Little Endian
76543210
c2d2
c3d3e3f3
a3b3
a0b0c0d0e0f0a1b1
c1d1e1f1a2b2
e2f2
0
8
16 ($10)
24 ($18)
Figure 2-24. Program Memory Organization in Big and Little Endian Modes
The assembler outputs a byte stream to the loader and therefore corrects for the byte address reversal inside
each 16-bit instruction to achieve the memory results above.
Big Endian Assembler OutputLittle Endian Assembler Output
byte address $00 data $a0byte address $00 data $b0
byte address $01 data $b0byte address $01 data $a0
byte address $02 data $c0byte address $02 data $d0
byte address $03 data $d0byte address $03 data $c0
byte address $04 data $e0byte address $04 data $f0
..........
2-62SC140 DSP Core Reference Manual
Memory Interface
Figure 2-25 shows the memory accesses to the same memory area by both program fetches as well as data
accesses in big and little endian modes.
01234567
a1b1e0f0c0d0a0b0
c1d1
a3b3e2f2
a0b0_c0d0_e0f0_a1b2
xxxx_xxxx_c1d1_e1f1
xxxx_xxxx_xxxx_c1d1
xxxx_xxxx_xxxx_xxc1
64-bit XA-BUS
c2d2a2b2e1f1
c3d3
e3f3
FETCH (always 128 aligned) from address A0
Program Bus Contents (for both Endian cases)
Memory System Changes Big Endian to Little
64-bit XB-BUS
Memory
0
8
16 ($10)
Little EndianBig Endian
76543210
c2d2
a3b3
c3d3e3f3
InstructionsData Bus ContentsData Bus Contents
MOVE.4W from address $00
MOVE.L from address $08
MOVE.W from address $08
MOVE.B from address $08
c2d2_a2b2_e1f1_c1d1_a1b1_e0f0_c0d 0_a0b0
a1b1_e0f0_c0d0_a0b0
xxxx_xxxx_e1f1_c1d1
xxxx_xxxx_xxxx_c1d1
xxxx_xxxx_xxxx_xxd1
64-bit XA-BUS
a0b0c0d0e0f0a1b1
c1d1e1f1a2b2
e2f2
64-bit XB-BUS
0
8
16 ($10)
128-bit P-BUS
SC140 Core
Figure 2-25. Instruction Moves in Big and Little Endian Modes
The Program Bus contents always appear as eight 16-bit little endian packed instructions, the memory
system performing a word (instruction) reversal in the case of big endian (program bus only).
SC140 DSP Core Reference Manual2-63
Memory Interface
l
0)
2.4.1.6 Memory Access Behavior in Big/Little Endian Modes
Table 2-27 shows the representation of the move instructions in big and little endian modes. In the
examples shown in this table, it is assumed that R0 points to address A0. Each alphanumeric A–H
represents one byte. Also, the memory contents may not exactly equal the register contents. For example,
in VSL instructions, the memory word (16 bits) is the register word shifted left by one bit. See Appendix A
for more detailed information.
Table 2-27. Move Instructions in Big and Little Endian Modes
InstructionRegister OperandsBig Endian
MOVE.B
MOVEU.B
Examp
D0 =
MOVE.W
MOVEU.W
Example: MOVE.W D0, (R0)
D0 =
MOVE.2WA0 = A
Example: MOVE.2W D0:D1, (R0)
D0 =
D1 =
MOVE.4WA0 = A
Example: MOVE.4W D0:D1:D2:D3, (R0)
D0 =
D1 =
D2 =
D3 =
e: MOVE.B D0,(R
16
A
AB
CD
16
AB
CD
EF
GH
0398
A
039
B
03916
039
A0 = AA0 = A
A0 = A
A1 = B
A1 = B
A2 = C
A3 = D
A1 = B
A2 = C
A3 = D
A4 = E
A5 = F
A6 = G
A7 = H
Little
Endian
A0 = B
A1 = A
A0 = B
A1 = A
A2 = D
A3 = C
A0 = B
A1 = A
A2 = D
A3 = C
A4 = F
A5 = E
A6 = H
A7 = G
MOVE.L
MOVEU.L
MOVES.L
2-64SC140 DSP Core Reference Manual
Example: MOVE.L D0, (R0)
D0 =
AB
03932
C
D
A0 = A
A1 = B
A2 = C
A3 = D
A0 = D
A1 = C
A2 = B
A3 = A
Memory Interface
Table 2-27. Move Instructions in Big and Little Endian Modes (Continued)
InstructionRegister OperandsBig Endian
MOVE.L
(Extension)
MOVE.2LA0 = A
MOVE.F
MOVES.F
Example: MOVE.L D0.E:D1.E, (A0)
D0 =
D1 =
L0
L1
A
+
+
B
Example: MOVE.2L D0:D1, (R0)
D0 =
D1 =
A
E
B
F
Example: MOVE.F D0, (R0)
D0 =
AB
16
C
GH
D
03932
03932
0393216
A0 = L1
A1 = B1
A2 = L0
A3 = A1
A1 = B
A2 = C
A3 = D
A4 = E
A5 = F
A6 = G
A7 = H
A0 = A
A1 = B
Little
Endian
A0 = A1
A1 = L0
A2 = B1
A3 = L1
A0 = D
A1 = C
A2 = B
A3 = A
A4 = H
A5 = G
A6 = F
A7 = E
A0 = B
A1 = A
MOVE.2F
MOVES.2F
MOVE.4F
MOVES.4F
Example: MOVE.2F D0:D1, (R0)
D0 =
D1 =
A
C
B
D
Example: MOVE.4F D0:D1:D2:D3, (R0)
D0 =
D1 =
D2 =
D3 =
A
C
E
G
B
D
F
H
A0 = A
0393216
0393216
A1 = B
A2 = C
A3 = D
A0 = A
A1 = B
A2 = C
A3 = D
A4 = E
A5 = F
A6 = G
A7 = H
A0 = B
A1 = A
A2 = D
A3 = C
A0 = B
A1 = A
A2 = D
A3 = C
A4 = F
A5 = E
A6 = H
A7 = G
SC140 DSP Core Reference Manual2-65
Memory Interface
N
N
Table 2-27. Move Instructions in Big and Little Endian Modes (Continued)
InstructionRegister OperandsBig Endian
VSL.4WA0 = C
VSL.4FA0 = C
xample: VSL.4W D2:D6:D1:D3, (R0) + N0
D2 =
D6 =
ote 1
ote 2
AB
C
EF
GH
Example: VSL.4F D2:D6:D1:D3, (R0) + N0
D2 =
D6 =
Note 3
Note 4
A
C
E
G
B
D
F
H
03916
D
0393216
A1 = D
A2 = A
A3 = B
A4 = G
A5 = H
A6 = E
A7 = F
A1 = D
A2 = A
A3 = B
A4 = G
A5 = H
A6 = E
A7 = F
Little
Endian
A0 = B
A1 = A
A2 = D
A3 = C
A4 = F
A5 = E
A6 = H
A7 = G
A0 = B
A1 = A
A2 = D
A3 = C
A4 = F
A5 = E
A6 = H
A7 = G
VSL.2WA0 = C
Example: VSL.2W D1:D3, (R0) + N0
03916
Note 1
Note 2
VSL.2FA0 = C
Example: VSL.2F D1:D3, (R0) + N0
Note 3
Note 4
A
C
B
D
A
C
B
D
0393216
Notes:
1. Data selected according to VF0 bit in SR, selects D3.l<<1 if VF0=1, D1.L<<1 if VF0=0
2. Data selected according to VF2 bit in SR, selects D3.l<<1 if VF2=1, D1.L<<1 if VF2=0
3. Data selected according to VF1 bit in SR, selects D3.H<<1 if VF1=1, D1.H<<1 if VF1=0
4. Data selected according to VF3 bit in SR, selects D3.H<<1 if VF3=1, D1.H<<1 if VF3=0
A1 = D
A2 = A
A3 = B
A1 = D
A2 = A
A3 = B
A0 = B
A1 = A
A2 = D
A3 = C
A0 = B
A1 = A
A2 = D
A3 = C
2-66SC140 DSP Core Reference Manual
Memory Interface
Table 2-28 shows the representation of the stack support instructions in big and little endian modes. In the
examples shown in this table, it is assumed that the stack access is to address A0. The stack instructions
treat the register data like a 32-bit long word move.
Table 2-28. Stack Support Instructions in Big and Little Endian Modes
InstructionRegister OperandsBig Endian
Single
POP
POPN
PUSH
PUSHN
Double
POP
POPN
PUSH
PUSHN
Example: PUSH D0
D0 =
A
BCD
Example: PUSH D0 PUSH D1
D0 =
D1 =
A
E
B
F
031
031
C
GH
D
A0 = A
A1 = B
A2 = C
A3 = D
A0 = A
A1 = B
A2 = C
A3 = D
A4 = E
A5 = F
A6 = G
A7 = H
Endian
A0 = D
A1 = C
A2 = B
A3 = A
A0 = D
A1 = C
A2 = B
A3 = A
A4 = H
A5 = G
A6 = F
A7 = E
Table 2-29 shows the representation of the bit mask instructions in big and little endian modes.
Table 2-29. Bit Mask Instructions in Big and Little Endian Modes
InstructionRegister OperandsBig Endian
BMCHG.W
BMCLR.W
BMSET.W
BMTSTS.W
BMTSTC.W
BMTSET.W
NOT.W
AND.W
OR.W
EOR.W
Example: BMSET.W #$1234, (A0)
015
Data =
A
B
Mask = 1234
A0 = A
A1 = B
Endian
A0 = B
A1 = A
Little
Little
SC140 DSP Core Reference Manual2-67
Memory Interface
Table 2-30 shows the representation of the change-of-flow instructions in big and little endian modes. In
this table, it is assumed that the stack access is to address A0. This shows how the contents of the PC and
SR are transferred to/from memory like 32-bit long words.
Table 2-30. Non-Loop Change-of-Flow Instructions in Big and Little Endian Modes
InstructionRegister Operands
BSR
BSRD
JSR
JSRD
RTE
RTED
PC =
SR =
A
E
B
F
C
GH
RTS
RTSD
RTSTK
RTSTKD
PC =
AB
C
Big
Endian
A0 = A
031
D
A1 = B
A2 = C
A3 = D
A4 = E
A5 = F
A6 = G
A7 = H
A0 = A
031
D
A1 = B
A2 = C
A3 = D
Little
Endian
A0 = D
A1 = C
A2 = B
A3 = A
A4 = H
A5 = G
A6 = F
A7 = E
A0 = D
A1 = C
A2 = B
A3 = A
Table 2-31 shows the representation of the control instructions in big and little endian modes. In this table,
it is assumed that the stack access is to address A0.
Table 2-31. Control Instructions in Big and Little Endian Modes
InstructionRegister Operands
Big
Endian
Little
Endian
Interrupt Service
.
TRAP
ILLEGAL
PC =
SR =
031
A
E
B
F
C
D
GH
A1 = B
A2 = C
A3 = D
A4 = E
A5 = F
A6 = G
A7 = H
A0 = A
A0 = D
A1 = C
A2 = B
A3 = A
A4 = H
A5 = G
A6 = F
A7 = E
2-68SC140 DSP Core Reference Manual
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.