Motorola, Inc.
Semiconductor Products Sector
DSP Division
6501 William Cannon Drive, West
Austin, Texas 78735-8598
Page 2
Order this document by DSP56KFAMUM/AD
Motorola reserves the right to make changes without further notice to any products herein to improve reliability, function or design. Motorola does not assume any liability arising out of the application or use of any product or circuit described herein; neither does it convey any license under its
patent rights nor the rights of others. Motorola products are not authorized for use as components
in life support devices or systems intended for surgical implant into the body or intended to support
or sustain life. Buyer agrees to notify Motorola of any such intended end use whereupon Motorola
shall determine availability and suitability of its product or products for the use intended. Motorola
and M are registered trademarks of Motorola, Inc. Motorola, Inc. is an Equal Employment Opportunity /Affirmative Action Employer.
OnCE is a trade mark of Motorola, Inc.
 Motorola Inc., 1994
Page 3
“1 ≤ N ≤
”.
Order this document by
MOTOROLA
SEMICONDUCTOR
DSP56KFAMUM/AD
TECHNICAL DATA
DSP56K Family
Addendum to
24-bit Digital Signal Processor
Family Manual
This document, containing changes, additional features, further explanations, and clarifications, is
a supplement to the original document:
DSP56KFAMUM/ADFamily ManualDSP56K Family
24-bit Digital Signal Processors
Change the following:
TM
Page 11-4, Section 11.2.1 - Delete “4. NeXT
Page A-83, third line - Replace
Page A-104, Under the “Operation:” heading - Replace “
Page A-104, Second sentence after “Description:” heading - Replace “
of D.
” with “
One is added to the LSB of D; i.e. bit 0 of A0 or B0.
“1;leN;le24”
under Mach”.
with
24”
D -1 ⇒ D
” with “
”
D+1 ⇒ D
One is added from the LSB
”.
Page A-130, First symbolic description under the “Operation:” heading - Replace “
“
Page B-11, An inch below the middle of the page - Replace the “
Page B-16, 7
”.
”.
”.
”.
”.
”.
th
instruction from bottom - Replace “
2+mvp oscillator clock cycles
2+mvp oscillator clock cycles
4+mvp oscillator clock cycles
4 oscillator clock cycles
1 program words
cir
” instruction with “
lsl A,n0
” with “
lsl B A,n0
” with “ Timing:
” with “Memory:
If S[n]=0
” with “ Timing:
” with “ Timing:
” with “ Timing:
clr
” with
2+mvp
1+ mv
”.
MOTOROLA INC., 1995
Page 4
MOTOROLA
SEMICONDUCTOR
TECHNICAL DATA
MOTOROLA INC., 1995
Page 5
OnCE
TM
is a trade mark of Motorola, Inc.
Motorola reserves the right to make changes without further notice to any products herein. Motorola makes no warranty, representation or guarantee regarding the suitability of its products for any particular purpose, nor does Motorola assume any liability
arising out of the application or use of any product or circuit, and specifically disclaims any and all liability, including without limitation consequential or incidental damages. “Typical” parameters can and do vary in different applications. All operating parameters, including “Typical”, must be validated for each customer application by customer's technical experts. Motorola does not
convey any license under its patent rights nor the rights of others. Motorola products are not designed, intended, or authorized
for use as components in systems intended for surgical implant into the body, or other applications intended to support or sustain
life, or for any other application in which the failure of the Motorola product could create a situation where personal injury or death
may occur. Should Buyer purchase or use Motorola products for any such unintended or unauthorized application, Buyer shall
indemnify and hold Motorola and its officers, employees, subsidiaries, affiliates, and distributors harmless against all claims,
costs, damages, and expenses, and reasonable attorney fees arising out of, directly or indirectly, any claim of personal injury or
death associated with such unintended or unauthorized use, even if such claim alleges that Motorola was negligent regarding the
design or manufacture of the part.
Motorola and
b are registered trademarks of Motorola, Inc.
Literature Distribution Centers:
USA: Motorola Literature Distribution; P.O. Box 20912; Phoenix, Arizona 85036.
EUROPE: Motorola Ltd.; European Literature Center; 88 Tanners Drive, Blakelands, Milton
The DSP56K Family is Motorola’s series of 24-bit general purpose Digital Signal Processors (DSPs
*
). The family architecture features a central processing module that is
common to the various family members, such as the DSP56002 and the DSP56004.
Note: The DSP56000 and the DSP56001 are not based on the central processing module
architecture and should not be used with this manual. They will continue to be described
in the DSP56000/DSP56001 User’s Manual (DSP56000UM/AD Rev. 2).
This manual describes the DSP56K Family’s central processor and instruction set. It is
intended to be used with a family member’s User’s Manual, such as the DSP56002 User’s
Manual .
The User’s Manual presents the device’s specifics, including pin descriptions, operating
modes, and peripherals. Packaging and timing information can be found in the device’s
Technical Data Sheet.
This chapter introduces general DSP theory and discusses the features and benefits of
the Motorola DSP56K family of 24-bit processors. It also presents a brief description of
each of the sections of the manual.
1.2ORIGIN OF DIGITAL SIGNAL PROCESSING
DSP is the arithmetic processing of real-time signals sampled at regular intervals and digitized. Examples of DSP processing include the following:
•Filtering of signals
•Convolution, which is the mixing of two signals
•Correlation, which is a comparison of two signals
•Rectification, amplification, and/or transformation of a signal
All of these functions have traditionally been performed using analog circuits. Only recent-
ly has semiconductor technology provided the processing power necessary to digitally
perform these and other functions using DSPs.
Figure 1-1 shows a description of analog signal processing. The circuit in the illustration
filters a signal from a sensor using an operational amplifier, and controls an actuator with
the result. Since the ideal filter is impossible to design, the engineer must design the filter
for acceptable response, considering variations in temperature, component aging, power
supply variation, and component accuracy. The resulting circuit typically has low noise immunity, requires adjustments, and is difficult to modify.
*This manual uses the acronym DSP for Digital Signal Processing or Digital Signal Processor, de-
pending on the context.
MOTOROLA DSP56K FAMILY INTRODUCTION1 - 3
Page 19
x(t)
INPUT
FROM
SENSOR
ORIGIN OF DIGITAL SIGNAL PROCESSING
ANALOG FILTER
R
f
C
f
x(t)
R
i
+
-
y(t)
OUTPUT
ACTUATOR
y(t)
TO
GAIN
FREQUENCY
t
yt()
---------
xt()
FREQUENCY CHARACTERISTICS
IDEAL
FILTER
f
c
f
R
f
------
–=
R
i
1
----------------------------- +
1jwR
fCf
Figure 1-1 Analog Signal Processing
The equivalent circuit using a DSP is shown in Figure 1-2. This application requires an
analog-to-digital (A/D) converter and digital-to-analog (D/A) converter in addition to the
DSP. Even with these additional parts, the component count can be lower using a DSP
due to the high integration available with current components.
Processing in this circuit begins by band-limiting the input with an anti-alias filter, eliminating out-of-band signals that can be aliased back into the pass band due to the sampling
process. The signal is then sampled, digitized with an A/D converter, and sent to the DSP.
The filter implemented by the DSP is strictly a matter of software. The DSP can directly
implement any filter that can also be implemented using analog techniques. Also, adaptive filters can be easily implemented using DSP, whereas these filters are extremely
difficult to implement using analog techniques.
The DSP output is processed by a D/A converter and is low-pass filtered to remove the
effects of digitizing. In summary, the advantages of using the DSP include the following:
1- 4DSP56K FAMILY INTRODUCTION
MOTOROLA
Page 20
ORIGIN OF DIGITAL SIGNAL PROCESSING
•Fewer components
•Stable, deterministic performance
•Wide range of applications
•High noise immunity and
•
Self-test can be built in
•
No filter adjustments
•
Filters with much closer tolerances
•
Adaptive filters easily implemented
power-supply rejection
LOW-PASS
ANTIALIASING
FILTER
ANALOG INANALOG OUT
SAMPLER AND
ANALOG-TO-DIGITAL
CONVERTER
A/DD/A
x(n)y(n)y(t)x(t)
A
IDEAL
FILTER
GAIN
DSP OPERATION
FIR FILTER
N
ck() nk–()×
∑
k0=
FINITE IMPULSE
RESPONSE
DIGITAL-TO-ANALOG
CONVERTER
RECONSTRUCTION
LOW-PASS
FILTER
FREQUENCY
A
ANALOG
FILTER
DIGITAL
FILTER
GAIN
FREQUENCY
A
GAIN
FREQUENCY
Figure 1-2 Digital Signal Processing
f
f
c
f
f
c
f
f
c
MOTOROLA DSP56K FAMILY INTRODUCTION1 - 5
Page 21
ORIGIN OF DIGITAL SIGNAL PROCESSING
The DSP56K family is not designed for a particular application but is designed to execute
commonly used DSP benchmarks in a minimum time for a single-multiplier architecture.
For example, a cascaded, 2nd-order, four-coefficient infinite impulse response (IIR) biquad section has four multiplies for each section. For that algorithm, the theoretical
minimum number of operations for a single-multiplier architecture is four per section. Table 1-1 shows a list of benchmarks with the number of instruction cycles a DSP56K chip
uses compared to the number of multiplies the algorithm requires.
Table 1-1 Benchmark Summary in Instruction Cycles
Number of
BenchmarkNumber of Cycles
Algorithm
Multiplies
Real Multiply31
N Real Multiplies2NN
Real Update41
N Real Updates2NN
N Term Real Convolution (FIR)NN
N Term Real * Complex Convolution2NN
Complex Multiply64
N Complex Multiplies4NN
Complex Update74
N Complex Updates4N4N
N Term Complex Convolution (FIR)4N4N
th
- Order Power Series2N2N
N
2nd - Order Real Biquad Filter74
N Cascaded 2
N Radix Two FFT Butterflies6N4N
nd
- Order Biquads4N4N
These benchmarks and others are used independently or in combination to implement
functions whose characteristics are controlled by the coefficients of the benchmarks being
executed. Useful functions using these and other benchmarks include the following:
Scaler, Vector, and Matrix Arithmetic
Transcendental Function Computation
(e.g., Sin(X), Exp(X))
Other Nonlinear Functions
Pseudo-Random-Number Generation
Modulation
Amplitude
Frequency
Phase
Spectral Analysis
Fast Fourier Transform (FFT)
Discrete Fourier Transform (DFT)
Sine/Cosine Transforms
Moving Average (MA) Modeling
Autoregressive (AR) Modeling
ARMA Modeling
Useful applications are based on combining these and other functions. DSP applications
affect almost every area in electronics because any application for analog electronic circuitry can be duplicated using DSP. The advantages in doing so are becoming more
compelling as DSPs become faster and more cost effective.Some typical applications for
DSPs are presented in the following list:
Telecommunication
Tone Generation
Dual-Tone Multifrequency (DTMF)
Subscriber Line Interface
Full-Duplex Speakerphone
Teleconferencing
Voice Mail
Adaptive Differential Pulse Code
Modulation (ADPCM) Transcoder
Medium-Rate Vocoders
Noise Cancelation
Repeaters
Integrated Services Digital Network
Navigation
Oceanography
Automatic Vehicle Location
Search and Tracking
Seismic Processing
Oil Exploration
Geological Exploration
As shown in Figure 1-3, the keys to DSP are as follows:
•The Multiply/Accumulate (MAC) operation
•Fetching operands for the MAC
•Program control to provide versatile operation
•Input/Output to move data in and out of the DSP
MAC is the basic operation used in DSP. The DSP56K family of processors has a dual
Harvard architecture optimized for MAC operations. Figure 1-3 shows how the DSP56K
1- 8DSP56K FAMILY INTRODUCTION
MOTOROLA
Page 24
SUMMARY OF DSP56K FAMILY FEATURES
architecture matches the shape of the MAC operation. The two operands, C() and X(), are
directed to a multiply operation, and the result is summed. This process is built into the
chip by using two separate memories (X and Y) to feed a single-cycle MAC. The entire
process must occur under program control to direct the correct operands to the multiplier
and save the accumulator as needed. Since the two memories and the MAC are independent, the DSP can perform two moves, a multiply and an accumulate, in a single
operation. As a result, many of the benchmarks shown in Table 1-1 can be executed at or
near the theoretical maximum speed for a single-multiplier architecture.
1.3SUMMARY OF DSP56K FAMILY FEATURES
The high throughput of the DSP56K family of processors makes them well suited for communication, high-speed control, numeric processing and computer and audio
applications. The main features that contribute to this high throughput include:
• Speed — Speeds high enough to easily address applications traditionally served by
low-end floating point DSPs.
FIR FILTER
N
ck() nk–()×
A/DD/A
x(n)y(n)y(t)x(t)
∑
k0=
X
∑
X
MEMORY
Y
MEMORY
X
PROGRAM
∑
MOTOROLA DSP56K FAMILY INTRODUCTION1 - 9
MAC
Figure 1-3 DSP Hardware Origins
Page 25
SUMMARY OF DSP56K FAMILY FEATURES
• Precision — The data paths are 24 bits wide, providing 144 dB of dynamic range;
intermediate results held in the 56-bit accumulators can range over 336 dB.
• Parallelism — Each on-chip execution unit (AGU, program control unit, data ALU),
memory, and peripheral operates independently and in parallel with the other units
through a sophisticated bus system. The data ALU, AGU, and program control unit
operate in parallel so that an instruction prefetch, a 24-bit x 24-bit multiplication, a 56bit addition, two data moves, and two address-pointer updates using one of three
types of arithmetic (linear, modulo, or reverse-carry) can be executed in a single
instruction cycle. This parallelism allows a four-coefficient IIR filter section to be
executed in only four cycles, the theoretical minimum for single-multiplier architecture.
At the same time, the two serial controllers can send and receive full-duplex data, and
the host port can send/receive simplex data.
• Flexibility — While many other DSPs need external communications circuitry to
interface with peripheral circuits (such as A/D converters, D/A converters, or host
processors), the DSP56K family provides on-chip serial and parallel interfaces which
can support various configurations of memory and peripheral modules
• Sophisticated Debugging — Motorola’s on-chip emulation technology (OnCE) allows
simple, inexpensive, and speed independent access to the internal registers for
debugging. OnCE tells application programmers exactly what the status is within the
registers, memory locations, buses, and even the last five instructions that were
executed.
• Phase-locked Loop (PLL) Based Clocking — PLL allows the chip to use almost any
available external system clock for full-speed operation while also supplying an output
clock synchronized to a synthesized internal core clock. It improves the synchronous
timing of the processors’ external memory port, eliminating the timing skew common
on other processors.
• Invisible Pipeline — The three-stage instruction pipeline is essentially invisible to the
programmer, allowing straightforward program development in either assembly
language or a high-level language such as a full Kernighan and Ritchie C.
• Instruction Set — The instruction mnemonics are MCU-like, making the transition
from programming microprocessors to programming the chip as easy as possible. The
orthogonal syntax controls the parallel execution units. The hardware DO loop
instruction and the repeat (REP) instruction make writing straight-line code obsolete.
1- 10DSP56K FAMILY INTRODUCTION
MOTOROLA
Page 26
MANUAL ORGANIZATION
DSP56001 Compatibility — All members of the DSP56K family are downward
compatible with the DSP56001, and also have added flexibility, speed, and
functionality.
• Low Power — As a CMOS part, the DSP56000/DSP56001 is inherently very low
power and the STOP and WAIT instructions further reduce power requirements.
1.4MANUAL ORGANIZATION
This manual describes the central processing module of the DSP56K family in detail and
provides practical information to help the user:
•Understand the operation of the DSP56K family
•Design parallel communication links
•Design serial communication links
•Code DSP algorithms
•Code communication routines
•Code data manipulation algorithms
•Locate additional support
•
The following list describes the contents of each section and each appendix:
Section 2 – DSP56K Central Architecture Overview
The DSP56K central architecture consists of the data arithmetic logic unit (ALU), address generation unit (AGU), program control unit, On-Chip Emulation (OnCE)
circuitry, the phase locked loop (PLL) based clock oscillator, and an external memory
port (Port A). This section describes each subsystem and the buses interconnecting
the major components in the DSP56K central processing module.
Section 3 – Data Arithmetic Logic Unit
This section describes in detail the data ALU and its programming model.
Section 4 – Address Generation Unit
This section specifically describes the AGU, its programming model, address indirect
modes, and address modifiers.
Section 5 – Program Control Unit
This section describes in detail the program control unit and its programming model.
Section 6 – Instruction Set Introduction
This section presents a brief description of the syntax, instruction formats, operand/memory references, data organization, addressing modes, and instruction set. A
detailed description of each instruction is given in APPENDIX A - INSTRUCTION SET
DETAILS.
MOTOROLA DSP56K FAMILY INTRODUCTION1 - 11
Page 27
MANUAL ORGANIZATION
Section 7 – Processing States
This section describes the five processing states (normal, exception, reset, wait, and
stop).
Section 8 – Port A
This section describes the external memory port, its control register, and control
signals.
Section 9 – PLL Clock Oscillator
This section describes the PLL and its functions
Section 10 – On-Chip Emulator (OnCE)
This section describes the OnCE circuitry and its functions.
Section 11 – Additional Support
This section presents a brief description of current support products and services and
information on where to obtain them.
Appendix A – Instruction Set Details
A detailed description of each DSP56K family instruction, its use, and its affect on the
processor are presented.
Appendix B – Benchmarks
DSP5K family benchmark results are listed in this appendix.
1- 12DSP56K FAMILY INTRODUCTIONMOTOROLA
Page 28
SECTION 2
DSP56K CENTRAL ARCHITECTURE
OVERVIEW
MOTOROLA DSP56K CENTRAL ARCHITECTURE OVERVIEW2 - 1
Page 29
SECTION CONTENTS
SECTION 2.1 DSP56K CENTRAL ARCHITECTURE OVERVIEW ..................3
SECTION 2.2 DATA BUSES .............................................................................3
SECTION 2.9 PHASE-LOCKED LOOP (PLL) BASED CLOCKING ..................6
2 - 2DSP56K CENTRAL ARCHITECTURE OVERVIEW
MOTOROLA
Page 30
DSP56K CENTRAL ARCHITECTURE OVERVIEW
2.1DSP56K CENTRAL ARCHITECTURE OVERVIEW
The DSP56K family of processors is built on a standard central processing module. In the
expansion area around the central processing module, the chip can support various configurations of memory and peripheral modules which may change from family member to
family member. This section introduces the architecture and the major components of the
central processing module.
The central components are:
•Data Buses
•Address Buses
•Data Arithmetic Logic Unit (data ALU)
•Address Generation Unit (AGU)
•Program Control Unit (PCU)
•Memory Expansion (Port A)
•On-Chip Emulator (OnCE™) circuitry
•Phase-locked Loop (PLL) based clock circuitry
Figure 2-1 shows a block diagram of a typical DSP56K family processor, including the
central processing module and a nonspecific expansion area for memory and peripherals.
The following paragraphs give brief descriptions of each of the central components. Each
of the components is explained in detail in subsequent chapters.
2.2DATA BUSES
The DSP56K central processing module is organized around the registers of three independent execution units: the PCU, the AGU, and the data ALU. Data movement between
the execution units occurs over four bidirectional 24-bit buses: the X data bus (XDB), the
Y data bus (YDB), the program data bus (PDB), and the global data bus (GDB). (Certain
instructions treat the X and Y data buses as one 48-bit data bus by concatenating them.)
Data transfers between the data ALU and the X data memory or Y data memory occur
over XDB and YDB, respectively. XDB and YDB are kept local on the chip to maximize
speed and minimize power dissipation. All other data transfers, such as I/O transfers with
peripherals, occur over the GDB. Instruction word prefetches occur in parallel over the
PDB.
The bus structure supports general register-to-register, register-to-memory, and memoryto-register data movement. It can transfer up to two 24-bit words and one 56-bit word in
the same instruction cycle. Transfers between buses occur in the internal bus switch.
MOTOROLA DSP56K CENTRAL ARCHITECTURE OVERVIEW2 - 3
Page 31
PERIPHERAL
PINS
24-Bit 56K
Module
PERIPHERAL
MODULES
ADDRESS
GENERATION
UNIT
ADDRESS BUSES
PROGRAM
RAM/ROM
EXPANSION
X MEMORY
RAM/ROM
EXPANSION
YAB
XAB
PAB
Y MEMORY
RAM/ROM
EXPANSION
EXPANSION
AREA
EXTERNAL
ADDRESS
BUS
SWITCH
ADDRESS
INTERNAL
DATA
BUS
SWITCH
PLL
CLOCK
GENERATOR
PROGRAM
INTERRUPT
CONTROLLER
PROGRAM
DECODE
CONTROLLER
Program Control Unit
MODC/NMI
MODB/IRQB
MODA/IRQA
RESET
PROGRAM
ADDRESS
GENERA TOR
YDB
XDB
PDB
GDB
DATA ALU
24X24+56→56-BIT MAC
TWO 56-BIT ACCUMULATORS
BUS
CONTROL
EXTERNAL
DATA BUS
SWITCH
OnCE™
16 BITS
24 BITS
PORT A
CONTROL
DATA
Figure 2-1 DSP56K Block Diagram
2.3ADDRESS BUSES
Addresses are specified for internal X data memory and Y data memory on two unidirectional 16-bit buses — X address bus (XAB) and Y address bus (YAB). Program memory
addresses are specified on the bidirectional program address bus (PAB). External mem-
2- 4DSP56K CENTRAL ARCHITECTURE OVERVIEW
MOTOROLA
Page 32
DATA ALU
ory spaces are addressed over a single 16-bit unidirectional address bus driven by a
three-input multiplexer that can select the XAB, the YAB, or the PAB. Only one external
memory access can be made in an instruction cycle. There is no speed penalty if only one
external memory space is accessed in an instruction cycle. However, if two or three external memory spaces are accessed in a single instruction, there will be a one or two
instruction cycle execution delay, respectively.
A bus arbitrator controls external access.
2.3.1Internal Bus Switch
Transfers between buses occur in the internal bus switch. The internal bus switch, which
is similar to a switch matrix, can connect any two internal buses without adding any pipeline delays. This flexibility simplifies programming.
2.3.2Bit Manipulation Unit
The bit manipulation unit is physically located in the internal bus switch block because the
internal data bus switch can access each memory space. The bit manipulation unit performs bit manipulation operations on memory locations, address registers, control
registers, and data registers over the XDB, YDB, and GDB.
2.4DATA ALU
The data ALU performs all of the arithmetic and logical operations on data operands. It
consists of four 24-bit input registers, two 48-bit accumulator registers, two 8-bit accumulator extension registers, an accumulator shifter, two data bus shifter/limiter circuits, and
a parallel, single-cycle, nonpipelined Multiply-Accumulator (MAC) unit.
2.5ADDRESS GENERATION UNIT
The AGU performs all of the address storage and address calculations necessary to indirectly address data operands in memory. It operates in parallel with other chip resources
to minimize address generation overhead. The AGU has two identical address arithmetic
units that can generate two 16-bit addresses every instruction cycle. Each of the arithmetic units can perform three types of arithmetic: linear, modulo, and reverse-carry.
2.6PROGRAM CONTROL UNIT
The program control unit performs instruction prefetch, instruction decoding, hardware
DO loop control, and interrupt (or exception) processing. It consists of three components:
the program address generator, the program decode controller, and the program interrupt
controller. It contains a 15-level by 32-bit system stack memory and the following six di-
MOTOROLA DSP56K CENTRAL ARCHITECTURE OVERVIEW2 - 5
Page 33
MEMORY EXPANSION PORT (PORT A)
rectly addressable registers: the program counter (PC), loop address (LA), loop counter
(LC), status register (SR), operating mode register (OMR), and stack pointer (SP). The
16-bit PC can address 65,536 locations in program memory space.
There are four mode and interrupt control pins that provide input to the program interrupt
controller. The Mode Select A/External Interrupt Request A(MODA/IRQA
lect B/External Interrupt Request B (MODB/IRQB
) pins select the chip operating mode
) and Mode Se-
and receive interrupt requests from external sources.
The Mode Select C/Non-Maskable Interrupt (MODC/NMI
) pin provides further operating
mode options and non-maskable interrupt input.
The RESET pin resets the chip. When it is asserted, it initializes the chip and places it in
the reset state. When it is deasserted, the chip assumes the operating mode indicated by
the MODA, MODB, and MODC pins.
2.7MEMORY EXPANSION PORT (PORT A)
Port A synchronously interfaces with a wide variety of memory and peripheral devices
over a common 24-bit data bus. These devices include high-speed static RAMs, slower
memory devices, and other DSPs and MPUs in master/slave configurations. This variety
is possible because the expansion bus timing is programmable and can be tailored to
match the speed requirements of the different memory spaces. Not all DSP56K family
members feature a memory expansion port. See the individual device’s User’s Manual to
determine if a particular chip includes this feature.
2.8ON-CHIP EMULATOR (OnCE)
DSP56K on-chip emulation (OnCE) circuitry allows the user to interact with the DSP56K
and its peripherals non-intrusively to examine registers, memory, or on-chip peripherals.
It provides simple, inexpensive, and speed independent access to the internal registers
for sophisticated debugging and economical system development.
Dedicated OnCE pins allow the user to insert the DSP into its target system and retain
debug control without sacrificing other user accessible on-chip resources. The design
eliminates the costly cabling and the access to processor pins required by traditional emulator systems.
2.9PHASE-LOCKED LOOP (PLL) BASED CLOCKING
The PLL allows the DSP to use almost any available external system clock for full-speed
operation, while also supplying an output clock synchronized to a synthesized internal
clock. The PLL performs frequency multiplication, skew elimination, and low-power
division.
2- 6DSP56K CENTRAL ARCHITECTURE OVERVIEW
MOTOROLA
Page 34
SECTION 3
DATA ARITHMETIC LOGIC UNIT
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 1
Page 35
SECTION CONTENTS
SECTION 3.1 DATA ARITHMETIC LOGIC UNIT .............................................3
SECTION 3.2 OVERVIEW AND DATA ALU ARCHITECTURE .......................3
3.2.1 Data ALU Input Registers (X1, X0, Y1, Y0) ........................................5
3.2.2 MAC and Logic Unit ............................................................................6
3.2.3 Data ALU A and B Accumulators ........................................................7
SECTION 3.5 DATA ALU PROGRAMMING MODEL .......................................19
SECTION 3.6 DATA ALU SUMMARY ..............................................................19
3 - 2DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 36
DATA ARITHMETIC LOGIC UNIT
3.1DATA ARITHMETIC LOGIC UNIT
This section describes the operation of the Data ALU registers and hardware. It discusses data representation, rounding, and saturation arithmetic used within the Data
ALU, and concludes with a discussion of the programming model.
3.2OVERVIEW AND DATA ALU ARCHITECTURE
As described in Section 2, The DSP56K family central processing module is composed
of three execution units that operate in parallel. They are the Data ALU, address generation unit (AGU), and the program control unit (PCU) (see Figure 3-1). These three units
are register oriented rather than bus oriented and interface over the system buses with
memory and memory-mapped I/O devices.
The Data ALU (see Figure 3-2) is the first of these execution units to be presented. It balances speed with the capability to process signals that have a wide dynamic range and
performs all arithmetic and logical operations on data operands.
The Data ALU registers may be read or written over the XDB and the YDB as 24- or 48bit operands. The source operands for the Data ALU, which may be 24, 48, or 56 bits,
always originate from Data ALU registers. The results of all Data ALU operations are
stored in an accumulator.
The 24-bit data words provide 144 dB of dynamic range. This range is sufficient for most
real-world applications since the majority of data converters are 16 bits or less – and certainly not greater than 24 bits. The 56-bit accumulator inside the Data ALU provides 336
dB of internal dynamic range so that no loss of precision will occur due to intermediate
processing. Special circuitry handles data overflows and roundoff errors.
The Data ALU can perform any of the following operations in a single instruction cycle:
multiplication, multiply-accumulate with positive or negative accumulation, convergent
rounding, multiply-accumulate with positive or negative accumulation and convergent
rounding, addition, subtraction, a divide iteration, a normalization iteration, shifting, and
logical operations.
The components of the Data ALU are:
• Four 24-bit input registers
• A parallel, single-cycle, nonpipelined multiply-accumulator/logic unit (MAC)
• Two 48-bit accumulator registers
• Two 8-bit accumulator extension registers
• An accumulator shifter
• Two data bus shifter/limiter circuits
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 3
Page 37
PERIPHERAL
PINS
24 Bit 56K
Module
OVERVIEW AND DATA ALU ARCHITECTURE
PERIPHERAL
MODULES
ADDRESS
GENERATION
UNIT
PROGRAM
RAM/ROM
EXPANSION
YAB
XAB
PAB
X MEMORY
RAM/ROM
EXPANSION
Y MEMORY
RAM/ROM
EXPANSION
EXPANSION
AREA
EXTERNAL
ADDRESS
BUS
SWITCH
ADDRESS
INTERNAL
DATA
BUS
SWITCH
PLL
CLOCK
GENERATOR
PROGRAM
INTERRUPT
CONTROLLER
PROGRAM
DECODE
CONTROLLER
Program Control Unit
MODC/NMI
MODB/IRQB
MODA/IRQA
RESET
PROGRAM
ADDRESS
GENERA TOR
YDB
XDB
PDB
GDB
DATA ALU
24X24+56→56-BIT MAC
TWO 56-BIT ACCUMULATORS
BUS
CONTROL
EXTERNAL
DATA BUS
SWITCH
OnCE™
16 BITS
24 BITS
PORT A
CONTROL
DATA
The following paragraphs describe each of these components and provide a description
of data representation, rounding, and saturation arithmetic.
3 - 4DATA ARITHMETIC LOGIC UNIT
Figure 3-1 DSP56K Block Diagram
MOTOROLA
Page 38
OVERVIEW AND DATA ALU ARCHITECTURE
3.2.1 Data ALU Input Registers (X1, X0, Y1, Y0)
X1, X0, Y1, and Y0 are four 24-bit, general-purpose data registers. They can be treated
as four independent, 24-bit registers or as two 48-bit registers called X and Y, developed
by concatenating X1:X0 and Y1:Y0, respectively. X1 is the most significant word in X and
Y1 is the most significant word in Y. The registers serve as input buffer registers between
the XDB or YDB and the MAC unit. They act as Data ALU source operands and allow
new operands to be loaded for the next instruction while the current instruction uses the
X DATA BUS
Y DATA BUS
2424
X0
X1
Y0
Y1
56
SHIFTER
2424
MULTIPLIER
ACCUMULATOR,
ROUNDING,
AND LOGIC UNIT
56
A (56)
B (56)
5656
SHIFTER/LIMITER
56
24
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 5
24
Figure 3-2 Data ALU
Page 39
OVERVIEW AND DATA ALU ARCHITECTURE
register contents. The registers may also be read back out to the appropriate data bus to
implement memory-delay operations and save/restore operations for interrupt service
routines.
3.2.2 MAC and Logic Unit
The MAC and logic unit shown in Figure 3-3 conduct the main arithmetic processing and
perform all calculations on data operands in the DSP.
For arithmetic instructions, the unit accepts up to three input operands and outputs one
56-bit result in the following form: extension:most significant product:least significant
product (EXT:MSP:LSP). The operation of the MAC unit occurs independently and in parallel with XDB and YDB activity, and its registers facilitate buffering for Data ALU inputs
and outputs. Latches on the MAC unit input permit writing an input register which is the
source for a Data ALU operation in the same instruction.
The arithmetic unit contains a multiplier and two accumulators. The input to the multiplier
can only come from the X or Y registers (X1, X0, Y1, Y0). The multiplier executes 24-bit
x 24-bit, parallel, twos-complement fractional multiplies. The 48-bit product is right justified and added to the 56-bit contents of either the A or B accumulator. The 56-bit sum is
stored back in the same accumulator (see Figure 3-3). An 8-bit adder, which acts as an
extension accumulator for the MAC array, accommodates overflow of up to 256 and allows the two 56-bit accumulators to be added to and subtracted from each other. The
extension adder output is the EXT portion of the MAC unit output. This multiply/accumulate operation is not pipelined, but is a single-cycle operation. If the instruction specifies a
multiply without accumulation (MPY), the MAC clears the accumulator and then adds the
contents to the product.
In summary, the results of all arithmetic instructions are valid (sign-extended and zerofilled) 56-bit operands in the form of EXT:MSP:LSP (A2:A1:A0 or B2:B1:B0). When a 56bit result is to be stored as a 24-bit operand, the LSP can be simply truncated, or it can be
rounded (using convergent rounding) into the MSP.
Convergent rounding (round-to-nearest) is performed when the instruction (for example,
the signed multiply-accumulate and round (MACR) instruction) specifies adding the multiplier’s product to the contents of the accumulator. The scaling mode bits in the status
register specify which bit in the accumulator shall be rounded.
The logic unit performs the logical operations AND, OR, EOR, and NOT on Data ALU registers. It is 24 bits wide and operates on data in the MSP portion of the accumulator. The
LSP and EXT portions of the accumulator are not affected.
3 - 6DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 40
OVERVIEW AND DATA ALU ARCHITECTURE
24 BITS
48 BITS
56 BITS
X0,X1,
Y0, OR Y1
24-BITx24-BIT
FRACTIONAL
MULTIPLIER
S
H
I
F
T
E
R
CONVERGENT - ROUNDING
FORCING FUNCTION
X0,X1,
Y0, OR Y1
+
–
56 - BIT
ARITHMETIC AND
LOGIC UNIT
SCALING
MODE BITS
R
X0,X1,
Y0, OR Y1
24
CONDITION
CODE GENERATOR
ACCUMULATOR AACCUMULATOR B
Figure 3-3 MAC Unit
3.2.3 Data ALU A and B Accumulators
The Data ALU features two general-purpose, 56-bit accumulators, A and B. Each consists of three concatenated registers (A2:A1:A0 and B2:B1:B0, respectively). The 8-bit
sign extension (EXT) is stored in A2 or B2 and is used when more than 48-bit accuracy is
needed; the 24-bit most significant product (MSP) is stored in A1 or B1; the 24-bit least
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 7
Page 41
OVERVIEW AND DATA ALU ARCHITECTURE
DATA ALU ACCUMULATOR REGISTERS
Accumulator A
A2A1A0
*
70230230
Accumulator B
550550
B2
*
70230230
B1B0
EXTMSPLSP
*Read as sign extension bits, written as don’t care.
EXTMSPLSP
Figure 3-4 DATA ALU Accumulator Registers
significant product (LSP) is stored in A0 or B0 as shown in Figure 3-4.
Overflow occurs when a source operand requires more bits for accurate representation
than are available in the destination. The 8-bit extension registers offer protection
against overflow. In the DSP56K chip family, the extreme values that a word operand
can assume are - 1 and + 0.9999998. If the sum of two numbers is less than - 1 or
greater than + 0.9999998, the result (which cannot be represented in a 24 bit word operand) has underflowed or overflowed. The 8-bit extension registers can accurately represent the result of 255 overflows or 255 underflows. Whenever the accumulator extension
registers are in use, the V bit in the status register is set.
Automatic sign extension occurs when the 56-bit accumulator is written with a smaller
operand of 48 or 24 bits. A 24-bit operand is written to the MSP (A1 or B1) portion of the
accumulator, the LSP (A0 or B0) portion is zero filled, and the EXT (A2 or B2) portion is
sign extended from MSP. A 48-bit operand is written into the MSP:LSP portion (A1:A0 or
B1:B0) of the accumulator, and the EXT portion is sign extended from MSP. No sign
extension occurs if an individual 24-bit register is written (A1, A0, B1, or B0).When either
A or B is read, it may be optionally scaled one bit left or one bit right for block floatingpoint arithmetic. Sign extension can also occur when writing A or B from the XDB and/or
YDB or with the results of certain Data ALU operations (such as the transfer conditionally
(Tcc) or transfer Data ALU register (TFR) instructions).
Overflow protection occurs when the contents of A or B are transferred over the XDB and
YDB by substituting a limiting constant for the data. Limiting does not affect the content
of A or B – only the value transferred over the XDB or YDB is limited. This overflow protection occurs after the contents of the accumulator has been shifted according to the
scaling mode. Shifting and limiting occur only when the entire 56-bit A or B accumulator
is specified as the source for a parallel data move over the XDB or YDB. When individual
registers A0, A1, A2, B0, B1, or B2 are specified as the source for a parallel data move,
3 - 8DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 42
OVERVIEW AND DATA ALU ARCHITECTURE
shifting and limiting are not performed.
3.2.4 Accumulator Shifter
The accumulator shifter (see Figure 3-3) is an asynchronous parallel shifter with a 56-bit
input and a 56-bit output that is implemented immediately before the MAC accumulator
input. The source accumulator shifting operations are as follows:
•No Shift (Unmodified)
•1-Bit Left Shift (Arithmetic or Logical) ASL, LSL, ROL
•1-Bit Right Shift (Arithmetic or Logical) ASR, LSR, ROR
•Force to zero
3.2.5 Data Shifter/Limiter
The data shifter/limiter circuits (see Figure 3-3) provide special post-processing on data
read from the Data ALU A and B accumulators out to the XDB or YDB. There are two independent shifter/limiter circuits (one for XDB and one for the YDB); each consists of a
shifter followed by a limiting circuit.
3.2.5.1Limiting (Saturation Arithmetic)
The A and B accumulators serve as buffer registers between the MAC unit and the XDB
and/or YDB. They act both as Data ALU source and destination operands.Test logic exists
in each accumulator register to support the operation of the data shifter/limiter circuits.
This test logic detects overflows out of the data shifter so that the limiter can substitute
one of several constants to minimize errors due to the overflow. This process is called saturation arithmetic
The Data ALU A and B accumulators have eight extension bits. Limiting occurs when the
extension bits are in use and either A or B is the source being read over XDB or YDB. If
the contents of the selected source accumulator can be represented without overflow in
the destination operand size (i.e., accumulator extension register not in use), the data limiter is disabled, and the operand is not modified. If contents of the selected source
accumulator cannot be represented without overflow in the destination operand size, the
data limiter will substitute a limited data value with maximum magnitude (saturated) and
with the same sign as the source accumulator contents: $7FFFFF for 24-bit or $7FFFFF
FFFFFF for 48-bit positive numbers, $800000 for 24-bit or $800000 000000 for 48-bit negative numbers. This process is called saturation arithmetic. The value in the accumulator
register is not shifted and can be reused within the Data ALU. When limiting does occur,
a flag is set and latched in the status register.Two limiters allow two-word operands to be
limited independently in the same instruction cycle. The two data limiters can also be com-
bined to form one 48-bit data limiter for long-word operands.
For example, if the source operand were 01.100 (+ 1.5 decimal) and the destination reg-
ister were only four bits, the destination register would contain 1.100 (- 1.5 decimal) after
the transfer, assuming signed fractional arithmetic. This is clearly in error as overflow has
occurred. To minimize the error due to overflow, it is preferable to write the maximum
(“limited”) value the destination can assume. In the example, the limited value would be
0.111 (+ 0.875 decimal), which is clearly closer to + 1.5 than - 1.5 and therefore introduces less error.
Figure 3-5 shows the effects of saturation arithmetic on a move from register A1 to register X0. The instruction “MOVE A1,X0” causes a move without limiting, and the instruction
“MOVE A,X0” causes a move of the same 24 bits with limiting. The error without limiting
is 2.0; whereas, it is 0.0000001 with limiting. Table 3-1 shows a more complete set of
limiting situations.
3.2.5.2Scaling
The data shifters can shift data one bit to the left or one bit to the right, or pass the data
unshifted. Each data shifter has a 24-bit output with overflow indication and is controlled
by the scaling mode bits in the status register. These shifters permit dynamic scaling of
fixed-point data without modifying the program code. For example, this permits block
floating-point algorithms such as fast Fourier transforms to be implemented in a regular
fashion.
3.3DATA REPRESENTATION AND ROUNDING
The DSP56K uses a fractional data representation for all Data ALU operations. Figure 3-
3 - 10DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 44
DATA REPRESENTATION AND ROUNDING
Table 3-1 Limited Data Values
Destination
Memory Reference
X
Y
X and Y
L (X:Y)
Source
Operand
X:A
X:B
Y:A
Y:B
X:A Y:A
X:A Y:B
X:B Y:A
X:B Y:B
L:AB
L:BA
L:A
L:B
Accumulator
Sign
+
-
+
-
+
-
+
-
+
+
-
Limited Value (Hexadecimal) Type of
XDBYDB
7FFFFF
800000
—
—
7FFFFF
800000
7FFFFF
800000
7FFFFF
800000
7FFFFF
800000
7FFFFF
800000
7FFFFF
800000
7FFFFF
800000
7FFFFF
800000
FFFFFF
000000
—
—
Access
One 24 bit
One 24 bit
Two 24 bit
One 48 bit
7 shows the bit weighting of words, long words, and accumulator operands for this representation. The decimal points are all aligned and are left justified.
Data must be converted to a fractional number by scaling before being used by the DSP
or the user will have to be very careful in how the DSP manipulates the data. Moving $3F
to a 24-bit Data ALU register does not result in the contents being $00003F as might be
expected. Assuming numbers are fractional, the DSP left justifies rather than right justifies. As a result, storing $3F in a 24-bit register results in the contents being $3F0000.
The simplest example of scaling is to convert all integer numbers to fractional numbers
by shifting the decimal 24 places to the left (see Figure 3-6). Thus, the data has not
changed; only the position of the decimal has moved.
S3F.
S.3F
Figure 3-6 Integer-to-Fractional Data Conversion
For words and long words, the most negative number that can be represented is -1
whose internal representation is $800000 and $800000000000, respectively. The most
positive word is $7FFFFF or 1 - 2
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 11
S = SIGN BIT
3F = HEXADECIMAL DATA TO BE CONVERTED
-23
and the most positive long word is $7FFFFFFFFFFF
Page 45
DATA REPRESENTATION AND ROUNDING
or 1 - 2
-47
. These limitations apply to all data stored in memory and to data stored in the
Data ALU input buffer registers. The extension registers associated with the accumulators allow word growth so that the most positive number that can be used is approximately 256 and the most negative number is approximately -256. When the accumulator
extension registers are in use, the data contained in the accumulators cannot be stored
exactly in memory or other registers. In these cases, the data must be limited to the most
positive or most negative number consistent with the size of the destination and the sign
of the accumulator (the most significant bit (MSB) of the extension register).
To maintain alignment of the binary point when a word operand is written to accumulator
A or B, the operand is written to the most significant accumulator register (A1 or B1), and
its MSB is automatically sign extended through the accumulator extension register. The
least significant accumulator register is automatically cleared. When a long-word operand is written to an accumulator, the least significant word of the operand is written to the
least significant accumulator register A0 or B0 and the most significant word is written to
DATA ALU
–2
0
–23
2
WORD OPERAND
X1, X0
Y1, Y0
A1, A0
B1, B0
LONG - WORD OPERAND
X1:X0 = X
Y1:Y0 = Y
A1:A0 = A10
B1:B0 = B10
ACCUMULATOR A OR B
Figure 3-7 Bit Weighting and Alignment of Operands
–2
0
–24
2
*
8
–2
A2, B2A1, B1A0, B0
SIGN EXTENSIONOPERANDZERO
0
2
–24
2
–47
2
–47
2
3 - 12DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 46
DATA REPRESENTATION AND ROUNDING
A1 or B1(see Figure 3-8).
TWOS COMPLEMENT INTEGER
N BITS
S
(N–1)
–2
•
TO [+2
(N–1)
–1]
TWOS COMPLEMENT FRACTIONAL
FRACTIONAL = INTEGER EXCEPT FOR X AND
÷
S
•
N BITS
–1 TO [+1–2
–(N–1)
]
Figure 3-8 Integer/Fractional Number Comparison
A comparison between integer and fractional number representation is shown in Figure
. The number representation for integers is between ± 2
3-8
(N-1)
; whereas, the fractional
representation is limited to numbers between ± 1. To convert from an integer to a fractional number, the integer must be multiplied by a scaling factor so the result will always
be between ± 1. The representation of integer and fractional numbers is the same if the
numbers are added or subtracted but is different if the numbers are multiplied or divided.
An example of two numbers multiplied together is given in Figure 3-9. The key difference
is that the extra bit in the integer multiplication is used as a duplicate sign bit and as the
least significant bit (LSB) in the fractional multiplication. The advantages of fractional
data representation are as follows:
•The MSP (left half) has the same format as the input data.
•The LSP (right half) can be rounded into the MSP without shifting or updating the
exponent.
•A significant bit is not lost through sign extension.
•Conversion to floating-point representation is easier because the industry-standard
floating-point formats use fractional mantissas.
•Coefficients for most digital filters are derived as fractions by the high-level language
programs used in digital-filter design packages, which implies that the results can be
used without the extensive data conversions that other formats require.
Should integer arithmetic be required in an application, shifting a one or zero, depending
on the sign, into the MSB converts a fraction to an integer.
The Data ALU MAC performs rounding of the accumulator register to single precision if
requested in the instruction (the A1 or B1 register is rounded according to the contents of
the A0 or B0 register). The rounding method is called round-to-nearest (even) number, or
convergent rounding. The usual rounding method rounds up any value above one-half
and rounds down any value below one-half. The question arises as to which way onehalf should be rounded. If it is always rounded one way, the results will eventually be
biased in that direction. Convergent rounding solves the problem by rounding down if the
number is odd (LSB=0) and rounding up if the number is even (LSB=1). Figure 3-10
shows the four cases for rounding a number in the A1 (or B1) register. If scaling is set in
the status register, the resulting number will be rounded as it is put on the data bus. However, the contents of the register are not scaled.
3 - 14DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 48
DATA REPRESENTATION AND ROUNDING
CASE I: IF A0 < $800000 (1/2), THEN ROUND DOWN (ADD NOTHING)
BEFORE ROUNDING
0
A2A1A0
XX . .XX XXX . . .XXX0100 011XXX . . . . XXX
55 48 47 24 230
CASE II: IF A0 > $800000 (1/2), THEN ROUND UP (ADD 1 TO A1)
BEFORE ROUNDING
1
A2A1A0
XX . .XX XXX . . .XXX0100 1110XX . . . . XXX
55 48 47 24 230
CASE III: IF A0 = $800000 (1/2), AND THE LSB OF A1 = 0,THEN ROUND DOWN (ADD NOTHING)
BEFORE ROUNDING
0
A2A1A0
XX . .XX XXX . . . XXX0100 10000 . . . . . . 000
55 48 47 24 230
AFTER ROUNDING
A2A1A0*
XX . . XX XXX . . . XXX0100 000 . . . . . . . . 000
CASE IV: IF A0 = $800000 (1/2), AND THE LSB = 1, THEN ROUND UP (ADD 1 TO A1)
BEFORE ROUNDING
A2A1A0
XX . .XX XXX . . .XXX0101 10000 . . . . . . 000
55 48 47 24 230
*A0 is always clear; performed during RND, MPYR, MACR
Figure 3-10 Convergent Rounding
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 15
AFTER ROUNDING
1
A2A1A0*
XX . .XX XXX . . .XXX0110 000 . . . . . . . . 000
55 48 47 24 230
Page 49
DOUBLE PRECISION MULTIPLY MODE
3.4DOUBLE PRECISION MULTIPLY MODE
The Data ALU double precision multiply operation multiplies two 48-bit operands with a
96-bit result. The processor enters the dedicated Double Precision Multiply Mode when
the user sets bit 14 (DM) of the Status Register (bit 6 of the MR register). The mode is
disabled by clearing the DM bit. For information on the DM bit, see Section 5.4.2.13 Double Precision Multiply Mode (Bit 14).
CAUTION:
While in the Double Precision Multiply Mode, only the double precision m ultiply algorithms
shown in Figure 3-11, Figure 3-12, and Figure 3-13 may be executed by the Data ALU;
any other Data ALU operation will give indeterminate results.
Figure 3-11 shows the full double precision multiply algorithm. To allow for pipeline
delay, the ANDI instruction should not be immediately followed by a Data ALU instruction. For example, the ORI instruction sets the DM mode bit, but, due to the instruction
execution pipeline, the Data ALU enters the Double Precision Multiply mode only after
; MSP*MSP➞a
movea,l:(r0)+
andi#$bf,mr;exit mode
non-Data ALU operation;pipeline delay
Figure 3-11 Full Double Precision Multiply Algorithm
3 - 16DATA ARITHMETIC LOGIC UNIT
MOTOROLA
Page 50
DOUBLE PRECISION MULTIPLY MODE
one instruction cycle. The ANDI instruction clears the DM mode bit, but, due to the
instruction execution pipeline, the Data ALU leaves the mode after one instruction cycle.
The double precision multiply algorithm uses the Y0 register at all stages. If the use of
the Data ALU is required in an interrupt service routine, Y0 should be saved together
with other Data ALU registers to be used, and should be restored before leaving the
interrupt routine.
If just single precision times double precision multiply is desired, two of the multiply operations may be deleted and replaced by suitable initialization and clearing of the accumulator and Y0. Figure 3-12 shows the single precision times double precision algorithm.
Y:X:
R5
R0
R1
R0
SPMSP1
LSP1
DP2DP3
DP1
DP3_DP2_DP1 = MSP1_LSP1 x SP
clr a#0,y0;clear a and y0
ori#$40,mr;enter DP mode
movex:(r1)+,x0y:(r5)+,y1;load LSP1 and SP
macx0,y1,ax:(r1)+,x1;LSP1*SP➞a,
Figure 3-13 shows a single precision times double precision multiply-accumulate algorithm. First, the least significant parts of the double precision values are multiplied by the
single precision values and accumulated in the “Double Precision Multiply” mode. Then
the DM bit is cleared and the least significant part of the result is saved to memory. The
most significant parts of the double precision values are then multiplied by the single pre-
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 17
Page 51
DOUBLE PRECISION MULTIPLY MODE
cision values and accumulated using regular MAC instructions. Note that the maximum
number of single times double MAC operations in this algorithm are limited to 255 since
overflow may occur (the A2 register is just eight bits long). If a longer sequence is
required, it should be split into sub-sequences each with no more than 255 MAC operations.
Y:X:
SPiMSPi
R1
R0
DP3_DP2_DP1 =
move #N-1,m5
clr a#0,y0;clear a and y0
ori#$40,mr;enter DP mode
movex:(r1)+,x0y:(r5)+,y1;load LSPi and SPi
rep#N;0<N<256
macx0,y1,ax:(r1)+,x0y:(r5)+,y1;LSPi*SPi➞a
andi#$bf,mr;exit DP mode
movea0,x:(r0)+;save DP1
movea1,y0
movea2,a
movey0,a0;a2:a1➞a1:a0
rep#N
macx0,y1,ax:(r1)+,x0y:(r5)+,y1;load MSPi and SPi
movea,l:(r0)+;save DP3_DP2
LSPi
DP3
DP1
DP2
∑ MSPi_LSPi x SPi
R5
R0
Figure 3-13 Single × Double Multiply-Accumulate Algorithm
3 - 18DATA ARITHMETIC LOGIC UNITMOTOROLA
Page 52
DATA ALU PROGRAMMING MODEL
3.5DATA ALU PROGRAMMING MODEL
The Data ALU features 24-bit input/output data registers that can be concatenated to accommodate 48-bit data and two 56-bit accumulators, which are segmented into three 24bit pieces that can be transferred over the buses. Figure 3-14 illustrates how the registers
in the programming model are grouped.
DATA ALU
INPUT REGISTERS
XY
470
X1
230 230
A
A2A1A0
*
238 7 0 230 230
*Read as sign extension bits, written as don’t care.
X0
DATA ALU
ACCUMULATOR REGISTERS
550550
*
238 7 0 230 230
470
Y1Y0
230 230
B
B2
B1B0
Figure 3-14 DSP56K Programming Model
3.6DATA ALU SUMMARY
The Data ALU performs arithmetic operations involving multiply and accumulate operations. It executes all instructions in one machine cycle and is not pipelined. The two 24-bit
numbers being multiplied can come from the X registers (X0 or X1) or Y registers (Y0 or
Y1). After multiplication, they are added (or subtracted) with one of the 56-bit accumulators and can be convergently rounded to 24 bits. The convergent-rounding forcing
function detects the $800000 condition in the LSP and makes the correction as necessary. The final result is then stored in one of the accumulators as a valid 56-bit number.
The condition code bits are set based on the rounded output of the logic unit.
MOTOROLA DATA ARITHMETIC LOGIC UNIT3 - 19
Page 53
DATA ALU SUMMARY
3 - 20DATA ARITHMETIC LOGIC UNITMOTOROLA
Page 54
SECTION 4
ADDRESS GENERATION UNIT
MOTOROLA ADDRESS GENERATION UNIT4 - 1
Page 55
SECTION CONTENTS
SECTION 4.1 ADDRESS GENERATION UNIT AND ADDRESSING MODES ....3
This section contains three major subsections. The first subsection describes the hardware architecture of the address generation unit (AGU), the second subsection
describes the programming model, and the third subsection describes the addressing
modes, explaining how the Rn, Nn, and Mn registers work together to form a memory
address.
4.2AGU ARCHITECTURE
The AGU is shown in the DSP56K block diagram in Figure 4-1. It uses integer arithmetic
to perform the effective address calculations necessary to address data operands in
memory, and contains the registers used to generate the addresses. It implements linear, modulo, and reverse-carry arithmetic, and operates in parallel with other chip
resources to minimize address-generation overhead.
The AGU is divided into two identical halves, each of which has an address arithmetic
logic unit (ALU) and four sets of three registers (see Figure 4-2). They are the address
registers (R0 - R3 and R4 - R7), offset registers (N0 - N3 and N4 - N7), and the modifier
registers (M0 - M3 and M4 - M7). The eight Rn, Nn, and Mn registers are treated as register triplets — e.g., only N2 and M2 can be used to update R2. The eight triplets are
R0:N0:M0, R1:N1:M1, R2:N2:M2, R3:N3:M3, R4:N4:M4, R5:N5:M5, R6:N6:M6, and
R7:N7:M7.
The two arithmetic units can generate two 16-bit addresses every instruction cycle — one
for any two of the XAB, YAB, or PAB. The AGU can directly address 65,536 locations on
the XAB, 65,536 locations on the YAB, and 65,536 locations on the PAB. The two independent address ALUs work with the two data memories to feed the data ALU two
operands in a single cycle. Each operand may be addressed by an Rn, Nn, and Mn triplet.
4.2.1 Address Register Files (Rn)
Each of the two address register files (see Figure 4-2) consists of four 16-bit registers. The
two files contain address registers R0 - R3 and R4 - R7, which usually contain addresses
used as pointers to memory. Each register may be read or written by the global data bus
(GDB). When read by the GDB, 16-bit registers are written into the two least significant
bytes of the GBD, and the most significant byte is set to zero. When written from the GBD,
only the two least significant bytes are written, and the most significant byte is truncated.
Each address register can be used as input to its associated address ALU for a register
update calculation. Each register can also be written by the output of its respective address ALU. One Rn register from the low address ALU and one Rn register from the high
address ALU can be accessed in a single instruction.
MOTOROLA ADDRESS GENERATION UNIT4 - 3
Page 57
PERIPHERAL
PINS
PERIPHERAL
MODULES
24-Bit 56K
Module
ADDRESS
GENERATION
UNIT
AGU ARCHITECTURE
PROGRAM
RAM/ROM
EXP ANSION
X MEMORY
RAM/ROM
EXPANSION
YAB
XAB
PAB
Y MEMORY
RAM/ROM
EXPANSION
EXPANSION
AREA
EXTERNAL
ADDRESS
BUS
SWITCH
ADDRESS
INTERNAL
DATA
BUS
SWITCH
PLL
CLOCK
GENERAT OR
PROGRAM
INTERRUPT
CONTROLLER
PROGRAM
DECODE
CONTROLLER
Program Control Unit
MODC/NMI
MODB/IRQB
MODA/IRQA
RESET
PROGRAM
ADDRESS
GENERA TOR
YDB
XDB
PDB
GDB
DATA ALU
24X24+56→56-BIT MAC
TWO 56-BIT ACCUMULATORS
BUS
CONTROL
EXTERNAL
DATA BUS
SWITCH
OnCE™
16 BITS
24 BITS
PORT A
CONTROL
DATA
Figure 4-1 DSP56K Block Diagram
4.2.2 Offset Register Files (Nn)
Each of two offset register files shown in Figure 4-2 consists of four 16-bit registers. The
two files contain offset registers N0 - N3 and N4 - N7, which contain either data or offset
values used to update address pointers. Each offset register can be read or written by the
4 - 4ADDRESS GENERATION UNIT
MOTOROLA
Page 58
AGU ARCHITECTURE
LOW ADDRESS ALUHIGH ADDRESS ALU
XAB YAB PAB
TRIPLE MULTIPLEXER
M0
N0
N1
M1
M2
N2
N3M3
ADDRESS
ALU
GLOBAL DATA BUS
R4
R0
R1
R5
R6
R2
R3R7
ADDRESS
ALU
N4
M4
M5
N5
N6
M6
M7N7
16 bits
24 bits
Figure 4-2 AGU Block Diagram
GDB. When read by the GDB, the contents of a register are placed in the two least significant bytes, and the most significant byte on the GDB is zero extended. When a register
is written, only the least significant 16 bits of the GDB are used; the upper portion is
truncated.
4.2.3 Modifier Register Files (Mn)
Each of the two modifier register files shown in Figure 4-2 consists of four 16-bit registers.
The two files contain modifier registers M0 - M3 and M4 - M7, which specify the type of
arithmetic used during address register update calculations or contain data. Each modifier
register can be read or written by the GDB. When read by the GDB, the contents of a register are placed in the two least significant bytes, and the most significant byte on the GDB
is zero extended. When a register is written, only the least significant 16 bits of the GDB
are used; the upper portion is truncated. Each modifier register is preset to $FFFF during
a processor reset.
4.2.4 Address ALU
The two address ALUs are identical (see Figure 4-2) in that each contains a 16-bit full
adder (called an offset adder), which can add 1) plus one, 2) minus one, 3) the contents
of the respective offset register N, or 4) the twos complement of N to the contents of the
MOTOROLA ADDRESS GENERATION UNIT4 - 5
Page 59
selected address register. A second full adder (called a modulo adder) adds the summed
result of the first full adder to a modulo value, M or minus M, where M-1 is stored in the
respective modifier register. A third full adder (called a reverse-carry adder) can add 1)
plus one, 2) minus one, 3) the offset N (stored in the respective offset register), or 4) minus
N to the selected address register with the carry propagating in the reverse direction —
i.e., from the most significant bit (MSB) to the least significant bit (LSB). The offset adder
and the reverse-carry adder are in parallel and share common inputs. The only difference
between them is that the carry propagates in opposite directions. Test logic determines
which of the three summed results of the full adders is output.
Each address ALU can update one address register, Rn, from its respective address register file during one instruction cycle and can perform linear, reverse-carry, and modulo
arithmetic. The contents of the selected modifier register specify the type of arithmetic to
be used in an address register update calculation. The modifier value is decoded in the
address ALU.
PROGRAMMING MODEL
The output of the offset adder gives the result of linear arithmetic (e.g., Rn
and is selected as the modulo arithmetic unit output for linear arithmetic addressing modifiers. The reverse-carry adder performs the required operation for reverse-carry
arithmetic and its result is selected as the address ALU output for reverse-carry addressing modifiers. Reverse-carry arithmetic is useful for 2
addressing. For modulo arithmetic, the modulo arithmetic unit will perform the function
(Rn
Nn. If the modulo operation requires wraparound for modulo arithmetic, the summed output of the modulo adder gives the correct updated address register value; if wraparound
is not necessary, the output of the offset adder gives the correct result.
4.2.5 Address Output Multiplexers
The address output multiplexers (see Figure 4-2) select the source for the XAB, YAB, and
PAB. These multiplexers allow the XAB, YAB, or PAB outputs to originate from R0 - R3
or R4 - R7.
4.3PROGRAMMING MODEL
The programmer’s view of the AGU is eight sets of three registers (see Figure 4-3). These
registers can act as temporary data registers and indirect memory pointers. Automatic updating is available when using address register indirect addressing. The Mn registers can
be programmed for linear addressing, modulo addressing, and bit-reverse addressing.
N) modulo M, where N can be one, minus one, or the contents of the offset register
The eight 16-bit address registers, R0 - R7, can contain addresses or general-purpose
data. The 16-bit address in a selected address register is used in the calculation of the
effective address of an operand. When supporting parallel X and Y data memory moves,
the address registers must be thought of as two separate files, R0 - R3 and R4 - R7. The
contents of an Rn may point directly to data or may be offset. In addition, Rn can be preupdated or post-updated according to the addressing mode selected. If an Rn is updated,
modifier registers, Mn, are always used to specify the type of update arithmetic. Offset
registers, Nn, are used for the update-by-offset addressing modes. The address register
modification is performed by one of the two modulo arithmetic units. Most addressing
modes modify the selected address register in a read-modify-write fashion; the address
register is read, its contents are modified by the associated modulo arithmetic unit, and
the register is written with the appropriate output of the modulo arithmetic unit. The form
of address register modification performed by the modulo arithmetic unit is controlled by
the contents of the offset and modifier registers discussed in the following paragraphs. Address registers are not affected by a processor reset.
4.3.2 Offset Register Files (N0
The eight 16-bit offset registers, N0 - N7, can contain offset values used to increment/decrement address registers in address register update calculations or can be used for 16-bit
general-purpose storage. For example, the contents of an offset register can be used to
step through a table at some rate (e.g., five locations per step for waveform generation),
or the contents can specify the offset into a table or the base of the table for indexed addressing. Each address register, Rn, has its own offset register, Nn, associated with it.
MOTOROLA ADDRESS GENERATION UNIT4 - 7
-
N3 and N4
-
N7)
Page 61
ADDRESSING
Table 4-1 Address Register Indirect Summary
Address Register Indirect
No UpdateNoXXXX X (Rn)
Postincrement by 1YesXXXX X (Rn)+
Postdecrement by 1YesXXXX X (Rn)–
Postincrement by Offset NnYesXXXX X (Rn)+Nn
NOTE:
S = System Stack Reference
C = Program Control Unit Register Reference
D = Data ALU Register Reference
A = Address ALU Register Reference
P = Program Memory Reference
X = X Memory Reference
Y = Y Memory Reference
L = L Memory Reference
XY = XY Memory Reference
Uses Mn
Modifier
SCDAPXYLXY
Operand Reference
Assembler
Syntax
Offset registers are not affected by a processor reset.
4.3.3 Modifier Register Files (M0
-
M3 and M4 - M7)
The eight 16-bit modifier registers, M0 - M7, define the type of address arithmetic to be
performed for addressing mode calculations, or they can be used for general-purpose
storage. The address ALU supports linear, modulo, and reverse-carry arithmetic types for
all address register indirect addressing modes. For modulo arithmetic, the contents of Mn
also specify the modulus. Each address register, Rn, has its own modifier register, Mn,
associated with it. Each modifier register is set to $FFFF on processor reset, which specifies linear arithmetic as the default type for address register update calculations.
4.4ADDRESSING
The DSP56K provides three different addressing modes: register direct, address register
indirect, and special. Since the register direct and special addressing modes do not necessarily use the AGU registers, they are described in SECTION 6 - INSTRUCTION SET
INTRODUCTION. The address register indirect addressing modes use the registers in
4 - 8ADDRESS GENERATION UNIT
MOTOROLA
Page 62
ADDRESSING
the AGU and are described in the following paragraphs.
4.4.1 Address Register Indirect Modes
When an address register is used to point to a memory location, the addressing mode is
called “address register indirect” (see Table 4-1). The term indirect is used because the
register contents are not the operand itself, but rather the address of the operand. These
addressing modes specify that an operand is in memory and specify the effective
address of that operand.
A portion of the data bus movement field in the instruction specifies the memory space to
be referenced. The contents of specific AGU registers that determine the effective
address are modified by arithmetic operations performed in the AGU. The type of
address arithmetic used is specified by the address modifier register, Mn. The offset register, Nn, is only used when the update specifies an offset.
Not all possible combinations are available, such as + (Rn). The 24-bit instruction word
size is not large enough to allow a completely orthogonal instruction set for all instructions used by the DSP.
An example and description of each mode is given in the following paragraphs. SECTION 6 - INSTRUCTION SET INTRODUCTION and APPENDIX A - INSTRUCTION SET
DETAILS give a complete description of the instruction syntax used in these examples.
In particular, XY: memory references refer to instructions in which an operand in X memory and an operand in Y memory are referenced in the same instruction.
4.4.1.1 No Update
The address of the operand is in the address register, Rn (see Table 4-1). The contents
of the Rn register are unchanged by executing the instruction. Figure 4-4 shows a MOVE
instruction using address register indirect addressing with no update. This mode can be
used for making XY: memory references. This mode does not use Nn or Mn registers.
4.4.1.2Postincrement By 1
The address of the operand is in the address register, Rn (see Table 4-1 and Figure 4-5).
After the operand address is used, it is incremented by 1 and stored in the same address
register. This mode can be used for making XY: memory references and for modifying
the contents of Rn without an associated data move.
4.4.1.3 Postdecrement By 1
The address of the operand is in the address register, Rn (see Table 4-1 and Figure 4-6).
After the operand address is used, it is decremented by 1 and stored in the same
address register. This mode can be used for making XY: memory references and for
modifying the contents of Rn without an associated data move.
4.4.1.4Postincrement By Offset Nn
The address of the operand is in the address register, Rn (see Table 4-1 and Figure 4-7).
After the operand address is used, it is incremented by the contents of the Nn register and
stored in the same address register. The contents of the Nn register are unchanged. This
mode can be used for making XY: memory references and for modifying the contents of
The address of the operand is in the address register, Rn (see Table 4-1 and Figure 4-8).
After the operand address is used, it is decremented by the contents of the Nn register
and stored in the same address register. The contents of the Nn register are unchanged.
This mode cannot be used for making XY: memory references, but it can be used to mod-
ify the contents of Rn without an associated data move.
4.4.1.6Indexed By Offset Nn
The address of the operand is the sum of the contents of the address register, Rn, and
the contents of the address offset register, Nn (see Table 4-1 and Figure 4-9). The contents of the Rn and Nn registers are unchanged. This addressing mode, which requires
Figure 4-7 Address Register Indirect — Postincrement by Offset Nn
an extra instruction cycle, cannot be used for making XY: memory references.
4.4.1.7Predecrement By 1
The address of the operand is the contents of the address register, Rn, decremented by
1 before the operand address is used (see Table 4-1 and Figure 4-10). The contents of
Rn are decremented and stored in the same address register. This addressing mode requires an extra instruction cycle. This mode cannot be used for making XY: memory
references, nor can it be used for modifying the contents of Rn without an associated data
Figure 4-8 Address Register Indirect — Postdecrement by Offset Nn
move.
4.4.2 Address Modifier Arithmetic Types
The address ALU supports linear, modulo, and reverse-carry arithmetic for all address
register indirect modes. These arithmetic types easily allow the creation of data structures
in memory for FIFOs (queues), delay lines, circular buffers, stacks, and bit-reversed FFT
buffers.
Figure 4-9 Address Register Indirect — Indexed by Offset Nn
The contents of the address modifier register, Mn, defines the type of arithmetic to be performed for addressing mode calculations. For modulo arithmetic, the contents of Mn also
specifies the modulus, or the size of the memory buffer whose addresses will be referenced. See Table 4-2 for a summary of the address modifiers implemented on the
DSP56K. The MMMM column indicates the hex value which should be stored in the Mn
register.
4.4.2.1 Linear Modifier (Mn=$FFFF)
When the value in the modifier register is $FFFF, address modification is performed using
normal 16-bit linear arithmetic (see Table 4-2). A 16-bit offset, Nn, and + 1 or –1 can be
used in the address calculations. The range of values can be considered as signed (Nn
from –32,768 to + 32,767) or unsigned (Nn from 0 to + 65,535) since there is no arithmetic
4 - 16ADDRESS GENERATION UNIT
MOTOROLA
Page 70
ADDRESSING
difference between these two data representations. Addresses are normally considered
unsigned, and data is normally considered signed.
4.4.2.2Modulo Modifier
When the value in the modifier register falls into one of two ranges (Mn=$0001 to $7FFF
or Mn= $8001 to $BFFF with the reserved gaps noted in the table), address modification
is performed using modulo arithmetic (see Table 4-2).
Modulo arithmetic normally causes the address register value to remain within an address
range of size M, whose lower boundary is determined by Rn. The upper boundary is determined by the modulus, or M. The modulus value, in turn, is determined by Mn, the value
in the modifier register (see Figure 4-11).
There are certain cases where modulo arithmetic addressing conditions may cause the
address register to jump linearly to the same relative address in a different buffer. Other
cases firmly restrict the address register to the same buffer, causing the address register
to wrap around within the buffer. The range in which the value contained in the modifier
register falls determines how the processor will handle modulo addressing.
4.4.2.2.1Mn=$0001 to $7FFF
In this range, the modulus (M) equals the value in the modifier register (Mn) plus 1. The
memory buffer’s lower boundary (base address) value, determined by Rn, must have zeros in the k LSBs, where 2
k
≥
M, and therefore must be a multiple of 2
k
. The upper
boundary is the lower boundary plus the modulo size minus one (base address plus M–
1). Since M
k
2
) is created where these circular buffers can be located. If M<2k, there will be a space
between sequential circular buffers of (2
2k, once M is chosen, a sequential series of memory blocks (each of length
≤
k
)–M.
For example, to create a circular buffer of 21 stages, M is 21, and the lower address
boundary must have its five LSBs equal to zero (2
k
≥ 21, thus k ≥ 5). The Mn register is
loaded with the value 20. The lower boundary may be chosen as 0, 32, 64, 96, 128, 160,
etc. The upper boundary of the buffer is then the lower boundary plus 21. There will be an
unused space of 11 memory locations between the upper address and next usable lower
address. The address pointer is not required to start at the lower address boundary or to
end on the upper address boundary; it can initially point anywhere within the defined modulo address range. Neither the lower nor the upper boundary of the modulo region is
stored; only the size of the modulo region is stored in Mn. The boundaries are determined
by the contents of Rn. Assuming the (Rn)+ indirect addressing mode, if the address register pointer increments past the upper boundary of the buffer (base address plus M–1),
it will wrap around through the base address (lower boundary). Alternatively, assuming
the (Rn)- indirect addressing mode, if the address decrements past the lower boundary
(base address), it will wrap around through the base address plus M–1 (upper boundary).
If an offset (Nn) is used in the address calculations, the 16-bit absolute value, |Nn|, must
be less than or equal to M for proper modulo addressing in this range. If Nn>M, the result
is data dependent and unpredictable, except for the special case where Nn=P x 2
k
, a multiple of the block size where P is a positive integer. For this special case, when using the
(Rn)+ Nn addressing mode, the pointer, Rn, will jump linearly to the same relative address
in a new buffer, which is P blocks forward in memory (see Figure 4-12).
Similarly, for (Rn)–Nn, the pointer will jump P blocks backward in memory. This technique
is useful in sequentially processing multiple tables or N-dimensional arrays. The range of
values for Nn is –32,768 to + 32,767. The modulo arithmetic unit will automatically wrap
around the address pointer by the required amount. This type of address modification is
useful for creating circular buffers for FIFOs (queues), delay lines, and sample buffers up
to 32,768 words long as well as for decimation, interpolation, and waveform generation.
The special case of (Rn)
± Nn mod M with Nn=P x 2
k
is useful for performing the same
algorithm on multiple blocks of data in memory — e.g., parallel infinite impulse response
(IIR) filtering.
An example of address register indirect modulo addressing is shown in Figure 4-13. Starting at location 64, a circular buffer of 21 stages is created. The addresses generated are
offset by 15 locations. The lower boundary = L x (2
the lower address boundary must be a multiple of 32. The lower boundary may be chosen
MOTOROLA ADDRESS GENERATION UNIT4 - 19
k
) where 2k ≥ 21; therefore, k=5 and
Page 73
ADDRESSING
k
2
M
(Rn) ± Nn MOD M
WHERE Nn = 2
k
2
M
k
(i.e., P = 1)
Figure 4-12 Linear Addressing with a Modulo Modifier
as 0, 32, 64, 96, 128, 160, etc. For this example, L is arbitrarily chosen to be 2, making
the lower boundary 64. The upper boundary of the buffer is then 84 (the lower boundary
plus 20 (M–1)). The Mn register is loaded with the value 20 (M–1). The offset register is
arbitrarily chosen to be 15 (Nn
≤M). The address pointer is not required to start at the lower
address boundary and can begin anywhere within the defined modulo address range —
i.e., within the lower boundary + (2
k
) address region. The address pointer, Rn, is arbitrarily
chosen to be 75 in this example. When R2 is post-incremented by the offset by the MOVE
instruction, instead of pointing to 90 (as it would in the linear mode) it wraps around to 69.
If the address register pointer increments past the upper boundary of the buffer (base address plus M–1), it will wrap around to the base address. If the address decrements past
the lower boundary (base address), it will wrap around to the base address plus M–1.
If Rn is outside the valid modulo buffer range and an operation occurs that causes Rn to
be updated, the contents of Rn will be updated according to modulo arithmetic rules. For
example, a MOVE B0,X:(R0)+ N0 instruction (where R0=6, M0=5, and N0=0) would apparently leave R0 unchanged since N0=0. However, since R0 is above the upper
boundary, the AGU calculates R0+ N0–M0–1 for the new contents of R0 and sets R0=0.
4 - 20ADDRESS GENERATION UNITMOTOROLA
Page 74
ADDRESSING
EXAMPLE: MOVE X0,X:(R2)+N
LET:
M2
00.....0010100
MODULUS=21
N2
R2
R2
0..010 00000
k=5
+
00.....0001111
00.....1001011
(90)
(75)
(69)
N2
OFFSET=15
POINTER=75
(84)
XD BUS
21
X0
(64)
Figure 4-13 Modulo Modifier Example
The MOVE instruction in Figure 4-13 takes the contents of the X0 register and moves it
to a location in the X memory pointed to by (R2), and then (R2) is updated modulo 21. The
new value of R2 is not 90 (75+ 15), which would be the case if linear arithmetic had been
used, but rather is 69 since modulo arithmetic was used.
4.4.2.2.2Mn=$8001 to $BFFF
In this range, the modulo (M) equals (Mn+1)-$8000, where Mn is the value in the modifier register (see Table 4-2). This range firmly restricts the address register to the same
buffer, causing the address register to wrap around within the buffer. This multiple wraparound addressing feature reduces argument overhead and is useful for decimation,
interpolation, and waveform generation.
The address modification is performed modulo M, where M may be any power of 2 in the
range from 2
1
to 214. Modulo M arithmetic causes the address register value to remain
within an address range of size M defined by a lower and upper address boundary. The
value M-1 is stored in the modifier register Mn least significant 14 bits while the two most
significant bits are set to ‘10’. The lower boundary (base address) value must have zeroes
in the k LSBs, where 2
k
= M, and therefore must be a multiple of 2k. The upper boundary
is the lower boundary plus the modulo size minus one (base address plus M-1).
MOTOROLA ADDRESS GENERATION UNIT4 - 21
Page 75
ADDRESSING
For example, to create a circular buffer of 32 stages, M is chosen as 32 and the lower address boundary must have its 5 least significant bits equal to zero (2
k
= 32, thus k = 5).
The Mn register is loaded with the value $801F. The lower boundary may be chosen as
0, 32, 64, 96, 128, 160, etc. The upper boundary of the buffer is then the lower boundary
plus 31.
The address pointer is not required to start at the lower address boundary and may begin
anywhere within the defined modulo address range (between the lower and upper boundaries). If the address register pointer increments past the upper boundary of the buffer
(base address plus M-1) it will wrap around to the base address. If the address decrements past the lower boundary (base address) it will wrap around to the base address
plus M-1. If an offset Nn is used in the address calculations, it is not required to be less
than or equal to M for proper modulo addressing since multiple wrap around is supported
for (Rn)+Nn, (Rn)-Nn and (Rn+Nn) address updates (multiple wrap-around cannot occur
with (Rn)+, (Rn)- and -(Rn) addressing modes).
The multiple wrap-around address modifier is useful for decimation, interpolation and
waveform generation since the multiple wrap-around capability may be used for argument
reduction.
4.4.2.3 Reverse-Carry Modifier (Mn=$0000)
Reverse carry is selected by setting the modifier register to zero (see Table 4-2). The address modification is performed in hardware by propagating the carry in the reverse
direction — i.e., from the MSB to the LSB. Reverse carry is equivalent to bit reversing the
contents of Rn (i.e., redefining the MSB as the LSB, the next MSB as bit 1, etc.) and the
offset value, Nn, adding normally, and then bit reversing the result. If the + Nn addressing
mode is used with this address modifier and Nn contains the value 2
(k–1)
(a power of two),
this addressing modifier is equivalent to bit reversing the k LSBs of Rn, incrementing Rn
by 1, and bit reversing the k LSBs of Rn again. This address modification is useful for addressing the twiddle factors in 2k-point FFT addressing and to unscramble 2
data. The range of values for Nn is 0 to + 32K (i.e., Nn=2
15
), which allows bit-reverse ad-
k
-point FFT
dressing for FFTs up to 65,536 points.
To make bit-reverse addressing work correctly for a 2
k
point FFT, the following proce-
dures must be used:
1. Set Mn=0; this selects reverse-carry arithmetic.
2. Set Nn=2
(k–1)
.
4 - 22ADDRESS GENERATION UNITMOTOROLA
Page 76
ADDRESSING
3. Set Rn between the lower boundary and upper boundary in the buffer memory. The lower boundary is L x (2
k
), where L is an arbitrary whole number. This
boundary gives a 16-bit binary number “xx . . . xx00 . . . 00”, where xx . . . xx=L
and 00 . . . 00 equals k zeros. The upper boundary is L x (2
k
)+ ((2k)–1). This
boundary gives a 16-bit binary number “xx . . . xx11 . . . 11”, where xx . . . xx=L
and 11 . . . 11 equals k ones.
4. Use the (Rn)+ Nn addressing mode.
As an example, consider a 1024-point FFT with real data stored in the X memory and
imaginary data stored in the Y memory. Since 1,024=2
is zero to select bit-reverse addressing. Offset register (Nn) contains the value 512 (2
1)
), and the pointer register (Rn) contains 3,072 (L x (2k)=3 x (210)), which is the lower
10
, k=10. The modifier register (Mn)
(k–
boundary of the memory buffer that holds the results of the FFT. The upper boundary is
4,095 (lower boundary + (2
k
)–1=3,072+ 1,023).
Postincrementing by + N generates the address sequence (0, 512, 256, 768, 128, 640,...),
which is added to the lower boundary. This sequence (0, 512, etc.) is the scrambled FFT
data order for sequential frequency points from 0 to 2
π. Table 4-3 shows the successive
contents of Rn when using (Rn)+ Nn updates.
Table 4-3 Bit-Reverse Addressing
Sequence Example
Rn Contents
30720
3584512
3328256
3840768
3200128
3712640
Offset From
Lower Boundary
The reverse-carry modifier only works when the base address of the FFT data buffer is a
multiple of 2
k
, such as 1,024, 2,048, 3,072, etc. The use of addressing modes other than
postincrement by + Nn is possible but may not provide a useful result.
MOTOROLA ADDRESS GENERATION UNIT4 - 23
Page 77
ADDRESSING
The term bit reverse with respect to reverse-carry arithmetic is descriptive. The lower
boundary that must be used for the bit-reverse address scheme to work is L x (2
k
). In the
previous example shown in Table 4-3, L=3 and k=10. The first address used is the lower
boundary (3072); the calculation of the next address is shown in Figure 4-14. The k LSBs
of the current contents of Rn (3,072) are swapped:
EACH UPDATE, (Rn)+Nn, IS EQUIVALENT TO:
Lk BITS
1. BIT REVERSING:Rn=000011 0000000000=3072
0000000000
2. INCREMENT Rn BY 1:Rn=000011 0000000000
+1
000011 0000000001
3. BIT REVERSING AGAIN:Rn=000011 0000000001
1000000000
000011 1000000000=3584
Figure 4-14 Bit-Reverse Address Calculation Example
•Bits 0 and 9 are swapped.
•Bits 1 and 8 are swapped.
•Bits 2 and 7 are swapped.
•Bits 3 and 6 are swapped.
•Bits 4 and 5 are swapped.
The result is incremented (3,073), and then the k LSBs are swapped again:
•Bits 0 and 9 are swapped.
•Bits 1 and 8 are swapped.
•Bits 2 and 7 are swapped.
•Bits 3 and 6 are swapped.
•Bits 4 and 5 are swapped.
The result is Rn equals 3,584.
4 - 24ADDRESS GENERATION UNITMOTOROLA
Page 78
ADDRESSING
4.4.2.4Address-Modifier-Type Encoding Summary
There are three address modifier types:
• Linear Addressing
• Reverse-Carry Addressing
• Modulo Addressing
Bit-reverse addressing is useful for 2
k
-point FFT addressing. Modulo addressing is useful
for creating circular buffers for FIFOs (queues), delay lines, and sample buffers up to
32,768 words long. The linear addressing is useful for general-purpose addressing. There
is a reserved set of modifier values (from 32,768 to 65,534) that should not be used.
Figure 4-15 gives examples of the three addressing modifiers using 8-bit registers for simplification (all AGU registers are 16 bit). The addressing mode used in the example,
postincrement by offset Nn, adds the contents of the offset register to the contents of the
address register after the address register is accessed. The results of the three examples
are as follows:
•The linear address modifier addresses every fifth location since the offset register
contains $5.
•Using the bit-reverse address modifier causes the postincrement by offset Nn
addressing mode to use the address register, bit reverse the four LSBs, increment by
1, and bit reverse the four LSBs again.
•The modulo address modifier has a lower boundary at a predetermined location, and
the modulo number plus the lower boundary establishes the upper boundary. This
boundary creates a circular buffer so that, if the address register is pointing within the
boundaries, addressing past a boundary causes a circular wraparound to the other
boundary.
MOTOROLA ADDRESS GENERATION UNIT4 - 25
Page 79
ADDRESSING
LINEAR ADDRESS MODIFIER
M0 = 255 = 11111111 FOR LINEAR ADDRESSING WITH R0
ORIGINAL REGISTERS: N0 = 5, R0 = 75 = 0100 1011
POSTINCREMENT BY OFFSET N0: R0 = 80 = 0101 0000
POSTINCREMENT BY OFFSET N0: R0 = 85 = 0101 0101
POSTINCREMENT BY OFFSET N0: R0 = 90 = 0101 1010
MODULO ADDRESS MODIFIER
M0 = 19 = 0001 0011 FOR MODULO 20 ADDRESSING WITH R0
ORIGINAL REGISTERS: N0 = 5,R0 = 75 = 0100 1011
POSTINCREMENT BY OFFSET N0: R0 = 80 = 0101 0000
POSTINCREMENT BY OFFSET N0: R0 = 65 = 0100 0001
POSTINCREMENT BY OFFSET N0: R0 = 70 = 0100 0110
R0
R0
UPPER
BOUNDARY
LOWER
BOUNDARY
90
85
80
75
83
80
75
70
65
64
REVERSE-CARRY ADDRESS MODIFIER
M0 = 0= 0000 0000 FOR REVERSE-CARRY ADDRESSING WITH R0
ORIGINAL REGISTERS: N0 = 8,R0 = 64 = 0100 0000
POSTINCREMENT BY OFFSET N0: R0 = 72 = 0100 1000
POSTINCREMENT BY OFFSET N0: R0 = 68 = 0100 0100
POSTINCREMENT BY OFFSET N0: R0 = 76 = 0100 1100
Figure 4-15 Address Modifier Summary
R0
76
72
68
64
4 - 26ADDRESS GENERATION UNITMOTOROLA
Page 80
SECTION 5
PROGRAM CONTROL UNIT
MOTOROLA PROGRAM CONTROL UNIT5 - 1
Page 81
SECTION CONTENTS
SECTION 5.1 PROGRAM CONTROL UNIT .................................................... 3
5.4.8 Programming Model Summary ........................................................... 17
5 - 2PROGRAM CONTROL UNIT
MOTOROLA
Page 82
PROGRAM CONTROL UNIT
5.1PROGRAM CONTROL UNIT
This section describes the hardware of the program control unit (PCU) and concludes
with a description of the programming model. The instruction pipeline description is also
included since understanding the pipeline is particularly important in understanding the
DSP56K family of processors.
5.2OVERVIEW
The program control unit is one of the three execution units in the central processing
module (see Figure 5-2). It performs program address generation (instruction prefetch),
instruction decoding, hardware DO loop control, and exception (interrupt) processing.
The programmer sees the program control unit as six registers and a hardware system
stack (SS) as shown in Figure 5-1. In addition to the standard program flow-control
resources, such as a program counter (PC), complete status register (SR), and SS, the
program control unit features registers (loop address (LA) and loop counter (LC)) dedicated to supporting the hardware DO loop instruction.
The SS is a 15-level by 32-bit separate internal memory which stores the PC and SR for
subroutine calls, long interrupts, and program looping. The SS also stores the LC and LA
registers. Each location in the SS is addressable as a 16-bit register, system stack high
(SSH) and system stack low (SSL). The stack pointer (SP) points to the SS locations.
PABPDB
1624
CLOCK
OMR
PC
LA
LC
SP
SR
2424
GLOBAL DATA BUS
32 x 15
STACK
INTERRUPTS
CONTROL
Figure 5-1 Program Address Generator
MOTOROLA PROGRAM CONTROL UNIT5 - 3
Page 83
PERIPHERAL
PINS
PERIPHERAL
24-Bit
56K Mod-
MODULES
ADDRESS
GENERATION
UNIT
OVERVIEW
PROGRAM
RAM/ROM
EXPANSION
YAB
XAB
PAB
X MEMORY
RAM/ROM
EXPANSION
Y MEMORY
RAM/ROM
EXPANSION
EXPANSION
AREA
EXTERNAL
ADDRESS
BUS
SWITCH
ADDRESS
INTERNAL
DATA
BUS
SWITCH
PLL
CLOCK
GENERATOR
PROGRAM
INTERRUPT
CONTROLLER
PROGRAM
DECODE
CONTROLLER
Program Control Unit
MODC/NMI
MODB/IRQB
MODA/IRQA
RESET
PROGRAM
ADDRESS
GENERA TOR
YDB
XDB
PDB
GDB
DATA ALU
24X24+56→56-BIT MAC
TWO 56-BIT ACCUMULATORS
BUS
CONTROL
EXTERNAL
DATA BUS
SWITCH
OnCE™
16 BITS
24 BITS
PORT A
CONTROL
DATA
Figure 5-2 DSP56K Block Diagram
All of the PCU registers are read/write to facilitate system debugging. Although none of
the registers are 24 bits, they are read or written over 24-bit buses. When they are read,
the least significant bits (LSBs) are significant, and the most significant bits (MSBs) are
zeroed as appropriate. When they are written, only the appropriate LSBs are significant,
and the MSBs are written as don’t care.
5 - 4PROGRAM CONTROL UNIT
MOTOROLA
Page 84
PROGRAM CONTROL UNIT (PCU) ARCHITECTURE
The program control unit implements a three-stage (prefetch, decode, execute) pipeline
and controls the five processing states of the DSP: normal, exception, reset, wait, and
stop.
5.3PROGRAM CONTROL UNIT (PCU) ARCHITECTURE
The PCU consists of three hardware blocks: the program decode controller (PDC), the
program address generator (PAG), and the program interrupt controller (PIC).
5.3.1 Program Decode Controller
The PDC contains the program logic array decoders, the register address bus generator,
the loop state machine, the repeat state machine, the condition code generator, the interrupt state machine, the instruction latch, and the backup instruction latch. The PDC
decodes the 24-bit instruction loaded into the instruction latch and generates all signals
necessary for pipeline control. The backup instruction latch stores a duplicate of the
prefetched instruction to optimize execution of the repeat (REP) and jump (JMP)
instructions.
5.3.2 Program Address Generator (PAG)
The PAG contains the PC, the SP, the SS, the operating mode register (OMR), the SR,
the LC register, and the LA register (see Figure 5-1).
The PAG provides hardware dedicated to support loops, which are frequent constructs in
DSP algorithms. A DO instruction loads the LC register with the number of times the loop
should be executed, loads the LA register with the address of the last instruction word in
the loop (fetched during one loop pass), and asserts the loop flag in the SR. The DO instruction also supports nested loops by stacking the contents of the LA, LC, and SR prior
to the execution of the instruction. Under control of the PAG, the address of the first instruction in the loop is also stacked so the loop can be repeated with no overhead. While
the loop flag in the SR is asserted, the loop state machine (in the PDC) will compare the
PC contents to the contents of the LA to determine if the last instruction word in the loop
was fetched. If the last word was fetched, the LC contents are tested for one. If LC is not
equal to one, then it is decremented, and the SS is read to update the PC with the address
of the first instruction in the loop, effectively executing an automatic branch. If the LC is
equal to one, then the LC, LA, and the loop flag in the SR are restored with the stack contents, while instruction fetches continue at the incremented PC value (LA + 1). More
information about the LA and LC appears in Section 5.3.4 Instruction Pipeline Format.
The repeat (REP) instruction loads the LC with the number of times the next instruction is
to be repeated. The instruction to be repeated is only fetched once, so throughput is increased by reducing external bus contention. However, REP instructions are not
MOTOROLA PROGRAM CONTROL UNIT5 - 5
Page 85
PROGRAM CONTROL UNIT (PCU) ARCHITECTURE
interruptible since they are fetched only once. A single-instruction DO loop can be used
in place of a REP instruction if interrupts must be allowed.
5.3.3 Program Interrupt Controller
The PIC receives all interrupt requests, arbitrates among them, and generates the interrupt vector address.
Interrupts have a flexible priority structure with levels that can range from zero to three.
Levels 0 (lowest level), 1, and 2 are maskable. Level 3 is the highest interrupt priority level
(IPL) and is not maskable. Two interrupt mask bits in the SR reflect the current IPL and
indicate the level needed for an interrupt source to interrupt the processor. Interrupts
cause the DSP to enter the exception processing state which is discussed fully in SECTION 7 – PROCESSING STATES.
The four external interrupt sources include three external interrupt request inputs (IRQA
IRQB
, and NMI) and the RESET pin. IRQA and IRQB can be either level sensitive or negative edge triggered. The nonmaskable interrupt (NMI
interrupt. MODA/IRQA
, MODB/IRQB, and MODC/NMI pins are sampled when RESET is
) is edge sensitive and is a level 3
deasserted. The sampled values are stored in the operating mode register (OMR) bits
MA, MB, and MC, respectively (see Section 5.4.3 for information on the OMR). Only the
fourth external interrupt, RESET
, and Illegal Instruction have higher priority than NMI.
The PIC also arbitrates between the different I/O peripherals. The currently selected peripheral supplies the correct vector address to the PIC.
5.3.4 Instruction Pipeline Format
The program control unit uses a three-level pipelined architecture in which concurrent instruction fetch, decode, and execution occur. This pipelined operation remains essentially
hidden from the user and makes programming straightforward. The pipeline is illustrated
in Figure 5-3, which shows the operations of each of the execution units and all initial conditions necessary to follow the execution of the instruction sequence shown in the figure.
The pipeline is described in more detail in Section 7.2.1 Instruction Pipeline.
The first instruction, I1, should be interpreted as follows: multiply the contents of X0 by the
contents of Y0, add the product to the contents already in accumulator A, round the result
to the “nearest even,” store the result back in accumulator A, move the contents in X data
memory (pointed to by R0) into X0 and postincrement R0, and move the contents in Y
data memory (pointed to by R4) into Y1 and postincrement R4. The second instruction,
I2, should be interpreted as follows: clear accumulator A, move the contents in X0 into the
location in X data memory pointed to by R0 and postincrement R0. Before the clear oper-
ation, move the contents in accumulator A into the location in Y data memory pointed to
by R4 and postdecrement R4. The third instruction, I3, is the same as I1, except the
rounding operation is not performed.
5.4PROGRAMMING MODEL
The program control unit features LA and LC registers which support the DO loop instruction and the standard program flow-control resources, such as a PC, complete SR, and
SS. With the exception of the PC, all registers are read/write to facilitate system debugging. Figure 5-4 shows the program control unit programming model with the six registers
and SS. The following paragraphs give a detailed description of each register.
5.4.1 Program Counter
This 16-bit register contains the address of the next location to be fetched from program
memory space. The PC can point to instructions, data operands, or addresses of operands. References to this register are always inherent and are implied by most instructions.
5 - 8PROGRAM CONTROL UNIT
MOTOROLA
Page 88
PROGRAMMING MODEL
MRCCR
15 1413 1211 109876 543210
LF DM TS1 S0I1I0SLEUNZVC
*
CARRY
OVERFLOW
ZERO
NEGATIVE
UNNORMALIZED
EXTENSION
LIMIT
SCALING
INTERRUPT MASK
SCALING MODE
RESERVED
TRACE MODE
DOUBLE PRECISION
MULTIPLY MODE
LOOP FLAG
All bits are cleared after hardware reset except bits 8 and 9 which are set to ones.
Bits 12 and 16 to 23 are reserved, read as zero and should be written with zero for future compatibility
Figure 5-5 Status Register Format
This special-purpose address register is stacked when program looping is initialized,
when a JSR is performed, or when interrupts occur (except for no-overhead fast
interrupts).
5.4.2 Status Register
The 16-bit SR consists of a mode register (MR) in the high-order eight bits and a condition
code register (CCR) in the low-order eight bits, as shown in Figure 5-5. The SR is stacked
when program looping is initialized, when a JSR is performed, or when interrupts occur,
(except for no-overhead fast interrupts).
The MR is a special purpose control register which defines the current system state of the
processor. The MR bits are affected by processor reset, exception processing, the DO,
end current DO loop (ENDDO), return from interrupt (RTI), and SWI instructions and by
instructions that directly reference the MR register, such as OR immediate to control register (ORI) and AND immediate to control register (ANDI). During processor reset, the
interrupt mask bits of the MR will be set. The scaling mode bits, loop flag, and trace bit will
be cleared.
MOTOROLA PROGRAM CONTROL UNIT5 - 9
Page 89
PROGRAMMING MODEL
The CCR is a special purpose control register that defines the current user state of the
processor. The CCR bits are affected by data arithmetic logic unit (ALU) operations, parallel move operations, and by instructions that directly reference the CCR (ORI and
ANDI). The CCR bits are not affected by parallel move operations unless data limiting occurs when reading the A or B accumulators. During processor reset, all CCR bits are
cleared.
5.4.2.1Carry (Bit 0)
The carry (C) bit is set if a carry is generated out of the MSB of the result in an addition.
This bit is also set if a borrow is generated in a subtraction. The carry or borrow is generated from bit 55 of the result. The carry bit is also affected by bit manipulation, rotate, and
shift instructions. Otherwise, this bit is cleared.
5.4.2.2Overflow (Bit 1)
The overflow (V) bit is set if an arithmetic overflow occurs in the 56-bit result. This bit indicates that the result cannot be represented in the accumulator register; thus, the register
has overflowed. Otherwise, this bit is cleared.
5.4.2.3Zero (Bit 2)
The zero (Z) bit is set if the result equals zero; otherwise, this bit is cleared.
5.4.2.4Negative (Bit 3)
The negative (N) bit is set if the MSB (bit 55) of the result is set; otherwise, this bit is
cleared.
5.4.2.5Unnormalized (Bit 4)
The unnormalized (U) bit is set if the two MSBs of the most significant product (MSP)
portion of the result are identical. Otherwise, this bit is cleared. The MSP portion of the A
or B accumulators, which is defined by the scaling mode and the U bit, is computed as
follows:
S1S0Scaling ModeU Bit Computation
00No ScalingU = (Bit 47 ⊕ Bit 46)
01Scale DownU = (Bit 48 ⊕ Bit 47)
10Scale UpU = (Bit 46 ⊕ Bit 45)
5 - 10PROGRAM CONTROL UNIT
MOTOROLA
Page 90
PROGRAMMING MODEL
5.4.2.6Extension (Bit 5)
The extension (E) bit is cleared if all the bits of the integer portion of the 56-bit result are
all ones or all zeros; otherwise, this bit is set. The integer portion, defined by the scaling
mode and the E bit, is computed as follows:
S1S0Scaling ModeInteger Portion
00No ScalingBits 55,54........48,47
01Scale DownBits 55,54........49,48
10Scale UpBits 55,54........47,46
If the E bit is cleared, then the low-order fraction portion contains all the significant bits;
the high-order integer portion is just sign extension. In this case, the accumulator extension register can be ignored. If the E bit is set, it indicates that the accumulator extension
register is in use.
5.4.2.7Limit (Bit 6)
The limit (L) bit is set if the overflow bit is set. The L bit is also set if the data shifter/limiter
circuits perform a limiting operation; otherwise, it is not affected. The L bit is cleared only
by a processor reset or by an instruction that specifically clears it, which allows the L bit
to be used as a latching overflow bit (i.e., a “sticky” bit). L is affected by data movement
operations that read the A or B accumulator registers.
5.4.2.8Scaling Bit (Bit 7)
The scaling bit (S) is used to detect data growth, which is required in Block Floating Point
FFT operation. Typically, the bit is tested after each pass of a radix 2 FFT and, if it is set,
the scaling mode should be activated in the next pass. The Block Floating Point FFT algorithm is described in the Motorola application note APR4/D, “Implementation of Fast
Fourier Transforms on Motorola’s DSP56000/DSP56001 and DSP96002 Digital Signal
Processors.” This bit is computed according to the following logical equations when the
result of accumulator A or B is moved to XDB or YDB. It is a “sticky” bit, cleared only by
an instruction that specifically clears it.
MOTOROLA PROGRAM CONTROL UNIT5 - 11
Page 91
PROGRAMMING MODEL
If S1=0 and S0=0 (no scaling)
then S = (A46 XOR A45) OR (B46 XOR B45)
If S1=0 and S0=1 (scale down)
then S = (A47 XOR A46) OR (B47 XOR B46)
If S1=1 and S0=0 (scale up)
then S = (A45 XOR A44) OR (B45 XOR B44)
If S1=1 and S0=1 (reserved)
then the S flag is undefined.
where Ai and Bi means bit i in accumulator A or B.
5.4.2.9 Interrupt Masks (Bits 8 and 9)
The interrupt mask bits, I1 and I0, reflect the current IPL of the processor and indicate
the IPL needed for an interrupt source to interrupt the processor. The current IPL of the
processor can be changed under software control. The interrupt mask bits are set during
hardware reset but not during software reset.
The scaling mode bits, S1 and S0, specify the scaling to be performed in the data ALU
shifter/limiter, and also specify the rounding position in the data ALU multiply-accumula-
5 - 12PROGRAM CONTROL UNIT
MOTOROLA
Page 92
PROGRAMMING MODEL
tor (MAC). The scaling modes are shown in the following table:
S1S0
0023No Scaling
0124Scale Down (1-Bit Arithmetic Right Shift)
1022Scale Up (1-Bit Arithmetic Left Shift)
11—Reserved for Future Expansion
Rounding
Bit
Scaling Mode
The scaling mode affects data read from the A or B accumulator registers out to the XDB
and YDB. Different scaling modes can occur with the same program code to allow dynamic scaling. Dynamic scaling facilitates block floating-point arithmetic. The scaling mode
also affects the MAC rounding position to maintain proper rounding when different portions of the accumulator registers are read out to the XDB and YDB. The scaling mode
bits, which are cleared at the start of a long interrupt service routine, are also cleared during a processor reset.
5.4.2.11Reserved Status (Bit 12)
This bits is reserved for future expansion and will read as zero during DSP read operations.
5.4.2.12Trace Mode (Bit 13)
The trace mode (T) bit specifies the tracing function of the DSP56000/56001 only . (With
other members of the DSP56K family, use the OnCE trace mode described in Section
10.5.) For the DSP56000/56001, if the T bit is set at the beginning of any instruction execution, a trace exception will be generated after the instruction execution is completed. If
the T bit is cleared, tracing is disabled and instruction execution proceeds normally. If a
long interrupt is executed during a trace exception, the SR with the trace bit set will be
stacked, and the trace bit in the SR is cleared (see SECTION 7 – PROCESSING
STATES for a complete description of a long interrupt operation). The T bit is also
cleared during processor reset.
5.4.2.13Double Precision Multiply Mode (Bit 14)
The processor is in double precision multiply mode when this bit is set. (See Section 3.4
for detailed information on the double precision multiply mode.) When the DM bit is set,
the operations performed by the MPY and MAC instructions change so that a double
precision 48-bit by 48-bit double precision multiplication can be performed in six instruc-
MOTOROLA PROGRAM CONTROL UNIT5 - 13
Page 93
PROGRAMMING MODEL
23876543210
SDMC YD DE MB MA
*
*
*
OPERATING MODE A, B
DATA ROM ENABLE
INTERNAL Y MEMORY DISABLE
OPERATING MODE C
RESERVED
STOP DELAY
RESERVED
RESERVED
Figure 5-6 OMR Format
tions. The DSP56K software simulator accurately shows how the MPY, MAC, and other
Data ALU instructions operate while the processor is in the double precision multiply
mode.
5.4.2.14Loop Flag (Bit 15)
The loop flag (LF) bit is set when a program loop is in progress. It detects the end of a
program loop. The LF is the only SR bit that is restored when a program loop is terminated. Stacking and restoring the LF when initiating and exiting a program loop, respectively, allow the nesting of program loops. At the start of a long interrupt service routine,
the SR (including the LF) is pushed on the SS and the SR LF is cleared. When returning
from the long interrupt with an RTI instruction, the SS is pulled and the LF is restored.
During a processor reset, the LF is cleared.
5.4.3 Operating Mode Register
The OMR is a 24-bit register (only six bits are defined) that sets the current operating
mode of the processor. Each chip in the DSP56K family of processors has its own set of
operating modes which determine the memory maps for program and data memories, and
the startup procedure that occurs when the chip leaves the reset state. The OMR bits are
only affected by processor reset and by the ANDI, ORI, and MOVEC instructions, which
directly reference the OMR.
The OMR format with all of its defined bits is shown in Figure 5-6. For product-specific
OMR bit definitions, see the individual chip’s user manual for details on its respective operating modes.
5.4.4 System Stack
The SS is a separate 15X32-bit internal memory divided into two banks, the SSH and the
5 - 14PROGRAM CONTROL UNIT
MOTOROLA
Page 94
PROGRAMMING MODEL
SSL, each 16 bits wide. The SSH stores the PC contents, and the SSL stores the SR contents for subroutine calls, long interrupts, and program looping. The SS will also store the
LA and LC registers. The SS is in stack memory space; its address is always inherent and
implied by the current instruction.
The contents of the PC and SR are pushed on the top location of the SS when a subroutine call or long interrupt occurs. When a return from subroutine (RTS) occurs, the
contents of the top location in the SS are pulled and put in the PC; the SR is not affected.
When an RTI occurs, the contents of the top location in the SS are pulled to both the PC
and SR.
The SS is also used to implement no-overhead nested hardware DO loops. When the DO
instruction is executed, the LA:LC are pushed on the SS, then the PC:SR are pushed on
the SS. Since each SS location can be addressed as separate 16-bit registers (SSH and
SSL), software stacks can be created for unlimited nesting.
The SS can accommodate up to 15 long interrupts, seven DO loops, 15 JSRs, or combinations thereof. When the SS limit is exceeded, a nonmaskable stack error interrupt
occurs, and the PC is pushed to SS location zero, which is not implemented in hardware.
The PC will be lost, and there will be no SP from the stack interrupt routine to the program
that was executing when the error occurred.
54 3210
UFSEP3P2P1P0
STACK POINTER
STACK ERROR FLAG
UNDERFLOW FLAG
Figure 5-7 Stack Pointer Register Format
5.4.5 Stack Pointer Register
The 6-bit SP register indicates the location of the top of the SS and the status of the SS
(underflow, empty, full, and overflow). The SP register is referenced implicitly by some instructions (DO, REP, JSR, RTI, etc.) or directly by the MOVEC instruction. The SP
register format is shown in Figure 5-7. The SP register works as a 6-bit counter that addresses (selects) a 15-location stack with its four LSBs. The possible SP values are
shown in Figure 5-8 and described in the following paragraphs.
5.4.5.1Stack Pointer (Bits 0–3)
The SP points to the last location used on the SS. Immediately after hardware reset,
MOTOROLA PROGRAM CONTROL UNIT5 - 15
Page 95
PROGRAMMING MODEL
these bits are cleared (SP=0), indicating that the SS is empty.
Data is pushed onto the SS by incrementing the SP, then writing data to the location to
which the SP points. An item is pulled off the stack by copying it from that location and
then by decrementing the SP.
5.4.5.2Stack Error Flag (Bit 4)
The stack error flag indicates that a stack error has occurred, and the transition of the
stack error flag from zero to one causes a priority level-3 stack error exception.
When the stack is completely full, the SP reads 001111, and any operation that pushes
data onto the stack will cause a stack error exception to occur. The SR will read 010000
(or 010001 if an implied double push occurs).
Any implied pull operation with SP equal to zero will cause a stack error exception, and
the SP will read 111111 (or 111110 if an implied double pull occurs).
The stack error flag is a “sticky bit” which, once set, remains set until cleared by the user.
There is a sequence of instructions that can cause a stack overflow and, without the sticky
bit, would not be detected because the stack pointer is decremented before the stack error
interrupt is taken. The sticky bit keeps the stack error bit set until the user clears it by writing a zero to SP bit 4. It also latches the overflow/underflow bit so that it cannot be
changed by stack pointer increments or decrements as long as the stack error is set. The
overflow/underflow bit remains latched until the first move to SP is executed.
Note: When SP is zero (stack empty), instructions that read the stack without SP post-
decrement and instructions that write to the stack without SP preincrement do not cause
a stack error exception (i.e., 1) DO SSL,xxxx 2) REP SSL 3) MOVEC or move peripheral
data (MOVEP) when SSL is specified as a source or destination).
5.4.5.3Underflow Flag (Bit 5)
The underflow flag is set when a stack underflow occurs. The underflow flag is a “sticky
bit” when the stack error flag is set. That is, when the stack error flag is set, the underflow
flag will not change state. The combination of “underflow=1” and “stack error=0” is an
illegal combination and will not occur unless it is forced by the user. If this condition is
forced by the user, the hardware will correct itself based on the result of the next stack
operation.
SP register bits 6 through 23 are reserved for future expansion and will read as zero during read operations.
5.4.6 Loop Address Register
The LA is a read/write register which is stacked into the SSH by a DO instruction and is
unstacked by end-of-loop processing or by an ENDDO instruction. The contents of the LA
register indicate the location of the last instruction word in a program loop. When that last
instruction is fetched, the processor checks the contents of the LC register (see the following section). If the contents are not one, the processor decrements the LC and takes
the next instruction from the top of the SS. If the LC is one, the PC is incremented, the
loop flag is restored (pulled from the SS), the SS is purged, the LA and LC registers are
pulled from the SS and restored, and instruction execution continues normally.
5.4.7 Loop Counter Register
The LC register is a special 16-bit counter which specifies the number of times a hardware
program loop shall be repeated. This register is stacked into the SSL by a DO instruction
and unstacked by end-of-loop processing or by execution of an ENDDO instruction. When
the end of a hardware program loop is reached, the contents of the LC register are tested
for one. If the LC is one, the program loop is terminated, and the LC register is loaded with
the previous LC contents stored on the SS. If LC is not one, it is decremented and the
program loop is repeated. The LC can be read under program control, which allows the
number of times a loop will be executed to be monitored/changed dynamically. The LC is
also used in the REP instruction
5.4.8 Programming Model Summary
The complete programming model for the DSP56K central processing module is shown
in Figure 5-9. Programming models for the peripherals are shown in the appropriate user
manuals.
MOTOROLA PROGRAM CONTROL UNIT5 - 17
Page 97
PROGRAMMING MODEL
DATA ARITHMETIC LOGIC UNIT
47X0
230 230
23 16150
*
*
*
*
*
*
*
*
X1
238 70#230
238 70#230
R7
R6
R5
R4
R3
R2
R1
R0
POINTER
REGISTERS
X0
ACCUMULATOR REGISTERS
55A0
A2
55B0
B2
ADDRESS GENERATION UNIT
23 16150
*
*
*
*
*
*
*
*
OFFSET
REGISTERS
INPUT REGISTERS
47Y0
230 230
A1
B1
N7
N6
N5
N4
N3
N2
N1
N0
Y1
A0
230
B0
230
23 16150
*
*
*
*
*
*
*
*
M7
M6
M5
M4
M3
M2
M1
M0
MODIFIER
REGISTERS
Y0
UPPER FILE
LOWER FILE
PROGRAM CONTROL UNIT
23 16150
*
LOOP ADDRESS
REGISTER (LA)
23 16150
*
PROGRAM
COUNTER (PC)
31SSH16 15SSL0
23 16150
*
LOOP COUNTER (LC)
23 16158 70
MRCCR
*
STATUS
REGISTER (SR)
238 76 5 4 3210
*
OPERATING MODE REGISTER (OMR)
1
236 50
*
SD
*
*
STACK POINTER (SP)
MC
YD
* READ AS ZERO, SHOULD BE WRITTEN
WITH ZERO FOR FUTURE COMPATIBILITY
# READ AS SIGN EXTENSION BITS,
WRITTEN AS DON’T CARE
15
SYSTEM STACK
Figure 5-9 DSP56K Central Processing Module Programming Model
6.4.6 Program Control Instructions .............................................................. 27
6 - 2INSTRUCTION SET INTRODUCTION
MOTOROLA
Page 100
INSTRUCTION SET INTRODUCTION
6.1INSTRUCTION SET INTRODUCTION
The programming model shown in Figure 6-1 suggests that the DSP56K central processing module architecture can be viewed as three functional units which operate in
parallel: data arithmetic logic unit (data ALU), address generation unit (AGU), and program control unit (PCU). The instruction set keeps each of these units busy throughout
each instruction cycle, achieving maximal speed and maintaining minimal program size.
This section introduces the DSP56K instruction set and instruction format. The complete
range of instruction capabilities combined with the flexible addressing modes used in this
processor provide a very powerful assembly language for implementing digital signal processing (DSP) algorithms. The instruction set has been designed to allow efficient coding
for DSP high-level language compilers such as the C compiler. Execution time is minimized by the hardware looping capabilities, use of an instruction pipeline, and parallel
moves.
6.2SYNTAX
The instruction syntax is organized into four columns: opcode, operands, and two parallelmove fields. The assembly-language source code for a typical one-word instruction is
shown in the following illustration. Because of the multiple bus structure and the parallelism of the DSP, up to three data transfers can be specified in the instruction word – one
on the X data bus (XDB), one on the Y data bus (YDB), and one within the data ALU.
These transfers are explicitly specified. A fourth data transfer is implied and occurs in the
program control unit (instruction word prefetch, program looping control, etc.). Each data
transfer involves a source and a destination.
Opcode Operands XDBYDB
MAC X0,Y0,A X:(R0)+,X0Y:(R4)+,Y0
The opcode column indicates the data ALU, AGU, or program control unit operation to be
performed and must always be included in the source code. The operands column specifies the operands to be used by the opcode. The XDB and YDB columns specify optional
data transfers over the XDB and/or YDB and the associated addressing modes. The
address space qualifiers (X:, Y:, and L:) indicate which address space is being referenced.
Parallel moves are allowed in 30 of the 62 instructions. Additional information is presented
in APPENDIX A - INSTRUCTION SET DETAILS.
6.3INSTRUCTION FORMATS
The DSP56K instructions consist of one or two 24-bit words – an operation word and an
optional effective address extension word. The general format of the operation word is
MOTOROLA INSTRUCTION SET INTRODUCTION6 - 3
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.