Analog Devices, Inc. reserves the right to change this product without
prior notice. Information furnished by Analog Devices is believed to be
accurate and reliable. However, no responsibility is assumed by Analog
Devices for its use; nor for any infringement of patents or other rights of
third parties which may result from its use. No license is granted by implication or otherwise under the patent rights of Analog Devices, Inc.
Trademark and Service Mark Notice
The Analog Devices logo, Blackfin, the Blackfin logo, CrossCore, EZ-KIT
Lite, SHARC, TigerSHARC, and VisualDSP++ are registered trademarks
of Analog Devices, Inc.
All other brand and product names are trademarks or service marks of
their respective owners.
CONTENTS
PREFACE
Purpose of This Manual .............................................................. xxxv
Thank you for purchasing and developing systems using Blackfin®
processors from Analog Devices, Inc.
Purpose of This Manual
The ADSP-BF533 Blackfin Processor Hardware Reference contains information about the DSP architecture for the Blackfin processors. The
architectural descriptions cover functional blocks, buses, and ports,
including all features and processes that they support.
For programming information, see the Blackfin Processor Programming
Reference. For timing, electrical, and package specifications, see the
ADSP-BF531/ADSP-BF532/ ADSP-BF533 Embedded Processor Data Sheet.
Intended Audience
The primary audience for this manual is a programmer who is familiar
with Analog Devices Blackfin processors. This manual assumes that the
audience has a working knowledge of the appropriate processor architecture and instruction set. Programmers who are unfamiliar with Analog
Devices processors can use this manual, but should supplement it with
other texts, such as the appropriate programming reference manuals and
data sheets, that describe their target architecture.
•Chapter 1, Introduction
Provides a high level overview of the processor. Architectural
descriptions include functional blocks, buses, and ports, including
features and processes they support.
•Chapter 2, Computational Units
Describes the arithmetic/logic units (ALUs), multiplier/accumulator units (MACs), shifter, and the set of video ALUs. The chapter
also discusses data formats, data types, and register files.
•Chapter 3, Operating Modes and States
Describes the three operating modes of the processor: Emulation
mode, Supervisor mode, and User mode. The chapter also
describes Idle state and Reset state.
•Chapter 4, Program Sequencer
Describes the operation of the program sequencer, which controls
program flow by providing the address of the next instruction to be
executed. The chapter also discusses loops, subroutines, jumps,
interrupts, and exceptions.
•Chapter 5, Data Address Generators
Describes the Data Address Generators (DAGs), addressing modes,
how to modify DAG and Pointer registers, memory address alignment, and DAG instructions.
•Chapter 6, Memory
Describes L1 memories. In particular, details their memory architecture, memory model, memory transaction model, and
memory-mapped registers (MMRs). Discusses the instruction,
data, and scratchpad memory, which are part of the Blackfin processor core.
•Chapter 7, Chip Bus Hierarchy
Describes on-chip buses, including how data moves through the
system. The chapter also discusses the system memory map, major
system components, and the system interconnects.
•Chapter 8, Dynamic Power Management
Describes system reset and power-up configuration, system clocking and control, and power management.
•Chapter 9, Direct Memory Access
Describes the peripheral DMA and Memory DMA controllers. The
peripheral DMA section discusses direct, block data movements
between a peripheral with DMA access and internal or external
memory spaces.
The Memory DMA section discusses memory-to-memory transfer
capabilities among the processor memory spaces and the L1, external synchronous, and asynchronous memories.
•Chapter 10, SPI Compatible Port Controllers
Describes the Serial Peripheral Interface (SPI) port that provides an
I/O interface to a variety of SPI compatible peripheral devices.
•Chapter 11, Parallel Peripheral Interface
Describes the Parallel Peripheral Interface (PPI) of the processor.
The PPI is a half-duplex, bidirectional port accommodating up to
16 bits of data and used for digital video and data converter
applications.
•Chapter 12, Serial Port Controllers
Describes the two independent, synchronous Serial Port Controllers (SPORT0 and SPORT1) that provide an I/O interface to a
variety of serial peripheral devices.
•Chapter 13, UART Port Controller
Describes the Universal Asynchronous Receiver/Transmitter
(UART) port, which converts data between serial and parallel
formats and includes modem control and interrupt handling hardware. The UART supports the half-duplex IrDA® SIR protocol as
a mode-enabled feature.
•Chapter 14, Programmable Flags
Describes the programmable flags, or general-purpose I/O pins in
the processor, including how to configure the pins as inputs and
outputs, and how to generate interrupts.
•Chapter 15, Timers
Describes the three general-purpose timers that can be configured
in any of three modes; the core timer that can generate periodic
interrupts for a variety of timing functions; and the watchdog timer
that can implement software watchdog functions, such as generating events to the Blackfin processor core.
•Chapter 16, Real-Time Clock
Describes a set of digital watch features of the processor, including
time of day, alarm, and stopwatch countdown.
•Chapter 17, External Bus Interface Unit
Describes the External Bus Interface Unit of the processor. The
chapter also discusses the asynchronous memory interface, the
SDRAM controller (SDC), related registers, and SDC configuration and commands.
•Chapter 18, System Design
Describes how to use the processor as part of an overall system. It
includes information about interfacing the processor to external
memory chips, bus timing and latency numbers, semaphores, and a
discussion of the treatment of unused pins.
•Appendix A, Blackfin Processor Core MMR Assignments
Lists the core memory-mapped registers, their addresses, and
cross-references to text.
•Appendix B, System MMR Assignments
Lists the system memory-mapped registers, their addresses, and
cross-references to text.
•Appendix C, Test Features
Describes test features for the processor; discusses the JTAG standard, boundary-scan architecture, instruction and boundary
registers, and public instructions.
•Appendix D, Numeric Formats
Describes various aspects of the 16-bit data format. The chapter
also describes how to implement a block floating-point format in
software.
•Appendix G, Glossary
Contains definitions of terms used in this book, including
acronyms.
Preface
What’s New in This Manual
This is Revision 3.4 of the ADSP-BF533 Blackfin Processor Hardware Reference. Changes to this book from Revision 3.3 include corrections of
typographic errors and reported document errata.
Technical or Customer Support
You can reach Analog Devices, Inc. Customer Support in the following
ways:
•Visit the Embedded Processing and DSP products Web site at
•Contact your Analog Devices, Inc. local sales office or authorized
distributor
•Send questions by mail to:
Analog Devices, Inc.
One Technology Way
P.O. Box 9106
Norwood, MA 02062-9106
USA
Supported Processors
The name Blackfin refers to a family of 16-bit, embedded processors.
VisualDSP++® currently supports the following Blackfin families:
ADSP-BF51x, ADSP-BF52x, ADSP-BF53x, ADSP-BF54x, and
ADSP-BF56x
Product Information
Product information can be obtained from the Analog Devices Web site,
VisualDSP++ online Help system, and a technical library CD.
Analog Devices Web Site
The Analog Devices Web site, www.analog.com, provides information
about a broad range of products—analog integrated circuits, amplifiers,
converters, and digital signal processors.
To access a complete technical library for each processor family, go to
http://www.analog.com/processors/technical_library. The manuals
selection opens a list of current manuals related to the product as well as a
link to the previous revisions of the manuals. When locating your manual
title, note a possible errata check mark next to the title that leads to the
current correction report against the manual.
Also note, MyAnalog.com is a free feature of the Analog Devices Web site
that allows customization of a Web page to display only the latest information about products you are interested in. You can choose to receive
weekly e-mail notifications containing updates to the Web pages that meet
your interests, including documentation errata against all manuals.
MyAnalog.com provides access to books, application notes, data sheets,
code examples, and more.
Visit MyAnalog.com to sign up. If you are a registered user, just log on.
Your user name is your e-mail address.
VisualDSP++ Online Documentation
Online documentation comprises the VisualDSP++ Help system, software
tools manuals, hardware tools manuals, processor manuals, Dinkum
Abridged C++ library, and FLEXnet License Tools software documentation. You can search easily across the entire VisualDSP++ documentation
set for any topic of interest.
For easy printing, supplementary Portable Documentation Format (.pdf)
files for all manuals are provided on the VisualDSP++ installation CD.
Each documentation file type is described as follows.
File Description
.chmHelp system files and manuals in Microsoft help format
.htm or
.html
.pdfVisualDSP++ and processor manuals in PDF format. Viewing and printing the
Dinkum Abridged C++ library and FLEXnet License Tools software documentation. Viewing and printing the
Explorer 6.0 (or higher).
.pdf files requires a PDF reader, such as Adobe Acrobat Reader (4.0 or higher).
.html files requires a browser, such as Internet
Technical Library CD
The technical library CD contains seminar materials, product highlights, a
selection guide, and documentation files of processor manuals, VisualDSP++ software manuals, and hardware tools manuals for the following
processor families: Blackfin, SHARC®, TigerSHARC®, ADSP-218x, and
ADSP-219x.
To order the technical library CD, go to http://www.analog.com/proces-
sors/technical_library, navigate to the manuals page for your
processor, click the request CD check mark, and fill out the order form.
Data sheets, which can be downloaded from the Analog Devices Web site,
change rapidly, and therefore are not included on the technical library
CD. Technical manuals change periodically. Check the Web site for the
latest manual revisions and associated documentation errata.
Text conventions used in this manual are identified and described as follows. Note that additional conventions, which apply only to specific
chapters, may appear throughout this document.
ExampleDescription
Preface
Close command
(File menu)
{this | that}Alternative items in syntax descriptions appear within curly brackets
[this | that]Optional items in syntax descriptions appear within brackets and sepa-
[this,…]Optional item lists in syntax descriptions appear within brackets
.SECTIONCommands, directives, keywords, and feature names are in text with
filenameNon-keyword placeholders appear in text with italic style format.
SWRST Software Reset
register
TMR0E, RESET
DRx, I[3:0]
SMS[3:0]
Titles in reference sections indicate the location of an item within the
VisualDSP++ environment’s menu system (for example, the Close
command appears on the File menu).
and separated by vertical bars; read the example as this or that. One
or the other is required.
rated by vertical bars; read the example as an optional this or that.
delimited by commas and terminated with an ellipse; read the example
as an optional comma-separated list of
letter gothic font.
Register names appear in UPPERCASE and a special typeface. The
descriptive names of registers are in mixed case and regular typeface.
Pin names appear in UPPERCASE and a special typeface.
Active low signals appear with an OVERBAR
Register, bit, and pin names in the text may refer to groups of registers
or pins:
A lowercase x in a register name (DRx) indicates a set of registers (for
example, DR2, DR1, and DR0).
A colon between numbers within brackets indicates a range of registers
or pins (for example, I[3:0] indicates I3, I2, I1, and I0; SMS[3:0]
cates SMS3
A Note: provides supplementary information on a related topic. In the
online version of this book, the word Note appears instead of this
symbol.
Caution: Incorrect device operation may result if ...
Caution: Device damage may result if ...
A Caution: identifies conditions or inappropriate usage of the product
that could lead to undesirable results or product damage. In the online
version of this book, the word Caution appears instead of this symbol.
Warn in g: Injury to device users may result if ...
A Warning: identifies conditions or inappropriate usage of the product
that could lead to conditions that are potentially hazardous for devices
users. In the online version of this book, the word Wa rnin g appears
instead of this symbol.
•If the register is read-only (RO), write-1-to-set (W1S), or
write-1-to-clear (W1C), this information appears under the name.
Read/write is the default and is not noted. Additional descriptive
text may follow.
•If any bits in the register do not follow the overall read/write convention, this is noted in the bit description after the bit name.
•If a bit has a short name, the short name appears first in the bit
description, followed by the long name in parentheses.
•The MMR assignment appears in hexadecimal to the left of the
register or—when multiple addresses are involved—in a table
below the register.
•The reset value appears in binary in the individual bits and in hexadecimal to the right of the register.
•Bits marked x have an unknown reset value. Consequently, the
reset value of registers that contain such bits is undefined or dependent on pin values at reset.
•Shaded bits are reserved.
Figure P-1 shows examples of these conventions.
To ensure upward compatibility with future implementations,
write back the value that is read for reserved bits in a register.
The ADSP-BF533, ADSP-BF532, and ADSP-BF531 processors are
enhanced members of the Blackfin processor family that offer significantly
higher performance and lower power than previous Blackfin processors
while retaining their ease-of-use and code compatibility benefits. The
three new processors are completely pin compatible, differing only in their
performance and on-chip memory, mitigating many risks associated with
new product development.
The Blackfin processor core architecture combines a dual MAC signal
processing engine, an orthogonal RISC-like microprocessor instruction
set, flexible Single Instruction, Multiple Data (SIMD) capabilities, and
multimedia features into a single instruction set architecture.
Blackfin products feature dynamic power management. The ability to vary
both the voltage and frequency of operation optimizes the power consumption profile to the specific task.
All of the peripherals, except for general-purpose I/O, Real-Time Clock,
and Timers, are supported by a flexible DMA structure. There are also
two separate memory DMA channels dedicated to data transfers between
the processor’s memory spaces, which include external SDRAM and asynchronous memory. Multiple on-chip buses provide enough bandwidth to
keep the processor core running even when there is also activity on all of
the on-chip and external peripherals.
Core Architecture
The processor core contains two 16-bit multipliers, two 40-bit accumulators, two 40-bit arithmetic logic units (ALUs), four 8-bit video ALUs, and
a 40-bit shifter, shown in Figure 1-2. The computational units process 8-,
16-, or 32-bit data from the register file.
The compute register file contains eight 32-bit registers. When performing compute operations on 16-bit operand data, the register file operates
as 16 independent 16-bit registers. All operands for compute operations
come from the multiported register file and instruction constant fields.
Each MAC can perform a 16- by 16-bit multiply per cycle, with accumulation to a 40-bit result. Signed and unsigned formats, rounding, and
saturation are supported.
The ALUs perform a traditional set of arithmetic and logical operations
on 16-bit or 32-bit data. Many special instructions are included to accelerate various signal processing tasks. These include bit operations such as
field extract and population count, modulo 232 multiply, divide primitives, saturation and rounding, and sign/exponent detection. The set of
video instructions include byte alignment and packing operations, 16-bit
and 8-bit adds with clipping, 8-bit average operations, and 8-bit subtract/absolute value/accumulate (SAA) operations. Also provided are the
compare/select and vector search instructions. For some instructions, two
16-bit ALU operations can be performed simultaneously on register pairs
(a 16-bit high half and 16-bit low half of a compute register). By also
using the second ALU, quad 16-bit operations are possible.
The 40-bit shifter can deposit data and perform shifting, rotating, normalization, and extraction operations.
A program sequencer controls the instruction execution flow, including
instruction alignment and decoding. For program flow control, the
sequencer supports PC-relative and indirect conditional jumps (with static
branch prediction) and subroutine calls. Hardware is provided to support
zero-overhead looping. The architecture is fully interlocked, meaning
there are no visible pipeline effects when executing instructions with data
dependencies.
The address arithmetic unit provides two addresses for simultaneous dual
fetches from memory. It contains a multiported register file consisting of
four sets of 32-bit Index, Modify, Length, and Base registers (for circular
buffering) and eight additional 32-bit pointer registers (for C-style
indexed stack manipulation).
Blackfin processors support a modified Harvard architecture in combination with a hierarchical memory structure. Level 1 (L1) memories typically
operate at the full processor speed with little or no latency. At the L1 level,
the instruction memory holds instructions only. The two data memories
hold data, and a dedicated scratchpad data memory stores stack and local
variable information.
In addition, multiple L1 memory blocks are provided, which may be configured as a mix of SRAM and cache. The Memory Management Unit
(MMU) provides memory protection for individual tasks that may be
operating on the core and may protect system registers from unintended
access.
The architecture provides three modes of operation: User, Supervisor, and
Emulation. User mode has restricted access to a subset of system resources,
thus providing a protected software environment. Supervisor and Emulation modes have unrestricted access to the system and core resources.
The ADSP-BF53x Blackfin processor instruction set is optimized so that
16-bit opcodes represent the most frequently used instructions. Complex
DSP instructions are encoded into 32-bit opcodes as multifunction
instructions. Blackfin products support a limited multi-issue capability,
where a 32-bit instruction can be issued in parallel with two 16-bit
instructions. This allows the programmer to use many of the core
resources in a single instruction cycle.
The ADSP-BF53x Blackfin processor assembly language uses an algebraic
syntax. The architecture is optimized for use with the C compiler.
Memory Architecture
The Blackfin processor architecture structures memory as a single, unified
4G byte address space using 32-bit addresses. All resources, including
internal memory, external memory, and I/O control registers, occupy separate sections of this common address space. The memory portions of this
address space are arranged in a hierarchical structure to provide a good
cost/performance balance of some very fast, low latency on-chip memory
as cache or SRAM, and larger, lower cost and lower performance off-chip
memory systems. Table 1-1 shows the memory comparison for the
ADSP-BF531, ADSP-BF532, and ADSP-BF533 processors.
Table 1-1. Memory Comparison
Typ e of Me mo ryADS P -B F5 31ADSP-BF532ADSP-BF533
Instruction SRAM/Cache16K byte16K byte16K byte
Instruction SRAM16K byte32K byte64K byte
Data SRAM/Cache16K byte32K byte32K byte
Data SRAM--32K byte
Scratchpad4K byte4K byte4K byte
Total84K byte116K byte148K byte
The L1 memory system is the primary highest performance memory available to the core. The off-chip memory system, accessed through the
External Bus Interface Unit (EBIU), provides expansion with SDRAM,
flash memory, and SRAM, optionally accessing up to 132M bytes of physical memory.
The memory DMA controller provides high bandwidth data movement
capability. It can perform block transfers of code or data between the
internal memory and the external memory spaces.
Internal Memory
The processor has three blocks of on-chip memory that provide high
bandwidth access to the core:
•L1 instruction memory, consisting of SRAM and a 4-way set-associative cache. This memory is accessed at full processor speed.
•L1 data memory, consisting of SRAM and/or a 2-way set-associative cache. This memory block is accessed at full processor speed.
•L1 scratchpad RAM, which runs at the same speed as the L1 memories but is only accessible as data SRAM and cannot be configured
as cache memory.
External Memory
External (off-chip) memory is accessed via the External Bus Interface Unit
(EBIU). This 16-bit interface provides a glueless connection to a bank of
synchronous DRAM (SDRAM) and as many as four banks of asynchronous memory devices including flash memory, EPROM, ROM, SRAM,
and memory-mapped I/O devices.
The PC133-compliant SDRAM controller can be programmed to interface to up to 128M bytes of SDRAM.
The asynchronous memory controller can be programmed to control up
to four banks of devices. Each bank occupies a 1M byte segment regardless
of the size of the devices used, so that these banks are only contiguous if
each is fully populated with 1M byte of memory.
Blackfin processors do not define a separate I/O space. All resources are
mapped through the flat 32-bit address space. Control registers for
on-chip I/O devices are mapped into memory-mapped registers (MMRs)
at addresses near the top of the 4G byte address space. These are separated
into two smaller blocks: one contains the control MMRs for all core functions and the other contains the registers needed for setup and control of
the on-chip peripherals outside of the core. The MMRs are accessible only
in Supervisor mode. They appear as reserved space to on-chip peripherals.
Event Handling
The event controller on the processor handles all asynchronous and synchronous events to the processor. The processor event handling supports
both nesting and prioritization. Nesting allows multiple event service routines to be active simultaneously. Prioritization ensures that servicing a
higher priority event takes precedence over servicing a lower priority
event. The controller provides support for five different types of events:
•Emulation – Causes the processor to enter Emulation mode, allowing command and control of the processor via the JTAG interface.
•Reset – Resets the processor.
•Nonmaskable Interrupt (NMI) – The software watchdog timer or
the NMI input signal to the processor generates this event. The
NMI event is frequently used as a power-down indicator to initiate
an orderly shutdown of the system.
•Exceptions – Synchronous to program flow. That is, the exception
is taken before the instruction is allowed to complete. Conditions
such as data alignment violations and undefined instructions cause
exceptions.
•Interrupts – Asynchronous to program flow. These are caused by
input pins, timers, and other peripherals.
Each event has an associated register to hold the return address and an
associated return-from-event instruction. When an event is triggered, the
state of the processor is saved on the supervisor stack.
The processor event controller consists of two stages: the Core Event Controller (CEC) and the System Interrupt Controller (SIC). The CEC works
with the SIC to prioritize and control all system events. Conceptually,
interrupts from the peripherals arrive at the SIC and are routed directly
into the general-purpose interrupts of the CEC.
Core Event Controller (CEC)
The Core Event Controller supports nine general-purpose interrupts
(IVG15– 7), in addition to the dedicated interrupt and exception events.
Of these general-purpose interrupts, the two lowest priority interrupts
(IVG15–14) are recommended to be reserved for software interrupt handlers, leaving seven prioritized interrupt inputs to support peripherals.
System Interrupt Controller (SIC)
The System Interrupt Controller provides the mapping and routing of
events from the many peripheral interrupt sources to the prioritized general-purpose interrupt inputs of the CEC. Although the processor
provides a default mapping, the user can alter the mappings and priorities
of interrupt events by writing the appropriate values into the Interrupt
Assignment Registers (IAR).
The processor has multiple, independent DMA controllers that support
automated data transfers with minimal overhead for the core. DMA transfers can occur between the internal memories and any of its DMA-capable
peripherals. Additionally, DMA transfers can be accomplished between
any of the DMA-capable peripherals and external devices connected to the
external memory interfaces, including the SDRAM controller and the
asynchronous memory controller. DMA-capable peripherals include the
SPORTs, SPI port, UART, and PPI. Each individual DMA-capable
peripheral has at least one dedicated DMA channel.
The DMA controller supports both one-dimensional (1D) and
two-dimensional (2D) DMA transfers. DMA transfer initialization can be
implemented from registers or from sets of parameters called descriptor
blocks.
The 2D DMA capability supports arbitrary row and column sizes up to
64K elements by 64K elements, and arbitrary row and column step sizes
up to +/- 32K elements. Furthermore, the column step size can be less
than the row step size, allowing implementation of interleaved datastreams. This feature is especially useful in video applications where data
can be de-interleaved on the fly.
Examples of DMA types supported include:
•A single, linear buffer that stops upon completion
•A circular, auto-refreshing buffer that interrupts on each full or
fractionally full buffer
•1D or 2D DMA using a linked list of descriptors
•2D DMA using an array of descriptors specifying only the base
DMA address within a common page
In addition to the dedicated peripheral DMA channels, there is a separate
memory DMA channel provided for transfers between the various memories of the system. This enables transfers of blocks of data between any of
the memories—including external SDRAM, ROM, SRAM, and flash
memory—with minimal processor intervention. Memory DMA transfers
can be controlled by a very flexible descriptor-based methodology or by a
standard register-based autobuffer mechanism.
External Bus Interface Unit
The External Bus Interface Unit (EBIU) on the processor interfaces with a
wide variety of industry-standard memory devices. The controller consists
of an SDRAM controller and an asynchronous memory controller.
PC133 SDRAM Controller
The SDRAM controller provides an interface to a single bank of industry-standard SDRAM devices or DIMMs. Fully compliant with the
PC133 SDRAM standard, the bank can be configured to contain between
16M and 128M bytes of memory.
A set of programmable timing parameters is available to configure the
SDRAM bank to support slower memory devices. The memory bank is
16 bits wide for minimum device count and lower system cost.
Asynchronous Controller
The asynchronous memory controller provides a configurable interface for
up to four separate banks of memory or I/O devices. Each bank can be
independently programmed with different timing parameters. This allows
connection to a wide variety of memory devices, including SRAM, ROM,
and flash EPROM, as well as I/O devices that interface with standard
memory control lines. Each bank occupies a 1M byte window in the
processor address space, but if not fully populated, these are not made
contiguous by the memory controller. The banks are 16 bits wide, for
interfacing to a range of memories and I/O devices.
Parallel Peripheral Interface
The processor provides a Parallel Peripheral Interface (PPI) that can connect directly to parallel A/D and D/A converters, ITU-R 601/656 video
encoders and decoders, and other general-purpose peripherals. The PPI
consists of a dedicated input clock pin, up to 3 frame synchronization
pins, and up to 16 data pins. The input clock supports parallel data rates
up to half the system clock rate.
In ITU-R 656 modes, the PPI receives and parses a data stream of 8-bit or
10-bit data elements. On-chip decode of embedded preamble control and
synchronization information is supported.
Three distinct ITU-R 656 modes are supported:
•Active Video Only – The PPI does not read in any data between
the End of Active Video (EAV) and Start of Active Video (SAV)
preamble symbols, or any data present during the vertical blanking
intervals. In this mode, the control byte sequences are not stored to
memory; they are filtered by the PPI.
•Vertical Blanking Only – The PPI only transfers Vertical Blanking
Interval (VBI) data, as well as horizontal blanking information and
control byte sequences on VBI lines.
•Entire Field – The entire incoming bitstream is read in through the
PPI. This includes active video, control preamble sequences, and
ancillary data that may be embedded in horizontal and vertical
blanking intervals.
Though not explicitly supported, ITU-R 656 output functionality can be
achieved by setting up the entire frame structure (including active video,
blanking, and control information) in memory and streaming the data out
the PPI in a frame sync-less mode. The processor’s 2D DMA features
facilitate this transfer by allowing the static frame buffer (blanking and
control codes) to be placed in memory once, and simply updating the
active video information on a per-frame basis.
The general-purpose modes of the PPI are intended to suit a wide variety
of data capture and transmission applications. The modes are divided into
four main categories, each allowing up to 16 bits of data transfer per
PPI_CLK cycle:
•Data Receive with Internally Generated Frame Syncs
•Data Receive with Externally Generated Frame Syncs
•Data Transmit with Internally Generated Frame Syncs
•Data Transmit with Externally Generated Frame Syncs
These modes support ADC/DAC connections, as well as video communication with hardware signalling. Many of the modes support more than
one level of frame synchronization. If desired, a programmable delay can
be inserted between assertion of a frame sync and reception/transmission
of data.
The processor incorporates two dual-channel synchronous serial ports
(SPORT0 and SPORT1) for serial and multiprocessor communications.
The SPORTs support these features:
•Bidirectional, I2S capable operation. Each SPORT has two sets of
independent transmit and receive pins, enabling eight channels of
I2S stereo audio.
•Buffered (eight-deep) transmit and receive ports. Each port has a
data register for transferring data words to and from other processor components and shift registers for shifting data in and out of
the data registers.
•Clocking. Each transmit and receive port can either use an external
serial clock or can generate its own in a wide range of frequencies.
•Word length. Each SPORT supports serial data words from 3 to 32
bits in length, transferred in most significant bit first or least significant bit first format.
•Framing. Each transmit and receive port can run with or without
frame sync signals for each data word. Frame sync signals can be
generated internally or externally, active high or low, and with
either of two pulse widths and early or late frame sync.
Each SPORT can perform A-law or µ-law companding according
to ITU recommendation G.711. Companding can be selected on
the transmit and/or receive channel of the SPORT without additional latencies.
•DMA operations with single cycle overhead
Each SPORT can automatically receive and transmit multiple buffers of memory data. The processor can link or chain sequences of
DMA transfers between a SPORT and memory.
•Interrupts
Each transmit and receive port generates an interrupt upon completing the transfer of a data word or after transferring an entire
data buffer or buffers through DMA.
•Multichannel capability
Each SPORT supports 128 channels out of a 1024-channel window and is compatible with the H.100, H.110, MVIP-90, and
HMVIP standards.
Serial Peripheral Interface (SPI) Port
The processor has an SPI-compatible port that enables the processor to
communicate with multiple SPI-compatible devices.
The SPI interface uses three pins for transferring data: two data pins and a
clock pin. An SPI chip select input pin lets other SPI devices select the
processor, and seven SPI chip select output pins let the processor select
other SPI devices. The SPI select pins are reconfigured Programmable Flag
pins. Using these pins, the SPI port provides a full-duplex, synchronous
serial interface, which supports both master and slave modes and multimaster environments.
The SPI port’s baud rate and clock phase/polarities are programmable,
and it has an integrated DMA controller, configurable to support either
transmit or receive datastreams. The SPI’s DMA controller can only service unidirectional accesses at any given time.
During transfers, the SPI port simultaneously transmits and receives by
serially shifting data in and out of its two serial data lines. The serial clock
line synchronizes the shifting and sampling of data on the two serial data
lines.
Timers
There are four general-purpose programmable timer units in the processor. Three timers have an external pin that can be configured either as a
Pulse Width Modulator (PWM) or timer output, as an input to clock the
timer, or as a mechanism for measuring pulse widths of external events.
These timer units can be synchronized to an external clock input connected to the PF1 pin, an external clock input to the PPI_CLK pin, or to the
internal SCLK.
The timer units can be used in conjunction with the UART to measure
the width of the pulses in the datastream to provide an autobaud detect
function for a serial channel.
The timers can generate interrupts to the processor core to provide periodic events for synchronization, either to the processor clock or to a count
of external signals.
In addition to the three general-purpose programmable timers, a fourth
timer is also provided. This extra timer is clocked by the internal processor
clock and is typically used as a system tick clock for generation of operating system periodic interrupts.
UART Port
The processor provides a full-duplex Universal Asynchronous
Receiver/Transmitter (UART) port, which is fully compatible with
PC-standard UARTs. The UART port provides a simplified UART interface to other peripherals or hosts, providing full- or half-duplex,
DMA-supported, asynchronous transfers of serial data. The UART port
includes support for 5 to 8 data bits; 1 or 2 stop bits; and none, even, or
odd parity. The UART port supports two modes of operation:
•Programmed I/O. The processor sends or receives data by writing
or reading I/O-mapped UART registers. The data is double buffered on both transmit and receive.
•Direct Memory Access (DMA). The DMA controller transfers
both transmit and receive data. This reduces the number and frequency of interrupts required to transfer data to and from memory.
The UART has two dedicated DMA channels, one for transmit
and one for receive. These DMA channels have lower priority than
most DMA channels because of their relatively low service rates.
The UART port’s baud rate, serial data format, error code generation and
status, and interrupts can be programmed to support:
•Wide range of bit rates
•Data formats from 7 to 12 bits per frame
•Generation of maskable interrupts to the processor by both transmit and receive operations
In conjunction with the general-purpose timer functions, autobaud detection is supported.
The capabilities of the UART are further extended with support for the
Infrared Data Association (IrDA
Specification (SIR) protocol.
®
) Serial Infrared Physical Layer Link
Real-Time Clock
The processor’s Real-Time Clock (RTC) provides a robust set of digital
watch features, including current time, stopwatch, and alarm. The RTC is
clocked by a 32.768 kHz crystal external to the processor. The RTC
peripheral has dedicated power supply pins, so that it can remain powered
up and clocked even when the rest of the processor is in a low power state.
The RTC provides several programmable interrupt options, including
interrupt per second, minute, hour, or day clock ticks, interrupt on programmable stopwatch countdown, or interrupt at a programmed alarm
time.
The 32.768 kHz input clock frequency is divided down to a 1 Hz signal
by a prescaler. The counter function of the timer consists of four counters:
a 60 second counter, a 60 minute counter, a 24 hours counter, and a
32768 day counter.
When enabled, the alarm function generates an interrupt when the output
of the timer matches the programmed value in the alarm control register.
There are two alarms. The first alarm is for a time of day. The second
alarm is for a day and time of that day.
The stopwatch function counts down from a programmed value, with one
minute resolution. When the stopwatch is enabled and the counter underflows, an interrupt is generated.
Like the other peripherals, the RTC can wake up the processor from Sleep
mode or Deep Sleep mode upon generation of any RTC wakeup event. An
RTC wakeup event can also wake up the on-chip internal voltage regulator from a powered down state.
Watchdog Timer
The processor includes a 32-bit timer that can be used to implement a
software watchdog function. A software watchdog can improve system
availability by forcing the processor to a known state through generation
of a hardware reset, nonmaskable interrupt (NMI), or general-purpose
interrupt, if the timer expires before being reset by software. The programmer initializes the count value of the timer, enables the appropriate
interrupt, then enables the timer. Thereafter, the software must reload the
counter before it counts to zero from the programmed value. This protects
the system from remaining in an unknown state where software that
would normally reset the timer has stopped running due to an external
noise condition or software error.
If configured to generate a hardware reset, the watchdog timer resets both
the CPU and the peripherals. After a reset, software can determine if the
watchdog was the source of the hardware reset by interrogating a status bit
in the watchdog control register.
The processor has 16 bidirectional programmable flag (PF) or general-purpose I/O pins, PF[15:0]. Each pin can be individually configured using
the flag control, status, and interrupt registers.
•Flag Direction Control register – Specifies the direction of each
individual PFx pin as input or output.
•Flag Control and Status registers – The processor employs a
“write-1-to-modify” mechanism that allows any combination of
individual flags to be modified in a single instruction, without
affecting the level of any other flags. Four control registers are provided. One register is written in order to set flag values, one register
is written in order to clear flag values, one register is written in
order to toggle flag values, and one register is written in order to
specify any number of flag values. Reading the Flag Status register
allows software to interrogate the sense of the flags.
•Flag Interrupt Mask registers – The two Flag Interrupt Mask registers allow each individual PFx pin to function as an interrupt to the
processor. Similar to the two Flag Control registers that are used to
set and clear individual flag values, one Flag Interrupt Mask register sets bits to enable interrupt function, and the other Flag
Interrupt Mask register clears bits to disable interrupt function.
PFx pins defined as inputs can be configured to generate hard-
The
ware interrupts, while output
PFx pins can be triggered by software
interrupts.
•Flag Interrupt Sensitivity registers – The two Flag Interrupt Sensitivity registers specify whether individual
PFx pins are level- or
edge-sensitive and specify—if edge-sensitive—whether just the rising edge or both the rising and falling edges of the signal are
significant. One register selects the type of sensitivity, and one register selects which edges are significant for edge sensitivity.
The processor can be clocked by an external crystal, a sine wave input, or a
buffered, shaped clock derived from an external clock oscillator.
This external clock connects to the processor’s CLKIN pin. The CLKIN input
cannot be halted, changed, or operated below the specified frequency during normal operation. This clock signal should be a TTL-compatible
signal.
The core clock (CCLK) and system peripheral clock (SCLK) are derived from
the input clock (CLKIN) signal. An on-chip Phase Locked Loop (PLL) is
capable of multiplying the CLKIN signal by a user-programmable (1x to
63x) multiplication factor (bounded by specified minimum and maximum
VCO frequencies). The default multiplier is 10x, but it can be modified by a
software instruction sequence. On-the-fly frequency changes can be made
by simply writing to the PLL_DIV register.
All on-chip peripherals are clocked by the system clock (SCLK). The system
clock frequency is programmable by means of the SSEL[3:0] bits of the
PLL_DIV register.
Dynamic Power Management
The processor provides four operating modes, each with a different performance/power profile. In addition, Dynamic Power Management provides
the control functions to dynamically alter the processor core supply voltage to further reduce power dissipation. Control of clocking to each of the
peripherals also reduces power consumption.
In the Full On mode, the PLL is enabled, not bypassed, providing the
maximum operational frequency. This is the normal execution state in
which maximum performance can be achieved. The processor core and all
enabled peripherals run at full speed.
Active Mode (Moderate Power Savings)
In the Active mode, the PLL is enabled, but bypassed. Because the PLL is
bypassed, the processor’s core clock (CCLK) and system clock (SCLK) run at
the input clock (CLKIN) frequency. In this mode, the CLKIN to VCO multiplier ratio can be changed, although the changes are not realized until the
Full On mode is entered. DMA access is available to appropriately configured L1 memories.
In the Active mode, it is possible to disable the PLL through the PLL
Control register (PLL_CTL). If disabled, the PLL must be re-enabled before
transitioning to the Full On or Sleep modes.
Sleep Mode (High Power Savings)
The Sleep mode reduces power dissipation by disabling the clock to the
processor core (CCLK). The PLL and system clock (SCLK), however, continue to operate in this mode. Typically an external event or RTC activity
will wake up the processor. When in the Sleep mode, assertion of any
interrupt causes the processor to sense the value of the bypass bit (
in the PLL Control register (PLL_CTL). If bypass is disabled, the processor
transitions to the Full On mode. If bypass is enabled, the processor transitions to the Active mode.
When in the Sleep mode, system DMA access to L1 memory is not
supported.
The Deep Sleep mode maximizes power savings by disabling the processor
core and synchronous system clocks (CCLK and SCLK). Asynchronous systems, such as the RTC, may still be running, but cannot access internal
resources or external memory. This powered-down mode can only be
exited by assertion of the reset interrupt or by an asynchronous interrupt
generated by the RTC. When in Deep Sleep mode, an RTC asynchronous
interrupt causes the processor to transition to the Active mode. Assertion
of RESET while in Deep Sleep mode causes the processor to transition to
the Full On mode.
Hibernate State
For lowest possible power dissipation, this state allows the internal supply
(V
DDINT
running. Although not strictly an operating mode like the four modes
detailed above, it is illustrative to view it as such.
) to be powered down, while keeping the I/O supply (V
DDEXT
)
Voltage Regulation
The processor provides an on-chip voltage regulator that can generate
internal voltage levels (0.8 V to 1.2 V) from an external 2.25 V to 3.6 V
supply. Figure 1-3 shows the typical external components required to
complete the power management system. The regulator controls the internal logic voltage levels and is programmable with the Voltage Regulator
Control register (VR_CTL) in increments of 50 mV. To reduce standby
power consumption, the internal voltage regulator can be programmed to
remove power to the processor core while keeping I/O power supplied.
While in this state, V
external buffers. The regulator can also be disabled and bypassed at the
user’s discretion.
The processor has two mechanisms for automatically loading internal L1
instruction memory after a reset. A third mode is provided to execute from
external memory, bypassing the boot sequence:
•Execute from 16-bit external memory – Execution starts from
address 0x2000 0000 with 16-bit packing. The boot ROM is
bypassed in this mode. All configuration settings are set for the
slowest device possible (3-cycle hold time; 15-cycle R/W access
times; 4-cycle setup).
•Boot from 8-bit or 16-bit external flash memory – The flash boot
routine located in boot ROM memory space is set up using Asynchronous Memory Bank 0. All configuration settings are set for the
slowest device possible (3-cycle hold time; 15-cycle R/W access
times; 4-cycle setup).
•Boot from SPI serial EEPROM (8-, 16-, or 24-bit addressable) –
The SPI uses the
device, submits successive read commands at addresses 0x00,
0x0000, and 0x000000 until a valid 8-, 16-, or 24-bit addressable
EEPROM is detected, and begins clocking data into the beginning
of L1 instruction memory.
•Boot from SPI host (slave mode) – A user-defined programmable
flag pin is an output on the Blackfin processor and an input on the
SPI host device. This flag allows the processor to hold off the host
device from sending data during certain sections of the boot process. When this flag is de-asserted, the host can continue to send
bytes to the processor.
For each of the boot modes, a 10-byte header is first read from an external
memory device. The header specifies the number of bytes to be transferred
and the memory destination address. Multiple memory blocks may be
loaded by any boot sequence. Once all blocks are loaded, program execution commences from the start of L1 instruction SRAM.
PF2 output pin to select a single SPI EEPROM
In addition, bit 4 of the Reset Configuration register can be set by application code to bypass the normal boot sequence during a software reset. For
this case, the processor jumps directly to the beginning of L1 instruction
memory.
Instruction Set Description
The ADSP-BF53x processor family assembly language instruction set
employs an algebraic syntax designed for ease of coding and readability.
The instructions have been specifically tuned to provide a flexible, densely
encoded instruction set that compiles to a very small final memory size.
The instruction set also provides fully featured multifunction instructions
that allow the programmer to use many of the processor core resources in
a single instruction. Coupled with many features more often seen on
microcontrollers, this instruction set is very efficient when compiling C
and C++ source code. In addition, the architecture supports both user
(algorithm/application code) and supervisor (O/S kernel, device drivers,
debuggers, ISRs) modes of operation, allowing multiple levels of access to
core resources.
The assembly language, which takes advantage of the processor’s unique
architecture, offers these advantages:
•Seamlessly integrated DSP/CPU features optimized for both 8-bit
and 16-bit operations
•A multi-issue load/store modified Harvard architecture, which supports two 16-bit MAC or four 8-bit ALU + two load/store + two
pointer updates per cycle
•All registers, I/O, and memory mapped into a unified 4G byte
memory space, providing a simplified programming model
•Microcontroller features, such as arbitrary bit and bit field manipulation, insertion, and extraction; integer operations on 8-, 16-, and
32-bit data types; and separate user and supervisor stack pointers.
Code density enhancements include intermixing of 16- and 32-bit
instructions with no mode switching or code segregation. Frequently used
instructions are encoded in 16 bits.
Development Tools
The processor is supported with a complete set of CrossCore® software
and hardware development tools, including Analog Devices emulators and
the VisualDSP++ development environment. The same emulator hardware that supports other Analog Devices products also fully emulates the
ADSP-BF53x processor family.
The VisualDSP++ project management environment lets programmers
develop and debug an application. This environment includes an
easy-to-use assembler that is based on an algebraic syntax, an archiver
(librarian/library builder), a linker, a loader, a cycle-accurate instruction-level simulator, a C/C++ compiler, and a C/C++ runtime library that
includes DSP and mathematical functions. A key point for these tools is
C/C++ code efficiency. The compiler has been developed for efficient
translation of C/C++ code to Blackfin processor assembly. The Blackfin
processor has architectural features that improve the efficiency of compiled C/C++ code.
Debugging both C/C++ and assembly programs with the VisualDSP++
debugger, programmers can:
•View mixed C/C++ and assembly code (interleaved source and
object information)
•Insert breakpoints
•Set conditional breakpoints on registers, memory, and stacks
•Trace instruction execution
•Perform linear or statistical profiling of program execution
•Fill, dump, and graphically plot the contents of memory
The VisualDSP++ Integrated Development Environment (IDE) lets programmers define and manage software development. Its dialog boxes and
property pages let programmers configure and manage all development
tools, including Color Syntax Highlighting in the VisualDSP++ editor.
These capabilities permit programmers to:
•Control how the development tools process inputs and generate
outputs.
•Maintain a one-to-one correspondence with the tool’s command-line switches.
The VisualDSP++ Kernel (VDK) incorporates scheduling and resource
management tailored specifically to address the memory and timing constraints of DSP programming. These capabilities enable engineers to
develop code more effectively, eliminating the need to start from the very
beginning, when developing new application code. The VDK features
include threads, critical and unscheduled regions, semaphores, events, and
device flags. The VDK also supports priority-based, pre-emptive, cooperative and time-sliced scheduling approaches. In addition, the VDK was
designed to be scalable. If the application does not use a specific feature,
the support code for that feature is excluded from the target system.
Because the VDK is a library, a developer can decide whether to use it or
not. The VDK is integrated into the VisualDSP++ development environment but can also be used with standard command-line tools. The VDK
development environment assists in managing system resources, automating the generation of various VDK-based objects, and visualizing the
system state during application debug.
Analog Devices emulators use the IEEE 1149.1 JTAG test access port of
the processor to monitor and control the target board processor during
emulation. The emulator provides full speed emulation, allowing inspection and modification of memory, registers, and processor stacks.
Nonintrusive in-circuit emulation is assured by the use of the processor’s
JTAG interface—the emulator does not affect target system loading or
timing.
In addition to the software and hardware development tools available
from Analog Devices, third parties provide a wide range of tools supporting the Blackfin processor family. Hardware tools include the
ADSP-BF533 EZ-KIT Lite standalone evaluation/development cards.
Third party software tools include DSP libraries, real-time operating systems, and block diagram design tools.
The processor’s computational units perform numeric processing for DSP
and general control algorithms. The six computational units are two arithmetic/logic units (ALUs), two multiplier/accumulator (multiplier) units, a
shifter, and a set of video ALUs. These units get data from registers in the
Data Register File. Computational instructions for these units provide
fixed-point operations, and each computational instruction can execute
every cycle.
The computational units handle different types of operations. The ALUs
perform arithmetic and logic operations. The multipliers perform
multiplication and execute multiply/add and multiply/subtract operations. The shifter executes logical shifts and arithmetic shifts and performs
bit packing and extraction. The video ALUs perform Single Instruction,
Multiple Data (SIMD) logical operations on specific 8-bit data operands.
Data moving in and out of the computational units goes through the Data
Register File, which consists of eight registers, each 32 bits wide. In operations requiring 16-bit operands, the registers are paired, providing sixteen
possible 16-bit registers.
The processor’s assembly language provides access to the Data Register
File. The syntax lets programs move data to and from these registers and
specify a computation’s data format at the same time.
Figure 2-1 provides a graphical guide to the other topics in this chapter.
An examination of each computational unit provides details about its
operation and is followed by a summary of computational instructions.
Studying the details of the computational units, register files, and data
buses leads to a better understanding of proper data flow for computa-
SP
SEQUENCER
ALIGN
DECODE
LOOP BUFFER
DAG0DAG1
16
16
8888
4040
ACC 0ACC 1
BARREL
SHIFTER
DATA ARITHMETIC UNIT
CONTROL
UNIT
ADDRESS ARITHMETIC UNIT
FP
P5
P4
P3
P2
P1
P0
R7
R6
R5
R4
R3
R2
R1
R0
I3
I2
I1
I0
L3
L2
L1
L0
B3
B2
B1
B0
M3
M2
M1
M0
tions. Next, details about the processor’s advanced parallelism reveal how
to take advantage of multifunction instructions.
Figure 2-1 shows the relationship between the Data Register File and the
computational units—multipliers, ALUs, and shifter.
Figure 2-1. Processor Core Architecture
Single function multiplier, ALU, and shifter instructions have unrestricted
access to the data registers in the Data Register File. Multifunction operations may have restrictions that are described in the section for that
particular operation.
Two additional registers, A0 and A1, provide 40-bit accumulator results.
These registers are dedicated to the ALUs and are used primarily for multiply-and-accumulate functions.
The traditional modes of arithmetic operations, such as fractional and
integer, are specified directly in the instruction. Rounding modes are set
from the
results of the computational operations.
ASTAT register, which also records status and conditions for the
Using Data Formats
ADSP-BF53x processors are primarily 16-bit, fixed-point machines. Most
operations assume a two’s-complement number representation, while others assume unsigned numbers or simple binary strings. Other instructions
support 32-bit integer arithmetic, with further special features supporting
8-bit arithmetic and block floating point. For detailed information about
each number format, see Appendix D, “Numeric Formats.”
In the ADSP-BF53x processor family arithmetic, signed numbers are
always in two’s-complement format. These processors do not use
signed-magnitude, one’s-complement, binary-coded decimal (BCD), or
excess-n formats.
Binary String
The binary string format is the least complex binary notation; in it, 16 bits
are treated as a bit pattern. Examples of computations using this format
are the logical operations NOT, AND, OR, XOR. These ALU operations
treat their operands as binary strings with no provision for sign bit or
binary point placement.
Unsigned binary numbers may be thought of as positive and having nearly
twice the magnitude of a signed number of the same length. The processor
treats the least significant words of multiple precision numbers as
unsigned numbers.
Signed Numbers: Two’s-Complement
In ADSP-BF53x processor arithmetic, the word signed refers to
two’s-complement numbers. Most ADSP-BF53x processor family operations presume or support two’s-complement arithmetic.
Fractional Representation: 1.15
ADSP-BF53x processor arithmetic is optimized for numerical values in a
fractional binary format denoted by 1.15 (“one dot fifteen”). In the 1.15
format, 1 sign bit (the Most Significant Bit (MSB)) and 15 fractional bits
represent values from –1 to 0.999969.
Figure 2-2 shows the bit weighting for 1.15 numbers as well as some
examples of 1.15 numbers and their decimal equivalents.
Data Registers Data Address Generator Registers (DAGs)
R0
R1
R2
R3
R4
R5
R6
R7
A0
A1
A0.XA0.W
P0
P1
P2
P3
P4
P5
SP
I0
I2
I3
L0B0
B3L3
L2
L1B1
B2
I1
R0.HR0.L
R1.H
R2.H
R3.H
R4.H
R5.H
R6.H
R7.H
R1.L
R2.L
R3.L
R4.L
R5.L
R6.L
R7.L
A1.X
A1.W
FP
M0
M3
M1
M2
Register Files
The processor’s computational units have three definitive register
groups—a Data Register File, a Pointer Register File, and set of Data
Address Generator (DAG) registers.
•The Data Register File receives operands from the data buses for
the computational units and stores computational results.
•The Pointer Register File has pointers for addressing operations.
•The DAG registers are dedicated registers that manage zero-overhead circular buffers for DSP operations.
For more information, see Chapter 5, “Data Address Generators.”
The processor register files appear in Figure 2-3.
In the processor, a word is 32 bits long; H denotes the high order
16 bits of a 32-bit register; L denotes the low order 16 bits of a
32-bit register. For example, A0.W contains the lower 32 bits of the
40-bit A0 register; A0.L contains the lower 16 bits of A0.W, and A0.H
contains the upper 16 bits of A0.W.
Data Register File
The Data Register File consists of eight registers, each 32 bits wide. Each
register may be viewed as a pair of independent 16-bit registers. Each is
denoted as the low half or high half. Thus the 32-bit register R0 may be
regarded as two independent register halves, R0.L and R0.H.
Three separate buses (two read, one write) connect the Register File to the
L1 data memory, each bus being 32 bits wide. Transfers between the Data
Register File and the data memory can move up to four 16-bit words of
valid data in each cycle.
Accumulator Registers
In addition to the Data Register File, the processor has two dedicated,
40-bit accumulator registers. Each can be referred to as its 16-bit low half
(An.L) or high half (An.H) plus its 8-bit extension (An.X). Each can also be
referred to as a 32-bit register (
a complete 40-bit result register (An).
The general-purpose Address Pointer registers, also called P-registers, are
organized as:
•6-entry, P-register files P[5:0]
•Frame Pointers (FP) used to point to the current procedure’s activation record
•Stack Pointer registers (
SP) used to point to the last used location
on the runtime stack. See mode dependent registers in Chapter 3,
“Operating Modes and States.”
P-registers are 32 bits wide. Although P-registers are primarily used for
address calculations, they may also be used for general integer arithmetic
with a limited set of arithmetic operations; for instance, to maintain
counters. However, unlike the Data registers, P-register arithmetic does
not affect the Arithmetic Status (ASTAT) register status flags.
DAG Register Set
DSP instructions primarily use the Data Address Generator (DAG) register set for addressing. The DAG register set consists of these registers:
The I (Index) registers and B (Base) registers always contain addresses of
8-bit bytes in memory. The Index registers contain an effective address.
The M (Modify) registers contain an offset value that is added to one of
the Index registers or subtracted from it.
The B and L (Length) registers define circular buffers. The B register contains the starting address of a buffer, and the L register contains the length
in bytes. Each L and B register pair is associated with the corresponding I
register. For example,
L0 and B0 are always associated with I0. However,
any M register may be associated with any I register. For example, I0 may
be modified by M3. For more information, see Chapter 5, “Data Address
Generators.”
Register File Instruction Summary
Table 2-1 lists the register file instructions. For more information about
assembly language syntax, see the Blackfin Processor Programming Reference.
The processor supports 32-bit words, 16-bit half words, and bytes. The
32- and 16-bit words can be integer or fractional, but bytes are always
integers. Integer data types can be signed or unsigned, but fractional data
types are always signed.
Table 2-3 illustrates the formats for data that resides in memory, in the
register file, and in the accumulators. In the table, the letter d represents
one bit, and the letter s represents one signed bit.
Some instructions manipulate data in the registers by sign-extending or
zero-extending the data to 32 bits:
•Instructions zero-extend unsigned data
•Instructions sign-extend signed 16-bit half words and 8-bit bytes
Other instructions manipulate data as 32-bit numbers. In addition, two
16-bit half words or four 8-bit bytes can be manipulated as 32-bit values.
For details, refer to the instructions in the Blackfin Processor Programming Reference.
In Table 2-2, note the meaning of these symbols:
•s = sign bit(s)
•d = data bit(s)
•“.” = decimal point by convention; however, a decimal point does
not literally appear in the number.
•Italics denotes data from a source other than adjacent bits.
Both internal and external memory are accessed in little endian byte order.
For more information, see “Memory Transaction Model” on page 6-65.
ALU Data Types
Operations on each ALU treat operands and results as either 16- or 32-bit
binary strings, except the signed division primitive (DIVS). ALU result status bits treat the results as signed, indicating status with the overflow flags
(AV0, AV1) and the negative flag (AN). Each ALU has its own sticky overflow flag, AV0S and AV1S. Once set, these bits remain set until cleared by
writing directly to the ASTAT register. An additional V flag is set or cleared
depending on the transfer of the result from both accumulators to the register file. Furthermore, the sticky VS bit is set with the V bit and remains
set until cleared.
The logic of the overflow bits (V, VS, AV0, AV0S, AV1, AV1S) is based on
two’s-complement arithmetic. A bit or set of bits is set if the Most Significant Bit (MSB) changes in a manner not predicted by the signs of the
operands and the nature of the operation. For example, adding two positive numbers must generate a positive result; a change in the sign bit
signifies an overflow and sets AVn, the corresponding overflow flags. Adding a negative and a positive number may result in either a negative or
positive result, but cannot cause an overflow.
The logic of the carry bits (
arithmetic. The bit is set if a carry is generated from bit 16 (the MSB).
The carry bits (AC0, AC1) are most useful for the lower word portions of a
multiword operation.
ALU results generate status information. For more information about
using ALU status, see “ALU Instruction Summary” on page 2-29.
Each multiplier produces results that are binary strings. The inputs are
interpreted according to the information given in the instruction itself
(whether it is signed multiplied by signed, unsigned multiplied by
unsigned, a mixture, or a rounding operation). The 32-bit result from the
multipliers is assumed to be signed; it is sign-extended across the full
40-bit width of the A0 or A1 registers.
The processor supports two modes of format adjustment: the fractional
mode for fractional operands (1.15 format with 1 sign bit and 15 fractional bits) and the integer mode for integer operands (16.0 format).
When the processor multiplies two 1.15 operands, the result is a 2.30
(2 sign bits and 30 fractional bits) number. In the fractional mode, the
multiplier automatically shifts the multiplier product left one bit before
transferring the result to the multiplier result register (A0, A1). This shift of
the redundant sign bit causes the multiplier result to be in 1.31 format,
which can be rounded to 1.15 format. The resulting format appears in
Figure 2-4.
In the integer mode, the left shift does not occur. For example, if the operands are in the 16.0 format, the 32-bit multiplier result would be in 32.0
format. A left shift is not needed and would change the numerical
representation. This result format appears in Figure 2-5.
Multiplier results generate status information when they update accumulators or when they are transferred to a destination register in the register
file. For more information, see “Multiplier Instruction Summary” on page
Many operations in the shifter are explicitly geared to signed (two’s-complement) or unsigned values—logical shifts assume unsigned magnitude
or binary string values, and arithmetic shifts assume two’s-complement
values.
The exponent logic assumes two’s-complement numbers. The exponent
logic supports block floating point, which is also based on two’s-complement fractions.
Shifter results generate status information. For more information about
using shifter status, see “Shifter Instruction Summary” on page 2-55.
Arithmetic Formats Summary
Table 2-3, Table 2-4, Table 2-5, and Table 2-6 summarize some of the
arithmetic characteristics of computational operations.
Table 2-3. ALU Arithmetic Formats
OperationOperand FormatsResult Formats
AdditionSigned or unsignedInterpret flags
SubtractionSigned or unsignedInterpret flags
Logical Binary stringSame as operands
DivisionExplicitly signed or unsignedSame as operands
Multiplication/Subtraction16.0 explicitly signed or
unsigned
32.0 not shifted
32.0 not shifted
32.0 not shifted
Table 2-6. Shifter Arithmetic Formats
OperationOperand FormatsResult Formats
Logical ShiftUnsigned binary stringSame as operands
Arithmetic ShiftSignedSame as operands
Exponent DetectSignedSame as operands
Using Multiplier Integer and Fractional Formats
For multiply-and-accumulate functions, the processor provides two
choices—fractional arithmetic for fractional numbers (1.15) and integer
arithmetic for integers (16.0).
For fractional arithmetic, the 32-bit product output is format adjusted—
sign-extended and shifted one bit to the left—before being added to accumulator
A0 or A1. For example, bit 31 of the product lines up with bit 32
of A0 (which is bit 0 of A0.X), and bit 0 of the product lines up with bit 1
of A0 (which is bit 1 of A0.W). The Least Significant Bit (LSB) is zero
filled. The fractional multiplier result format appears in Figure 2-4.
For integer arithmetic, the 32-bit product register is not shifted before
being added to A0 or A1. Figure 2-5 shows the integer mode result
placement.
With either fractional or integer operations, the multiplier output product
is fed into a 40-bit adder/subtracter which adds or subtracts the new product with the current contents of the A0 or A1 register to produce the final
40-bit result.
On many multiplier operations, the processor supports multiplier results
rounding (RND option). Rounding is a means of reducing the precision of a
number by removing a lower order range of bits from that number’s representation and possibly modifying the remaining portion of the number to
more accurately represent its former value. For example, the original number will have N bits of precision, whereas the new number will have only
M bits of precision (where N>M). The process of rounding, then, removes
N – M bits of precision from the number.
RND_MOD bit in the ASTAT register determines whether the RND option
The
provides biased or unbiased rounding. For unbiased rounding, set RND_MOD
bit = 0. For biased rounding, set
For most algorithms, unbiased rounding is preferred.
The convergent rounding method returns the number closest to the original. In cases where the original number lies exactly halfway between two
numbers, this method returns the nearest even number, the one containing an LSB of 0. For example, when rounding the 3-bit,
two’s-complement fraction 0.25 (binary 0.01) to the nearest 2-bit,
two’s-complement fraction, the result would be 0.0, because that is the
even-numbered choice of 0.5 and 0.0. Since it rounds up and down based
on the surrounding values, this method is called unbiased rounding.
Unbiased rounding uses the ALU’s capability of rounding the 40-bit result
at the boundary between bit 15 and bit 16. Rounding can be specified as
part of the instruction code. When rounding is selected, the output register contains the rounded 16-bit result; the accumulator is never rounded.
The accumulator uses an unbiased rounding scheme. The conventional
method of biased rounding adds a 1 into bit position 15 of the adder
chain. This method causes a net positive bias because the midway value
(when
A0.L/A1.L = 0x8000) is always rounded upward.
The accumulator eliminates this bias by forcing bit 16 in the result output
to 0 when it detects this midway point. Forcing bit 16 to 0 has the effect
of rounding odd A0.L/A1.L values upward and even values downward,
yielding a large sample bias of 0, assuming uniformly distributed values.
The following examples use x to represent any bit pattern (not all zeros).
The example in Figure 2-6 shows a typical rounding operation for A0; the
example also applies for A1.
The compensation to avoid net bias becomes visible when all lower 15 bits
are 0 and bit 15 is 1 (the midpoint value) as shown in Figure 2-7.
In Figure 2-7,
A0 bit 16 is forced to 0. This algorithm is employed on
every rounding operation, but is evident only when the bit patterns shown
in the lower 16 bits of the next example are present.
Biased Rounding
The round-to-nearest method also returns the number closest to the original. However, by convention, an original number lying exactly halfway
between two numbers always rounds up to the larger of the two. For
example, when rounding the 3-bit, two’s-complement fraction 0.25
(binary 0.01) to the nearest 2-bit, two’s-complement fraction, this method
returns 0.5 (binary 0.1). The original fraction lies exactly midway between
0.5 and 0.0 (binary 0.0), so this method rounds up. Because it always
rounds up, this method is called biased rounding.
RND_MOD bit is set (=1), the processor uses biased rounding
instead of unbiased rounding. When operating in biased rounding mode,
all rounding operations with A0.L/A1.L set to 0x8000 round up, rather
than only rounding odd values up. For an example of biased rounding, see
Table 2-7.
Table 2-7. Biased Rounding in Multiplier Operation
A0/A1 Before RNDBiased RND ResultUnbiased RND Result
0x00 0000 80000x00 0001 80000x00 0000 0000
0x00 0001 80000x00 0002 00000x00 0002 0000
0x00 0000 8001 0x00 0001 0001 0x00 0001 0001
0x00 0001 80010x00 0002 00010x00 0002 0001
0x00 0000 7FFF 0x00 0000 FFFF 0x00 0000 FFFF
0x00 0001 7FFF0x00 0001 FFFF0x00 0001 FFFF
Biased rounding affects the result only when the A0.L/A1.L register contains 0x8000; all other rounding operations work normally. This mode
allows more efficient implementation of bit specified algorithms that use
biased rounding (for example, the Global System for Mobile Communications (GSM) speech compression routines).
Truncation
Another common way to reduce the significant bits representing a number
is to simply mask off the N – M lower bits. This process is known as trun-cation and results in a relatively large bias. Instructions that do not
support rounding revert to truncation. The