Analog Devices, Inc. reserves the right to change this product without
prior notice. Information furnished by Analog Devices is believed to be
accurate and reliable. However, no responsibility is assumed by Analog
Devices for its use; nor for any infringement of patents or other rights of
third parties which may result from its use. No license is granted by implication or otherwise under the patent rights of Analog Devices, Inc.
Trademark and Service Mark Notice
The Analog Devices logo, Blackfin, the Blackfin logo, CrossCore,
SHARC, TigerSHARC, and VisualDSP++ are registered trademarks of
Analog Devices, Inc.
All other brand and product names are trademarks or service marks of
their respective owners.
CONTENTS
PREFACE
Purpose of This Manual ................................................................. xli
Thank you for purchasing and developing systems using Blackfin®
processors from Analog Devices, Inc.
Purpose of This Manual
The ADSP-BF538/ADSP-BF538F Blackfin Processor Hardware Reference
contains information about the architecture for the ADSP-BF538 processors. The architectural descriptions cover functional blocks, buses, and
ports, including all features and processes that they support.
For programming information, see the Blackfin Processor Programming
Reference. For timing, electrical, and package specifications, see the
ADSP-BF538/ADSP-BF538F Embedded Processor Data Sheet.
Intended Audience
The primary audience for this manual is a programmer who is familiar
with Analog Devices Blackfin processors. This manual assumes that the
audience has a working knowledge of the appropriate processor architecture and instruction set. Programmers who are unfamiliar with Analog
Devices processors can use this manual, but should supplement it with
other texts, such as the appropriate programming reference manuals and
data sheets, that describe their target architecture.
Chapter 1, “Introduction”
Provides a high level overview of the processor. Architectural descriptions
include functional blocks, buses, and ports, including features and processes they support.
Chapter 2, “Computational Units”
Describes the arithmetic/logic units (ALUs), multiplier/accumulator units
(MACs), shifter, and the set of video ALUs. The chapter also discusses
data formats, data types, and register files.
Chapter 3, “Operating Modes and States”
Describes the three operating modes of the processor: Emulation mode,
Supervisor mode, and User mode. The chapter also describes Idle state
and Reset state.
Chapter 4, “Program Sequencer”
Describes the operation of the program sequencer, which controls program flow by providing the address of the next instruction to be executed.
The chapter also discusses loops, subroutines, jumps, interrupts, and
exceptions.
Chapter 5, “Data Address Generators”
Describes the Data Address Generators (DAGs), addressing modes, how
to modify DAG and Pointer registers, memory address alignment, and
DAG instructions.
Chapter 6, “Memory”
Describes L1 memories. In particular, details their memory architecture,
memory model, memory transaction model, and memory-mapped registers (MMRs). Discusses the instruction, data, and scratchpad memory,
which are part of the Blackfin processor core.
Chapter 7, “Chip Bus Hierarchy”
Describes on-chip buses, including how data moves through the system.
The chapter also discusses the system memory map, major system components, and the system interconnects.
Chapter 8, “Dynamic Power Management”
Describes system reset and power-up configuration, system clocking and
control, and power management.
Chapter 9, “Direct Memory Access”
Describes the peripheral DMA and memory DMA controllers. The
peripheral DMA section discusses direct, block data movements between a
peripheral with DMA access and internal or external memory spaces.
The memory DMA section discusses memory-to-memory transfer capabilities among the processor memory spaces and the L1, external
synchronous, and asynchronous memories.
Chapter 10, “SPI Compatible Port Controllers”
Describes the serial peripheral interface (SPI) ports that provide an I/O
interface to a variety of SPI compatible peripheral devices.
Chapter 11, “Parallel Peripheral Interface”
Describes the parallel peripheral interface (PPI) of the processor. The PPI
is a half-duplex, bidirectional port accommodating up to 16 bits of data
and used for digital video and data converter applications.
Chapter 12, “Serial Port Controllers”
Describes the independent, synchronous serial port controllers that provide an I/O interface to a variety of serial peripheral devices.
Chapter 13, “UART Port Controllers”
Describes the Universal Asynchronous Receiver/Transmitter (UART)
ports, which convert data between serial and parallel formats and includes
modem control and interrupt handling hardware. The UARTs support
the half-duplex IrDA® SIR protocol as a mode-enabled feature.
Chapter 14, “General-Purpose Input/Output Port F”
Describes the GPIO Port F, including how to configure the pins as inputs
and outputs and how to generate interrupts.
Chapter 15, “General-Purpose Input/Output Ports C, D, E”
Describes the general-purpose I/O pins, including how to configure the
pins as inputs and outputs.
Chapter 16, “Timers”
Describes the general-purpose timers that can be configured in any of
three modes; the core timer that can generate periodic interrupts for a
variety of timing functions; and the watchdog timer that can implement
software watchdog functions, such as generating events to the Blackfin
processor core.
Chapter 17, “Real-Time Clock”
Describes a set of digital watch features of the processor, including time of
day, alarm, and stopwatch countdown.
Chapter 18, “External Bus Interface Unit”
Describes the External Bus Interface Unit of the processor. The chapter
also discusses the asynchronous memory interface, the SDRAM controller
(SDC), related registers, and SDC configuration and commands.
Chapter 19, “Controller Area Network (CAN) Module”
Describes the CAN module, a low bit rate serial interface intended for use
in applications where bit rates are typically up to 1Mbit/s.
Chapter 20, “Two Wire Interface Controllers”
Describes the Two Wire Interface (TWI) controllers, which allow a device
to interface to an Inter IC bus as specified by the Philips I
2
C Bus Specifica-
tion version 2.1 dated January 2000.
Chapter 21, “System Design”
Describes how to use the processor as part of an overall system. It includes
information about interfacing the processor to external memory chips, bus
timing and latency numbers, semaphores, and a discussion of the treatment of unused pins.
Chapter 22, “Blackfin Processor Debug”
Describes the Blackfin processor debug functionality, which can be used
for software debugging and complements some services often found in an
operating system.
Appendix A, “Blackfin Processor Core MMR Assignments”
Lists the core memory-mapped registers, their addresses, and cross-references to text.
Appendix B, “System MMR Assignments”
Lists the system memory-mapped registers, their addresses, and cross-references to text.
Appendix C, “Test Features”
Describes test features for the processor; discusses the JTAG standard,
boundary-scan architecture, instruction and boundary registers, and public instructions.
Appendix D, “Numeric Formats”
Describes various aspects of the 16-bit data format. The chapter also
describes how to implement a block floating-point format in software.
What’s New in This Manual
This is Revision 1.0 of the ADSP-BF538/ADSP-BF538F Blackfin Processor
Hardware Reference. Modifications and corrections based on errata reports
against Preliminary Revision 0.1 of this manual have been made.
Product information can be obtained from the Analog Devices Web site,
VisualDSP++ online Help system, and a technical library CD.
Analog Devices Web Site
The Analog Devices Web site, www.analog.com, provides information
about a broad range of products—analog integrated circuits, amplifiers,
converters, and digital signal processors.
To access a complete technical library for each processor family, go to
http://www.analog.com/processors/technical_library. The manuals
selection opens a list of current manuals related to the product as well as a
link to the previous revisions of the manuals. When locating your manual
title, note a possible errata check mark next to the title that leads to the
current correction report against the manual.
Also note, MyAnalog.com is a free feature of the Analog Devices Web site
that allows customization of a Web page to display only the latest information about products you are interested in. You can choose to receive
weekly e-mail notifications containing updates to the Web pages that meet
your interests, including documentation errata against all manuals.
MyAnalog.com provides access to books, application notes, data sheets,
code examples, and more.
MyAnalog.com to sign up. If you are a registered user, just log on.
Online documentation comprises the VisualDSP++ Help system, software
tools manuals, hardware tools manuals, processor manuals, Dinkum
Abridged C++ library, and FLEXnet License Tools software documentation. You can search easily across the entire VisualDSP++ documentation
set for any topic of interest.
For easy printing, supplementary Portable Documentation Format (.pdf)
files for all manuals are provided on the VisualDSP++ installation CD.
Each documentation file type is described as follows.
File Description
.chmHelp system files and manuals in Microsoft help format
.htm or
.html
.pdfVisualDSP++ and processor manuals in PDF format. Viewing and printing the
Dinkum Abridged C++ library and FLEXnet License Tools software documentation. Viewing and printing the .html files requires a browser, such as Internet
Explorer 6.0 (or higher).
.pdf files requires a PDF reader, such as Adobe Acrobat Reader (4.0 or higher).
Technical Library CD
The technical library CD contains seminar materials, product highlights, a
selection guide, and documentation files of processor manuals, VisualDSP++ software manuals, and hardware tools manuals for the following
processor families: Blackfin, SHARC®, TigerSHARC®, ADSP-218x, and
ADSP-219x.
To order the technical library CD, go to http://www.analog.com/proces-
sors/technical_library
processor, click the request CD check mark, and fill out the order form.
Data sheets, which can be downloaded from the Analog Devices Web site,
change rapidly, and therefore are not included on the technical library
CD. Technical manuals change periodically. Check the Web site for the
latest manual revisions and associated documentation errata.
Conventions
Text conventions used in this manual are identified and described as follows. Note that additional conventions, which apply only to specific
chapters, may appear throughout this document.
ExampleDescription
Close command
(File menu)
{this | that}Alternative items in syntax descriptions appear within curly brackets
[this | that]Optional items in syntax descriptions appear within brackets and sep-
[this,…]Optional item lists in syntax descriptions appear within brackets
SECTIONCommands, directives, keywords, and feature names are in text with
.
filenameNon-keyword placeholders appear in text with italic style format.
SWRST Software
Reset register
TMR0, RESETPin names appear in UPPERCASE and a special typeface.
Titles in reference sections indicate the location of an item within the
VisualDSP++ environment’s menu system (for example, the Close
command appears on the File menu).
and separated by vertical bars; read the example as this or that. One
or the other is required.
arated by vertical bars; read the example as an optional
delimited by commas and terminated with an ellipse; read the example
as an optional comma-separated list of this.
letter gothic font.
Register names appear in UPPERCASE and a special typeface. The
descriptive names of registers are in mixed case and regular typeface.
Register, bit, and pin names in the text may refer to groups of registers
or pins:
A lowercase x in a register name (DRx) indicates a set of registers (for
example, DR2, DR1, and DR0).
A colon between numbers within brackets indicates a range of registers
or pins (for example, I[3:0] indicates I3, I2, I1, and I0; AMS[3:0]
indicates AMS3, AMS2, AMS1, and AMS0).
Note: For correct operation, ...
A Note: provides supplementary information on a related topic. In the
online version of this book, the word Note appears instead of this
symbol.
Caution: Incorrect device operation may result if ...
Caution: Device damage may result if ...
A Caution: identifies conditions or inappropriate usage of the product
that could lead to undesirable results or product damage. In the online
version of this book, the word Caution appears instead of this symbol.
Warn in g: Injury to device users may result if ...
A Warning: identifies conditions or inappropriate usage of the product
that could lead to conditions that are potentially hazardous for devices
users. In the online version of this book, the word Wa rnin g appears
instead of this symbol.
The ADSP-BF538 Blackfin® processor is derived from the ADSP-BF533
processor, offering similar performance and ease of use capabilities, but
with enhanced peripheral features, targeted for the automotive and industrial markets. Common peripherals share the same features and functions.
Any time a processor is referenced by name (for example, ADSP-BF538),
the information provided applies to the processor derivatives with on-chip
flash memory as well (for example, ADSP-BF538F4 and
ADSP-BF538F8).
The Blackfin processor core architecture combines a dual-MAC signal
processing engine, an orthogonal RISC-like microprocessor instruction
set, flexible single instruction multiple data (SIMD) capabilities, and multimedia features into a single instruction set architecture.
Blackfin products feature dynamic power management, the ability to vary
both the voltage and frequency of operation, which optimizes the power
consumption profile to the specific task.
Purpose of this Manual
This Blackfin processor hardware reference provides architectural information about enhanced Blackfin processors that include the ADSP-BF538
processors. The architectural descriptions cover functional blocks, buses,
and ports, including all features and processes that they support.
For programming information, see the Blackfin Processor Programming
Reference. For timing, electrical, and package specifications, see the
ADSP-BF538/ADSP-BF538F Embedded Processor Data Sheet.
Table 1-1 can be used to identify chapters from the ADSP-BF533 Blackfin
Processor Hardware Reference that are applicable to ADSP-BF538 Blackfin
products.
For programmers familiar with the ADSP-BF533/BF532/BF531 processors, the ADSP-BF538 is very similar, as they are built from the same
processor core. The ADSP-BF538 uses many of the same peripherals that
are found on the ADSP-BF533/BF532/BF531 (see Table 1-1).
Table 1-1 is intended as a guide that can be used to identify which chap-
ters of this manual are the same/similar or new, when compared to the
ADSP-BF533 Blackfin Processor Hardware Reference chapters; such that an
experienced programmer does not need to read every chapter of this manual to understand the operation of the ADSP-BF538.
•No changes—means that the reader can refer directly to the
ADSP-BF533 Blackfin Processor Hardware Reference for this
chapter.
•Changed—means that the ADSP-BF533 Blackfin Processor Hard-ware Reference chapter has been copied into this book, but some
changes have been made or features added.
•New—means that this is an entirely new chapter and this is the
only source of reference for the material
These peripherals are connected to the core via several high bandwidth
buses, as shown in Figure 1-1.
Figure 1-1. Processor Block Diagram
All of the peripherals, except for general-purpose I/O, real-time clock,
CAN, timers, and TWI are supported by a flexible DMA structure. There
are also four separate memory DMA channels dedicated to data transfers
between the processor memory spaces, which include external SDRAM
and asynchronous memory. Multiple on-chip buses provide enough bandwidth to keep the processor core running even when there is also activity
on all of the on-chip and external peripherals.
The processor core contains two 16-bit multipliers, two 40-bit accumulators, two 40-bit arithmetic logic units (ALUs), four 8-bit video ALUs, and
a 40-bit shifter, as shown in Figure 1-2. The computational units process
8-, 16-, or 32-bit data from the register file.
The compute register file contains eight 32-bit registers. When performing compute operations on 16-bit operand data, the register file operates
as 16 independent 16-bit registers. All operands for compute operations
come from the multiported register file and instruction constant fields.
Each MAC can perform a 16- by 16-bit multiply per cycle, with accumulation to a 40-bit result. Signed and unsigned formats, rounding, and
saturation are supported.
The ALUs perform a traditional set of arithmetic and logical operations
on 16-bit or 32-bit data. Many special instructions are included to accelerate various signal processing tasks. These include bit operations such as
field extract and population count, modulo 232 multiply, divide primitives, saturation and rounding, and sign/exponent detection. The set of
video instructions include byte alignment and packing operations, 16-bit
and 8-bit adds with clipping, 8-bit average operations, and 8-bit subtract/absolute value/accumulate (SAA) operations. Also provided are the
compare/select and vector search instructions. For some instructions, two
16-bit ALU operations can be performed simultaneously on register pairs
(a 16-bit high half and 16-bit low half of a compute register). By also
using the second ALU, quad 16-bit operations are possible.
The 40-bit shifter can deposit data and perform shifting, rotating, normalization, and extraction operations.
A program sequencer controls the instruction execution flow, including
instruction alignment and decoding. For program flow control, the
sequencer supports PC relative and indirect conditional jumps (with static
branch prediction), and subroutine calls. Hardware is provided to support
zero-overhead looping. The architecture is fully interlocked, meaning that
there are no visible pipeline effects when executing instructions with data
dependencies.
The address arithmetic unit provides two addresses for simultaneous dual
fetches from memory. It contains a multiported register file consisting of
four sets of 32-bit index, modify, length, and base registers (for circular
buffering), and eight additional 32-bit pointer registers (for C-style
indexed stack manipulation).
Blackfin products support a modified Harvard architecture in combination with a hierarchical memory structure. Level 1 (L1) memories typically
operate at the full processor speed with little or no latency. At the L1 level,
the instruction memory holds instructions only. The two data memories
hold data, and a dedicated scratchpad data memory stores stack and local
variable information.
In addition, multiple L1 memory blocks are provided, which may be configured as a mix of SRAM and cache. The memory management unit
(MMU) provides memory protection for individual tasks that may be
operating on the core and may protect system registers from unintended
access.
The architecture provides three modes of operation: user, supervisor, and
emulation. User mode has restricted access to a subset of system resources,
thus providing a protected software environment. Supervisor and emulation modes have unrestricted access to the system and core resources.
The Blackfin instruction set is optimized so that 16-bit opcodes represent
the most frequently used instructions. Complex DSP instructions are
encoded into 32-bit opcodes as multifunction instructions. Blackfin products support a limited multi-issue capability, where a 32-bit instruction
can be issued in parallel with two 16-bit instructions. This allows the programmer to use many of the core resources in a single instruction cycle.
The Blackfin assembly language uses an algebraic syntax. The architecture
is optimized for use with the C compiler.
The Blackfin architecture structures memory as a single, unified 4G byte
address space using 32-bit addresses. All resources, including internal
memory, external memory, and I/O control registers, occupy separate sections of this common address space. The memory portions of this address
space are arranged in a hierarchical structure to provide a good cost/performance balance of some very fast, low latency on-chip memory as cache
or SRAM, and larger, lower-cost and lower performance off-chip memory
systems. Table 1-2 shows the memory allocation for the ADSP-BF538.
Table 1-2. Memory Comparison
Type of MemoryMemory size
Instruction SRAM/Cache16 KB
Instruction SRAM64 KB
Instruction ROM-
Data SRAM/Cache32 KB
Data SRAM32 KB
Scratchpad4 KB
Total148 KB
The L1 memory system is the primary highest performance memory available to the core. The off-chip memory system, accessed through the
External Bus Interface Unit (EBIU), provides expansion with SDRAM,
flash memory, and SRAM, optionally accessing up to 132M bytes of physical memory.
The memory DMA controller provides high bandwidth data movement
capability. It can perform block transfers of code or data between the
internal memory and the external memory spaces.
The processor has three blocks of on-chip memory that provide high
bandwidth access to the core:
L1 instruction memory, consisting of SRAM and a 4-way set-associative
cache. On ROM-enabled parts, this also includes a user-definable ROM
region. This memory is accessed at full processor speed.
L1 data memory, consisting of SRAM and/or a 2-way set-associative
cache. This memory block is accessed at full processor speed.
L1 scratchpad RAM, which runs at the same speed as the L1 memories but
is only accessible as data SRAM and cannot be configured as cache
memory.
External Memory
External (off-chip) memory is accessed via the EBIU. This 16-bit interface
provides a glueless connection to a bank of synchronous DRAM
(SDRAM) and as many as four banks of asynchronous memory devices
including flash memory, EPROM, ROM, SRAM, and memory-mapped
I/O devices.
The PC133-compliant SDRAM controller can be programmed to interface to up to 128M bytes of SDRAM.
The asynchronous memory controller can be programmed to control up
to four banks of devices. Each bank occupies a 1M byte segment regardless
of the size of the devices used, so that these banks are only contiguous if
each is fully populated with 1M byte of memory.
Blackfin products do not define a separate I/O space. All resources are
mapped through the flat 32-bit address space. On-chip I/O devices have
their control registers mapped into memory-mapped registers (MMRs) at
addresses near the top of the 4G byte address space. These are separated
into two smaller blocks: one contains the control MMRs for all core functions and the other contains the registers needed for setup and control of
the on-chip peripherals outside of the core. The MMRs are accessible only
in supervisor mode. They appear as reserved space to on-chip peripherals.
Event Handling
The event controller on the processor handles all asynchronous and synchronous events to the processor. The processor event handling supports
both nesting and prioritization. Nesting allows multiple event service routines to be active simultaneously. Prioritization ensures that servicing a
higher priority event takes precedence over servicing a lower priority
event. The controller provides support for five different types of events:
•Emulation
Causes the processor to enter emulation mode, allowing command
and control of the processor via the JTAG interface.
•Reset
Resets the processor.
•Nonmaskable Interrupt (NMI)
The software watchdog timer or the NMI input signal to the processor generates this event. The NMI event is frequently used as a
power-down indicator to initiate an orderly shutdown of the
system.
Synchronous to program flow. That is, the exception is taken
before the instruction is allowed to complete. Conditions such as
data alignment violations and undefined instructions cause
exceptions.
•Interrupts
Asynchronous to program flow. These are caused by input pins,
timers, and other peripherals.
Each event has an associated register to hold the return address and an
associated return-from-event instruction. When an event is triggered, the
state of the processor is saved on the supervisor stack.
The processor event controller consists of two stages: the core event controller (CEC) and the system interrupt controllers (SIC). The CEC works
with the SIC to prioritize and control all system events. Conceptually,
interrupts from the peripherals arrive at the SIC and are routed directly
into the general-purpose interrupts of the CEC.
Core Event Controller (CEC)
The CEC supports nine general-purpose interrupts (IVG15–7), in addition
to the dedicated interrupt and exception events. Of these general-purpose
interrupts, the two lowest-priority interrupts (
mended to be reserved for software interrupt handlers, leaving seven
prioritized interrupt inputs to support peripherals.
IVG15–14) are recom-
System Interrupt Controllers (SICx)
The system interrupt controllers provide the mapping and routing of
events from the many peripheral interrupt sources to the prioritized general-purpose interrupt inputs of the CEC. Although the processor
provides a default mapping, the user can alter the mappings and priorities
of interrupt events by writing the appropriate values into the Interrupt
Assignment Registers (
SIC_IARx).
DMA Support
The processor has two independent DMA controllers that support automated data transfers with minimal overhead for the core. DMA transfers
can occur between the internal memories and any of its DMA-capable
peripherals. Additionally, DMA transfers can be accomplished between
any of the DMA-capable peripherals and external devices connected to the
external memory interfaces, including the SDRAM controller and the
asynchronous memory controller. DMA-capable peripherals include the
SPORTs, SPI ports, UARTs, and PPI. Each individual DMA-capable
peripheral has at least one dedicated DMA channel.
The DMA controllers support both 1-dimensional (1D) and 2-dimensional (2D) DMA transfers. DMA transfer initialization can be
implemented from registers or from sets of parameters called descriptor
blocks.
The 2D DMA capability supports arbitrary row and column sizes up to
64K elements by 64K elements, and arbitrary row and column step sizes
up to +/- 32K elements. Furthermore, the column step size can be less
than the row step size, allowing implementation of interleaved data
streams. This feature is especially useful in video applications where data
can be de-interleaved on the fly.
Examples of DMA types supported include:
•A single, linear buffer that stops upon completion
•A circular, auto-refreshing buffer that interrupts on each full or
fractionally full buffer
•1-D or 2-D DMA using a linked list of descriptors
•2-D DMA using an array of descriptors specifying only the base
DMA address within a common page
In addition to the dedicated peripheral DMA channels, there are four
separate memory DMA channels provided for transfers between the various memories of the system. This enables transfers of blocks of data
between any of the memories—including external SDRAM, ROM,
SRAM, and flash memory—with minimal processor intervention. Memory DMA transfers can be controlled by a very flexible descriptor-based
methodology or by a standard register-based autobuffer mechanism.
External Bus Interface Unit
The external bus interface unit on the processor interfaces with a wide
variety of industry-standard memory devices. The controller consists of an
SDRAM controller and an asynchronous memory controller.
PC133 SDRAM Controller
The SDRAM controller provides an interface to a single bank of industry-standard SDRAM devices or DIMMs. Fully compliant with the
PC133 SDRAM standard, the bank can be configured to contain between
16 and 128M bytes of memory.
A set of programmable timing parameters is available to configure the
SDRAM bank to support slower memory devices. The memory bank is
16 bits wide for minimum device count and lower system cost.
The asynchronous memory controller provides a configurable interface for
up to four separate banks of memory or I/O devices. Each bank can be
independently programmed with different timing parameters. This allows
connection to a wide variety of memory devices, including SRAM, ROM,
and flash EPROM, as well as I/O devices that interface with standard
memory control lines. Each bank occupies a 1M byte window in the processor address space, but if not fully populated, these are not made
contiguous by the memory controller. The banks are 16 bits wide, for
interfacing to a range of memories and I/O devices.
Parallel Peripheral Interface
The processor provides a parallel peripheral interface (PPI) that can connect directly to parallel A/D and D/A converters, video encoders and
decoders, and other general-purpose peripherals. The PPI consists of a
dedicated input clock pin, up to 3 frame synchronization pins, and up to
16 data pins. The input clock supports parallel data rates up to SCLK/2,
while the synchronization signals can be configured as either inputs or
outputs.
The PPI supports a variety of general-purpose and ITU-R 656 modes of
operation. In general-purpose mode, the PPI provides half-duplex,
bi-directional data transfer with up to 16 bits of data. Up to 3 frame synchronization signals are also provided for controlling DMA transfers. In
ITU-R 656 mode, the PPI provides half-duplex, bidirectional data transfer with up to 10 bits of data. Additionally, on-chip decode of embedded
start-of-line (SOL) and start-of-field (SOF) preamble packets is
supported.
The GP modes of the PPI are intended to suit a wide variety of data capture and transmission applications. Three distinct sub-modes are
supported:
Input mode - Frame syncs and data are inputs into the PPI.
Frame capture mode - Frame syncs are outputs from the PPI, but
data are inputs.
Output mode - Frame syncs and data are outputs from the PPI.
Input Mode
This mode is intended for ADC applications, as well as video communication with hardware signaling. In its simplest form, PPI_FS1 is an external
frame sync input that controls when to read data. The PPI_DELAY MMR
allows for a delay (in PPI_CLK cycles) between reception of this frame sync
and the initiation of data reads. The number of input data samples is
user-programmable and defined by the contents of the PPI_COUNT register.
Data widths of 8, 10, 11, 12, 13, 14, 15, and 16 bits are supported, as
programmed by the PPI_CONTROL register.
Frame Capture Mode
This mode allows the video source(s) to act as a slave (for example, for
frame capture). The processor controls when to read from the video
source(s).
PPI_FS1 is an HSYNC output and PPI_FS2 is a VSYNC output.
Introduction
Output Mode
This mode is used for transmitting video or other data with up to three
output frame syncs. Typically, a single frame sync is appropriate for data
converter applications, whereas two or three frame syncs could be used for
sending video with hardware signaling.
ITU-R 656 Mode Descriptions
The ITU-R 656 modes of the PPI are intended to suit a wide variety of
video capture, processing, and transmission applications. Three distinct
sub-modes are supported:
Active Video Only Mode
Vertical Blanking Only Mode
Entire Field Mode
Active Video Only Mode
This mode is used when only the active video portion of a field is of interest and not any of the blanking intervals. The PPI does not read in any
data between the end of active video (EAV) and start of active video (SAV)
preamble symbols, or any data present during the vertical blanking intervals. In this mode, the control byte sequences are not stored to memory;
they are filtered by the PPI. After synchronizing to the start of Field 1, the
PPI ignores incoming samples until it sees an SAV code. The user specifies
the number of active video lines per frame (in PPI_COUNT register).
Vertical Blanking Interval Mode
In this mode, the PPI only transfers vertical blanking interval (VBI) data,
as well as horizontal blanking information and control byte sequences on
VBI lines.
In this mode, the entire incoming bit stream is read in through the PPI.
This includes active video, control preamble sequences, and ancillary data
that may be embedded in horizontal and vertical blanking intervals. Data
transfer starts immediately after synchronization to Field 1.
Serial Ports (SPORTs)
The processor incorporates four identical dual-channel synchronous serial
ports (SPORT0, SPORT1, SPORT2 and SPORT3) for serial and multiprocessor communications. The SPORTs support the following features:
•Bidirectional, I2S capable operation
Each SPORT has two sets of independent transmit and receive
pins, enabling 16 channels of I2S stereo audio.
•Buffered (8 deep) transmit and receive ports
Each port has a data register for transferring data words to and
from other processor components and shift registers for shifting
data in and out of the data registers.
•Clocking
Each transmit and receive port can either use an external serial
clock or can generate its own in a wide range of frequencies.
Each SPORT supports serial data words from 3 to 32 bits in
length, transferred in most significant bit first or least significant
bit-first format.
•Framing
Each transmit and receive port can run with or without frame sync
signals for each data word. Frame sync signals can be generated
internally or externally, active high or low, and with either of two
pulse widths and early or late frame sync.
•Companding in hardware
Each SPORT can perform A-law or µ-law companding according
to ITU recommendation G.711. Companding can be selected on
the transmit and/or receive channel of the SPORT without additional latencies.
•DMA operations with single cycle overhead
Each SPORT can automatically receive and transmit multiple buffers of memory data. The processor can link or chain sequences of
DMA transfers between a SPORT and memory.
•Interrupts
Each transmit and receive port generates an interrupt upon completing the transfer of a data word or after transferring an entire
data buffer or buffers through DMA.
•Multichannel capability
Each SPORT supports 128 channels out of a 1024-channel window and is compatible with the H.100, H.110, MVIP-90, and
HMVIP standards.
The processor has three SPI-compatible ports that enable the processor to
communicate with multiple SPI-compatible devices.
The SPI interface uses three pins for transferring data: two data pins and a
clock pin. An SPI chip-select input pin lets other SPI devices select the
processor as a slave. SPI chip-select output pins let the processor select
other SPI devices. SPI0 has one chip-select input pin and seven chip-select
output pins. All are reconfigurable GPIO port F pins. The remaining two
SPI instantiations have one chip-select input pin and one chip-select output pin. All of these chip-selects are reconfigurable GPIO port D pins.
Using these pins, the SPI port provides a full-duplex, synchronous, serial
interface, which supports both master and slave modes and multimaster
environments.
The SPI port baud rate and clock phase/polarities are programmable, and
each SPI port has an integrated DMA controller, configurable to support
either transmit or receive data streams. The SPI DMA controllers can only
service unidirectional accesses at any given time.
During transfers, the SPI ports simultaneously transmit and receive by
serially shifting data in and out of their two serial data lines. The serial
clock line synchronizes the shifting and sampling of data on the two serial
data lines.
Timers
There are four general-purpose programmable timer units in the processor. Three timers have an external pin that can be configured either as a
pulse width modulator (PWM) or timer output, as an input to clock the
timer, or as a mechanism for measuring pulse widths of external events.
These timer units can be synchronized to an external clock input connected to the
internal SCLK.
The timer units can be used in conjunction with UART0 to measure the
width of the pulses in the data stream to provide an autobaud detect function for a serial channel.
The timers can generate interrupts to the processor core to provide periodic events for synchronization, either to the processor clock or to a count
of external signals.
In addition to the three general-purpose programmable timers, a fourth
timer is also provided. This extra timer is clocked by the internal processor
clock and is typically used as a system tick clock for generation of operating system periodic interrupts.
PF1 pin, an external clock input to the PPI_CLK pin, or to the
UART Ports
The processor has three half-duplex universal asynchronous receiver/transmitter (UART) ports, which are fully compatible with PC-standard
UARTs. The UART ports provide a simplified UART interface to other
peripherals or hosts, providing half-duplex, DMA-supported, asynchronous transfers of serial data. The UART ports include support for 5 to 8
data bits; 1 or 2 stop bits; and none, even, or odd parity. The UART ports
support two modes of operation:
•programmed I/O (PIO)
The processor sends or receives data by writing or reading
I/O-mapped UART registers. The data is double buffered on both
transmit and receive.
The DMA controller transfers both transmit and receive data. This
reduces the number and frequency of interrupts required to transfer data to and from memory. Each UART has two dedicated
DMA channels, one for transmit and one for receive. These DMA
channels have lower priority than most DMA channels because of
their relatively low service rates.
The UART port baud rate, serial data format, error code generation and
status, and interrupts can be programmed to support the following:
Wide range of bit rates
Data formats from 7 to 12 bits per frame
Generation of maskable interrupts to the processor by both transmit and receive operations
In conjunction with the general-purpose timer functions, autobaud detection is supported by UART0.
The capabilities of the UARTs are further extended with support for the
Infrared Data Association (IrDA®) Serial Infrared Physical Layer Link
Specification (SIR) protocol.
Controller Area Network Port
The controller area network port (CAN) provides a two wire interface for
communication with other CAN compliant devices. Features of the CAN
port include error detection, multimastering, prioritization of messages
through arbitration, and a 32 16 entry mailbox RAM. Transfer rates typically approach 1M bps.
The processor has two TWI (two wire interface) ports that support synchronous serial transfers over a two wire system with I2C compliant
devices. Features include simultaneous master and slave operation, multimaster arbitration, 400K bps data rates, master clock synchronization, and
7-bit addressing.
Real-Time Clock
The Blackfin real-time clock (RTC) provides a robust set of digital watch
features, including current time, stopwatch, and alarm. The RTC is
clocked by a 32.768 KHz crystal external to the processor. The RTC
peripheral has dedicated power supply pins, so that it can remain powered
up and clocked even when the rest of the processor is in a low power state.
The RTC provides several programmable interrupt options, including
interrupt per second, minute, hour, or day clock ticks, interrupt on programmable stopwatch countdown, or interrupt at a programmed alarm
time.
The 32.768 KHz input clock frequency is divided down to a 1 Hz signal
by a prescaler. The counter function of the timer consists of four counters:
a 60 second counter, a 60 minute counter, a 24 hours counter, and a
32768 day counter.
When enabled, the alarm function generates an interrupt when the output
of the timer matches the programmed value in the alarm control register.
There are two alarms: The first alarm is for a time of day. The second
alarm is for a day and time of that day.
The stopwatch function counts down from a programmed value, with one
minute resolution. When the stopwatch is enabled and the counter underflows, an interrupt is generated.
Like the other peripherals, the RTC can wake up the processor from a low
power state upon generation of any RTC wake-up event.
Watchdog Timer
The processor includes a 32-bit timer that can be used to implement a
software watchdog function. A software watchdog can improve system
availability by forcing the processor to a known state through generation
of a hardware reset, non-maskable interrupt (NMI), or general- purpose
interrupt, if the timer expires before being reset by software. The programmer initializes the count value of the timer, enables the appropriate
interrupt, then enables the timer. Thereafter, the software must reload the
counter before it counts to zero from the programmed value. This protects
the system from remaining in an unknown state where software that
would normally reset the timer has stopped running due to an external
noise condition or software error.
If configured to generate a hardware reset, the watchdog timer resets both
the CPU and the peripherals. After a reset, software can determine if the
watchdog was the source of the hardware reset by interrogating a status bit
in the watchdog control register.
The timer is clocked by the system clock (SCLK), at a maximum frequency
SCLK
.
of f
General-Purpose I/O
There are up to 54 general-purpose I/O (GPIO) pins on the processor
which span four ports—C, D, E, and F.
The GPIO ports C, D, and E functionality is muxed with peripheral pins.
By default, the peripheral function is selected. Through software, the
GPIO functionality can be selected for the pin instead.
The GPIO pins may be individually selected on a pin by pin basis; so that,
for example, if all the pins of a SPORT are not required, the remainder
may be used as GPIO. GPIO interrupt sensitivity registers – The two
GPIO interrupt sensitivity registers specify whether individual PFx pins
are level or edge sensitive and specify—if edge sensitive—whether just the
rising edge or both the rising and falling edges of the signal are significant.
One register selects the type of sensitivity, and one register selects which
edges are significant for edge sensitivity.
Clock Signals
The processor can be clocked by an external crystal, a sine wave input, or a
buffered, shaped clock derived from an external clock oscillator.
This external clock connects to the Blackfin CLKIN pin. CLKIN input cannot be halted, changed, or operated below the specified frequency during
normal operation. This clock signal should be a TTL-compatible signal.
The core clock (CCLK) and system peripheral clock (SCLK) are derived from
the input clock (CLKIN) signal. An on-chip PLL is capable of multiplying
the CLKIN signal by a user programmable multiplication factor. The
default multiplier is 10x, but it can be modified by a software instruction
sequence. On-the-fly frequency changes can be made by simply writing to
PLL_DIV register.
the
All on-chip peripherals are clocked by the system clock (SCLK). The system
clock frequency is programmable by means of the
PLL_DIV register.
The CAN clock is derived from the system clock (
SSEL[3:0] bits of the
SCLK), through a further
divisor. Careful selection of the input clock and SCLK is important to
obtain the correct CAN clock frequency.
The processor provides four operating modes, each with a different performance/power profile. In addition, dynamic power management provides
the control functions to dynamically alter the processor core supply voltage to further reduce power dissipation. Control of clocking to each of the
peripherals also reduces power consumption.
Full On Operating Mode (Maximum Performance)
In the full on mode, the phase-locked loop (PLL) is enabled, not bypassed,
providing the maximum operational frequency. This is the normal execution state in which maximum performance can be achieved. The processor
core and all enabled peripherals run at full speed.
Active Operating Mode (Moderate Power Savings)
In the active mode, the PLL is enabled, but bypassed. Because the PLL is
bypassed, the Blackfin core clock (CCLK) and system clock (SCLK) run at
the input clock (CLKIN) frequency. In this mode, the CLKIN to VCO multiplier ratio can be changed, although the changes are not realized until the
full on mode is entered. DMA access is available to appropriately configured L1 memories.
In the active mode, it is possible to disable the PLL through the PLL control register (
transitioning to the full on or sleep modes.
PLL_CTL). If disabled, the PLL must be re-enabled before
Sleep Operating Mode (High Power Savings)
The sleep mode reduces power consumption by disabling the clock to the
processor core. The sleep mode reduces power dissipation by disabling the
clock to the processor core (
ever, continue to operate in this mode. Typically an external event or
RTC activity wakes up the processor. When in the sleep mode, assertion
of any interrupt causes the processor to sense the value of the bypass bit
(
BYPASS) in the PLL control register (PLL_CTL). If bypass is disabled, the
processor transitions to the full on mode. If bypass is enabled, the processor transitions to the active mode.
When in the sleep mode, system DMA access to L1 memory is not
supported.
Deep Sleep Operating Mode (Maximum
Power Savings)
The deep sleep mode maximizes power savings by disabling the clocks to
the processor core and to all synchronous systems. Asynchronous systems,
such as the RTC, may still be running, but can not access internal
resources or external memory. This powered-down mode can only be
exited by assertion of the reset interrupt or by an asynchronous interrupt
generated by the RTC. When in deep sleep mode, assertion of the reset
interrupt or the RTC asynchronous interrupt causes the processor to transition to the active mode.
Hibernate State
For lowest possible power dissipation, this state allows the internal supply
(V
DDINT
running. Although not strictly an operating mode like the four modes
detailed above, it is illustrative to view it as such.
The processor can be programmed to wake up from hibernate by reset, the
RTC, a general-purpose event, or the CAN.
) to be powered down, while keeping the I/O supply (V
DDEXT
)
Voltage Regulation
Voltage Regulation
The processor provides an on-chip voltage regulator that can generate
internal voltage levels from an external 2.25 V to 3.6 V supply. The regulator controls the internal logic voltage levels and is programmable with
the voltage regulator control register (VR_CTL) in increments of 50 mV.
The regulator can also be disabled and bypassed at user discretion.
Boot Modes
The processor has three mechanisms for automatically loading internal L1
instruction memory after a reset. A fourth mode is provided to execute
from external memory, bypassing the boot sequence:
•Execute from 16-bit external memory—Execution starts from
address 0x2000 0000 with 16-bit packing. The boot ROM is
bypassed in this mode. All configuration settings are set for the
slowest device possible (3-cycle hold time; 15-cycle R/W access
times; 4-cycle setup).
•Boot from 8-bit or 16-bit flash memory—The 8-bit flash boot routine located in boot ROM memory space is set up using
asynchronous memory bank 0. All configuration settings are set for
the slowest device possible (3-cycle hold time; 15-cycle R/W access
times; 4-cycle setup). For ADSP-BF538F processors, the on-chip
flash memory can be booted from when the flash is mapped to
asynchronous bank 0.
•Boot from an SPI host in SPI slave Mode—The SPI0 is configured
as an SPI slave device and a host is used to boot the processor.
•Boot from an 8-/16-/24-bit addressable SPI in SPI master mode—
Support for Atmel AT45DB041B, AT45DB081B, AT45D161B
Data Flash® devices. The SPI0 uses the PF2 output pin to select a
single SPI EEPROM device.
For each of the boot modes, a 10-byte header is first read from an external
memory device. The header specifies the number of bytes to be transferred
and the memory destination address. Multiple memory blocks may be
loaded by any boot sequence. Once all blocks are loaded, program execution commences from the start of L1 instruction SRAM (0xFFA0 0000).
In addition, bit 4 of the reset configuration register can be set by application code to bypass the normal boot sequence during a software reset. For
this case, the processor jumps directly to the beginning of L1 instruction
memory.
To augment the boot modes, a secondary software loader is provided that
adds additional booting mechanisms. This secondary loader provides the
capability to boot from 16-bit flash memory, fast flash, variable baud rate
memory, and other sources.
Instruction Set Description
The Blackfin processor family assembly language instruction set employs
an algebraic syntax designed for ease of coding and readability. The
instructions have been specifically tuned to provide a flexible, densely
encoded instruction set that compiles to a very small final memory size.
The instruction set also provides fully featured multifunction instructions
that allow the programmer to use many of the processor core resources in
a single instruction. Coupled with many features more often seen on
microcontrollers, this instruction set is very efficient when compiling C
and C++ source code. In addition, the architecture supports both user
(algorithm/application code) and supervisor (O/S kernel, device drivers,
debuggers, ISRs) modes of operation, allowing multiple levels of access to
core resources.
The assembly language, which takes advantage of the Blackfin unique
architecture, offers the following advantages:
•Seamlessly integrated DSP/CPU features are optimized for both
8-bit and 16-bit operations.
•A multi-issue load/store modified-Harvard architecture, which
supports two 16-bit MAC or four 8-bit ALU + two load/store +
two pointer updates per cycle.
•All registers, I/O, and memory are mapped into a unified 4G byte
memory space, providing a simplified programming model.
•Microcontroller features, such as arbitrary bit and bit-field manipulation, insertion, and extraction; integer operations on 8-, 16-,
and 32-bit data-types; and separate user and supervisor stack
pointers.
•Code density enhancements include intermixing of 16- and 32-bit
instructions with no mode switching or code segregation. Frequently used instructions are encoded in 16 bits.
Development Tools
The processor is supported with a complete set of CrossCore® software
and hardware development tools, including Analog Devices emulators and
the VisualDSP++ development environment. The same emulator hardware that supports other Analog Devices products also fully emulates the
ADSP-BF53x family.
The VisualDSP++ project management environment lets programmers
develop and debug an application. This environment includes an
easy-to-use assembler that is based on an algebraic syntax, an archiver
(librarian/library builder), a linker, a loader, a cycle-accurate instruction-level simulator, a C/C++ compiler, and a C/C++ runtime library that
includes DSP and mathematical functions. A key point for these tools is
C/C++ code efficiency. The compiler has been developed for efficient
translation of C/C++ code to Blackfin processor assembly. The Blackfin
processor has architectural features that improve the efficiency of compiled C/C++ code.
Debugging both C/C++ and assembly programs with the VisualDSP++
debugger, programmers can:
•View mixed C/C++ and assembly code (interleaved source and
object information)
•Insert breakpoints
•Set conditional breakpoints on registers, memory, and stacks
•Trace instruction execution
•Perform linear or statistical profiling of program execution
•Fill, dump, and graphically plot the contents of memory
•Perform source level debugging
•Create custom debugger windows
The VisualDSP++ IDE lets programmers define and manage software
development. Its dialog boxes and property pages let programmers configure and manage all development tools, including color syntax highlighting
in the VisualDSP++ editor. These capabilities permit programmers to:
•Control how the development tools process inputs and generate
outputs.
•Maintain a one-to-one correspondence with the command line
switches.
The VisualDSP++ Kernel (VDK) incorporates scheduling and resource
management tailored specifically to address the memory and timing constraints of DSP programming. These capabilities enable engineers to
develop code more effectively, eliminating the need to start from the very
beginning, when developing new application code. The VDK features
include threads, critical and unscheduled regions, semaphores, events, and
device flags. The VDK also supports priority-based, pre-emptive, cooperative and time-sliced scheduling approaches. In addition, the VDK was
designed to be scalable. If the application does not use a specific feature,
the support code for that feature is excluded from the target system.
Because the VDK is a library, a developer can decide whether to use it or
not. The VDK is integrated into the VisualDSP++ development environment but can also be used with standard command-line tools. The VDK
development environment assists in managing system resources, automating the generation of various VDK-based objects, and visualizing the
system state during application debug.
Analog Devices emulators use the IEEE 1149.1 JTAG test access port of
the processor to monitor and control the target board processor during
emulation. The emulator provides full speed emulation, allowing inspection and modification of memory, registers, and processor stacks. Non
intrusive in-circuit emulation is assured by the use of the Blackfin JTAG
interface—the emulator does not affect target system loading or timing.
In addition to the software and hardware development tools available
from Analog Devices, third parties provide a wide range of tools supporting the Blackfin processor family. Third-party software tools include DSP
libraries, real-time operating systems, and block diagram design tools.
The processor’s computational units perform numeric processing for DSP
and general control algorithms. The six computational units are two arithmetic/logic units (ALUs), two multiplier/accumulator (multiplier) units, a
shifter, and a set of video ALUs. These units get data from registers in the
data register file. Computational instructions for these units provide
fixed-point operations, and each computational instruction can execute
every cycle.
The computational units handle different types of operations. The ALUs
perform arithmetic and logic operations. The multipliers perform
multiplication and execute multiply/add and multiply/subtract operations. The shifter executes logical shifts and arithmetic shifts and performs
bit packing and extraction. The video ALUs perform single-instruction,
multiple-data (SIMD) logical operations on specific 8-bit data operands.
Data moving in and out of the computational units goes through the data
register file, which consists of eight registers, each 32 bits wide. In operations requiring 16-bit operands, the registers are paired, providing sixteen
possible 16-bit registers.
The processor’s assembly language provides access to the data register file.
The syntax lets programs move data to and from these registers and specify
a computation’s data format at the same time.
Figure 2-1 provides a graphical guide to the other topics in this chapter.
An examination of each computational unit provides details about its
operation and is followed by a summary of computational instructions.
Studying the details of the computational units, register files, and data
Single function multiplier, ALU, and shifter instructions have unrestricted
access to the data registers in the data register file. Multifunction operations may have restrictions that are described in the section for that
particular operation.
Two additional registers, A0 and A1, provide 40-bit accumulator results.
These registers are dedicated to the ALUs and are used primarily for multiply-and-accumulate functions.
The traditional modes of arithmetic operations, such as fractional and
integer, are specified directly in the instruction. Rounding modes are set
from the
results of the computational operations.
ASTAT register, which also records status and conditions for the
Using Data Formats
Blackfin processors are primarily 16-bit, fixed-point machines. Most operations assume a two’s-complement number representation, while others
assume unsigned numbers or simple binary strings. Other instructions
support 32-bit integer arithmetic, with further special features supporting
8-bit arithmetic and block floating point. For detailed information about
each number format, see Appendix D, “Numeric Formats.”
In the Blackfin processor family arithmetic, signed numbers are always in
two’s-complement format. These processors do not use signed-magnitude,
one’s-complement, binary-coded decimal (BCD), or excess-n formats.
Binary String
The binary string format is the least complex binary notation; in it, 16 bits
are treated as a bit pattern. Examples of computations using this format
are the logical operations NOT, AND, OR, XOR. These ALU operations
treat their operands as binary strings with no provision for sign bit or
binary point placement.
Unsigned binary numbers may be thought of as positive and having nearly
twice the magnitude of a signed number of the same length. The processor
treats the least significant words of multiple precision numbers as
unsigned numbers.
Signed Numbers: Two’s-Complement
In Blackfin processor arithmetic, the word signed refers to two’s-comple-
ment numbers. Most Blackfin processor family operations presume or
support two’s-complement arithmetic.
Fractional Representation: 1.15
Blackfin processor arithmetic is optimized for numerical values in a fractional binary format denoted by 1.15 (“one dot fifteen”). In the 1.15
format, 1 sign bit (the most significant bit (MSB)) and 15 fractional bits
represent values from –1 to 0.999969.
Figure 2-2 shows the bit weighting for 1.15 numbers as well as some
examples of 1.15 numbers and their decimal equivalents.
Data Registers Data Address Generator Registers (DAGs)
R0
R1
R2
R3
R4
R5
R6
R7
A0
A1
A0.XA0.W
P0
P1
P2
P3
P4
P5
SP
I0
I2
I3
L0B0
B3L3
L2
L1B1
B2
I1
R0.HR0.L
R1.H
R2.H
R3.H
R4.H
R5.H
R6.H
R7.H
R1.L
R2.L
R3.L
R4.L
R5.L
R6.L
R7.L
A1.X
A1.W
FP
M0
M3
M1
M2
Register Files
The processor’s computational units have three definitive register
groups—a data register file, a pointer register file, and set of data address
generator (DAG) registers.
•The data register file receives operands from the data buses for the
computational units and stores computational results.
•The pointer register file has pointers for addressing operations.
•The DAG registers are dedicated registers that manage zero-overhead circular buffers for DSP operations.
For more information, see Chapter 5, “Data Address Generators”.
The processor register files appear in Figure 2-3.
In the processor, a word is 32 bits long; H denotes the high order
16 bits of a 32-bit register; L denotes the low order 16 bits of a
32-bit register. For example, A0.W contains the lower 32 bits of the
40-bit A0 register; A0.L contains the lower 16 bits of A0.W, and A0.H
contains the upper 16 bits of A0.W.
Data Register File
The data register file consists of eight registers, each 32 bits wide. Each
register may be viewed as a pair of independent 16-bit registers. Each is
denoted as the low half or high half. Thus the 32-bit register R0 may be
regarded as two independent register halves, R0.L and R0.H.
Three separate buses (two read, one write) connect the register File to the
L1 data memory, each bus being 32 bits wide. Transfers between the data
register file and the data memory can move up to four 16-bit words of
valid data in each cycle.
Accumulator Registers
In addition to the data register file, the processor has two dedicated,
40-bit accumulator registers. Each can be referred to as its 16-bit low half
(An.L) or high half (An.H) plus its 8-bit extension (An.X). Each can also be
referred to as a 32-bit register (
a complete 40-bit result register (An).
The general-purpose address pointer registers, also called P-registers, are
organized as:
•6-entry, P-register files P[5:0]
•Frame pointers (FP) used to point to the current procedure’s activation record
•Stack pointer registers (
SP) used to point to the last used location
on the runtime stack. See mode dependent registers in Chapter 3,
“Operating Modes and States”.
P-registers are 32 bits wide. Although P-registers are primarily used for
address calculations, they may also be used for general integer arithmetic
with a limited set of arithmetic operations; for instance, to maintain
counters. However, unlike the data registers, P-register arithmetic does
not affect the arithmetic register (ASTAT) register status flags.
DAG Register Set
DSP instructions primarily use the data address generator (DAG) register
set for addressing. The DAG register set consists of these registers:
The I (index) registers and B (base) registers always contain addresses of
8-bit bytes in memory. The index registers contain an effective address.
The M (modify) registers contain an offset value that is added to one of
the Index registers or subtracted from it.
The B and L (length) registers define circular buffers. The B register contains the starting address of a buffer, and the L register contains the length
in bytes. Each L and B register pair is associated with the corresponding I
register. For example,
L0 and B0 are always associated with I0. However,
any M register may be associated with any I register. For example, I0 may
be modified by M3. For more information, see Chapter 5, “Data Address
Generators”.
Register File Instruction Summary
Table 2-1 lists the register file instructions. For more information about
assembly language syntax, see the Blackfin Processor Programming Reference.
The processor supports 32-bit words, 16-bit half words, and bytes. The
32- and 16-bit words can be integer or fractional, but bytes are always
integers. Integer data types can be signed or unsigned, but fractional data
types are always signed.
Table 2-3 illustrates the formats for data that resides in memory, in the
register file, and in the accumulators. In the table, the letter d represents
one bit, and the letter s represents one signed bit.
Some instructions manipulate data in the registers by sign-extending or
zero-extending the data to 32 bits:
•Instructions zero-extend unsigned data
•Instructions sign-extend signed 16-bit half words and 8-bit bytes
Other instructions manipulate data as 32-bit numbers. In addition, two
16-bit half words or four 8-bit bytes can be manipulated as 32-bit values.
For details, refer to the instructions in the Blackfin Processor Programming Reference.
In Table 2-2, note the meaning of these symbols:
•s = sign bit(s)
•d = data bit(s)
•“.” = decimal point by convention; however, a decimal point does
not literally appear in the number.
•Italics denotes data from a source other than adjacent bits.
Endian Byte Order
Both internal and external memory are accessed in little endian byte order.
For more information, see “Memory Transaction Model” on page 6-61.
Operations on each ALU treat operands and results as either 16- or 32-bit
binary strings, except the signed division primitive (DIVS). ALU result status bits treat the results as signed, indicating status with the overflow flags
(AV0, AV1) and the negative flag (AN). Each ALU has its own sticky overflow flag, AV0S and AV1S. Once set, these bits remain set until cleared by
writing directly to the ASTAT register. An additional V flag is set or cleared
depending on the transfer of the result from both accumulators to the register file. Furthermore, the sticky VS bit is set with the V bit and remains
set until cleared.
The logic of the overflow bits (V, VS, AV0, AV0S, AV1, AV1S) is based on
two’s-complement arithmetic. A bit or set of bits is set if the most significant bit (MSB) changes in a manner not predicted by the signs of the
operands and the nature of the operation. For example, adding two positive numbers must generate a positive result; a change in the sign bit
signifies an overflow and sets AVn, the corresponding overflow flags. Adding a negative and a positive number may result in either a negative or
positive result, but cannot cause an overflow.
The logic of the carry bits (AC0, AC1) is based on unsigned magnitude
arithmetic. The bit is set if a carry is generated from bit 16 (the MSB).
The carry bits (AC0, AC1) are most useful for the lower word portions of a
multiword operation.
ALU results generate status information. For more information about
using ALU status, see “ALU Instruction Summary” on page 2-29.
Multiplier Data Types
Each multiplier produces results that are binary strings. The inputs are
interpreted according to the information given in the instruction itself
(whether it is signed multiplied by signed, unsigned multiplied by
unsigned, a mixture, or a rounding operation). The 32-bit result from the
multipliers is assumed to be signed; it is sign-extended across the full
40-bit width of the
A0 or A1 registers.
The processor supports two modes of format adjustment: the fractional
mode for fractional operands (1.15 format with 1 sign bit and 15 fractional bits) and the integer mode for integer operands (16.0 format).
When the processor multiplies two 1.15 operands, the result is a 2.30
(2 sign bits and 30 fractional bits) number. In the fractional mode, the
multiplier automatically shifts the multiplier product left one bit before
transferring the result to the multiplier result register (A0, A1). This shift of
the redundant sign bit causes the multiplier result to be in 1.31 format,
which can be rounded to 1.15 format. The resulting format appears in
Figure 2-4.
In the integer mode, the left shift does not occur. For example, if the operands are in the 16.0 format, the 32-bit multiplier result would be in 32.0
format. A left shift is not needed and would change the numerical
representation. This result format appears in Figure 2-5.
Multiplier results generate status information when they update accumulators or when they are transferred to a destination register in the register
file. For more information, see “Multiplier Instruction Summary” on page
2-40.
Shifter Data Types
Many operations in the shifter are explicitly geared to signed (two’s-complement) or unsigned values—logical shifts assume unsigned magnitude
or binary string values, and arithmetic shifts assume two’s-complement
values.
The exponent logic assumes two’s-complement numbers. The exponent
logic supports block floating point, which is also based on two’s-complement fractions.
Multiplication/Subtraction16.0 explicitly signed or
unsigned
32.0 not shifted
32.0 not shifted
Table 2-6. Shifter Arithmetic Formats
OperationOperand FormatsResult Formats
Logical ShiftUnsigned binary stringSame as operands
Arithmetic ShiftSignedSame as operands
Exponent DetectSignedSame as operands
Using Multiplier Integer and Fractional Formats
For multiply-and-accumulate functions, the processor provides two
choices—fractional arithmetic for fractional numbers (1.15) and integer
arithmetic for integers (16.0).
For fractional arithmetic, the 32-bit product output is format adjusted—
sign-extended and shifted one bit to the left—before being added to accumulator
of
of A0 (which is bit 1 of A0.W). The least significant bit (LSB) is zero-filled.
The fractional multiplier result format appears in Figure 2-4.
A0 or A1. For example, bit 31 of the product lines up with bit 32
A0 (which is bit 0 of A0.X), and bit 0 of the product lines up with bit 1
For integer arithmetic, the 32-bit product register is not shifted before
being added to
A0 or A1. Figure 2-5 shows the integer mode result
With either fractional or integer operations, the multiplier output product
is fed into a 40-bit adder/subtracter which adds or subtracts the new product with the current contents of the
A0 or A1 register to produce the final
40-bit result.
Figure 2-4. Fractional Multiplier Results Format
Rounding Multiplier Results
On many multiplier operations, the processor supports multiplier results
rounding (
number by removing a lower order range of bits from that number’s representation and possibly modifying the remaining portion of the number to
more accurately represent its former value. For example, the original number will have N bits of precision, whereas the new number will have only
M bits of precision (where N>M). The process of rounding, then, removes
N – M bits of precision from the number.
RND_MOD bit in the ASTAT register determines whether the RND option
provides biased or unbiased rounding. For unbiased rounding, set RND_MOD
bit = 0. For biased rounding, set RND_MOD bit = 1.
For most algorithms, unbiased rounding is preferred.
Unbiased Rounding
The convergent rounding method returns the number closest to the original. In cases where the original number lies exactly halfway between two
numbers, this method returns the nearest even number, the one containing an LSB of 0. For example, when rounding the 3-bit,
two’s-complement fraction 0.25 (binary 0.01) to the nearest 2-bit,
two’s-complement fraction, the result would be 0.0, because that is the
even-numbered choice of 0.5 and 0.0. Since it rounds up and down based
on the surrounding values, this method is called unbiased rounding.