Engineer-to-Engineer Note EE-276
a
Technical notes on using Analog Devices DSPs, processors and development tools
Contact our technical support at processor.support@analog.com and dsptools.support@analog.c om
Or visit our on-line resources http://www.analog.com/ee-notes and http://www.analog.com/processors
Video Framework Considerations for Image Processing on Blackfin®
Processors
Contributed by Kunal Singh and Ramesh Babu Rev 1 – September 8, 2005
Introduction
In any system, memory partitioning and data
flow management are crucial elements for a
successful multimedia framework design.
Blackfin® processors have a hierarchal memory
and non-intrusive DMA with the Parallel
Peripheral Port interface. When used in your
application, they can provide very high system
efficiency.
This EE-Note discusses the following topics that
should be considered for obtaining maximum
performance on ADSP-BF533 and ADSP-BF561
Blackfin family processors in video processing
applications:
Memory considerations
Internal memory space
SDRAM memory space
Managing external data accesses
buffers associated with video applications. The
Blackfin processor's memory has a unified
address range, which includes the internal L1
memory, (in case of the ADSP-BF561 processor
also L2 memory) SDRAM memory, and
asynchronous memory spaces.
Internal Memory Space
The L1 memory operates at core clock frequency
and hence, has lowest latency compared to the
other memory spaces. Blackfin processors have
separate Data and Instruction L1 memory.
Buffer 0
- - - - - - - - - - - - - - - -
Core
Fetch
Buffer 1
Buffer 2
- - - - - - - - - - - - - - - -
Coefficients
DMA
Access
DMA modes for PPI capture and display
Working with ITU-R-656 input modes
Outputting ITU-R-656 video frames
DMA prioritization and traffic control
register
Figure 1. Un-Optimized L1 Memory Allocation
Memory Considerations
The Blackfin processor architecture supports a
hierarchical memory that allows the programmer
to access faster, smaller memories for code that
runs the most often and larger memory for data
Copyright 2005, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of
customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of
their respective holders. Information furnished by Analog Devices applications and development tools engineers is believed to be accurate and reliable, however no
responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices Engineer-to-Engineer Notes.
The L1 data SRAM is constructed from singleported subsections, each subsection consisting of
4 Kbytes of memory. This organization results in
multi-ported behavior when there are
simultaneous access to different sub-banks or
a
accessing one even and one odd 32-bit word
within the same 4K sub-bank.
Buffer 0
Core
Fetch
Buffer 1
Buffer 2
Core
Fetch
Figure 2. Optimized L1 Memory Allocation
Coefficients
DMA
Access
Figure 1 shows the un-optimized allocation of
memory for different buffers. Each block in the
figure represents a 4 Kbyte sub-bank in internal
data memory. Here the internal data buses are not
used effectively, since the processor cannot fetch
the two data words simultaneously.
Figure 2 shows the optimized memory allocation
across internal 4 Kbyte data memory banks. This
memory allocation allows simultaneous dual
DAG and a DMA access, and hence, maximum
throughput over data buses.
In video encoding and decoding applications,
optimized memory allocation reduces the latency
involved in accessing L1 data memory due to
simultaneous access from core and DMA
controller
minimized. The SDC can keep track of one row
per bank (with up to four internal SDRAM
banks) at a time, hence it can switch between
four internal SDRAM banks without any stalls.
Code
Instruction
DMA
External
Bus
Video Frame 0
Video Frame 1
Ref Frame
DMA
Unused
Unused
Figure 3. Un-Optimized SDRAM Memory Allocation
In image processing applications, the video
frame is bought into the memory using a PPI
DMA. Because of the image size (i.e., VGA, D-1
NTSC, D-1 PAL, 4CIF, 16CIF, etc.), each frame
of the image must be captured in SDRAM
memory using a PPI DMA channel. The
algorithm can read the pixels block by block
from SDRAM and process each block as it is
brought in. The PPI captures the next frame into
another buffer while core is processing the
previous buffer. Since both core and DMAs are
accessing the SDRAM memory simultaneously,
it is necessary to map the code, video frame, and
other buffers appropriately to minimize the
latency involved in accessing SDRAM memory.
SDRAM Memory Space
The SDRAM Controller (SDC) enables the
processor to transfer data to and from
Synchronous DRAM (SDRAM). The SDRAM
controller supports a connection to four internal
banks within the SDRAM. In end applications,
by mapping the data buffers appropriately in
different internal sub-banks, the latency involved
in accessing data by core/DMA can be
Video Framework Considerations for Image Processing on Blackfin® Processors (EE-2 76) Page 2 of 6
Figure 3 shows the un-optimized memory
allocation in SDRAM internal sub-banks. In
Figure 3, both the code and video frame buffer
are mapped to SDRAM internal Bank 0. This
allocation method causes more latency because
SDRAM row activation cycles occur at almost
every cycle. This is due to alternating core
accesses (fetching the instructions) and DMA
accesses to different pages within the same