ANALOG DEVICES EE-276 Service Manual

Engineer-to-Engineer Note EE-276
a
Technical notes on using Analog Devices DSPs, processors and development tools
Contact our technical support at processor.support@analog.com and dsptools.support@analog.c om Or visit our on-line resources http://www.analog.com/ee-notes and http://www.analog.com/processors
Video Framework Considerations for Image Processing on Blackfin® Processors
Contributed by Kunal Singh and Ramesh Babu Rev 1 – September 8, 2005

Introduction

In any system, memory partitioning and data flow management are crucial elements for a successful multimedia framework design. Blackfin® processors have a hierarchal memory and non-intrusive DMA with the Parallel Peripheral Port interface. When used in your application, they can provide very high system efficiency.
This EE-Note discusses the following topics that should be considered for obtaining maximum performance on ADSP-BF533 and ADSP-BF561 Blackfin family processors in video processing applications:
Memory considerations
Internal memory space SDRAM memory space Managing external data accesses
buffers associated with video applications. The Blackfin processor's memory has a unified address range, which includes the internal L1 memory, (in case of the ADSP-BF561 processor also L2 memory) SDRAM memory, and asynchronous memory spaces.

Internal Memory Space

The L1 memory operates at core clock frequency and hence, has lowest latency compared to the other memory spaces. Blackfin processors have separate Data and Instruction L1 memory.
Buffer 0
- - - - - - - - - - - - - - - -
Core Fetch
Buffer 1 Buffer 2
- - - - - - - - - - - - - - - - Coefficients
DMA Access
DMA modes for PPI capture and display
Working with ITU-R-656 input modes Outputting ITU-R-656 video frames DMA prioritization and traffic control
register
Figure 1. Un-Optimized L1 Memory Allocation

Memory Considerations

The Blackfin processor architecture supports a hierarchical memory that allows the programmer to access faster, smaller memories for code that runs the most often and larger memory for data
Copyright 2005, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices applications and development tools engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices Engineer-to-Engineer Notes.
The L1 data SRAM is constructed from single­ported subsections, each subsection consisting of 4 Kbytes of memory. This organization results in multi-ported behavior when there are simultaneous access to different sub-banks or
a
accessing one even and one odd 32-bit word within the same 4K sub-bank.
Buffer 0
Core Fetch
Buffer 1
Buffer 2
Core Fetch
Figure 2. Optimized L1 Memory Allocation
Coefficients
DMA Access
Figure 1 shows the un-optimized allocation of
memory for different buffers. Each block in the figure represents a 4 Kbyte sub-bank in internal data memory. Here the internal data buses are not used effectively, since the processor cannot fetch the two data words simultaneously.
Figure 2 shows the optimized memory allocation
across internal 4 Kbyte data memory banks. This memory allocation allows simultaneous dual DAG and a DMA access, and hence, maximum throughput over data buses.
In video encoding and decoding applications, optimized memory allocation reduces the latency involved in accessing L1 data memory due to simultaneous access from core and DMA controller
minimized. The SDC can keep track of one row per bank (with up to four internal SDRAM banks) at a time, hence it can switch between four internal SDRAM banks without any stalls.
Code
Instruction
DMA
External
Bus
Video Frame 0 Video Frame 1
Ref Frame
DMA
Unused
Unused
Figure 3. Un-Optimized SDRAM Memory Allocation
In image processing applications, the video frame is bought into the memory using a PPI DMA. Because of the image size (i.e., VGA, D-1 NTSC, D-1 PAL, 4CIF, 16CIF, etc.), each frame of the image must be captured in SDRAM memory using a PPI DMA channel. The algorithm can read the pixels block by block from SDRAM and process each block as it is brought in. The PPI captures the next frame into another buffer while core is processing the previous buffer. Since both core and DMAs are accessing the SDRAM memory simultaneously, it is necessary to map the code, video frame, and other buffers appropriately to minimize the latency involved in accessing SDRAM memory.

SDRAM Memory Space

The SDRAM Controller (SDC) enables the processor to transfer data to and from Synchronous DRAM (SDRAM). The SDRAM controller supports a connection to four internal banks within the SDRAM. In end applications, by mapping the data buffers appropriately in different internal sub-banks, the latency involved in accessing data by core/DMA can be
Video Framework Considerations for Image Processing on Blackfin® Processors (EE-2 76) Page 2 of 6
Figure 3 shows the un-optimized memory
allocation in SDRAM internal sub-banks. In
Figure 3, both the code and video frame buffer
are mapped to SDRAM internal Bank 0. This allocation method causes more latency because SDRAM row activation cycles occur at almost every cycle. This is due to alternating core accesses (fetching the instructions) and DMA accesses to different pages within the same
Loading...
+ 4 hidden pages