Analog Devices EE213v02 Application Notes

Engineer-to-Engineer Note EE-213
a
Technical notes on using Analog Devices DSPs, processors and development tools
Contact our technical support at dsp.support@analog.com and at dsptools.support@analog.com Or vi sit our o n-li ne r esou rces htt p:/ /www.analog.com/ee-notes and http://www.analog.com/processors
Host Communication via the Asynchronous Memory Interface for Blackfin® Processors
Contributed by Prashant Khullar and Jeff Sondermeyer Rev 2 – March 29, 2004

Introduction

This Engineer-to-Engineer Note discusses the functionality and performance of an asynchronous memory interface developed for ADSP-BF531 / BF532 / BF533 Blackfin® processors. The interface is designed to provide a host port-like interface in applications that require a Blackfin processor to be used in conjunction with a host microcontroller. It can also be used to connect two Blackfin processors with minimal external circuitry. The maximum throughput of this implementation is 14.8 MB/s without concurrent bus activity and 8.3 MB/s while concurrent DMA activity is taking place. (e.g., simultaneous peripheral and memory DMA).
Higher bandwidth interfacing can be
!
achieved with additional external logic or by using other Blackfin peripherals such as the SPORT, PPI, or external Bus Grant/Request.
Reads and writes to the data bus on both ends are regulated by the processor’s external bus controller. The control signals are routed via the combinational logic elements to the
LE and /OE
pins on the latches (see Figure 1). Four general­purpose I/O pins are required for the implementation on both the Blackfin and Host processors. These are used to synchronize the interrupt-driven data transfers. Each processor must send and receive two types of interrupts: a
Read, and a Read_Ack. The former is issued by the
sending processor to indicate that a word of data has been latched into the data bus and may be read by the receiving processor. The latter is issued by the receiving processor to indicate that a word of data has been successfully read in and that the sending processor may send another word.

Hardware Components

The basic architecture of this interface is shown in Figure 1. Two inexpensive 16-bit latches and some combinational logic are necessary for implementation. The host processor only requires an asynchronous memory port to send and receive data. Such an interface can be found on most microcontrollers.

Figure 1. Host Interface Schematic

Copyright 2004, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices’ Engineer-to-Engineer Notes.
a

Software Requirements

Data transfer via the asynchronous interface must be controlled by an interrupt-driven software routine running on both processors. Reads and writes must be processed in interrupt service routines (ISRs). Interrupt requests of both types should be issued within the ISRs as well. The appropriate flag pins can simply be toggled to indicate interrupt requests.

Host-DSP API for Pointer versus Data

Using this interface, how do we distinguish between address and data information? The following paragraph describes one possible method.
Note: In this case, assume all interrupts are rising-edge-sensitive. When the host is sending address data to the DSP, the host can be left high until the host gets back a
Read_Ack from the DSP. In this way, when the
DSP still sees a high on the
Read interrupt pin, it
treats the incoming data as an address (a pointer to where the following data is to be placed). At that point, the host can send data, but this time it toggles the
Read flag pin (high then low) so that
the DSP sees only a low on the In the DSP level of the
Read ISR, the code senses the logic
Read interrupt pin and sets up a new
address pointer or places data at the next pointer location. This should be a clever way of differentiating between address pointers (where to put the following data) and the data itself. The only other consideration is that since the Blackfin external memory interface is only 16 bits wide, you need two 16-bit words to form a 32-bit address. In this way, the pointer/address can place code anywhere within the Blackfin memory map. To accomplish this, you set up and initialize a bit to zero (
Read ISR so that the first time a new address is
encountered, the code sets
ADDR_BIT) in the DSP
ADDR_BIT and stores
the first 16 bits as the most significant word (
MSW). Then, when the second 16 bits of the
Read flag pin
Read interrupt pin.
address arrive and ADDR_BIT = 1, it stores it as the least significant word (LSW). Following the receipt of the MSW, the bit is cleared in preparation for the next address that arrives. The reverse of this method is used for communicating from DSP back to the host.

MIPS Calculation

The worst-case ISR latency for entering and exiting an interrupt is 28 core clock ( cycles for Blackfin processors with a 10-stage execution pipeline. This latency includes pipeline refills and the return from interrupt ( specific minimum transfer rate is required, this should be a high-priority interrupt that is “non­interruptible” to maintain a deterministic number of cycles. Assuming we use the API method discussed in the previous section, the worst case latency from the beginning to the end of an interrupt is 61 operation and 75
CCLK cycles for a single write
CCLK cycles for a single read
operation. Choosing an arbitrary transfer rate of
2.5 Mwords/s, we consume 75 cycles * 2.5M = 187 MIPS of DSP processing power moving a single 16-bit word of data from the host to the DSP in each interrupt.
It is evident from the above example that for any transfer rate that is on the order of a Mega-word per second, the number of MIPS consumed for the method depicted in Figure 1 is prohibitive. Therefore, under these conditions, adding a multi-word FIFO between the Host and the DSP is recommended. Obviously, the deeper the FIFO, the fewer MIPS are consumed, because more data is transferred in each interrupt and the overhead is minimized. For a Kilo-word per second transfer rate, a one-word FIFO is adequate. An equation has emerged:
If we let NPI = Number of read/writes Per Interrupt
(Note: this is equal to the FIFO depth) NCS = Number of Context Saves/restores
CCLK)
RTI). If a
Host Communication via the Asynchronous Memory Interface for Blackfin® Processors (EE-213) Page 2 of 14
a
RTR = Required Transfer Rate (Note: this must be less than the 8.3MB/s for a FIFO depth of 1)
FIL = Fixed Interrupt Latency = 28 RLC = Read Latency Constant = 15
(Note: Wait = 1 system clock (
SCLKs and the minimum latency for an
asynchronous memory read is 2
CCLK/SCLK = 5/1)
WLC = Write Latency Constant = 0
CCLK cycles
CCLK cycles
SCLK), Hold = 2
SCLKs,
CCLKs (Note:
write operations do not stall the core) GPR = General-Purpose Flag Read in Interrupt =
30
CCLKs or 0 CCLKs depending on whether you
use this method (Note:
CCLK/SCLK = 5/1)
DSP MIPs to read host = [(FIL+RLC+NPI+GPR+(2*NCS))*RTR]/NPI DSP MIPs to write host =
interface (PPI) and from L1 memory t o SDRAM while waiting for interrupts. DMA traffic control is used to optimize bus sharing during these transfers. This second test case emulates the Blackfin processor’s behavior in a scenario where a video application is passing encoded data from the host to DSP and vice-versa. The complete source code for these test cases is included in Appendix B.
Additionally, if processor latencies are known beforehand, a further performance enhancement can be made. Specifically, the writing processor can issue a ‘Read’ interrupt before it has actually completed its data write sequence. Since the reading processor has a known latency (in these tests, an ADSP-BF533 with a 6-cycle latency) involved in reading a flag-pin and triggering an ISR, the correct data is present on the bus once it is ready to read. Figures 2 and 3 illustrate the effect of this modification.
[(FIL+WLC+NPI+GPR+(2*NCS))*RTR]/NPI
Example: NPI=8, NCS=2, RTR= 2.5Mwords/s
DSP MIPS to read host = [(28+15+8+30+4) *2.5M]/8 = 26.5 MIPs with a GPIO read in the ISR.

Performance Evaluation

The performance of the asynchronous host interface was evaluated on a hardware prototype interconnecting two ADSP-BF533 processors. The software routines used to test the interface are included in Appendix B of this document.
The interface was tested in two real-time scenarios with another ADSP-BF533 processor acting as the host device. In the first, both processors simply idle until an interrupt request is received which is then processed in the appropriate ISR. In the second, one of the processors performs auto-buffered DMAs from external SDRAM to the parallel peripheral
Without interspersed DMA activity or the performance enhancement, a complete 16-bit word transfer across the host interface takes 28 system clock (
SCLK) cycles including the cycles
associated with signaling using general-purpose I/O flags. With the performance enhancement, this can be reduced to 18
SCLK cycles. With
interspersed bidirectional DMA, this increases to 32
SCLK cycles. At the maximum SCLK
frequency of 133 MHz, this translates to throughputs of 9.5 MB/s, 14.8 MB/s, and
8.3 MB/s, respectively. In all cases, read and write sequences are configured to be 2
SCLK
cycles in length. The time taken for a read/write cycle to commence upon receiving an interrupt is 6
SCLK cycles without interspersed DMA and 12
SCLK cycles with interspersed DMA. Issuing a
flag-pin interrupt upon completion of a read or write sequence involves a 4-cycle latency in all cases. Figures 2 and 3 show snapshots of the data bus and control logic taken with a logic analyzer during transfer sequences in all test cases.
Host Communication via the Asynchronous Memory Interface for Blackfin® Processors (EE-213) Page 3 of 14
a

Conclusion

At a transfer rate of over 8 MB/s with full parallel DMA activity, this scheme can handle complex video algorithms. It is a very low-cost parallel approach that is fully asynchronous and bidirectional.
Host Communication via the Asynchronous Memory Interface for Blackfin® Processors (EE-213) Page 4 of 14

Appendix A. Timing Plots

a
Figure 2. Asynchronous Transfer without Performance Enhancement (Reading Processor’s View)
Async Data Async Data
Async Data Async Data
Figure 3. Asynchronous Transfer with Performance Enhancement (Reading Processor’s View)
Async Bi-Directional DMA Activity with Async
Figure 4. Asynchronous Data Transfer with Background DMA Activity (Reading Processor’s View)
Host Communication via the Asynchronous Memory Interface for Blackfin® Processors (EE-213) Page 5 of 14
Data Traffic Control Data
Loading...
+ 9 hidden pages