Interfacing the ADSP-21535 to HighSpeed Converters (like those on the
AD9860/2) over the External Memory
Bus
Last modified: 2/12/02
Contributed By: Jeff Sondermeyer, Jeritt Kent, Martin
Kessler, and Rick Gentile
Introduction
In the 1970’s and 80’s high speed mixed signal designs
were often constrained by digital circuitry limitations, not
analog. High-speed parallel converters (>10MSPS), for
example, have been available from industry leaders like
Analog Devices Inc. (NYSE: ADI), since the 1970’s.
More and more applications are demanding intensive realtime algorithms. In addition, higher sample rates (14 bits
at greater than 50MSPS) are available from both analogto-digital converters (ADCs) and digital-to-analog
converters (DACs). These factors mandate faster
programmable general-purpose (GP) digital signal
processors (DSPs) to handle the challenges presented by
these high-speed designs. Until recently, most designers
were forced to interface high-speed parallel converters to
Application Specific ICs (ASICs) or fast Field
Programmable Gate Arrays (FPGAs). Devices like these
are capable of resolving the many required simultaneous
parallel digital operations but are often inflexible and can
be prohibitively expensive. Now, with the recent launch
of the Blackfin™ DSP (ADSP-21535), ADI has a
programmable GP 16-bit fixed-point vector DSP that can
process the sustained input/output (I/O) and core
throughputs re quired to pro cess data from many of these
converters. In particular, the ADSP-21535’s core can be
clocked at 300MHz. Depending on the core clock
frequency, a maximum I/O or system clock (SCLK) of
133MHz can be achieved. This SCLK should not to be
confused with the serial clock for the Serial Peripheral
Interface (SPI).
Advantages of using a GP DSP
One of the biggest advantages of GP programmable DSPs
is that these solutions are typically much lower in cost
than their closest digital processing counterparts, FPGAs
and ASICs. Additionally, GP DSP design cycles are
much shorter which allows for a faster time to market.
Some companies must hire or consult professionals with
specialized skills to design FPGAs/ASICs. Companies
may even be forced to send their intellectual property (IP)
out-of-house involving certain risks in confidentiality
(hardware, firmware, and software). On the other hand,
GP DSP code can be converted to Read-Only-Memorybased (ROM) or be masked into a DSP, like the ADSP2153x, which further protects IP. Finally, GP DSPs are
fully programmable, unlike an ASIC implementation,
where every change requires a costly redesign (time and
money). These factors, alone, are driving many engineers
to reconsider GP DSP as the solution of choice, especially
as GP DSPs approach “Pentium class” core rates.
Generally speaking, the DSP needs to be clocked
minimally an order of magnitude (10X) faster than the
converter’s sample rate to guarantee sufficient data
processing bandwidth. Obviously, the amount of
processing bandwidth needed is dependent upon the
DSP’s interface capabilities which is, in turn, influenced
by several other factors including: block processing
versus sample processing, the existence of a Direct
Memory Access (DMA) controller, multi-ported memory,
and whether external FIFOs are used. Fortunately, the
first instantiation of ADI’s Blackfin DSP family, the
ADSP-21535, has a full DMA controller that operates
independent of the core with multi-ported Level 1 (L1)
and Level 2 (L2) memories. The combination of core
speed, an independent DMA controller, and a large multiported on-board memory (308K bytes) allows the ADSP21535 to perform efficient block processing at high data
rates. For example, if the Revision 2.2-compliant,
33MHz, 32-bit Peripheral Component Interconnect (PCI)
interface is used (not shown in this application), transfer
bandwidths can be achieved that approach 132MB/sec.
The ADSP-21532 has a dedicated Parallel Peripheral
Interface (PPI) to connect directly to high-speed
converters. Note that the ADSP-21535 does not this
interface. However, the External Bus Interface Unit
(EBIU) of the ADSP-21535 provides interfaces to
asynchronous (ASYNC) external memories. If the PCI
bus must be used for other system communications, the
EBIU is the only available parallel interface to connect to
a high-speed converter. Combining the DSP-mastered,
asynchronous control of this port with the synchronous,
continuous data stream of converters provides a challenge
for system designers.
Copyright 2002, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product d esign or the use or ap plication of customers ’ products or
for any infringements of patents or rights of others w hich may result fro m Analog Devices assist ance. All trademarks and logos are property of their respective holders. Information
furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices
regarding the technical accuracy of the content provided in all Analog Devices’ Engineer-to-Engineer Notes.
This application note will cover one particular hardware
implementation utilizing a low pin count, low cost
Programmable Array Logic (PAL), Complex
Programmable Logic Device (CPLD), or FPGA. This
logic will perform the control functions between the
AD9860/2 Mixed Signal Front-End (MxFE) and the
ASYNC external memory bus of the ADSP-21535.
Although the DSP code will not be disc ussed in this note,
the assembly code is included in the Appendix as a
reference. The application depicted in Figure 1 below is
for an Orthogonal Frequency Division Multiplexed
(OFDM) wireless portable terminal. Please note that the
ADC and DAC were time-shared (Time Division
Multiplexed or TDM) over the ASYNC interface of the
DSP. (The information given here applies equally to other parallel high-speed ADCs and DACs.)
This engineering note assumes that the reader has prior
knowledge of the ADSP-21535 and the AD9860/2. If you
are unfamiliar with the ADSP-21535, please refer to the
“ADSP-2153x/21535 Blackfin DSP Hardware
Reference”. The datasheet for the AD9860/2 can be
found at www.analog.com
Design Goals
One of the early design goals for this project was to
minimize the amount of external control logic necessary to
interface the DSP and the converter(s). Driven by cost,
engineering wanted to eliminate any FIFOs or memory
within the external logic device. An additional constraint
was to avoid routing the data buses through the logic
thereby reducing the number of pins, package size, and
cost of the logic device. The initial design shown in
Figure 1 co mbines all functions (includi ng data latching)
into a single logic device. However, production models of
this design will utilize inexpensive tri-state-able latches
driven by a logic device. These latches or buffers will
multiplex (pack) the samples from the DSP memory
interface to the 12/14-bit DAC as well as buffer or demultiplex (unpack) the 10/12-bit ADC samples to the DSP
memory interface.
Design Challenges
One of the key factors in any mixed-signal/DSP design is
a solid understanding of the trade-offs between the
devices. The following discussion will illustrate the
various tradeoffs that must be considered when interfacing
ADCs/DACs to the ADSP-21535.
The OFDM modulation scheme for this design drove the
converter sample rate for this application to be
15.36MSPS. The AD9860/2 has a dual 10/12-bit,
64MSPS ADC as well as a dual 12/14-bit, 128MSPS
DAC. Unlike ADI’s SHARC® processors that have a
DMA-Request and DMA-Grant (i.e. DMA can be
mastered from external device), the ADSP-21535 only has
one set of internal memory DMA channels (memDMA),
which must be mastered from the DSP. In addition, when
the ADSP-21535 ASYNC interface is connected to
devices that do contain FIFOs or memory, all latencies
must be understood. Every time the memDMA
relinquishes the bus after a burst of eight (8) transfers, it
requires ten (10) SCLK cycles to begin the next transfer.
Future Blackfin derivatives will have programmable
priority levels for the DMA controller as well as a
dedicated high-speed parallel interface with DMARequest and DMA-Grant signaling. With a dedicated PPI
on future Blackfin products, the ASYNC memory
interface will not be required to connect to parallel
converters.
This approach assumes that the memory interface is
dedicated to the converters. Multiplexing external
SRAM/SDRAM memory with the converter(s) would be
difficult and is not recommended, especially considering
that there is only one memDMA, and it would need to be
shared. The existence of a large on-board L2 memory
(256K bytes) minimizes the need for any external
memory. However, multiplexing the parallel converter(s)
with a Flash or EPROM for boot purposes is permissible.
This design uses a TDM time-slice approach for sharing
the external bus between the ADCs and the DACs.
Simultaneous access is not possible with the ADSP-21535
because, as mentioned previously, there is only one
memory interface that either does a read or a write, and
there is only one set of memDMA channels (source and
destination).
The ADSP-21535 will support a maximum SCLK of
133MHz (peak DMA bandwidth). At this rate, and with
no external FIFO, the memDMA could sustain a transfer
(32 bit word) rate of 133M/10 (nine cycles are required
for bus acquisition plus one to make the next transfer) or
13.3M words/second. Note, however, that the SCLK of
the ADSP-21535 is derived from the core clock (CCLK).
Here, there are four available divide ratios: 2, 2.5, 3, and
4. As a result, one possible combination of CCLK and
divisor that will allow a 133MHz SCLK is CCLK =
266MHz and CCLK/SCLK = 2. If the core must run at
300MHz, the highest SCLK that can be obtained is
EE-162 Page 2
Technical Notes on using Analog Devices’ DSP components and development tools
120MHz (divisor=2.5) to stay under the maximum
133MHz. Now, since the ASYNC memory interface is 32
bits wide, one can pack up to two 16-bit samples (in this
case I and Q) into each word. This effectively halves the
word rate that the DSP must process (with a 15.36MSPS
converter sample rate, the DSP will “see” 7.68MSPS).
The highest external converter sample rate, though, that
the memDMA will support under these conditions is 2 *
120/10 = 24MSPS. Furthermore, the SCLK must be an
integer multiple of the converter sample rate to ensure
proper phase alignment between converter timing and
DSP timing and eliminate the need for any external
FIFOs. So, if the 120MHz SCLK must be evenly
divisible by the sample rate, the highest even divisor of
120 is 12. Therefore, the highest converter sample rate
that the ADSP-21535 will support is 2 * 12M = 24MSPS.
Again, the DSP will only process half of this rate,
12MSPS, and this is, in fact, equal to the maximum rate
that the memDMA can sustain, 12M words/second. Note
that higher sample rates can be processed by the ADSP21535 by including small external FIFOs between the
converter(s) and the EBIU.
It is again noted that this application dictated a
15.36MSPS converter sample rate driven by OFDM
requirements. To obtain a SCLK that is an integer
multiple of this converter sample rate, then, one must
choose a Phase Locked Loop (PLL) Multiplier that is an
integer multiple, in turn, of one of the four available
divisor ratios (2, 2.5, 3, or 4). The maximum CCLK
allowed is 276.48MHz (using a PLL multiplier of 18).
This, in turn, limits the SCLK to the integer multiple
276.48/3 = 92.16MHz (a divide ratio of 2 would give an
SCLK over the 133MHz maximum). Under these
constraints, the maximum sustained rate that the
memDMA can support is 92.16/10 = 9.21 words/second.
DMA Considerations
Careful consideration must be given to the combined,
required, “sustained” DMA performance. Since the
memDMA is a shared resource over the DMA bus (DAB),
other DMA activity is arbitrated on this bus. This
application required a 10Mbit/second serial channel on a
SPORT that also must arbitrate for the DAB. This will
consume an additional 625K words/second at 16 bits/word
of DMA bandwidth. The ADSP-21535 supports a total of
133M words/second peak DMA bandwidth, and the
SPORT has higher arbitration priority over the memDMA
(see Table 1). Given this, the SPORT DMA should
effectively utilize the ten-cycle (10) delay previously
discussed and allow most, if not all, of the 9.21M
words/seconds to be used by the memDMA. There are
9.21M – 7.68M (15.36/2) = 1.53M words/second of
additional bandwidth which should provide enough
margin for a sustained 7.68MSPS.
Analysis of the DMA engine within the ADSP-21535
reveals a few other considerations. While the DMA
engine supports two types of DMA transfers: descriptorbased and autobuffer-based, the ADSP-21535 memDMA
controller does not support autobuffer-based DMA.
Therefore, descriptor-based transfers must be used. The
descriptor fetch from L1/L2 memory involves two (2)
five-word block moves, one for the source descriptor and
another for the destination descriptor. Additionally, the
memDMA has a 16-entry 32-bit FIFO that is filled from
the source and emptied from the destination. If both
descriptors are loaded simultaneously, this requires 39
SCLK cycles (worst case) from L2. The destination
descriptor load has priority over the source load to avoid
overrunning the FIFO. Thus, in this example, the amount
of time required to load both descriptors simultaneously is
1/92.16M * 39 = 423 nanoseconds. The DMA engine
descriptor load performance is best when the descriptors
are loaded from L2 memory. If the descriptors are located
in L1 memory, there are additional delays. The source
plus destination descriptors’ load time from L1 is 65
SCLK cycles worst case. To effectively process data at
these sample rates, ping-pong buffers are normally used
(this design utilizes two (2) 1024-word buffers). This
technique allows data to be filled into one buffer while the
core processes the other buffer. See the Appendix for the
Blackfin™ assembly code that utilizes the memDMA to
pull data in from an ADC and “ping-pongs” between two
internal buffers. As a reference, the complete
VisualDSP++ 2.0 project is available from ADI.
EE-162 Page 3
Technical Notes on using Analog Devices’ DSP components and development tools
There are two phases of operation that must be analyzed.
First, samples must be received from the ADC into the
DSP (receiver TDM phase) and secondly, samples must
be transmitted from the DSP to the DAC (transmitter
TDM phase).
Receiver TDM Phase
During the receive phase (i.e. data from the ADC), the
data movement is in this direction: ADC → EBIU
(source) → memDMA → FIFO → L1/L2 (destination).
At the 15.36MHz converter sample rate, a new 32-bit
sample arrives at the DSP every 1/7.68M = 130.2
nanoseconds. As seen from the descriptor load time
latency, 423 nanoseconds, something must be done to
avoid overrunning the DSP and losing samples.
Fortunately, the converters are attached to an external bus,
and the address bus is not being used. Thus, when
moving samples into the DSP, one can setup the source
descriptor with the maximum transfer count, 65536 words,
and destination descriptor with intended ping-pong buffer
transfer size, 1024 words. In this way, upon interrupt
from the core every 1024 words, only the destination
descriptor is reloaded, and the load time is reduced to 20
SCLKs * 1/92.16M = 217 nanoseconds. As previously
mentioned, this design utilizes a TDM scheme in which
the ADC and DAC occupy time slices. The multiplex rate
is a variable 5-8 milliseconds. Since the ADC and DAC
data is interleaved, at a worst case, the interface changes
from receiver to transmitter and vice versa every 8
milliseconds. Therefore, 65536 words * 130.2
nanoseconds or 8.5 milliseconds is adequate time, and the
source descriptor only needs to be setup once at the
beginning of each receiver TDM phase. Finally, the 16entry memDMA FIFO “hides” the destination descriptor
load time because the source is still filling the FIFO while
the destination descriptor is being loaded from memory.
In a worst-case scenario, the memDMA FIFO will only
accumulate a few samples of data before the descriptor is
reloaded. Then, these samples are burst into memory.
So, the need for an external FIFO on the receiver side is
eliminated and no samples are lost.
Transmitter TDM Phase
During the transmit phase (i.e. DAC), data movement is in
the opposite direction: L2/L1 (source) → memDMA →
FIFO → EBIU (destination) → DAC. Unlike the
previous mode, the source descriptor must be updated
every 1024 words. This will require 20 SCLK cycles or
217 nanoseconds. However, since the memDMA (9.21M
words/second) is running slightly faster than the sample
rate (7.68M words/second), this should maintain 16
samples in the memDMA FIFO, which will feed the DAC
while the descriptor loads. The destination descriptor
transfer count can be fixed at 65536 words. Again, no
external FIFO is required and no samples are lost.
Logic Overview and Timing
In avoiding the need for FIFOs in the external logic, it is
still important to synchronize the converter clocks to the
DSP SCLK. Depending on the sample rate, this limits the
available ADSP-21535 clocking options. Minimally,
SCLK must be evenly divisible by the converter sample
rate, and CCLK may need to be evenly divisible by the
converter sample rate as well (there is only one noninteger divisor, 2.5, and this may not be useable in some
cases). External latches or buffers must be used to align
the data from the converters with the timing of the DSP
(See Figures 2 and 3 for sample skew and delay). The
four-wire DSP SPI port is directly connected to the
AD9860/2 SPI port. To ensure proper power sequencing
and initialization, the DSP should reset the converter(s).
In an effort to further reduce the pin count of the external
logic, another option available on the AD9860/2 (not
shown here) allows two (2) 10/12-bit ADC values to be
time-multiplexed onto a single 10/12-bit RXDATA bus.
This would eliminate one of the two (2) 10/12-bit buses,
but it requires the external logic to de-multiplex the data
before it is transmitted to the DSP.
All data movement is controlled or mastered by the
memDMA within the DSP. When the ADC data is read
(see Figure 2), the external logic must drive the data and
the ARDY signal. The external logic must sample the
/AOE pin to check when data can be driven to the ADSP-
21535. The /AOE signal indicates to the external logic
that the DMA controller is ready to take data. The
receiver three-state machine is shown at the bottom of
Figure 2.
When data is being sent out (see Figure 3) to the DAC, the
external logic has to sample the /AWE signal and then
drive ARDY. /AWE indicates to the external logic when
the DMA controller is ready with new data. The
transmitter four-state machine is shown at the bottom of
Figure 3.
Conclusions
Even though the ADSP-21535 was not specifically
designed to interface to high-speed parallel converters,
this engineering note provides a low-cost “FIFO-free”
solution for interfacing to both ADCs and DACs with
sample rates up to 24MSPS if the core is constrained to
EE-162 Page 4
Technical Notes on using Analog Devices’ DSP components and development tools
run at 300MHz. If the ADSP-21535 core can be clocked
specifically at 264MHz, then the highest converter sample
rate is limited only by the maximum SCLK that the
ADSP-21535 can support (133MHz) and the inter-burst
10-cycle memDMA latency: 2 * 133M/10 = 26.6MHz.
In summary, the following table is included as a set of
rules specific to the ADSP-21535:
The Ten Commandments of Interfacing to the ADSP-21535
1) The ADSP-21535’s maximum allowed core
frequency is 300MHz.
2) ADSP-21535's maximum allowed system clock
(SCLK) is 133MHz.
3) ADSP-21535's memDMA has a worst-case tencycle (10) reacquisition latency every time the DMA
bus is relinquished.
4) ADSP-21535's SCLK shalt be derived from CCLK,
and there are four available divide ratios between
CCLK and SCLK (2, 2.5, 3, and 4).
5) If the ADSP-21535's core shalt run at the maximum
(300MHz), then the maximum SCLK is 120MHz. See
Commandments 1, 2, and 4.
6) The maximum ADSP-21535 memDMA rate is
133M/10 = 13.3Mwords/sec. See Commandments 2
and 3.
7) In order to obtain the fastest ADSP-21535 SCLK
(133MHz) and memDMA transfer rate
(13.3Mwords/sec), the core shalt not run at maximum,
but at 266MHz with a CCLK /SCLK divide ratio of 2.
"Only" 34 MIPs are not utilized!
8) When not using external FIFOs, the ADSP-21535's
core shalt operate minimally at an order of magnitude
(10X) greater than the greatest external interfacing
converter sample r ate to provi de suffi cient pro cessing
bandwidth.
9) To eliminate the need for external FIFOs, the ADSP21535's SCLK shalt be an integer multiple of the
external interfacing converter sample rates to ensure
proper phase alignment between converter timing
and DSP timing.
10) To halve the sample rate that the DSP must
process, th e externa l log ic shal t pack up to two 16-b it
samples into each 32-bit word.
EE-162 Page 5
Technical Notes on using Analog Devices’ DSP components and development tools