Technical Notes on using Analog Devices' DSP components and development tools
Contact our technical support by phone: (800) ANALOG-D or e-mail: dsp.support@analog.com
Or vi sit ou r on-l ine re sourc es ht tp:// www.analog.com/dsp and http://www.analog.com/dsp/EZAnswer
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed
Converters (like those on the AD9860/2) over the External Memory Bus
Contributed by Jeff Sondermeyer, Jeritt Kent, Martin Kessler, and Rick Gentile June 5, 2003
Introduction
In the 1970’s and 80’s high speed mixed signal designs
were often constrained by digital circuitry limitations, not
analog. High-speed parallel converters (>10MSPS), for
example, have been available from industry leaders like
Analog Devices Inc. (NYSE: ADI), since the 1970’s. More
and more applications are demanding intensive real-time
algorithms. In addition, higher sample rates (14 bits at
greater than 50MSPS) are available from both analog-todigital converters (ADCs) and digital-to-analog converters
(DACs). These factors mandate faster programmable
general-purpose (GP) digital signal processors (DSPs) to
handle the challenges presented by these high-speed
designs. Until recently, most designers were forced to
interface high-speed parallel converters to Application
Specific ICs (ASICs) or fast Field Programmable Gate
Arrays (FPGAs). Devices like these are capable of
resolving the many required simultaneous parallel digital
operations but are often inflexible and can be prohibitively
expensive.
Now, with the new Blackfin® processor architecture, ADI
has a family of programmable GP processors that
combines 16-bit fixed-point DSP and 32-bit RISC
processing with a unified instruction set model. The first
instantiation of this family, the ADSP-BF535 Blackfin
processor can process the sustained input/output (I/O) and
core throughputs required to process data from many of
these converters. In particular, the ADSP-BF535 Blackfin
processor’s core can be clocked at 300MHz. Depending on
the core clock frequency, a maximum I/O or system clock
(SCLK) of 133MHz can be achieved. This SCLK should
not to be confused with the serial clock for the Serial
Peripheral Interface (SPI).
Advantages of using a GP DSP
One of the biggest advantages of GP programmable
processors is that these solutions are typically much lower
Copyright 2003, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product design or the use or application of
customers’ products or for any infringements of patents or rights of others which may result from Analog Devices assistance. All trademarks and logos are property
of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however
no responsibility is assumed by Analog Devices regarding technical accuracy and topicality of the content provided in Analog Devices’ Engineer-to-Engineer Notes.
in cost than their closest digital processing counterparts,
FPGAs and ASICs. Additionally, GP DSP design cycles
are much shorter which allows for a faster time to market.
Some companies must hire or consult professionals with
specialized skills to design FPGAs/ASICs. Companies may
even be forced to send their intellectual property (IP) outof-house involving certain risks in confidentiality
(hardware, firmware, and software). On the other hand, GP
DSP code can be converted to Read-Only-Memory-based
(ROM) or be masked into a DSP, like the ADSP-BF53x,
which further protects IP. Finally, GP processors are fully
programmable, unlike an ASIC implementation, where
every change requires a costly redesign (time and money).
These factors, alone, are driving many engineers to
reconsider GP DSP as the solution of choice, especially as
GP DSPs approach “Pentium class” core rates.
Generally speaking, the Blackfin processor needs to be
clocked minimally an order of magnitude (10X) faster than
the converter’s sample rate to guarantee sufficient data
processing bandwidth. Obviously, the amount of
processing bandwidth needed is dependent upon the
processors interface capabilities which is, in turn,
influenced by several other factors including: block
processing versus sample processing, the existence of a
Direct Memory Access (DMA) controller, multi-ported
memory, and whether external FIFOs are used.
Fortunately, the first instantiation of ADI’s Blackfin
processor family, the ADSP-BF535, has a full DMA
controller that operates independent of the core with multiported Level 1 (L1) and Level 2 (L2) memories. The
combination of core speed, an independent DMA
controller, and a large multi-ported on-board memory
(308K bytes) allows the ADSP-BF535 to perform efficient
block processing at high data rates. For example, if the
Revision 2.2-compliant, 33MHz, 32-bit Peripheral
Component Interconnect (PCI) interface is used (not
shown in this application), transfer bandwidths can be
achieved that approach 132MB/sec.
a
Unlike the ADSP-BF531/BF532/BF533 Blackfin
processors, which have a dedicated Parallel Peripheral
Interface (PPI) to connect directly to high-speed
converters, the ADSP-BF535 does not. However, the
External Bus Interface Unit (EBIU) of the ADSP-BF535
provides interfaces to asynchronous (ASYNC) external
memories. If the PCI bus must be used for other system
communications, the EBIU is the only available parallel
interface to connect the ADSP-BF535 to a high-speed
converter. Combining the DSP-mastered, asynchronous
control of this port with the synchronous, continuous data
stream of converters provides a challenge for system
designers.
This application note will cover one particular hardware
implementation utilizing a low pin count, low cost
Programmable Array Logic (PAL), Complex
Programmable Logic Device (CPLD), or FPGA. This logic
will perform the control functions between the AD9860/2
Mixed Signal Front-End (MxFE) and the ASYNC
external memory bus of the ADSP-BF535. Although the
DSP code will not be discussed in this note, the assembly
code is included in the Appendix as a reference. The
application depicted in Figure 1 below is for an Orthogonal
Frequency Division Multiplexed (OFDM) wireless
portable terminal. Please note that the ADC and DAC were
time-shared (Time Division Multiplexed or TDM) over the
ASYNC interface of the DSP. (The information given here
applies equally to other parallel high-speed ADCs and
DACs.)
This engineering note assumes that the reader has prior
knowledge of the ADSP-BF535 and the AD9860/2. If you
are unfamiliar with the ADSP-BF535, please refer to the
“ADSP-BF535 Blackfin® Processor Hardware
Reference”. The datasheet for the AD9860/2 can be found
at www.analog.com
Design Goals
One of the early design goals for this project was to
minimize the amount of external control logic necessary to
interface the DSP and the converter(s). Driven by cost,
engineering wanted to eliminate any FIFOs or memory
within the external logic device. An additional constraint
was to avoid routing the data buses through the logic
thereby reducing the number of pins, package size, and
cost of the logic device. The initial design shown in Figure
1 combines all functions (including data latching) into a
single logic device. However, production models of this
design will utilize inexpensive tri-state-able latches driven
by a logic device. These latches or buffers will multiplex
(pack) the samples from the DSP memory interface to the
12/14-bit DAC as well as buffer or de-multiplex (unpack)
the 10/12-bit ADC samples to the DSP memory interface.
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 2 of 24
Design Challenges
One of the key factors in any mixed-signal/DSP design is a
solid understanding of the trade-offs between the devices.
The following discussion will illustrate the various
tradeoffs that must be considered when interfacing
ADCs/DACs to the ADSP-BF535.
The OFDM modulation scheme for this design drove the
converter sample rate for this application to be
15.36MSPS. The AD9860/2 has a dual 10/12-bit, 64MSPS
ADC as well as a dual 12/14-bit, 128MSPS DAC. Unlike
ADI’s SHARC® DSPs that have a DMA-Request and
DMA-Grant (i.e. DMA can be mastered from external
device), the ADSP-BF535 Blackfin processor only has one
set of internal memory DMA channels (MemDMA), which
must be mastered from the DSP. In addition, when the
ADSP-BF535 ASYNC interface is connected to devices
that do contain FIFOs or memory, all latencies must be
understood. Every time the MemDMA relinquishes the bus
after a burst of eight (8) transfers, it requires ten (10)
SCLK cycles to begin the next transfer.
Future Blackfin processor derivatives will have
programmable priority levels for the DMA controller as
well as a dedicated high-speed parallel interface with
DMA-Request and DMA-Grant signaling. With a
dedicated PPI on future Blackfin processor products, the
ASYNC memory interface will not be required to connect
to parallel converters.
This approach assumes that the memory interface is
dedicated to the converters. Multiplexing external
SRAM/SDRAM memory with the converter(s) would be
difficult and is not recommended, especially considering
that there is only one MemDMA, and it would need to be
shared. The existence of a large on-board L2 memory
(256K bytes) minimizes the need for any external memory.
However, multiplexing the parallel converter(s) with a
Flash or EPROM for boot purposes is permissible.
This design uses a TDM time-slice approach for sharing
the external bus between the ADCs and the DACs.
Simultaneous access is not possible with the ADSP-BF535
because, as mentioned previously, there is only one
memory interface that either does a read or a write, and
there is only one set of MemDMA channels (source and
destination).
The ADSP-BF535 will support a maximum SCLK of
133MHz (peak DMA bandwidth). At this rate, and with no
external FIFO, the MemDMA could sustain a transfer (32
bit word) rate of 133M/10 (nine cycles are required for bus
acquisition plus one to make the next transfer) or 13.3M
words/second. Note, however, that the SCLK of the
ADSP-BF535 is derived from the core clock (CCLK).
Here, there are four available divide ratios: 2, 2.5, 3, and 4.
a
As a result, one possible combination of CCLK and divisor
that will allow a 133MHz SCLK is CCLK = 266MHz and
CCLK/SCLK = 2. If the core must run at 300MHz, the
highest SCLK that can be obtained is 120MHz
(divisor=2.5) to stay under the maximum 133MHz. Now,
since the ASYNC memory interface is 32 bits wide, one
can pack up to two 16-bit samples (in this case I and Q)
into each word. This effectively halves the word rate that
the DSP must process (with a 15.36MSPS converter
sample rate, the DSP will “see” 7.68MSPS). The highest
external converter sample rate, though, that the MemDMA
will support under these conditions is 2 * 120/10 =
24MSPS. Furthermore, the SCLK must be an integer
multiple of the converter sample rate to ensure proper
phase alignment between converter timing and DSP timing
and eliminate the need for any external FIFOs. So, if the
120MHz SCLK must be evenly divisible by the sample
rate, the highest even divisor of 120 is 12. Therefore, the
highest converter sample rate that the ADSP-BF535 will
support is 2 * 12M = 24MSPS. Again, the DSP will only
process half of this rate, 12MSPS, and this is, in fact, equal
to the maximum rate that the MemDMA can sustain, 12M
words/second. Note that higher sample rates can be
processed by the ADSP-BF535 by including small external
FIFOs between the converter(s) and the EBIU.
It is again noted that this application dictated a 15.36MSPS
converter sample rate driven by OFDM requirements. To
obtain a SCLK that is an integer multiple of this converter
sample rate, then, one must choose a Phase Locked Loop
(PLL) Multiplier that is an integer multiple, in turn, of one
of the four available divisor ratios (2, 2.5, 3, or 4). The
maximum CCLK allowed is 276.48MHz (using a PLL
multiplier of 18). This, in turn, limits the SCLK to the
integer multiple 276.48/3 = 92.16MHz (a divide ratio of 2
would give an SCLK over the 133MHz maximum). Under
these constraints, the maximum sustained rate that the
MemDMA can support is 92.16/10 = 9.21 words/second.
DMA Considerations
Careful consideration must be given to the combined,
required, “sustained” DMA performance. Since the
MemDMA is a shared resource over the DMA bus (DAB),
other DMA activity is arbitrated on this bus. This
application required a 10Mbit/second serial channel on a
SPORT that also must arbitrate for the DAB. This will
consume an additional 625K words/second at 16 bits/word
of DMA bandwidth. The ADSP-BF535 Blackfin processor
supports a total of 133M words/second peak DMA
bandwidth, and the SPORT has higher arbitration priority
over the MemDMA (see Table 1). Given this, the SPORT
DMA should effectively utilize the ten-cycle (10) delay
previously discussed and allow most, if not all, of the
9.21M words/seconds to be used by the MemDMA. There
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 3 of 24
are 9.21M – 7.68M (15.36/2) = 1.53M words/second of
additional bandwidth which should provide enough margin
for a sustained 7.68MSPS.
Analysis of the DMA engine within the ADSP-BF535
reveals a few other considerations. While the DMA engine
supports two types of DMA transfers: descriptor-based and
autobuffer-based, the ADSP-BF535 MemDMA controller
does not support autobuffer-based DMA. Therefore,
descriptor-based transfers must be used. The descriptor
fetch from L1/L2 memory involves two (2) five-word
block moves, one for the source descriptor and another for
the destination descriptor. Additionally, the MemDMA has
a 16-entry 32-bit FIFO that is filled from the source and
emptied from the destination. If both descriptors are loaded
simultaneously, this requires 39 SCLK cycles (worst case)
from L2. The destination descriptor load has priority over
the source load to avoid overrunning the FIFO. Thus, in
this example, the amount of time required to load both
descriptors simultaneously is 1/92.16M * 39 = 423
nanoseconds. The DMA engine descriptor load
performance is best when the descriptors are loaded from
L2 memory. If the descriptors are located in L1 memory,
there are additional delays. The source plus destination
descriptors’ load time from L1 is 65 SCLK cycles worst
case. To effectively process data at these sample rates,
ping-pong buffers are normally used (this design utilizes
two (2) 1024-word buffers). This technique allows data to
be filled into one buffer while the core processes the other
buffer. See the Appendix for the Blackfin assembly code
that utilizes the MemDMA to pull data in from an ADC
and “ping-pongs” between two internal buffers. As a
reference, the complete VisualDSP++ 3.1 project is
available from ADI.
There are two phases of operation that must be analyzed.
First, samples must be received from the ADC into the
DSP (receiver TDM phase) and secondly, samples must be
a
transmitted from the DSP to the DAC (transmitter TDM
phase).
Receiver TDM Phase
During the receive phase (i.e. data from the ADC), the data
movement is in this direction: ADC → EBIU (source) →
MemDMA → FIFO → L1/L2 (destination). At the
15.36MHz converter sample rate, a new 32-bit sample
arrives at the DSP every 1/7.68M = 130.2 nanoseconds. As
seen from the descriptor load time latency, 423
nanoseconds, something must be done to avoid
overrunning the DSP and losing samples. Fortunately, the
converters are attached to an external bus, and the address
bus is not being used. Thus, when moving samples into
the DSP, one can setup the source descriptor with the
maximum transfer count, 65536 words, and destination
descriptor with intended ping-pong buffer transfer size,
1024 words. In this way, upon interrupt from the core
every 1024 words, only the destination descriptor is
reloaded, and the load time is reduced to 20 SCLKs *
1/92.16M = 217 nanoseconds. As previously mentioned,
this design utilizes a TDM scheme in which the ADC and
DAC occupy time slices. The multiplex rate is a variable 58 milliseconds. Since the ADC and DAC data is
interleaved, at a worst case, the interface changes from
receiver to transmitter and vice versa every 8 milliseconds.
Therefore, 65536 words * 130.2 nanoseconds or 8.5
milliseconds is adequate time, and the source descriptor
only needs to be setup once at the beginning of each
receiver TDM phase. Finally, the 16-entry MemDMA
FIFO “hides” the destination descriptor load time because
the source is still filling the FIFO while the destination
descriptor is being loaded from memory. In a worst-case
scenario, the MemDMA FIFO will only accumulate a few
samples of data before the descriptor is reloaded. Then,
these samples are burst into memory. So, the need for an
external FIFO on the receiver side is eliminated and no
samples are lost.
Logic Overview and Timing
In avoiding the need for FIFOs in the external logic, it is
still important to synchronize the converter clocks to the
DSP SysCLKSCLK. Depending on the sample rate, this
limits the available ADSP-BF535 clocking options.
Minimally, SCLK must be evenly divisible by the
converter sample rate, and CCLK may need to be evenly
divisible by the converter sample rate as well (there is only
one non-integer divisor, 2.5, and this may not be useable in
some cases). External latches or buffers must be used to
align the data from the converters with the timing of the
DSP (See Figures 2 and 3 for sample skew and delay).
The four-wire DSP SPI port is directly connected to the
AD9860/2 SPI port. To ensure proper power sequencing
and initialization, the DSP should reset the converter(s). In
an effort to further reduce the pin count of the external
logic, another option available on the AD9860/2 (not
shown here) allows two (2) 10/12-bit ADC values to be
time-multiplexed onto a single 10/12-bit RXDATA bus.
This would eliminate one of the two (2) 10/12-bit buses,
but it requires the external logic to de-multiplex the data
before it is transmitted to the DSP.
All data movement is controlled or mastered by the
MemDMA within the Blackfin processor. When the ADC
data is read (see Figure 2), the external logic must drive the
data and the ARDY signal. The external logic must sample
the /AOE pin to check when data can be driven to the
ADSP-BF535. The /AOE signal indicates to the external
logic that the DMA controller is ready to take data. The
receiver three-state machine is shown at the bottom of
Figure 2.
When data is being sent out (see Figure 3) to the DAC, the
external logic has to sample the /AWE signal and then
drive ARDY. /AWE indicates to the external logic when
the DMA controller is ready with new data. The
transmitter four-state machine is shown at the bottom of
Figure 3.
Transmitter TDM Phase
During the transmit phase (i.e. DAC), data movement is in
the opposite direction: L2/L1 (source) → MemDMA →
FIFO → EBIU (destination) → DAC. Unlike the previous
mode, the source descriptor must be updated every 1024
words. This will require 20 SCLK cycles or 217
nanoseconds. However, since the MemDMA (9.21M
words/second) is running slightly faster than the sample
rate (7.68M words/second), this should maintain 16
samples in the MemDMA FIFO, which will feed the DAC
while the descriptor loads. The destination descriptor
transfer count can be fixed at 65536 words. Again, no
external FIFO is required and no samples are lost.
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 4 of 24
Conclusions
Even though the ADSP-BF535 Blackfin processor was not
specifically designed to interface to high-speed parallel
converters, this engineering note provides a low-cost
“FIFO-free” solution for interfacing to both ADCs and
DACs with sample rates up to 24MSPS if the core is
constrained to run at 300MHz. If the ADSP-BF535 core
can be clocked specifically at 264MHz, then the highest
converter sample rate is limited only by the maximum
SCLK that the ADSP-BF535 can support (133MHz) and
the inter-burst 10-cycle MemDMA latency: 2 * 133M/10 =
26.6MHz.
a
In summary, the following table is included as a set of
rules specific to the ADSP-BF535 Blackfin processor:
The Ten Commandments of
Interfacing to the ADSP-BF535
1) The ADSP-BF535’s maximum allowed core frequency
is 300MHz.
2) ADSP-BF535's maximum allowed system clock
(SCLK) is 133MHz.
3) ADSP-BF535's MemDMA has a worst-case ten-cycle
(10) reacquisition latency every time the DMA bus is
relinquished.
4) ADSP-BF535's SCLK shalt be derived from CCLK,
and there are four available divide ratios between
CCLK and SCLK (2, 2.5, 3, and 4).
5) If the ADSP-BF535's core shalt run at the maximum
(300MHz), then the maximum SCLK is 120MHz. See
Commandments 1, 2, and 4.
6) The maximum ADSP-BF535 MemDMA rate is
133M/10 = 13.3Mwords/sec. See Commandments 2
and 3.
7) In order to obtain the fastest ADSP-BF535 SCLK
(133MHz) and MemDMA transfer rate
(13.3Mwords/sec), the core shalt not run at maximum,
but at 266MHz with a CCLK/SCLK divide ratio of 2.
"Only" 34 MIPs are not utilized!
8) When not using external FIFOs, the ADSP-BF535's
core shalt operate minimally at an order of magnitude
(10X) greater than the greatest external interfacing
converter sample rate to provide sufficient processing
bandwidth.
9) To eliminate the need for external FIFOs, the ADSPBF535's SCLK shalt be an integer multiple of the
external interfacing converter sample rates to ensure
proper phase alignment between converter timing and
DSP timing.
10) To halve the sample rate that the DSP must process, the
external logic shalt pack up to two 16-bit samples into
each 32-bit word.
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 5 of 24
a
21535
PF[X1]
AMS[X]
AWE
ARDY
7.68 MHz
DATA[31:20, 15:2]
7.68 MSPS
AOE
ARE
ABE[3]
CLKOUT
92.16 MHz
XTALI
15.36 MHz
GPIOPF[X2]
CS
+/-0
RESET
TXBLANK
FPGA / PALAD9860/2ADSP-
RXSYNC
0 MHz
TXSYNC
MUX
MUX
Buffer
3.84 MHz
TXDATA[13:0]
7.68 MSPS
RXDATA[23:0]
7.68 MSPS
15.36 MSPS
4x
3.84 MSPS
4x
15.36 MSPS
1/2
7.68 MSPS
1/2
CLKOUT2
7.68 MHz
CLKOUT1
30.72 MHz
XTAL OSC
3.3V
CLKSEL
VCXOp
30.72 MHz
30.72 MHz
SPISEL[X]SEN
SCKx
MISOx
MOSIx
SCLK
SDO
SDIO
SSEL1
SSEL0
TXSYNC
TXDATA
RXDATA
RXSYNC
CLKOUT1
CLKOUT2
TX-ATX-BTX-ATX-B
RX-AB
RX-ABRX-ABRX-AB
CCLK=276.48 MHz
SysCLK=92.16 MHz
Martin Kessler 01/18/02, Analog Devices
3.84 MHz
30.72 MHz
7.68 MHz
Figure 1: External Logic
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 6 of 24
a
CCLK
SCLK
AMS[x]
AWE
ARDY
DATA
AOE
ARE
ABE[3]
CLKOUT2
Set-up
Hold
4
Read
Access
1
25
123456789
Hold4Set-upRead
10 11 12
Data0 AB [31:20, 15:4]
Use ASYNC
Clear
Access
1
2
123456789
12345
5
Data1 AB [31:20, 15:4]
Hold
4
276.48 MHz
1011 1 2
92.16 MHz
7.68 MHz
7.68 MSPS
7.68 MHz
RXDATA
CLKOUT1
Consecutive Read Access from FPGA, RX-path AD9860/2
3) ARDY low (use asynchronous clear); Sample RX Data AB
4) ARDY low; Temp=1; CLKOUT2==low ?
RXData1 AB [23:0]
12
/ARE==low ?
truefalse
RXData2 AB [23:0]
3412
7.68 MSPS
30.76 MHz
false
false
Figure 2: Receive Timing and State Machine
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 7 of 24
a
CCLK
SysCLK
AMS[x]
AWE
ARDY
DATA
AOE
ARE
ABE[3]
CLKOUT2
Set-up
Hold
1
6
123456789
Write Access
12
Data1 A [15:2]Data1 B [15:2]
Hold
7
10 11 12
Set-up
4
1
123456789
Write Access
7
Hold
4
276.48 MHz
1011 1 2
92.16 MHz
7.68 MHz
7.68 MSPS
7.68 MHz
TXDATA
CLKOUT1
TXSYNC
Consecutive Write Access to FPGA, TX-path AD9860/2
If /AMS==low and /AOE==high:
1) /AWE==low ?
true
Martin Kessler 01/18/02, Analog Devices
2) /AWE==low ? yes: Sample DATA and ABE[3]
3) Output new TX DATA and TXSYNC one cycle after CLKOUT2 high detected
TXData0 B [13:0]
TXData1 A [13:0]
false
7.68 MSPS
30.72 MHz
3.84 MHz
Figure 3: Transmit Timing and State Machine
Interfacing the ADSP-BF535 Blackfin® Processor to High-Speed Converters (like those on the AD9860/2) over the
External Memory Bus (EE-162) Page 8 of 24
Loading...
+ 16 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.