Analog Devices EE162 (2) Application Notes

a
a Engineer To Engineer Note EE-162
aa
Technical Notes on using Analog Devices’ DSP components and development tools
Phone: (800) ANALOG-D, FAX: (781) 461-3010, EMAIL: dsp.support@analog.com, FTP: ftp.analog.com, WEB: www.analog.com/dsp
Interfacing the ADSP-21535 to High­Speed Converters (like those on the AD9860/2) over the External Memory Bus
Last modified: 2/12/02 Contributed By: Jeff Sondermeyer, Jeritt Kent, Martin Kessler, and Rick Gentile
Introduction
In the 1970’s and 80’s high speed mixed signal designs were often constrained by digital circuitry limitations, not analog. High-speed parallel converters (>10MSPS), for example, have been available from industry leaders like Analog Devices Inc. (NYSE: ADI), since the 1970’s. More and more applications are demanding intensive real­time algorithms. In addition, higher sample rates (14 bits at greater than 50MSPS) are available from both analog­to-digital converters (ADCs) and digital-to-analog converters (DACs). These factors mandate faster programmable general-purpose (GP) digital signal processors (DSPs) to handle the challenges presented by these high-speed designs. Until recently, most designers were forced to interface high-speed parallel converters to Application Specific ICs (ASICs) or fast Field Programmable Gate Arrays (FPGAs). Devices like these are capable of resolving the many required simultaneous parallel digital operations but are often inflexible and can be prohibitively expensive. Now, with the recent launch of the Blackfin™ DSP (ADSP-21535), ADI has a programmable GP 16-bit fixed-point vector DSP that can process the sustained input/output (I/O) and core throughputs re quired to pro cess data from many of these converters. In particular, the ADSP-21535’s core can be clocked at 300MHz. Depending on the core clock frequency, a maximum I/O or system clock (SCLK) of 133MHz can be achieved. This SCLK should not to be confused with the serial clock for the Serial Peripheral Interface (SPI).
Advantages of using a GP DSP
One of the biggest advantages of GP programmable DSPs is that these solutions are typically much lower in cost than their closest digital processing counterparts, FPGAs and ASICs. Additionally, GP DSP design cycles are
much shorter which allows for a faster time to market. Some companies must hire or consult professionals with specialized skills to design FPGAs/ASICs. Companies may even be forced to send their intellectual property (IP) out-of-house involving certain risks in confidentiality (hardware, firmware, and software). On the other hand, GP DSP code can be converted to Read-Only-Memory­based (ROM) or be masked into a DSP, like the ADSP­2153x, which further protects IP. Finally, GP DSPs are fully programmable, unlike an ASIC implementation, where every change requires a costly redesign (time and money). These factors, alone, are driving many engineers to reconsider GP DSP as the solution of choice, especially as GP DSPs approach “Pentium class” core rates.
Generally speaking, the DSP needs to be clocked minimally an order of magnitude (10X) faster than the converter’s sample rate to guarantee sufficient data processing bandwidth. Obviously, the amount of processing bandwidth needed is dependent upon the DSP’s interface capabilities which is, in turn, influenced by several other factors including: block processing versus sample processing, the existence of a Direct Memory Access (DMA) controller, multi-ported memory, and whether external FIFOs are used. Fortunately, the first instantiation of ADI’s Blackfin DSP family, the ADSP-21535, has a full DMA controller that operates independent of the core with multi-ported Level 1 (L1) and Level 2 (L2) memories. The combination of core speed, an independent DMA controller, and a large multi­ported on-board memory (308K bytes) allows the ADSP­21535 to perform efficient block processing at high data rates. For example, if the Revision 2.2-compliant, 33MHz, 32-bit Peripheral Component Interconnect (PCI) interface is used (not shown in this application), transfer bandwidths can be achieved that approach 132MB/sec.
The ADSP-21532 has a dedicated Parallel Peripheral Interface (PPI) to connect directly to high-speed converters. Note that the ADSP-21535 does not this interface. However, the External Bus Interface Unit (EBIU) of the ADSP-21535 provides interfaces to asynchronous (ASYNC) external memories. If the PCI bus must be used for other system communications, the EBIU is the only available parallel interface to connect to a high-speed converter. Combining the DSP-mastered, asynchronous control of this port with the synchronous, continuous data stream of converters provides a challenge for system designers.
Copyright 2002, Analog Devices, Inc. All rights reserved. Analog Devices assumes no responsibility for customer product d esign or the use or ap plication of customers ’ products or for any infringements of patents or rights of others w hich may result fro m Analog Devices assist ance. All trademarks and logos are property of their respective holders. Information furnished by Analog Devices Applications and Development Tools Engineers is believed to be accurate and reliable, however no responsibility is assumed by Analog Devices regarding the technical accuracy of the content provided in all Analog Devices’ Engineer-to-Engineer Notes.
This application note will cover one particular hardware implementation utilizing a low pin count, low cost Programmable Array Logic (PAL), Complex Programmable Logic Device (CPLD), or FPGA. This logic will perform the control functions between the AD9860/2 Mixed Signal Front-End (MxFE) and the ASYNC external memory bus of the ADSP-21535. Although the DSP code will not be disc ussed in this note, the assembly code is included in the Appendix as a reference. The application depicted in Figure 1 below is for an Orthogonal Frequency Division Multiplexed (OFDM) wireless portable terminal. Please note that the ADC and DAC were time-shared (Time Division Multiplexed or TDM) over the ASYNC interface of the DSP. (The information given here applies equally to other parallel high-speed ADCs and DACs.)
This engineering note assumes that the reader has prior knowledge of the ADSP-21535 and the AD9860/2. If you are unfamiliar with the ADSP-21535, please refer to the “ADSP-2153x/21535 Blackfin DSP Hardware Reference”. The datasheet for the AD9860/2 can be found at www.analog.com
Design Goals
One of the early design goals for this project was to minimize the amount of external control logic necessary to interface the DSP and the converter(s). Driven by cost, engineering wanted to eliminate any FIFOs or memory within the external logic device. An additional constraint was to avoid routing the data buses through the logic thereby reducing the number of pins, package size, and cost of the logic device. The initial design shown in Figure 1 co mbines all functions (includi ng data latching) into a single logic device. However, production models of this design will utilize inexpensive tri-state-able latches driven by a logic device. These latches or buffers will multiplex (pack) the samples from the DSP memory interface to the 12/14-bit DAC as well as buffer or de­multiplex (unpack) the 10/12-bit ADC samples to the DSP memory interface.
Design Challenges
One of the key factors in any mixed-signal/DSP design is a solid understanding of the trade-offs between the devices. The following discussion will illustrate the various tradeoffs that must be considered when interfacing ADCs/DACs to the ADSP-21535.
The OFDM modulation scheme for this design drove the converter sample rate for this application to be
15.36MSPS. The AD9860/2 has a dual 10/12-bit, 64MSPS ADC as well as a dual 12/14-bit, 128MSPS DAC. Unlike ADI’s SHARC® processors that have a DMA-Request and DMA-Grant (i.e. DMA can be mastered from external device), the ADSP-21535 only has one set of internal memory DMA channels (memDMA), which must be mastered from the DSP. In addition, when the ADSP-21535 ASYNC interface is connected to devices that do contain FIFOs or memory, all latencies must be understood. Every time the memDMA relinquishes the bus after a burst of eight (8) transfers, it requires ten (10) SCLK cycles to begin the next transfer.
Future Blackfin derivatives will have programmable priority levels for the DMA controller as well as a dedicated high-speed parallel interface with DMA­Request and DMA-Grant signaling. With a dedicated PPI on future Blackfin products, the ASYNC memory interface will not be required to connect to parallel converters.
This approach assumes that the memory interface is dedicated to the converters. Multiplexing external SRAM/SDRAM memory with the converter(s) would be difficult and is not recommended, especially considering that there is only one memDMA, and it would need to be shared. The existence of a large on-board L2 memory (256K bytes) minimizes the need for any external memory. However, multiplexing the parallel converter(s) with a Flash or EPROM for boot purposes is permissible.
This design uses a TDM time-slice approach for sharing the external bus between the ADCs and the DACs. Simultaneous access is not possible with the ADSP-21535 because, as mentioned previously, there is only one memory interface that either does a read or a write, and there is only one set of memDMA channels (source and destination).
The ADSP-21535 will support a maximum SCLK of 133MHz (peak DMA bandwidth). At this rate, and with no external FIFO, the memDMA could sustain a transfer (32 bit word) rate of 133M/10 (nine cycles are required for bus acquisition plus one to make the next transfer) or
13.3M words/second. Note, however, that the SCLK of the ADSP-21535 is derived from the core clock (CCLK). Here, there are four available divide ratios: 2, 2.5, 3, and
4. As a result, one possible combination of CCLK and divisor that will allow a 133MHz SCLK is CCLK = 266MHz and CCLK/SCLK = 2. If the core must run at 300MHz, the highest SCLK that can be obtained is
EE-162 Page 2
Technical Notes on using Analog Devices’ DSP components and development tools
Phone: (800) ANALOG-D, FAX: (781)461-3010, EMAIL: dsp.support@analog.com, FTP: ftp.analog.com, WEB: www.analog.com/dsp
120MHz (divisor=2.5) to stay under the maximum 133MHz. Now, since the ASYNC memory interface is 32 bits wide, one can pack up to two 16-bit samples (in this case I and Q) into each word. This effectively halves the word rate that the DSP must process (with a 15.36MSPS converter sample rate, the DSP will “see” 7.68MSPS). The highest external converter sample rate, though, that the memDMA will support under these conditions is 2 * 120/10 = 24MSPS. Furthermore, the SCLK must be an integer multiple of the converter sample rate to ensure proper phase alignment between converter timing and DSP timing and eliminate the need for any external FIFOs. So, if the 120MHz SCLK must be evenly divisible by the sample rate, the highest even divisor of 120 is 12. Therefore, the highest converter sample rate that the ADSP-21535 will support is 2 * 12M = 24MSPS. Again, the DSP will only process half of this rate, 12MSPS, and this is, in fact, equal to the maximum rate that the memDMA can sustain, 12M words/second. Note that higher sample rates can be processed by the ADSP­21535 by including small external FIFOs between the converter(s) and the EBIU.
It is again noted that this application dictated a
15.36MSPS converter sample rate driven by OFDM requirements. To obtain a SCLK that is an integer multiple of this converter sample rate, then, one must choose a Phase Locked Loop (PLL) Multiplier that is an integer multiple, in turn, of one of the four available divisor ratios (2, 2.5, 3, or 4). The maximum CCLK allowed is 276.48MHz (using a PLL multiplier of 18). This, in turn, limits the SCLK to the integer multiple
276.48/3 = 92.16MHz (a divide ratio of 2 would give an SCLK over the 133MHz maximum). Under these constraints, the maximum sustained rate that the memDMA can support is 92.16/10 = 9.21 words/second.
DMA Considerations
Careful consideration must be given to the combined, required, “sustained” DMA performance. Since the memDMA is a shared resource over the DMA bus (DAB), other DMA activity is arbitrated on this bus. This application required a 10Mbit/second serial channel on a SPORT that also must arbitrate for the DAB. This will consume an additional 625K words/second at 16 bits/word of DMA bandwidth. The ADSP-21535 supports a total of 133M words/second peak DMA bandwidth, and the SPORT has higher arbitration priority over the memDMA (see Table 1). Given this, the SPORT DMA should effectively utilize the ten-cycle (10) delay previously discussed and allow most, if not all, of the 9.21M words/seconds to be used by the memDMA. There are
9.21M – 7.68M (15.36/2) = 1.53M words/second of additional bandwidth which should provide enough margin for a sustained 7.68MSPS.
DAB Master Arbitration Priority
SPORT0 RCV DMA Controller 0 - highest SPORT1 RCV DMA Controller 1 SPORT0 XMT DMA Controller 2 SPORT1 XMT DMA Controller 3 USB DMA Controller 4 SPI0 DMA Controller 5 SPI1 DMA Controller 6 UART0 RCV Controller 7 UART1 RCV Controller 8 UART0 XMT Controller 9 UART1 XMT Controller 10 Memory DMA Controller 11 - lowest
Table 1: Arbitration Priority
Analysis of the DMA engine within the ADSP-21535 reveals a few other considerations. While the DMA engine supports two types of DMA transfers: descriptor­based and autobuffer-based, the ADSP-21535 memDMA controller does not support autobuffer-based DMA. Therefore, descriptor-based transfers must be used. The descriptor fetch from L1/L2 memory involves two (2) five-word block moves, one for the source descriptor and another for the destination descriptor. Additionally, the memDMA has a 16-entry 32-bit FIFO that is filled from the source and emptied from the destination. If both descriptors are loaded simultaneously, this requires 39 SCLK cycles (worst case) from L2. The destination descriptor load has priority over the source load to avoid overrunning the FIFO. Thus, in this example, the amount of time required to load both descriptors simultaneously is 1/92.16M * 39 = 423 nanoseconds. The DMA engine descriptor load performance is best when the descriptors are loaded from L2 memory. If the descriptors are located in L1 memory, there are additional delays. The source plus destination descriptors’ load time from L1 is 65 SCLK cycles worst case. To effectively process data at these sample rates, ping-pong buffers are normally used (this design utilizes two (2) 1024-word buffers). This technique allows data to be filled into one buffer while the core processes the other buffer. See the Appendix for the Blackfin™ assembly code that utilizes the memDMA to pull data in from an ADC and “ping-pongs” between two internal buffers. As a reference, the complete VisualDSP++ 2.0 project is available from ADI.
EE-162 Page 3
Technical Notes on using Analog Devices’ DSP components and development tools
Phone: (800) ANALOG-D, FAX: (781)461-3010, EMAIL: dsp.support@analog.com, FTP: ftp.analog.com, WEB: www.analog.com/dsp
There are two phases of operation that must be analyzed. First, samples must be received from the ADC into the DSP (receiver TDM phase) and secondly, samples must be transmitted from the DSP to the DAC (transmitter TDM phase).
Receiver TDM Phase
During the receive phase (i.e. data from the ADC), the data movement is in this direction: ADC EBIU (source) memDMA FIFO L1/L2 (destination). At the 15.36MHz converter sample rate, a new 32-bit sample arrives at the DSP every 1/7.68M = 130.2 nanoseconds. As seen from the descriptor load time latency, 423 nanoseconds, something must be done to avoid overrunning the DSP and losing samples. Fortunately, the converters are attached to an external bus, and the address bus is not being used. Thus, when moving samples into the DSP, one can setup the source descriptor with the maximum transfer count, 65536 words, and destination descriptor with intended ping-pong buffer transfer size, 1024 words. In this way, upon interrupt from the core every 1024 words, only the destination descriptor is reloaded, and the load time is reduced to 20 SCLKs * 1/92.16M = 217 nanoseconds. As previously mentioned, this design utilizes a TDM scheme in which the ADC and DAC occupy time slices. The multiplex rate is a variable 5-8 milliseconds. Since the ADC and DAC data is interleaved, at a worst case, the interface changes from receiver to transmitter and vice versa every 8 milliseconds. Therefore, 65536 words * 130.2 nanoseconds or 8.5 milliseconds is adequate time, and the source descriptor only needs to be setup once at the beginning of each receiver TDM phase. Finally, the 16­entry memDMA FIFO “hides” the destination descriptor load time because the source is still filling the FIFO while the destination descriptor is being loaded from memory. In a worst-case scenario, the memDMA FIFO will only accumulate a few samples of data before the descriptor is reloaded. Then, these samples are burst into memory. So, the need for an external FIFO on the receiver side is eliminated and no samples are lost.
Transmitter TDM Phase
During the transmit phase (i.e. DAC), data movement is in the opposite direction: L2/L1 (source) memDMA FIFO EBIU (destination) DAC. Unlike the previous mode, the source descriptor must be updated every 1024 words. This will require 20 SCLK cycles or 217 nanoseconds. However, since the memDMA (9.21M words/second) is running slightly faster than the sample rate (7.68M words/second), this should maintain 16
samples in the memDMA FIFO, which will feed the DAC while the descriptor loads. The destination descriptor transfer count can be fixed at 65536 words. Again, no external FIFO is required and no samples are lost.
Logic Overview and Timing
In avoiding the need for FIFOs in the external logic, it is still important to synchronize the converter clocks to the DSP SCLK. Depending on the sample rate, this limits the available ADSP-21535 clocking options. Minimally, SCLK must be evenly divisible by the converter sample rate, and CCLK may need to be evenly divisible by the converter sample rate as well (there is only one non­integer divisor, 2.5, and this may not be useable in some cases). External latches or buffers must be used to align the data from the converters with the timing of the DSP (See Figures 2 and 3 for sample skew and delay). The four-wire DSP SPI port is directly connected to the AD9860/2 SPI port. To ensure proper power sequencing and initialization, the DSP should reset the converter(s). In an effort to further reduce the pin count of the external logic, another option available on the AD9860/2 (not shown here) allows two (2) 10/12-bit ADC values to be time-multiplexed onto a single 10/12-bit RXDATA bus. This would eliminate one of the two (2) 10/12-bit buses, but it requires the external logic to de-multiplex the data before it is transmitted to the DSP.
All data movement is controlled or mastered by the memDMA within the DSP. When the ADC data is read (see Figure 2), the external logic must drive the data and the ARDY signal. The external logic must sample the /AOE pin to check when data can be driven to the ADSP-
21535. The /AOE signal indicates to the external logic that the DMA controller is ready to take data. The receiver three-state machine is shown at the bottom of Figure 2.
When data is being sent out (see Figure 3) to the DAC, the external logic has to sample the /AWE signal and then drive ARDY. /AWE indicates to the external logic when the DMA controller is ready with new data. The transmitter four-state machine is shown at the bottom of Figure 3.
Conclusions
Even though the ADSP-21535 was not specifically designed to interface to high-speed parallel converters, this engineering note provides a low-cost “FIFO-free” solution for interfacing to both ADCs and DACs with sample rates up to 24MSPS if the core is constrained to
EE-162 Page 4
Technical Notes on using Analog Devices’ DSP components and development tools
Phone: (800) ANALOG-D, FAX: (781)461-3010, EMAIL: dsp.support@analog.com, FTP: ftp.analog.com, WEB: www.analog.com/dsp
run at 300MHz. If the ADSP-21535 core can be clocked specifically at 264MHz, then the highest converter sample rate is limited only by the maximum SCLK that the ADSP-21535 can support (133MHz) and the inter-burst 10-cycle memDMA latency: 2 * 133M/10 = 26.6MHz.
In summary, the following table is included as a set of rules specific to the ADSP-21535:
The Ten Commandments of Interfacing to the ADSP-21535
1) The ADSP-21535’s maximum allowed core frequency is 300MHz.
2) ADSP-21535's maximum allowed system clock (SCLK) is 133MHz.
3) ADSP-21535's memDMA has a worst-case ten­cycle (10) reacquisition latency every time the DMA bus is relinquished.
4) ADSP-21535's SCLK shalt be derived from CCLK, and there are four available divide ratios between CCLK and SCLK (2, 2.5, 3, and 4).
5) If the ADSP-21535's core shalt run at the maximum (300MHz), then the maximum SCLK is 120MHz. See Commandments 1, 2, and 4.
6) The maximum ADSP-21535 memDMA rate is 133M/10 = 13.3Mwords/sec. See Commandments 2 and 3.
7) In order to obtain the fastest ADSP-21535 SCLK (133MHz) and memDMA transfer rate (13.3Mwords/sec), the core shalt not run at maximum, but at 266MHz with a CCLK /SCLK divide ratio of 2. "Only" 34 MIPs are not utilized!
8) When not using external FIFOs, the ADSP-21535's core shalt operate minimally at an order of magnitude (10X) greater than the greatest external interfacing converter sample r ate to provi de suffi cient pro cessing bandwidth.
9) To eliminate the need for external FIFOs, the ADSP­21535's SCLK shalt be an integer multiple of the external interfacing converter sample rates to ensure proper phase alignment between converter timing and DSP timing.
10) To halve the sample rate that the DSP must process, th e externa l log ic shal t pack up to two 16-b it samples into each 32-bit word.
EE-162 Page 5
Technical Notes on using Analog Devices’ DSP components and development tools
Phone: (800) ANALOG-D, FAX: (781)461-3010, EMAIL: dsp.support@analog.com, FTP: ftp.analog.com, WEB: www.analog.com/dsp
Loading...
+ 10 hidden pages