MITEL PDSP16510AMAAC1R, PDSP16510AMAGCPR Datasheet

PDSP16510A MA

Stand Alone FFT Processor

Supersedes January 1997 version, DS3762 - 2.0 DS3762 - 3.0 October 1998

The PDSP16510 performs Forward or Inverse Fast Fourier Transforms on complex or real data sets containing up to 1024 points. Data and coefficients are each represented by 16 bits, with block floating point arithmetic for increased dynamic range.

An internal RAM is provided which can hold up to 1024 complex data points. This removes the memory transfer bottleneck, inherent in building block solutions. Its organisation allows the PDSP16510 to simultaneously input new data, transform data stored in the RAM, and to output previous results. No external buffering is needed for transforms containing up to 256 points, and the PDSP16510 can be directly connected to an A/D converter to perform continuous transforms. The user can choose to overlap data blocks by either 0%, 50%, or 75%. Inputs and outputs are asynchronous to the 40MHz system clock used for internal operations.

A 1024 point complex transform can be completed in some 98µs, which is equivalent to throughput rates of 450 million operations per second. Multiple devices can be connected in parallel in order to increase the sampling rate up to the 40MHz system clock. Six devices are needed to give the maximum performance with 1024 point transforms.

Either a Hamming or a Blackman-Harris window operator can be internally applied to the incoming real or complex data. The latter gives 67dB side lobe attenuation. The operator values are calculated internally and do not require an external ROM nor do they incur any time penalty.

The device outputs the real and imaginary components of the frequency bins. These can be directly connected to the PDSP16330 in order to produce magnitude and phase values from the complex data.

Rev A B C D

Date MAR 1993 JAN 1997 OCT 1998

FEATURES

Completely self contained FFT Processor

Internal RAM supports up to1024 complex points

16 bit data and coefficients plus block floating point for increased dynamic range

450 MIP operation gives 98 microsecond transformation times for 1024 points

Up to 40MHz sampling rates with A grade multiple devices.

Fig. 1. Block Diagram

NOTE

Polyimide is used as an inter-layer dielectric and as glassivation.

Polymeric material is also used for die attach which according to the requirement in paragraph 1.2.1.b. (2) precludes catagorising this device as fully compliant. In every other respect this device has been manufactured and screened in full accordance with the requirements of Mil-Std 883 (latest revision).

CHANGE NOTIFICATION

The change notification requirements of MIL-PRF-38535 will be implemented on this device type. Known customers will be notified of any changes since the last buy when ordering further parts if significant changes have been made.

Internal window operator gives 67dB side lobe attenuation and needs no external ROM.

132 pin surface mount package

ORDERING INFORMATION

PDSP16510A MA GCPR (Power Ceramic QFP Package

- HIREL LEVEL A Screening)

PDSP16510A MA AC1R (Power Ceramic PGA Package

- HIREL LEVEL A Screening)

PDSP16510A MA

Fig. 2. Typical 256 Point Real Only System Performing Continuous Transforms

Pin Out for 84 PGA Package (AC84) - bottom view

PDSP16510A MA

Pin Out for 132 Leaded Chip Carrier (GC132)

PDSP16510A MA

SIGNAL TYPE DESCRIPTION

D15:0 I Data input during real only mode, The real component in complex data mode.

AUX15:0 I When DEF is active AUX15:0 are used to define the operating mode as defined in Table 3.

When DEF is in-active AUX15:0 either provide the 16 bit imaginary component of complex input data, or a second set of real only inputs.

R15:0 O These pins output the real component of the transformed data when DAV and DEN are active.

Otherwise they are high impedance.

I15:0 O These pins output the imaginary component of the transformed data when DAV and DEN are

active. Otherwise they are high impedance.

DEF I The high going edge of DEF is used to internally latch the contents of AUX15:0, which then

define the operating mode. In the simplest system DEF is a power on reset. When DEF is low the internal control logic is reset.

SCLK I System clock used for internal computations.

S3:0 O These pins indicate the number of shifts towards the binary point which have occurred as the

result of the conditional scaling logic. When the data path right shift is restricted to 2 places per pass, state 15 is used to indicate an overflow and only a total of 14 shifts is possible.

LFLG O This flag indicates that data is being loaded into the device. It goes active in response to an

INEN input, and may be programmed to go in-active after the complete, one quarter, or one half a data block has been loaded.

INEN I The use of this input is mode dependent. It is either used as an active low, load enabling,

signal for the DIS strobe, or it is used to initiate a new block load operation.

DIS I The rising edge of this asynchronous input is used to load data into the device.

DOS I The rising edge of this asynchronous input is used to dump data from the device. In most

applications it may be tied to the DIS input, even if the output rate must be higher than the input rate because of overlapped data blocks. The DIS input is then internally divided down.

DAV O An active low signal that indicates that a transform is complete. Transformed data will then

be outputed in normal sequential order using DOS. It may be optionally programmed to be delayed by 24 DOS strobes to match the delay through a PDSP16330.

DEN I This input is used to enable the data dump operation when DAV has gone active. If it is tied

low the device will automatically dump data when DAV goes active. Otherwise the device will wait for the enabling signal to go low before the dump operation commences.

DISAB I Only available in the 132 pin GC package. When high the block floating logic is disabled.

VDD P +5V pins

GND P Ground pins

NOTE. All references to DEF, INEN, DAV, and DEN within the text do not contain the bar designator, signifying an active low

signal. This is considered to be implied by the signal name and is not meant to imply a change in the signal function.

FUNCTIONAL OPERATION

The PDSP16510 performs decimation in time, radix 4, forward or inverse Fast Fourier Transforms. Data is loaded into an internal workspace RAM in normal sequential order, processed, and then dumped in the correct order. With real only input data the processing time can approximately be

halved for a given transform size. Two real inputs then replace a single complex input, and are processed in parallel.

Either a Blackman Harris or a Hamming window can be generated internally, and applied to the incoming real or complex data with no time penalty. No external ROM is needed to support these windows. The Blackman Harris window gives improved dynamic range over the Hamming window when two closely

spaced frequencies are to be detected, and one is of smaller magnitude than the other. It does, however, reduce the actual frequency resolution, and the Hamming window may then be preferable.

Data in and out of the device is represented by 16 bit real and imaginary components, with 16 bit sine and cosine values contained in an internal ROM. Conditional scaling, coupled with word growth through the butterfly data path, gives increased dynamic range. Transforms can be computed with sample sizes of either 256 or 1024 data points. The 256 point option can alternatively be used to simultaneously execute either four 64 point transforms, or sixteen 16 point transforms. The 16 point mode can only be used with a rectangular window, and no overlapping of data blocks is possible.

The device can be configured, either, to perform continuous transforms in a real time application, or as slave processor to a more general purpose signal processing system. In the continuous mode, with transform sizes of 256 points or less, it contains three internal control units which simultaneously allow new data to be loaded, present data to be transformed, and previous results to be dumped. Additional, external, input/ output buffering is not needed. The internal input buffer also allows data blocks to be overlapped by either 50% or 75%, apart from the mode with no overlaps. When 1024 point transforms are to be calculated, without loss of incoming data during the transform time, it is necessary to use an input buffer. This requirement is satisfied by a single PDSP16540 support device.

In any of the real or complex modes it is possible to obtain higher performance by connecting devices in parallel. It is then possible to increase the sampling rate to that of the system clock used for internal operations.

The mode of operation of the device is controlled by 16 bits in a control register. These are loaded through the AUX15:0 port when a control signal DEF is active low. This port is also used to provide the imaginary component of complex input data, and, if complex transforms are to be performed, an external tristate buffer will be needed to isolate the control information. This should only be enabled when DEF is active. DEF is also used to initiliase the internal circuitry, and can be a simple power on reset if control parameters need not be subsequently changed.

PDSP16510A MA

DATA PRECISION

During each pass of a radix-4 fast Fourier transform it is possible for either component of a particular result to grow by a factor of up to four in the first pass, and 5.242 in subsequent passes. This is between two and three bits in each pass and the data path must allow for this word growth to avoid any possibility of overflow. At the end of the data path the word is again reduced to 16 bits by discarding least significant bits.. Any un-necessary word growth to prevent overflow thus results in loss of arithmetic precision, and has a detrimental effect on the dynamic range achievable.

In practice these large word growths only occur when bipolar complex square waves are transformed, and even then will not occur on every pass. The PDSP16510 compromises by allowing a 2 bit word growth during the butterfly calculation in the first pass. This is equivalent to ignoring the most significant bit of the 19 bit final result ,which is assumed to be an extra sign bit, and then selecting the next 16 bits for

Fig. 3 One of Four Data Paths

storage. In subsequent passes a Control Register Bit allows the user to continue to select these 16 bits, or instead to use the 16 most significant bits. The latter option is equivalent to a 3 bit word growth. The 2 or 3 bit word growth option applies to ALL subsequent passes and is not a per pass option.

If the 2 bit option is selected there is a possibility of overflow occurring in one of the passes. The prediction of overflow is mathematically difficult, and only occurs with specific complex square waves. Scaling down the inputs cannot be guaranteed to prevent overflow because of the

PDSP16510A MA

block floating point shifting scheme, which is discussed later. Overflow can NEVER occur if the 3 bit option is chosen, but at the expense of worse dynamic range.

When overflow does occur a flag is raised which can be read by the user ( see later discussion on scale tag bits ), and the results ignored. In addition all frequency bins are forced to zero to prevent any erroneous system response.

Even with only 2 bit word growth poor dynamic range will be obtained if the data is simply reduced to 16 bits, and becomes worse when the incoming data does not fully occupy all the bits in the word. These problems are overcome in the PDSP16510, however, by a block floating point scheme which compensates for any unnecessary word growth.

During each pass the number of sign bits in the largest result is recorded. Before the next pass, data is shifted left [multiplied by 2], once for every extra sign bit in this recorded sample. At least one component in the block then fully occupies the 16 bit word, and maximum data accuracy is preserved

Up to four shifts are possible before every pass after the first, with a total of fifteen for the complete transform. At the end of the transform the number of left shifts that have occurred is indicated on S3:0. Lack of pins prevents a separate output being available to indicate that overflow has occurred in the 2 bit word growth option. For this reason the maximum number of compensating left shifts in this mode is restricted to 14. State 15 is then used to indicate that overflow has occurred.

The first step in the butterfly calculation multiplies 16 bit data values with 16 bit sine/cosine values, to give 18 bit results. This increased word length preserves accuracy through the following adder network, and has been shown through simulations to be an optimum size for transform sizes up to 1024 points. This is particularly true when the input data is restricted to below 16 bits, as is necessary with practical A/ D converters with very high sampling rates. The bottom bit of this 18 bit word is forced to logical one and as such is a compromise between truncation and true rounding. It gives a lower noise floor in the outputs compared to simple truncation.

To prevent any possibility of overflow during the butterfly calculation the word length is allowed to grow by one bit through each of the three adders. The least significant bit is always discarded in the first two adders . Sixteen bits are then chosen from the final adder in the manner discussed earlier, and the number of sign bits in the largest result is recorded for use in the following pass.

Fig. 3 shows one of the four internal data paths which can compute a radix-4 butterfly in twelve system clock cycles. This equates to completing the butterfly in 3 cycles for the complete device.

Fig. 5. RAM Organization with 1024 Point Transforms

RAM has been designed for use in a wide variety of applications. The provision of an asynchronous input strobe (DIS), allows data to be loaded without the need for additional external buffering. An asynchronous output strobe (DOS) also allows transformed data to be dumped with the sampling clock, this being particularly useful when the device is performing the inverse transform back to the time domain. Inputs and outputs are both supported by flag and enabling signals which allow transfers to be properly co-ordinated with the internal transform operation.

In many applications the DIS and DOS inputs can be tied together and fed by the sampling clock. If the output rate must be higher than the input rate, as with multiple devices supporting overlapped data samples, both strobes can still be connected together. The clock supplied should then be twice or four times the sampling clock, and an internal divider can be used to provide the correctly reduced input rate. The provision of a separate DOS pin does, however, allow the output rate to be asynchronous to the input rate, and therefore faster than strictly needed. Further output processing at higher rates is then possible if this is advantageous to system requirements.

The internal workspace is double buffered when 256 point transforms are to be performed. A separate output buffer is also provided. These resources, together with separate input and output buses, allow new data to be loaded and old results to be dumped, whilst the present transform is being computed. Additional, external, input buffering is not needed to prevent loss of incoming data whilst a transform is being performed.

When block overlapping is required, internally stored data will be re-used, and a proportionally smaller number of new samples need be loaded. Note that the internal window operator still functions correctly since it is actually applied during the first pass, and not whilst data is being loaded. The internal RAM organisation is shown in Fig. 4. It should be noted that the amount of overlap between I/O transfers and transforms is completely under the control of the system, since an input enable signal (INEN) and an output enable (DEN) can be used to initiate transfers.

In the 1024 point mode there is insufficient workspace for

DATA TRANSFERS

The data transfer mechanism to and from the internal

Fig. 6. 1024 Point Transforms with I/P BufferFig. 4. RAM Organization with 256 Data Points

PDSP16510A MA

input and output buffering in addition to working memory. The device is then configured in a mode with separate load, transform and dump operations. The internal arrangement is shown in Fig. 5. The support of an external input buffer is needed if incoming samples are not to be lost whilst a transform is in progress. This is loaded at the sample clock rate and transferred to the FFT processor as quickly as possible. In this mode the PDSP16510 always expects to receive 1024 words, regardless of the amount of block overlapping. Data stored internally cannot be re-used when block overlapping is required, and data from the external buffer must be re-read as necessary.

Fig. 6 illustrates a typical 1024 point system with an input buffer which supports complex input data. The input buffer can be provided by a PDSP16540 Bucket Buffer without the need for any external control logic. It supplies RAM for 1024 x 32 complex words, and allows transfers to the FFT Processor at the full system clock rate. The PDSP16540 also supports the standard 50% and 75% data block overlapping, but in addition allows the user to define the amount of overlap to within 32 words.

If no incoming data is to remain un-processed, the user must ensure that the time taken to acquire sufficient data to instigate a new transform is greater than or equal to the transformation time itself. The latter can be calculated from

Table 4, once the system clock rate has been defined. When 1024 point transforms are performed, both the time to read data from the input buffer, and also the time to dump data, must be included in the calculation to determine the minimum time in which data can be loaded into the external buffer.

The peak transfer rate is limited by the characteristics of the I/O circuits, but can be greater than the sampling rate which is determined by the transform time. When load and dump operations are not concurrent with transform operations ( as in the 1024 point modes ), then the maximum I/O rate is equal to the system clock rate, Ø. When other transform sizes are specified, the sampling rate, S, is reduced by a factor F. This is defined below where Ø is in MHz and L is the system clock low time in nanoseconds;

S = FØ, where F = 4

6 + 0.001ØL

F is typically 0.66 and applies to all transforms except for those of 1024 points, even if INEN is driven such that concurrent operations do not actually occur. If this causes a system limitation in a single device application, then the device can be configured for pseudo, Mode 2, multiple device operation. Separate load, transform, and then dump operations will then always occur, but DEN must be low when a transform is

Characteristic

†

Data In set up Time

†

Data In Hold Time

†

INEN active going set up

†

INEN active Hold Time

†

INEN in-active Hold Time to ensure no load

†

INEN in-active going set up for no load operation

†

Delay to LFLG going active ( 30 pf load )

†

Delay to LFLG going in-active ( 30 pf load )

†

Min time to INEN low in edge mode

Table 1. Advanced Timing Information with Continuous Inputs.

16510A

Symbol Min Max Units

10 ns

0ns

8ns

0ns

2ns

8ns

10 ns

15 ns

+ 16 hidden pages

MITEL PDSP16510AMAAC1R, PDSP16510AMAGCPR Datasheet

Specifications and Main Features

Frequently Asked Questions

User Manual