Supersedes version in December 1993 Digital Video & DSP IC Handbook, HB3923-1DS3475 - 4.4 May 1996
The PDSP16510 performs Forward or Inverse Fast
Fourier Transforms on complex or real data sets containing up
to 1024 points. Data and coefficients are each represented by
16 bits, with block floating point arithmetic for increased
dynamic range.
An internal RAM is provided which can hold up to 1024
complex data points. This removes the memory transfer
bottleneck, inherent in building block solutions. Its organisation allows the PDSP16510 to simultaneously input new data,
transform data stored in the RAM, and to output previous
results. No external buffering is needed for transforms containing up to 256 points, and the PDSP16510 can be directly
connected to an A/D converter to perform continuous transforms. The user can choose to overlap data blocks by either
0%, 50%, or 75%. Inputs and outputs are synchronous to the
40MHz system clock used for internal operations.
A 1024 point complex transform can be completed in
some 98µs, which is equivalent to throughput rates of 450
million operations per second. Multiple devices can be connected in parallel in order to increase the sampling rate up to
the 40MHz system clock. Six devices are needed to give the
maximum performance with 1024 point transforms.
Either a Hamming or a Blackman-Harris window operator
can be internally applied to the incoming real or complex data.
The latter gives 67dB side lobe attenuation. The operator
values are calculated internally and do not require an external
ROM nor do they incur any time penalty.
The device outputs the real and imaginary components of
the frequency bins. These can be directly connected to the
PDSP16330 in order to produce magnitude and phase values
from the complex data.
SIGNALTYPEDESCRIPTION
D15:0IData input during real only mode. The real component in complex data mode.
AUX15:0IWhen DEF is active AUX15:0 are used to define the operating mode as defined in Table 3.
When DEF is in-active AUX15:0 either provide the 16 bit imaginary component of complex
input data, or a second set of real only inputs.
R15:0OThese pins output the real component of the transformed data when DAV and DEN are active.
Otherwise they are high impedance.
I15:0OThese pins output the imaginary component of the transformed data when DAV and DEN are
active. Otherwise they are high impedance.
DEFIThe high going edge of DEF is used to internally latch the contents of AUX15:0, which then
define the operating mode. In the simplest system DEF is a power on reset. When DEF is low
the internal control logic is reset.
SCLKISystem clock used for internal computations.
S3:0OThese pins indicate the number of shifts towards the binary point which have occurred as the
result of the conditional scaling logic. When the data path right shift is restricted to 2 places
per pass, state 15 is used to indicate an overflow and only a total of 14 shifts is possible.
LFLGOThis flag indicates that data is being loaded into the device. It goes active in response to an
INEN input, and may be programmed to go in-active after the complete, one quarter, or one
half a data block has been loaded.
INENIThe use of this input is mode dependent. It is either used as an active low, load enabling,
signal for the DIS strobe, or it is used to initiate a new block load operation.
DISIThe rising edge of this input is used to load data into the device.
DOSIThe rising edge of this input is used to dump data from the device. In most applications it may
be tied to the DIS input, even if the output rate must be higher than the input rate because of
overlapped data blocks. The DIS input is then internally divided down.
DAVOAn active low signal that indicates that a transform is complete. Transformed data will then
be output in normal sequential order using DOS. It may be optionally programmed to be
delayed by 24 DOS strobes to match the delay through a PDSP16330.
DENIThis input is used to enable the data dump operation when DAV has gone active. If it is tied
low the device will automatically dump data when DAV goes active. Otherwise the device will
wait for the enabling signal to go low before the dump operation commences.
DISABIOnly available in the 132 pin GC package. When high the block floating logic is disabled.
VDDP+5V pins
GNDPGround pins
NOTE. All references to DEF, INEN, DAV, and DEN within the text do not contain the bar designator, signifying an active low
signal. This is considered to be implied by the signal name and is not meant to imply a change in the signal function.
FUNCTIONAL OPERATION
The PDSP16510 performs decimation in time, radix 4,
forward or inverse Fast Fourier Transforms. Data is loaded
into an internal workspace RAM in normal sequential order,
processed, and then dumped in the correct order. With real
only input data the processing time can approximately be
halved for a given transform size. Two real inputs then replace a
single complex input, and are processed in parallel.
Either a Blackman-Harris or a Hamming window can be
generated internally, and applied to the incoming real or complex
data with no time penalty. No external ROM is needed to support
these windows. The Blackman-Harris window gives improved
dynamic range over the Hamming window when two closely
3
PDSP16510
Shift left until largest point
has one sign bit.
INPUT
spaced frequencies are to be detected, and one is of smaller
magnitude than the other. It does, however, reduce the actual
frequency resolution, and the Hamming window may then be
preferable.
Data in and out of the device is represented by 16 bit real
and imaginary components, with 16 bit sine and cosine values
contained in an internal ROM. Conditional scaling, coupled
with word growth through the butterfly data path, gives increased dynamic range. Transforms can be computed with
sample sizes of either 256 or 1024 data points. The 256 point
option can alternatively be used to simultaneously execute
either four 64 point transforms, or sixteen 16 point transforms.
The 16 point mode can only be used with a rectangular
window, and no overlapping of data blocks is possible.
The device can be configured, either, to perform continuous transforms in a real time application, or as slave processor
to a more general purpose signal processing system. In the
continuous mode, with transform sizes of 256 points or less,
it contains three internal control units which simultaneously
allow new data to be loaded, present data to be transformed,
and previous results to be dumped. Additional, external, input/
output buffering is not needed. The internal input buffer also
allows data blocks to be overlapped by either 50% or 75%,
apart from the mode with no overlaps.
When 1024 point transforms are to be calculated, without
loss of incoming data during the transform time, it is necessary
to use an input buffer. This requirement is satisfied by a single
PDSP16540 support device.
In any of the real or complex modes it is possible to obtain
higher performance by connecting devices in parallel. It is then
possible to increase the sampling rate to that of the system
clock used for internal operations.
The mode of operation of the device is controlled by 16
bits in a control register. These are loaded through the
AUX15:0 port when a control signal DEF is active low. This
port is also used to provide the imaginary component of
complex input data, and, if complex transforms are to be
performed, an external tristate buffer will be needed to isolate
the control information. This should only be enabled when
DEF is active. DEF is also used to initialise the internal
circuitry, and can be a simple power on reset if control
parameters need not be subsequently changed.
DATA PRECISION
SIN / COS
ROM
16
SELECT
RAM
MULTIPLIER
S S 29 14 13 0--
1816
FIRST ADDER
19Bit Result
18 10-
REGISTER FILE
SECOND ADDER
19Bit Result
18 10-
REGISTER FILE
THIRD ADDER
19Bit Result
18 - 3
17 - 2
16
"1"
During each pass of a radix-4 fast Fourier transform it is
possible for either component of a particular result to grow by
a factor of up to four in the first pass, and 5.242 in subsequent
passes. This is between two and three bits in each pass and
the data path must allow for this word growth to avoid any
possibility of overflow. At the end of the data path the word is
again reduced to 16 bits by discarding least significant bits.
Any un-necessary word growth to prevent overflow thus
results in loss of arithmetic precision, and has a detrimental
effect on the dynamic range achievable.
In practice these large word growths only occur when
bipolar complex square waves are transformed, and even
then will not occur on every pass. The PDSP16510 compromises by allowing a 2 bit word growth during the butterfly
calculation in the first pass. This is equivalent to ignoring the
most significant bit of the 19 bit final result, which is assumed
to be an extra sign bit, and then selecting the next 16 bits for
4
CR
BIT3
Fig. 3 One of Four Data Paths
SELECT
storage. In subsequent passes a Control Register Bit allows
the user to continue to select these 16 bits, or instead to use
the 16 most significant bits. The latter option is equivalent to
a 3 bit word growth. The 2 or 3 bit word growth option applies
to ALL subsequent passes and is not a per pass option.
If the 2 bit option is selected there is a possibility of
overflow occurring in one of the passes. The prediction of
overflow is mathematically difficult, and only occurs with
specific complex square waves. Scaling down the inputs
cannot be guaranteed to prevent overflow because of the
block floating point shifting scheme, which is discussed later.
Overflow can NEVER occur if the 3 bit option is chosen, but at
the expense of worse dynamic range.
When overflow does occur a flag is raised which can be
read by the user ( see later discussion on scale tag bits ), and
the results ignored. In addition all frequency bins are forced
to zero to prevent any erroneous system response.
Even with only 2 bit word growth poor dynamic range will
be obtained if the data is simply reduced to 16 bits, and
becomes worse when the incoming data does not fully occupy
all the bits in the word. These problems are overcome in the
PDSP16510, however, by a block floating point scheme which
compensates for any unnecessary word growth.
During each pass the number of sign bits in the largest
result is recorded. Before the next pass, data is shifted left
[multiplied by 2], once for every extra sign bit in this recorded
sample. At least one component in the block then fully occupies the 16 bit word, and maximum data accuracy is preserved
Up to four shifts are possible before every pass after the
first, with a total of fifteen for the complete transform. At the end
of the transform the number of left shifts that have occurred is
indicated on S3:0. Lack of pins prevents a separate output
being available to indicate that overflow has occurred in the 2
bit word growth option. For this reason the maximum number
of compensating left shifts in this mode is restricted to 14.
State 15 is then used to indicate that overflow has occurred.
The first step in the butterfly calculation multiplies 16 bit
data values with 16 bit sine/cosine values, to give 18 bit
results. This increased word length preserves accuracy
through the following adder network, and has been shown
through simulations to be an optimum size for transform sizes
up to 1024 points. This is particularly true when the input data
is restricted to below 16 bits, as is necessary with practical A/
D converters with very high sampling rates. The bottom bit of
this 18 bit word is forced to logical one and as such is a
compromise between truncation and true rounding. It gives a
lower noise floor in the outputs compared to simple truncation.
To prevent any possibility of overflow during the butterfly
calculation the word length is allowed to grow by one bit
through each of the three adders. The least significant bit is
always discarded in the first two adders . Sixteen bits are then
chosen from the final adder in the manner discussed earlier,
and the number of sign bits in the largest result is recorded for
use in the following pass.
Fig. 3 shows one of the four internal data paths which can
compute a radix-4 butterfly in twelve system clock cycles. This
equates to completing the butterfly in 3 cycles for the complete
device.
PDSP16510
TRANSFORM
WORKSPACEFFT
INPUT
DATA
LOAD
Fig. 5. RAM Organization with 1024 Point Transforms
RAM has been designed for use in a wide variety of applications. The provision of an independent input strobe (DIS),
allows data to be loaded without the need for additional
external buffering. An independent output strobe (DOS) is
also provided. DIS and DOS can thus be tied together, this
being particularly useful when the device is performing the
inverse transform back to the time domain. Transfer of data
occurs internally from DIS to SCLK, so although thay can be
of different frequencies, they must be synchronous to each
other. In the same way transfer of data also occurs from SCLK
to DOS, so while DOS can also be independent of SCLK it
must also be synchronous to it. Inputs and outputs are both
supported by flag and enabling signals which allow transfers
to be properly co-ordinated with the internal transform operation.
In many applications the DIS and DOS inputs can be tied
together and fed by the sampling clock. If the output rate must
be higher than the input rate, as with multiple devices supporting overlapped data samples, both strobes can still be connected together. The clock supplied should then be twice or
four times the sampling clock, and an internal divider can be
used to provide the correctly reduced input rate. The provision
of a separate DOS pin does, however, allow the output rate to
be different to the input rate, and therefore faster than strictly
needed. Further output processing at higher rates is then
possible if this is advantageous to system requirements.
The internal workspace is double buffered when 256
point transforms are to be performed. A separate output buffer
is also provided. These resources, together with separate
input and output buses, allow new data to be loaded and old
results to be dumped, whilst the present transform is being
computed. Additional, external, input buffering is not needed
to prevent loss of incoming data whilst a transform is being
performed.
When block overlapping is required, internally stored
data will be re-used, and a proportionally smaller number of
new samples need be loaded. Note that the internal window
operator still functions correctly since it is actually applied
during the first pass, and not whilst data is being loaded. The
internal RAM organisation is shown in Fig. 4. It should be
DATA PATH
OUTPUT
DATA TRANSFERS
The data transfer mechanism to and from the internal
WORKSPACE
A
FFT
INPUT
DATA
LOAD
TRANS-
FORM
WORKSPACE
B
DATA PATH
BUFFER
LOAD IN
LAST PASS
SAMPLE CLOCKPOWER ON RESET
510 PARAMETERS
GND
WS
WEN
IMAG'
O/P
REAL
SYSTEM
CLOCK
PDSP16540
BUCKET
BUFFER
RS
MD5:0
GND
RES
DAV
AUX
D
INEN
DEF
DAV
PDSP16510
SCLK
DIS
GND
DEN
I
R
DOS
Fig. 6. 1024 Point Transforms with I/P BufferFig. 4. RAM Organization with 256 Data Points
5
PDSP16510
noted that the amount of overlap between I/O transfers and
transforms is completely under the control of the system, since
an input enable signal (INEN) and an output enable (DEN) can
be used to initiate transfers.
In the 1024 point mode there is insufficient workspace for
input and output buffering in addition to working memory. The
device is then configured in a mode with separate load,
transform and dump operations. The internal arrangement is
shown in Fig. 5. The support of an external input buffer is
needed if incoming samples are not to be lost whilst a
transform is in progress. This is loaded at the sample clock
rate and transferred to the FFT processor as quickly as
possible. In this mode the PDSP16510 always expects to
receive 1024 words, regardless of the amount of block overlapping. Data stored internally cannot be re-used when block
overlapping is required, and data from the external buffer must
be re-read as necessary.
Fig. 6 illustrates a typical 1024 point system with an input
buffer which supports complex input data. The input buffer
can be provided by a PDSP16540 Bucket Buffer without the
need for any external control logic. It supplies RAM for 1024
x 32 complex words, and allows transfers to the FFT Processor at the full system clock rate. The PDSP16540 also supports the standard 50% and 75% data block overlapping, but
in addition allows the user to define the amount of overlap to
1N/2N
DIS
within 32 words.
If no incoming data is to remain un-processed, the user
must ensure that the time taken to acquire sufficient data to
instigate a new transform is greater than or equal to the
transformation time itself. The latter can be calculated from
Table 4, once the system clock rate has been defined. When
1024 point transforms are performed, both the time to read
data from the input buffer, and also the time to dump data,
must be included in the calculation to determine the minimum
time in which data can be loaded into the external buffer.
The peak transfer rate is limited by the characteristics of
the I/O circuits, but can be greater than the sampling rate
which is determined by the transform time. When load and
dump operations are not concurrent with transform operations
( as in the 1024 point modes ), then the maximum I/O rate is
equal to the system clock rate, Ø. When other transform sizes
are specified, the sampling rate, S, is reduced by a factor F.
This is defined below where Ø is in MHz and L is the system
clock low time in nanoseconds :
S = FØ, where F = 4 / (6+0.001ØL)
F is typically 0.66 and applies to all transforms except for those
of 1024 points, even if INEN is driven such that concurrent
operations do not actually occur (Note also that S must be
1
N
DATA IN
INEN
LFLG
INEN
Edge activated
system
VALID
TSDTHD
TSATHA
Min Time =THA
TSITHI
TFHTFL
50% Overlap
TFL
TFH
TSA
TED
16510A,A0,B0,C0
CharacteristicSymbolMinMaxUnits
Data In set up TimeT
Data In Hold TimeT
INEN active going set upT
INEN active Hold TimeT
INEN in-active Hold Time to ensure no loadT
INEN in-active going set up for no load operationT
Delay to LFLG going active ( 30 pf load )T
Delay to LFLG going in-active ( 30 pf load )T
Min time to INEN low in edge modeT
SD
HD
SA
HA
HI
SI
FH
FL
ED
10ns
0ns
8ns
0ns
2ns
8ns
10ns
10ns
15ns
Table 1. Advanced Timing Information with Continuous Inputs.
6
PDSP16510
synchronous to SCLK). If this causes a system limitation in a
single device application, then the device can be configured
for pseudo, Mode 2, multiple device operation. Separate load,
transform, and then dump operations will then always occur,
but DEN must be low when a transform is complete or DAV will
never go active. See the section on multiple device operation.
LOADING DATA
Data loading is controlled by three signals; DIS an input
strobe, INEN a load enable, and LFLG an output flag. Detailed
timing information is given in Table 1. Once sufficient data has
been acquired, a transform will automatically commence. This
is normally after a complete block has been loaded, except
when a single device is performing overlapped transforms of
256 points or less. With 75% overlapping, transforms will
commence after 25% of a new block has been loaded, and
with 50% overlapping transforms commence after 50% of the
data has been loaded. The remainder of the block is provided
by data already stored in the internal RAM.
The data strobe is used to load data into the internal
workspace RAM, and data must meet the specified set up and
hold times with respect to its rising edge. DIS can be a
continuous input since the device only loads data when an
input enabling signal is active.
An internal synchronisation interval is necessary between the last sample being loaded with the DIS strobe and
transforms being started with the system clock. This can be up
to twelve system clock periods when data transfers and
transforms are overlapped. The transform times given later in
Table 4 are maximum values, and include these twelve
periods.
The way in which the INEN signal controls data loading
is dependent on whether a single or multiple device is to be
implemented, and the status of Control Register Bit 12.
When Bit12 is set in a SINGLE device system the INEN
signal is simply used as an enable for the DIS strobes. When
INEN is low, and provided the relevant set up and hold times
have been satisfied, data will be loaded with the rising edge of
the DIS strobe. If no gaps occur within the incoming data,
INEN can be tied permanently low, provided that the sampling
rate has been chosen such that transforms are completed
before a new block of data is loaded. For transforms of less
than 1024 points, data will then be continually processed
without any loss of information. In the 1024 point modes the
device will cease loading data when 1024 samples have been
loaded, and even if INEN remains low no more data will be
accepted until the previous results have been dumped.
In a multiple device system an edge is ALWAYS needed
to commence a load operation, and Bit 12 has a different
purpose. The edge is provided by INEN going low. Loading
will cease when a complete block (or group of blocks with
multiple concurrent transforms) of data has been loaded, even
if INEN remains low. INEN must go high at some point after the
minimum hold time has been satisfied, and then return low
AFTER ALL DATA HAS BEEN LOADED, before a new load
operation can commence. Low going edges which occur
before all data has been loaded will be ignored.
The INEN edge mode is actually provided for the correct
operation of multiple device systems, but if Bit 12 in the Control
Register is reset in the SINGLE device mode, the edge
activated operation will still be possible. With all but 256 point
complex transforms, the single device edge mode of operation is identical to that of a multiple device system. With 256
point transforms, and their concurrent derivatives, the location
of the low going edge in the data stream is dependent on the
amount of block overlapping. The low going edge transition
must be provided after 64 samples have been loaded with
75% overlapping, and after 128 samples have been loaded
with 50% overlapping. With no overlapping the edge must be
provided after 256 samples have been loaded.
In a single device system with Bit 12 set, INEN can be
taken high to inhibit the load operation when gaps occur in the
data stream. In the INEN edge activated mode gaps in the
data stream can only be accommodated if the DIS clock is
externally inhibited. Taking INEN high will not inhibit the
loading of data in this mode.
With gaps in the data stream the peak sampling rates can
be higher than continuous sampling rates. When data loading
is not coincident with transform operations the peak rate can
equal that of the system clock, otherwise it is reduced by the
factor, F, given on the opposite page.
When Control Register Bit 12 is set in any multiple device
mode, the DEF high going edge will also initiate a load
operation after it has been internally synchronised to the rising
DIS edge. If the first device in a multiple device system is
programmed in this manner, the transform sequence will
automatically start when DEF goes in-active. The other devices need the INEN edge as usual, and must have Bit 12
reset. A fuller explanation of the use of Bit 12 in a multiple
device mode is given in the section on I/O In Multiple Device
Systems. Note that the use of Bit 12 in a single device system
(Control Register Bits 10:9 = 00) is completely different to its
use in a multiple device mode.
The LFLG output goes active in response to the DIS rising
edge used to load the first data sample, and indicates that a
load operation is occurring. In an edge activated system the
LFLG output will go high as the result of the first high going DIS
edge after INEN has gone low. In the simple INEN enabling
mode, internal logic counts the number of valid inputs and
detects when the programmed block length has been
reached. LFLG then goes low and will go high again in
response to the next valid DIS strobe. LFLG will go low when
DEF is active and will go high in response to the first INEN
enabled DIS edge after DEF has gone in- active.
The active going LFLG edge does not normally have any
system significance, but in the block overlapping modes the
in-active going edge will occur when 50% or 75% of the data
has been loaded. By driving the INEN input on one device with
the LFLG output from a previous device, this edge can be used
to partition data between several devices in a multiple device
system. It can also be used to provide an address marker for
a user defined input buffer, when executing 1024 point transforms with a single device. It is not needed, however, when the
input buffer is provided by the PDSP16540.
DUMPING DATA
Data output is controlled by an output strobe [DOS], a
dump enable signal [DEN], and a Data Available signal [DAV].
The DAV signal is used to indicate that the internal output
buffer contains transformed data, and the DEN input is used
to control the outputting of that data. The output buffer within
the device is clocked by the DOS input, and must be primed
7
PDSP16510
with a number of DOS strobes (see "user notes - stopping
DOS") once a transform is complete in order to transfer data
to the output pins. DAV will not go active until this priming has
occurred.
The state of the DEN input at the end of a transform is
used to control the transition of the active going edge of the
DAV output with respect to the DOS strobes. The latter are
then used to transfer data from the device to the next system
component. If the DEN input is tied low in a single device
system, the active going DAV transition will be internally
synchronised to the rising edge of a DOS clock. If DEN is not
tied low it must be guaranteed to be low at the end of the
internal transform operation for this synchronization to occur.
Since there is no external indication of this event, the user
must take care to only allow DEN to go high whilst DAV is
active, if this DAV synchronous mode is needed.
SYNCHRONIZED DAV OPERATION
In the DAV synchronised mode the first rising edge of the
DOS clock, after DAV has gone active, must be used to
transfer the first transformed sample from the output pins to
the next system component. It should be noted that the output
buffer will have been primed before the active DAV transition,
since DOS must be a continuous clock, and there is then no
delay before the first output becomes valid. The DAV output
can be used as a clock enable for this next device, and
transfers will continue in normal sequential order until the
required data has been dumped. DAV will then go inactive in
response to the last DOS edge which was used to transfer
data to the next device.
This mode of automatically dumping data when it is ready
finds applications in real time data flow systems, and detailed
timing is given in Table 2. It should be noted that the DOS input
MUST be continually present before DAV goes active. If this
is not the case the DAV output will not go active at the correct
time, and the internal output circuitry will not be primed. Once
DAV is active, however, it is possible for DOS to be irregular,
and DEN can be used to inhibit the action of the output strobe
as discussed previously. For the correct operation of the
device the user must ensure that DOS becomes continuous
and DEN remains low once DAV goes in-active.
When continuously transforming data such that new
outputs are internally available before the previous block has
been completely dumped, then DAV would normally stay
active and give no indication that one block dump had been
finished and another block started. Additional internal circuitry
is, however, provided to ensure that DAV goes inactive for one
DOS high time, thus supplying an inter block marker.
ASYNCHRONOUS DAV MODE
If DEN is not active in a single device when the transform
is complete, then the device will wait for DEN to go active
before any data is dumped. This mode is suitable for applications in which output processing is under the control of a
remote host, such as a general purpose digital signal processor. The DAV output will then go active as soon as the output
buffer is full, and will not be synchronised to the DOS edge. In
such systems the DOS strobe may not necessarily be present
at this time. Table 3 gives the relevant timing information.
In this host controlled dump mode the PDSP16510 waits
for the host to activate the DEN input after DAV has gone
active. DEN then functions as an enable for the host produced
data strobes on the DOS pin. DEN may either stay active for
the complete transfer, or may be used to enable each DOS
DOS
DATA O/P
S3:0
DAV
TVD
1
TDD
O/P 1O/P 2
TLZTDH
TDD
Scale Tag Value
TVI
N
THZ
16510A,A0,B0,C0
CharacteristicSymbolMinMax Units
Output Enable TimeT
Output Disable TimeT
Data Delay Time ( 30 pf load )T
Data Hold TimeT
DAV active Delay Time ( 30 pf load )T
DAV in active Delay Time ( 30 pf load )T
LZ
HZ
DD
DH
VD
VI
2ns
110ns
110ns
15ns
15ns
15ns
8
Table 2. Output Timing with DEN tied low. ( Advanced Data )
Loading...
+ 17 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.