User Manual FC100 Page 3 of 12 Last Edited: 25/11/2008 15:00:00
1 Introduction
The Fast Fourier Transform (FFT) is an efficient algorithm for computing the Discrete
Fourier Transform (DFT). This Intellectual Property (IP) core was designed to offer very fast
transform times while keeping a floating-point accuracy at all computational stages.
Sundance’s core is the fastest and the most efficient available in the FPGA world. It also
saves memory resources compared to other floating-point cores available on the market.
Features:
This FFT IP core targets the following devices:
Xilinx FPGA devices
o Virtex-II™, Virtex-II/Pro™, Spartan-3™, Virtex-4™ and Virtex-5™
Radix-2 Fast Fourier Transform (FFT) with pipelined butterfly rank structure
IEEE-754 Floating Point data
o Uses Xilinx Coregen math operators
o Customizable precision, speed, and size
o Any width fixed-point builds also available
Run-time selectable length N=32 to 2m, m= 5-26
o 32, 64, 128, 256, 512, 1024, …, 64M points
Run-time selectable Forward/Inverse transform mode
Continuous processing at speeds up to Fmax (see Table 1).
o Data rate of 250Msps in Virtex-5 FPGA device.
Natural-order inputs and outputs
Includes C/C++ bit-accurate model and data generator
o Model also usable from MATLAB
Includes Verilog or VHDL testbench and run scripts for simulation purposes and
specific performance characterization.
Applications:
The Pipelined Floating Point FFT IP Core is useful in high performance embedded computing
(HPEC) applications which require continuous digital signal processing (DSP) at high sample
rates. Floating point FFT hardware acceleration or co-processing is often a goal of scientific
algorithms used in High Performance Computing (HPC). End applications and markets
include radar, sonar, spectral analysis, telecommunications and image processing.
User Manual FC100 Page 4 of 12 Last Edited: 25/11/2008 15:00:00
2 Related Documents
2.1 Referenced Documents
N/A
3 Acronyms, Abbreviations and Definitions
3.1 Acronyms and Abbreviations
A list of all acronyms
User Manual FC100 Page 5 of 12 Last Edited: 25/11/2008 15:00:00
4 Functional Description
4.1 Mathematical equations
The Discrete Fourier Transform (DFT), of length N (N=2m), calculates the sampled Fourier
transform of a discrete-time sequence with N points evenly distributed.
The forward DFT with N points of a sequence x(n) can be written as follows:
2
N
n
1
).()(
enxkX
0
nkj
N
1,,0,
Nk
The inverse DFT is given by the following equation:
2
1
N
1
)(
nx
N
n
).(
0
nkj
N
ekX
1,,0,
Nk
4.2 Algorithm
The pipelined Floating point FFT IP core uses modular radix-2 Fast Fourier Transform (FFT)
architecture to provide discrete Fourier transforms (DFT) on data frames or continuous data
streams, with sample rate up to the maximum clock frequency.
This efficient structure employs a single butterfly and a single delay feedback path per rank
for low localized memory usage. True IEEE-754 floating point data maintained throughout,
supporting a large dynamic range of data without requiring complicated fixed-point analysis.
The standard pipelined IP Core is easily scalable to any Xilinx device and customisable to suit
many FFT applications.
This FFT core is designed for FFT computation larger or equal to 32 points and up to 64M
points. External memory, such as QDR/QDR2 SRAM, ZBT RAM, DDR/DDR2/DDR3
SDRAM, is most suited for transforms larger than 16384 points. For shorter transforms,
memory banks can likely be implemented inside the FPGA depending on which device is
used.
User Manual FC100 Page 6 of 12 Last Edited: 25/11/2008 15:00:00
4.3 Pipelined FFT core
4.3.1 Data format
This core is compliant to the IEEE-754 standard.
4.3.2 FFT block diagram
4.3.3 Description
Frame: the frame blocks use control signalling to delimit discrete data frames per the
selected transform length.
Bit-reverse: the bit-reverse block converts natural-order inputs to bit-reversed order as
required by the FFT engine.
Pipelined ranks: the pipelined rank blocks daisy-chain the FFT processing from input to
output. Each rank is optimized to contain the proper radix-2 butterfly math elements, twiddle
factor ROMs, and local datapath memories for efficient continuous processing.
Variable length select: the variable length select block multiplexes the rank outputs for
variable transform length support.
User Manual FC100 Page 7 of 12 Last Edited: 25/11/2008 15:00:00
4.3.4 Core modifications
The standard IP Core is available in netlist or parameterized source code and supports the
following:
Netlist builds for any Xilinx FPGA device
o FFT length and speed depend on chip resources and speed grade
Per-transform length selectable in powers-of-2
o from 32 to 2m points, where m= 5-26
Per-transform mode selectable between Forward and Inverse FFT
Static length and mode configuration
o Pipeline must be clear before changing these configuration settings.
IEEE-754 single precision floating point math operators using Xilinx Coregen
o full DSP usage/maximum latency floating_point_v4_0 cores
Decimation-in-time (DIT) algorithm with internal bit-reversal
o providing natural-order data inputs and outputs
Potential customized deliveries from Dillon Engineering include:
Fixed single length of 2m for a slight logic
o savings over run-time selectable length.
Fixed Forward or Inverse mode for a slight logic
o savings over run-time selectable mode.
Pipelined configuration settings
o allows dynamic mode and/or length switching on back-to-back transforms.
Bit-reversal stage removed for a slight logic savings and elimination of a
BlockRAM FIFO and associated latency.
oNote: data must then be input in bit-reversed order to provide natural-
order outputs
Decimation-in-frequency (DIF) build option, which inputs data in natural-order
and outputs data in bit-reversed order.
Any Xilinx Floating Point operator adjustments to precision and latencies, with
logic parameter settings to match. Xilinx Floating Point operators are built
separately with Coregen, providing RTL source and .ngc netlists. Thus all tradeoffs between speed, number of pipeline stages, DSP48/Mult macro usage, doubleor custom-precision float, etc., can be supported.
Any width fixed-point math operators in lieu of floating point. Options for various
scaling, rounding and saturation modes, all matched bit-accurate with the C/C++model.
User Manual FC100 Page 8 of 12 Last Edited: 25/11/2008 15:00:00
4.3.5 Parameters and ports definition
The core signal I/O have not been fixed to specific device pins to provide flexibility for
interfacing with user logic. Descriptions of all I/O signals are provided hereunder:
Signal
Signal
Direction
Description
CLK Input Clock Input.
Single source used for all I/O and internal clocking.
RST_N Input Active-low asynchronous reset.
Resets all control logic.
DIR Input Transform mode select.
0 = Forward FFT, 1 = Inverse FFT.
SEL[3:0] Input Transform length select.
Valid range is from 4'd5 (indicating transform length of 32)
up to the maximum length supported by the build (e.g. 4'd10 for
a transform length of 1024). Number of SEL bits is dependent
on the maximum length.
SYNC_IN Input Input sync strobe.
Indicates to the core to begin processing i_data on the
following clock cycle.
A[63:0] Input Input data
Complex data of the form R + iQ, where R is contained in
bits 63:32 and Q is contained in bits 31:0, each a single-precision
floating point number.
SYNC_OUT Output Output sync strobe.
Indicates the core is sending processed o_data beginning on
the following clock cycle.
X[63:0] Output Output data.
Complex data of the form R + iQ, where R is contained in
bits 63:32 and Q is contained in bits 31:0, each a single-precision
floating point number.
User Manual FC100 Page 9 of 12 Last Edited: 25/11/2008 15:00:00
5 Critical signal descriptions
All interface and internal operation of the core is synchronous to CLK. Simple SYNC strobes
are used on the input and output interfaces to signal that data is valid on the following clock
cycle. An active SYNC coinciding with the last data point thus indicates back-to-back
transforms. A SYNC_IN strobe active while the core is already inputing data is ignored.
Tying SYNC_IN active will signal the core to perform continuous transforms, and
SYNC_OUT will strobe as normal to frame the output data.
ce Output Timing, 1K-length back-to-back transforms
The DIR and SEL configuration inputs are by default selectable per-transform, but must be
stable starting with SYNC_IN active and must not be changed until the transformed data has
been completely output from the core (i.e. 2m clocks after the corresponding SYNC_OUT).
User Manual FC100 Page 10 of 12 Last Edited: 25/11/2008 15:00:00
6 Core assumptions
Following SYNC_IN, the initial transform has a start-up latency dependent on the bit-reversal
stage, the floating point core latencies and the length of the transform. The core provides
continuous processing at steady state, though the SYNC IN to OUT latencies may vary
slightly due to internal pipeline alignment.
The standard core with transform length of 1024 has a start-up latency of around 2300 clock
cycles, or 9.2usec at 250MHz clock rate.
Latencies of other lengths 2m follow approximately the formula:
m
7Resources usage and performance
FPGA device
Spartan®-3A
Length
256 150 23,867 30,143 16 96
XC3SD3400A-5
FFT
SMT348-SX55
1024 200 21,585 26,079 26 352
XC4VSX55-12
SMT351T-SX50
1024 250 20,562 20,333 19 176
XC5VSX50T-3
SMT700-SX95T
16,384 200 27,799 29,163 109 256
XC5VSX95T-2
SMT702-LX110T
1,024 250 27,403 33,581 19 32
XC5VLX110T-3
SMT702-LX110T
8,192 250 36,592 44,954 61 44
XC5VLX110T-3
Notes: 1) Actual slice count dependent on percentage of unrelated logic – see Mapping Report File for details
2) Assuming all core I/Os and clocks are routed off-chip.
The core is verified to be bit-accurate with the C/C++ data model under all supported lengths,
modes, throughputs and data format, using a rigorous simulation suite of directed and random
data. Our model development is evaluated in terms of SQNR with a double-precision floating
point software FFT implementation.
User Manual FC100 Page 11 of 12 Last Edited: 25/11/2008 15:00:00
9 Ordering Information
This product is available directly from Sundance Multiprocessor Technology and its selection
of favorite suppliers. Please contact us for pricing and additional information about this
product using the contact information on the front page of this datasheet.
There are also other FFT IP cores offered from Sundance Multiprocessor Technology:
UltraLong FFTs (up to 64M points, fixed or floating point),
Parallel Butterfly FFTs (continuous FFTs at multiple points per clock cycle),
Full Parallel FFTs (extremely fast rates, up to 25Gsps),
2D FFTs (two-dimensional transform for image processing),
Mixed Radix FFTs (for non-power of 2 FFT lengths).
User Manual FC100 Page 12 of 12 Last Edited: 25/11/2008 15:00:00
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.