Stratix®II devices contain a two-dimensional row- and column-based
architecture to implement custom logic. A series of column and row
interconnects of varying length and speed provides signal interconnects
between logic array blocks (LABs), memory block structures (M512 RAM,
M4K RAM, and M-RAM blocks), and digital signal processing (DSP)
blocks.
Each LAB consists of eight adaptive logic modules (ALMs). An ALM is
the Stratix II device family’s basic building block of logic providing
efficient implementation of user logic functions. LABs are grouped into
rows and columns across the device.
M512 RAM blocks are simple dual-port memory blocks with 512 bits plus
parity (576 bits). These blocks provide dedicated simple dual-port or
single-port memory up to 18-bits wide at up to 500 MHz. M512 blocks are
grouped into columns across the device in between certain LABs.
M4K RAM blocks are true dual-port memory blocks with 4K bits plus
parity (4,608 bits). These blocks provide dedicated true dual-port, simple
dual-port, or single-port memory up to 36-bits wide at up to 550 MHz.
These blocks are grouped into columns across the device in between
certain LABs.
M-RAM blocks are true dual-port memory blocks with 512K bits plus
parity (589,824 bits). These blocks provide dedicated true dual-port,
simple dual-port, or single-port memory up to 144-bits wide at up to
420 MHz. Several M-RAM blocks are located individually in the device's
logic array.
DSP blocks can implement up to either eight full-precision 9 × 9-bit
multipliers, four full-precision 18 × 18-bit multipliers, or one
full-precision 36 × 36-bit multiplier with add or subtract features. The
DSP blocks support Q1.15 format rounding and saturation in the
multiplier and accumulator stages. These blocks also contain shift
registers for digital signal processing applications, including finite
impulse response (FIR) and infinite impulse response (IIR) filters. DSP
blocks are grouped into columns across the device and operate at up to
450 MHz.
Altera Corporation 2–1
May 2007
Functional Description
Each Stratix II device I/O pin is fed by an I/O element (IOE) located at
the end of LAB rows and columns around the periphery of the device.
I/O pins support numerous single-ended and differential I/O standards.
Each IOE contains a bidirectional I/O buffer and six registers for
registering input, output, and output-enable signals. When used with
dedicated clocks, these registers provide exceptional performance and
interface support with external memory devices such as DDR and DDR2
SDRAM, RLDRAM II, and QDR II SRAM devices. High-speed serial
interface channels with dynamic phase alignment (DPA) support data
transfer at up to 1 Gbps using LVDS or HyperTransport
standards.
Figure 2–1 shows an overview of the Stratix II device.
DSP Blocks for
Multiplication and Full
Implementation of FIR Filters
M4K RAM Blocks
for True Dual-Port
Memory & Other Embedded
Memory Functions
IOEs Support DDR, PCI, PCI-X,
SSTL-3, SSTL-2, HSTL-1, HSTL-2,
LVDS, HyperTransport & other
I/O Standards
TM
technology I/O
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
IOEs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
DSP
Block
IOEsIOEs
LABsLABs
LABs
LABsLABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABsLABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABsLABs
M-RAM Block
LABsLABs
LABsLABs
2–2Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
The number of M512 RAM, M4K RAM, and DSP blocks varies by device
along with row and column numbers and M-RAM blocks. Table 2–1 lists
the resources available in Stratix II devices.
Table 2–1. Stratix II Device Resources
Stratix II Architecture
Device
EP2S154 / 1043 / 7802 / 123026
EP2S306 / 2024 / 14412 / 164936
EP2S607 / 3295 / 25523 / 366251
EP2S908 / 4886 / 40843 / 487168
EP2S1309 / 6997 / 60963 / 638187
EP2S18011 / 9308 / 76894 / 9610096
Logic Array
Blocks
M512 RAM
Columns/Blocks
Each LAB consists of eight ALMs, carry chains, shared arithmetic chains,
LAB control signals, local interconnect, and register chain connection
lines. The local interconnect transfers signals between ALMs in the same
M4K RAM
Columns/Blocks
M-RAM
Blocks
DSP Block
Columns/Blocks
LAB
Columns
LAB Rows
LAB. Register chain connections transfer the output of an ALM register to
®
the adjacent ALM register in an LAB. The Quartus
II Compiler places
associated logic in an LAB or adjacent LABs, allowing the use of local,
shared arithmetic chain, and register chain connections for performance
and area efficiency. Figure 2–2 shows the Stratix II LAB structure.
Altera Corporation 2–3
May 2007Stratix II Device Handbook, Volume 1
Logic Array Blocks
Figure 2–2. Stratix II LAB Structure
Direct link
interconnect from
adjacent block
Row Interconnects of
Variable Speed & Length
ALMs
Direct link
interconnect from
adjacent block
Direct link
interconnect to
adjacent block
Direct link
interconnect to
adjacent block
Local Interconnect
LAB
Local Interconnect is Driven
from Either Side by Columns & LABs,
& from Above by Rows
Column Interconnects of
Variable Speed & Length
LAB Interconnects
The LAB local interconnect can drive ALMs in the same LAB. It is driven
by column and row interconnects and ALM outputs in the same LAB.
Neighboring LABs, M512 RAM blocks, M4K RAM blocks, M-RAM
blocks, or DSP blocks from the left and right can also drive an LAB's local
interconnect through the direct link connection. The direct link
connection feature minimizes the use of row and column interconnects,
providing higher performance and flexibility. Each ALM can drive
24 ALMs through fast local and direct link interconnects. Figure 2–3
shows the direct link connection.
2–4Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Figure 2–3. Direct Link Connection
Direct link interconnect from
left LAB, TriMatrix memory
block, DSP block, or IOE output
Stratix II Architecture
Direct link interconnect from
right LAB, TriMatrix memory
block, DSP block, or IOE output
ALMs
Direct link
interconnect
to left
Direct link
interconnect
to right
Local
Interconnect
LAB Control Signals
Each LAB contains dedicated logic for driving control signals to its ALMs.
The control signals include three clocks, three clock enables, two
asynchronous clears, synchronous clear, asynchronous preset/load, and
synchronous load control signals. This gives a maximum of 11 control
signals at a time. Although synchronous load and clear signals are
generally used when implementing counters, they can also be used with
other functions.
Each LAB can use three clocks and three clock enable signals. However,
there can only be up to two unique clocks per LAB, as shown in the LAB
control signal generation circuit in Figure 2–4. Each LAB's clock and clock
enable signals are linked. For example, any ALM in a particular LAB
using the labclk1 signal also uses labclkena1. If the LAB uses both
the rising and falling edges of a clock, it also uses two LAB-wide clock
signals. De-asserting the clock enable signal turns off the corresponding
LAB-wide clock.
Each LAB can use two asynchronous clear signals and an asynchronous
load/preset signal. By default, the Quartus II software uses a NOT gate
push-back technique to achieve preset. If you disable the NOT gate
push-up option or assign a given register to power up high using the
Quartus II software, the preset is achieved using the asynchronous load
Altera Corporation 2–5
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
signal with asynchronous load data input tied high. When the
asynchronous load/preset signal is used, the labclkena0 signal is no
longer available.
The LAB row clocks [5..0] and LAB local interconnect generate the
LAB-wide control signals. The MultiTrack
skew allows clock and control signal distribution in addition to data.
Figure 2–4 shows the LAB control signal generation circuit.
Figure 2–4. LAB-Wide Control Signals
Dedicated Row LAB Clocks
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
6
6
6
There are two unique
clock signals per LAB.
TM
interconnect's inherent low
labclr1
Adaptive Logic
Modules
labclk0
labclkena0
or asyncload
or labpreset
labclk1
labclkena1labclkena2labclr0synclr
labclk2
syncload
The basic building block of logic in the Stratix II architecture, the adaptive
logic module (ALM), provides advanced features with efficient logic
utilization. Each ALM contains a variety of look-up table (LUT)-based
resources that can be divided between two adaptive LUTs (ALUTs). With
up to eight inputs to the two ALUTs, one ALM can implement various
combinations of two functions. This adaptability allows the ALM to be
2–6Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
completely backward-compatible with four-input LUT architectures. One
r
r
r
r
ALM can also implement any function of up to six inputs and certain
seven-input functions.
In addition to the adaptive LUT-based resources, each ALM contains two
programmable registers, two dedicated full adders, a carry chain, a
shared arithmetic chain, and a register chain. Through these dedicated
resources, the ALM can efficiently implement various arithmetic
functions and shift registers. Each ALM drives all types of interconnects:
local, row, column, carry chain, shared arithmetic chain, register chain,
and direct link interconnects. Figure 2–5 shows a high-level block
diagram of the Stratix II ALM while Figure 2–6 shows a detailed view of
all the connections in the ALM.
Figure 2–5. High-Level Block Diagram of the Stratix II ALM
carry_in
shared_arith_in
dataf0
datae0
dataa
datab
datac
datad
datae1
dataf1
Combinational
Logic
shared_arith_out
adder0
adder1
carry_out
reg_chain_in
reg_chain_out
DQ
reg0
DQ
reg1
Stratix II Architecture
To general o
local routing
To general o
local routing
To general o
local routing
To general o
local routing
Altera Corporation 2–7
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
Figure 2–6. Stratix II ALM Details
Row, column &
Row, column &
direct link routing
Local
Interconnect
direct link routing
Row, column &
Row, column &
direct link routing
Local
Interconnect
direct link routing
sclrasyncload
syncloadena[2..0]
reg_chain_in
carry_in
shared_arith_in
4-Input
LUT
Q
D
ADATA
PRN/ALD
ENA
3-Input
CLRN
LUT
3-Input
LUT
4-Input
LUT
Q
D
ADATA
PRN/ALD
ENA
3-Input
CLRN
LUT
3-Input
LUT
reg_chain_out
CC
V
carry_outclk[2..0]
shared_arith_outaclr[1..0]
dataf0
Local
Interconnect
datae0
Local
Interconnect
datac
Local
Interconnect
dataa
Local
Interconnect
datab
Local
Interconnect
datad
Interconnect
Local
datae1
Local
Interconnect
dataf1
Local
Interconnect
2–8Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Stratix II Architecture
One ALM contains two programmable registers. Each register has data,
clock, clock enable, synchronous and asynchronous clear, asynchronous
load data, and synchronous and asynchronous load/preset inputs.
Global signals, general-purpose I/O pins, or any internal logic can drive
the register's clock and clear control signals. Either general-purpose I/O
pins or internal logic can drive the clock enable, preset, asynchronous
load, and asynchronous load data. The asynchronous load data input
comes from the datae or dataf input of the ALM, which are the same
inputs that can be used for register packing. For combinational functions,
the register is bypassed and the output of the LUT drives directly to the
outputs of the ALM.
Each ALM has two sets of outputs that drive the local, row, and column
routing resources. The LUT, adder, or register output can drive these
output drivers independently (see Figure 2–6). For each set of output
drivers, two ALM outputs can drive column, row, or direct link routing
connections, and one of these ALM outputs can also drive local
interconnect resources. This allows the LUT or adder to drive one output
while the register drives another output. This feature, called register
packing, improves device utilization because the device can use the
register and the combinational logic for unrelated functions. Another
special packing mode allows the register output to feed back into the LUT
of the same ALM so that the register is packed with its own fan-out LUT.
This provides another mechanism for improved fitting. The ALM can also
drive out registered and unregistered versions of the LUT or adder
output.
fSee the Performance & Logic Efficiency Analysis of Stratix II Devices White
Paper for more information on the efficiencies of the Stratix II ALM and
comparisons with previous architectures.
ALM Operating Modes
The Stratix II ALM can operate in one of the following modes:
■Normal mode
■Extended LUT mode
■Arithmetic mode
■Shared arithmetic mode
Each mode uses ALM resources differently. In each mode, eleven
available inputs to the ALM--the eight data inputs from the LAB local
interconnect; carry-in from the previous ALM or LAB; the shared
arithmetic chain connection from the previous ALM or LAB; and the
register chain connection--are directed to different destinations to
implement the desired logic function. LAB-wide signals provide clock,
asynchronous clear, asynchronous preset/load, synchronous clear,
Altera Corporation 2–9
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
synchronous load, and clock enable control for the register. These LABwide signals are available in all ALM modes. See the “LAB Control
Signals” section for more information on the LAB-wide control signals.
The Quartus II software and supported third-party synthesis tools, in
conjunction with parameterized functions such as library of
parameterized modules (LPM) functions, automatically choose the
appropriate mode for common functions such as counters, adders,
subtractors, and arithmetic functions. If required, you can also create
special-purpose functions that specify which ALM operating mode to use
for optimal performance.
Normal Mode
The normal mode is suitable for general logic applications and
combinational functions. In this mode, up to eight data inputs from the
LAB local interconnect are inputs to the combinational logic. The normal
mode allows two functions to be implemented in one Stratix II ALM, or
an ALM to implement a single function of up to six inputs. The ALM can
support certain combinations of completely independent functions and
various combinations of functions which have common inputs.
Figure 2–7 shows the supported LUT combinations in normal mode.
2–10Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Figure 2–7. ALM in Normal Mode Note (1)
Stratix II Architecture
dataf0
datae0
datac
dataa
datab
datad
datae1
dataf1
dataf0
datae0
datac
dataa
datab
datad
datae1
dataf1
dataf0
datae0
datac
dataa
datab
4-Input
LUT
4-Input
LUT
5-Input
LUT
3-Input
LUT
5-Input
LUT
combout0
combout1
combout0
combout1
combout0
dataf0
datae0
datac
dataa
datab
datad
datae1
dataf1
dataf0
datae0
dataa
datab
datac
datad
dataf0
datae0
dataa
datab
datac
datad
5-Input
LUT
5-Input
LUT
6-Input
LUT
6-Input
LUT
combout0
combout1
combout0
combout0
datad
datae1
dataf1
4-Input
LUT
combout1
datae1
dataf1
6-Input
LUT
combout1
Note to Figure 2–7:
(1) Combinations of functions with fewer inputs than those shown are also supported. For example, combinations of
functions with the following number of inputs are supported: 4 and 3, 3 and 3, 3 and 2, 5 and 2, etc.
The normal mode provides complete backward compatibility with fourinput LUT architectures. Two independent functions of four inputs or less
can be implemented in one Stratix II ALM. In addition, a five-input
function and an independent three-input function can be implemented
without sharing inputs.
Altera Corporation 2–11
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
For the packing of two five-input functions into one ALM, the functions
must have at least two common inputs. The common inputs are dataa
and datab. The combination of a four-input function with a five-input
function requires one common input (either dataa or datab).
In the case of implementing two six-input functions in one ALM, four
inputs must be shared and the combinational function must be the same.
For example, a 4 × 2 crossbar switch (two 4-to-1 multiplexers with
common inputs and unique select lines) can be implemented in one ALM,
as shown in Figure 2–8. The shared inputs are dataa, datab, datac, and
datad, while the unique select lines are datae0 and dataf0 for
function0, and datae1 and dataf1 for function1. This crossbar
switch consumes four LUTs in a four-input LUT-based architecture.
Figure 2–8. 4 × 2 Crossbar Switch Example
4 × 2 Crossbar SwitchImplementation in 1 ALM
sel0[1..0]
inputa
inputb
inputc
inputd
out0
out1
dataf0
datae0
dataa
datab
datac
datad
Six-Input
LUT
(Function0)
combout0
sel1[1..0]
datae1
dataf1
Six-Input
LUT
(Function1)
combout1
In a sparsely used device, functions that could be placed into one ALM
may be implemented in separate ALMs. The Quartus II Compiler spreads
a design out to achieve the best possible performance. As a device begins
to fill up, the Quartus II software automatically utilizes the full potential
of the Stratix II ALM. The Quartus II Compiler automatically searches for
functions of common inputs or completely independent functions to be
placed into one ALM and to make efficient use of the device resources. In
addition, you can manually control resource usage by setting location
assignments.
Any six-input function can be implemented utilizing inputs dataa,
datab, datac, datad, and either datae0 and dataf0 or datae1 and
dataf1. If datae0 and dataf0 are utilized, the output is driven to
register0, and/or register0 is bypassed and the data drives out to
the interconnect using the top set of output drivers (see Figure 2–9). If
2–12Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Stratix II Architecture
datae1 and dataf1 are utilized, the output drives to register1
and/or bypasses register1 and drives to the interconnect using the
bottom set of output drivers. The Quartus II Compiler automatically
selects the inputs to the LUT. Asynchronous load data for the register
comes from the datae or dataf input of the ALM. ALMs in normal
mode support register packing.
Figure 2–9. 6-Input Function in Normal Mode Notes (1), (2)
dataf0
datae0
dataa
datab
datac
datad
datae1
dataf1
(2)
These inputs are available for register packing.
6-Input
LUT
DQ
reg0
DQ
reg1
To general or
local routing
To general or
local routing
To general or
local routing
Notes to Figure 2–9:
(1) If datae1 and dataf1 are used as inputs to the six-input function, then datae0
and dataf0 are available for register packing.
(2) The dataf1 input is available for register packing only if the six-input function is
un-registered.
Extended LUT Mode
The extended LUT mode is used to implement a specific set of
seven-input functions. The set must be a 2-to-1 multiplexer fed by two
arbitrary five-input functions sharing four inputs. Figure 2–10 shows the
template of supported seven-input functions utilizing extended LUT
mode. In this mode, if the seven-input function is unregistered, the
unused eighth input is available for register packing.
Functions that fit into the template shown in Figure 2–10 occur naturally
in designs. These functions often appear in designs as “if-else” statements
in Verilog HDL or VHDL code.
Altera Corporation 2–13
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
r
r
Figure 2–10. Template for Supported Seven-Input Functions in Extended LUT Mode
datae0
datac
dataa
datab
datad
dataf0
datae1
dataf1
(1)
5-Input
LUT
5-Input
LUT
This input is available
for register packing.
combout0
DQ
reg0
To general o
local routing
To general o
local routing
Note to Figure 2–10:
(1) If the seven-input function is unregistered, the unused eighth input is available for register packing. The second
register, reg1, is not available.
Arithmetic Mode
The arithmetic mode is ideal for implementing adders, counters,
accumulators, wide parity functions, and comparators. An ALM in
arithmetic mode uses two sets of two four-input LUTs along with two
dedicated full adders. The dedicated adders allow the LUTs to be
available to perform pre-adder logic; therefore, each adder can add the
output of two four-input functions. The four LUTs share the dataa and
datab inputs. As shown in Figure 2–11, the carry-in signal feeds to
adder0, and the carry-out from adder0 feeds to carry-in of adder1. The
carry-out from adder1 drives to adder0 of the next ALM in the LAB.
ALMs in arithmetic mode can drive out registered and/or unregistered
versions of the adder outputs.
2–14Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Figure 2–11. ALM in Arithmetic Mode
r
carry_in
Stratix II Architecture
datae0
dataf0
datac
datab
dataa
datad
datae1
dataf1
4-Input
LUT
4-Input
LUT
4-Input
LUT
4-Input
LUT
carry_out
adder0
adder1
DQ
reg0
DQ
reg1
To general or
local routing
To general or
local routing
To general or
local routing
To general o
local routing
While operating in arithmetic mode, the ALM can support simultaneous
use of the adder's carry output along with combinational logic outputs. In
this operation, the adder output is ignored. This usage of the adder with
the combinational logic output provides resource savings of up to 50% for
functions that can use this ability. An example of such functionality is a
conditional operation, such as the one shown in Figure 2–12. The
equation for this example is:
R = (X < Y) ? Y : X
To implement this function, the adder is used to subtract ‘Y’ from ‘X.’ If
‘X’ is less than ‘Y,’ the carry_out signal is ‘1.’ The carry_out signal is
fed to an adder where it drives out to the LAB local interconnect. It then
feeds to the LAB-wide syncload signal. When asserted, syncload
selects the syncdata input. In this case, the data ‘Y’ drives the
syncdata inputs to the registers. If ‘X’ is greater than or equal to ‘Y,’ the
syncload signal is de-asserted and ‘X’ drives the data port of the
registers.
Altera Corporation 2–15
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
Figure 2–12. Conditional Operation Example
Adder output
is not used.
ALM 1
X[0]
Y[0]
syncdata
X[1]
Y[1]
Carry Chain
X[2]
Y[2]
ALM 2
Comb &
Adder
Logic
Comb &
Adder
Logic
Comb &
Adder
Logic
Comb &
Adder
Logic
X[0]
X[1]
X[2]
syncload
syncload
syncload
DQ
reg0
DQ
reg1
DQ
reg0
carry_out
R[0]
R[1]
R[2]
To general or
local routing
To general or
local routing
To general or
local routing
To local routing &
then to LAB-wide
syncload
The arithmetic mode also offers clock enable, counter enable,
synchronous up/down control, add/subtract control, synchronous clear,
synchronous load. The LAB local interconnect data inputs generate the
clock enable, counter enable, synchronous up/down and add/subtract
control signals. These control signals are good candidates for the inputs
that are shared between the four LUTs in the ALM. The synchronous clear
and synchronous load options are LAB-wide signals that affect all
registers in the LAB. The Quartus II software automatically places any
registers that are not used by the counter into other LABs.
Carry Chain
The carry chain provides a fast carry function between the dedicated
adders in arithmetic or shared arithmetic mode. Carry chains can begin in
either the first ALM or the fifth ALM in an LAB. The final carry-out signal
is routed t o an ALM , wh ere it i s fe d to loc al, row, or col umn int erc onn ect s.
2–16Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Stratix II Architecture
The Quartus II Compiler automatically creates carry chain logic during
design processing, or you can create it manually during design entry.
Parameterized functions such as LPM functions automatically take
advantage of carry chains for the appropriate functions.
The Quartus II Compiler creates carry chains longer than 16 (8 ALMs in
arithmetic or shared arithmetic mode) by linking LABs together
automatically. For enhanced fitting, a long carry chain runs vertically
allowing fast horizontal connections to TriMatrix memory and DSP
blocks. A carry chain can continue as far as a full column.
To avoid routing congestion in one small area of the device when a high
fan-in arithmetic function is implemented, the LAB can support carry
chains that only utilize either the top half or the bottom half of the LAB
before connecting to the next LAB. This leaves the other half of the ALMs
in the LAB available for implementing narrower fan-in functions in
normal mode. Carry chains that use the top four ALMs in the first LAB
carry into the top half of the ALMs in the next LAB within the column.
Carry chains that use the bottom four ALMs in the first LAB carry into the
bottom half of the ALMs in the next LAB within the column. Every other
column of LABs is top-half bypassable, while the other LAB columns are
bottom-half bypassable.
See the “MultiTrack Interconnect” on page 2–22 section for more
information on carry chain interconnect.
Shared Arithmetic Mode
In shared arithmetic mode, the ALM can implement a three-input add. In
this mode, the ALM is configured with four 4-input LUTs. Each LUT
either computes the sum of three inputs or the carry of three inputs. The
output of the carry computation is fed to the next adder (either to adder1
in the same ALM or to adder0 of the next ALM in the LAB) via a
dedicated connection called the shared arithmetic chain. This shared
arithmetic chain can significantly improve the performance of an adder
tree by reducing the number of summation stages required to implement
an adder tree. Figure 2–13 shows the ALM in shared arithmetic mode.
Altera Corporation 2–17
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
Figure 2–13. ALM in Shared Arithmetic Mode
shared_arith_in
carry_in
4-Input
LUT
DQ
datae0
datac
datab
dataa
datad
datae1
4-Input
LUT
4-Input
LUT
4-Input
LUT
carry_out
shared_arith_out
reg0
DQ
reg1
Note to Figure 2–13:
(1) Inputs dataf0 and dataf1 are available for register packing in shared arithmetic mode.
Adder trees can be found in many different applications. For example, the
summation of the partial products in a logic-based multiplier can be
implemented in a tree structure. Another example is a correlator function
that can use a large adder tree to sum filtered data samples in a given time
frame to recover or to de-spread data which was transmitted utilizing
spread spectrum technology.
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
An example of a three-bit add operation utilizing the shared arithmetic
mode is shown in Figure 2–14. The partial sum (S[2..0]) and the
partial carry (C[2..0]) is obtained using the LUTs, while the result
(R[2..0]) is computed using the dedicated adders.
2–18Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Figure 2–14. Example of a 3-bit Add Utilizing Shared Arithmetic Mode
3-Bit Add ExampleALM Implementation
ALM 1
1st stage add is
implemented in LUTs.
2nd stage add is
implemented in adders.
X2 X1 X0
Y2 Y1 Y0
Z2 Z1 Z0
+
S2 S1 S0
+
C2 C1 C0
R3 R2 R1 R0
X0
Y0
Z0
3-Input
LUT
3-Input
LUT
Stratix II Architecture
shared_arith_in = '0'
carry_in = '0'
S0
R0
C0
Binary Add
+
+
1 1 0
1 1 0 1
1 1 0
1 0 1
0 1 0
0 0 1
Decimal
Equivalents
6
5
2
+
1
+
2 x 6
13
X1
Y1
Z1
X2
Y2
Z2
3-Input
3-Input
ALM 2
3-Input
3-Input
3-Input
3-Input
S1
LUT
R1
C1
LUT
S2
LUT
R2
C2
LUT
'0'
LUT
R3
LUT
Shared Arithmetic Chain
In addition to the dedicated carry chain routing, the shared arithmetic
chain available in shared arithmetic mode allows the ALM to implement
a three-input add. This significantly reduces the resources necessary to
implement large adder trees or correlator functions.
The shared arithmetic chains can begin in either the first or fifth ALM in
an LAB. The Quartus II Compiler creates shared arithmetic chains longer
than 16 (8 ALMs in arithmetic or shared arithmetic mode) by linking
LABs together automatically. For enhanced fitting, a long shared
Altera Corporation 2–19
May 2007Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
arithmetic chain runs vertically allowing fast horizontal connections to
TriMatrix memory and DSP blocks. A shared arithmetic chain can
continue as far as a full column.
Similar to the carry chains, the shared arithmetic chains are also top- or
bottom-half bypassable. This capability allows the shared arithmetic
chain to cascade through half of the ALMs in a LAB while leaving the
other half available for narrower fan-in functionality. Every other LAB
column is top-half bypassable, while the other LAB columns are bottomhalf bypassable.
See the “MultiTrack Interconnect” on page 2–22 section for more
information on shared arithmetic chain interconnect.
Register Chain
In addition to the general routing outputs, the ALMs in an LAB have
register chain outputs. The register chain routing allows registers in the
same LAB to be cascaded together. The register chain interconnect allows
an LAB to use LUTs for a single combinational function and the registers
to be used for an unrelated shift register implementation. These resources
speed up connections between ALMs while saving local interconnect
resources (see Figure 2–15). The Quartus II Compiler automatically takes
advantage of these resources to improve utilization and performance.
2–20Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Figure 2–15. Register Chain within an LAB Note (1)
adder0
Combinational
Logic
reg_chain_in
From Previous ALM
Within The LAB
DQ
reg0
Stratix II Architecture
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
Combinational
Logic
adder1
adder0
adder1
DQ
reg1
DQ
reg0
DQ
reg1
reg_chain_out
To Next ALM
within the LAB
Note to Figure 2–15:
(1) The combinational or adder logic can be utilized to implement an unrelated, un-registered function.
See the “MultiTrack Interconnect” on page 2–22 section for more
information on register chain interconnect.
Altera Corporation 2–21
May 2007Stratix II Device Handbook, Volume 1
MultiTrack Interconnect
Clear & Preset Logic Control
LAB-wide signals control the logic for the register's clear and load/preset
signals. The ALM directly supports an asynchronous clear and preset
function. The register preset is achieved through the asynchronous load
of a logic high. The direct asynchronous preset does not require a NOTgate push-back technique. Stratix II devices support simultaneous
asynchronous load/preset, and clear signals. An asynchronous clear
signal takes precedence if both signals are asserted simultaneously. Each
LAB supports up to two clears and one load/preset signal.
In addition to the clear and load/preset ports, Stratix II devices provide a
device-wide reset pin (DEV_CLRn) that resets all registers in the device.
An option set before compilation in the Quartus II software controls this
pin. This device-wide reset overrides all other control signals.
MultiTrack
Interconnect
In the Stratix II architecture, connections between ALMs, TriMatrix
memory, DSP blocks, and device I/O pins are provided by the MultiTrack
interconnect structure with DirectDrive
interconnect consists of continuous, performance-optimized routing lines
of different lengths and speeds used for inter- and intra-design block
connectivity. The Quartus II Compiler automatically places critical design
paths on faster interconnects to improve design performance.
DirectDrive technology is a deterministic routing technology that ensures
identical routing resource usage for any function regardless of placement
in the device. The MultiTrack interconnect and DirectDrive technology
simplify the integration stage of block-based designing by eliminating the
re-optimization cycles that typically follow design changes and
additions.
The MultiTrack interconnect consists of row and column interconnects
that span fixed distances. A routing structure with fixed length resources
for all devices allows predictable and repeatable performance when
migrating through different device densities. Dedicated row
interconnects route signals to and from LABs, DSP blocks, and TriMatrix
memory in the same row. These row resources include:
■Direct link interconnects between LABs and adjacent blocks
■R4 interconnects traversing four blocks to the right or left
■R24 row interconnects for high-speed access across the length of the
device
TM
technology. The MultiTrack
2–22Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Stratix II Architecture
The direct link interconnect allows an LAB, DSP block, or TriMatrix
memory block to drive into the local interconnect of its left and right
neighbors and then back into itself. This provides fast communication
between adjacent LABs and/or blocks without using row interconnect
resources.
The R4 interconnects span four LABs, three LABs and one M512 RAM
block, two LABs and one M4K RAM block, or two LABs and one DSP
block to the right or left of a source LAB. These resources are used for fast
row connections in a four-LAB region. Every LAB has its own set of R4
interconnects to drive either left or right. Figure 2–16 shows R4
interconnect connections from an LAB. R4 interconnects can drive and be
driven by DSP blocks and RAM blocks and row IOEs. For LAB
interfacing, a primary LAB or LAB neighbor can drive a given R4
interconnect. For R4 interconnects that drive to the right, the primary
LAB and right neighbor can drive on to the interconnect. For R4
interconnects that drive to the left, the primary LAB and its left neighbor
can drive on to the interconnect. R4 interconnects can drive other R4
interconnects to extend the range of LABs they can drive. R4
interconnects can also drive C4 and C16 interconnects for connections
from one row to another. Additionally, R4 interconnects can drive R24
interconnects.
Adjacent LAB can
Drive onto Another
LAB's R4 Interconnect
R4 Interconnect
Driving Left
LAB
Neighbor
Primary
LAB (2)
Notes to Figure 2–16:
(1) C4 and C16 interconnects can drive R4 interconnects.
(2) This pattern is repeated for every LAB in the LAB row.
(3) The LABs in Figure 2–16 show the 16 possible logical outputs per LAB.
Altera Corporation 2–23
May 2007Stratix II Device Handbook, Volume 1
C4 and C16
Column Interconnects (1)
LAB
Neighbor
R4 Interconnect
Driving Right
MultiTrack Interconnect
R24 row interconnects span 24 LABs and provide the fastest resource for
long row connections between LABs, TriMatrix memory, DSP blocks, and
Row IOEs. The R24 row interconnects can cross M-RAM blocks. R24 row
interconnects drive to other row or column interconnects at every fourth
LAB and do not drive directly to LAB local interconnects. R24 row
interconnects drive LAB local interconnects via R4 and C4 interconnects.
R24 interconnects can drive R24, R4, C16, and C4 interconnects.
The column interconnect operates similarly to the row interconnect and
vertically routes signals to and from LABs, TriMatrix memory, DSP
blocks, and IOEs. Each column of LABs is served by a dedicated column
interconnect. These column resources include:
■Shared arithmetic chain interconnects in an LAB
■Carry chain interconnects in an LAB and from LAB to LAB
■Register chain interconnects in an LAB
■C4 interconnects traversing a distance of four block s in up and down
direction
■C16 column interconnects for high-speed vertical routing through
the device
Stratix II devices include an enhanced interconnect structure in LABs for
routing shared arithmetic chains and carry chains for efficient arithmetic
functions. The register chain connection allows the register output of one
ALM to connect directly to the register input of the next ALM in the LAB
for fast shift registers. These ALM to ALM connections bypass the local
interconnect. The Quartus II Compiler automatically takes advantage of
these resources to improve utilization and performance. Figure 2–17
shows the shared arithmetic chain, carry chain and register chain
interconnects.
2–24Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Register Chain
Routing to Adjacent
ALM's Register Inp
The C4 interconnects span four LABs, M512, or M4K blocks up or down
from a source LAB. Every LAB has its own set of C4 interconnects to drive
either up or down. Figure 2–18 shows the C4 interconnect connections
from an LAB in a column. The C4 interconnects can drive and be driven
by all types of architecture blocks, including DSP blocks, TriMatrix
memory blocks, and column and row IOEs. For LAB interconnection, a
primary LAB or its LAB neighbor can drive a given C4 interconnect. C4
interconnects can drive each other to extend their range as well as drive
row interconnects for column-to-column connections.
Altera Corporation 2–25
May 2007Stratix II Device Handbook, Volume 1
MultiTrack Interconnect
4
Figure 2–18. C4 Interconnect Connections Note (1)
C4 Interconnect
Drives Local and R
Interconnects
up to Four Rows
C4 Interconnect
Driving Up
LAB
Row
Interconnect
Adjacent LAB can
drive onto neighboring
LAB's C4 interconnect
Local
Interconnect
C4 Interconnect
Driving Down
Note to Figure 2–18:
(1) Each C4 interconnect can drive either up or down four rows.
2–26Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
C16 column interconnects span a length of 16 LABs and provide the
fastest resource for long column connections between LABs, TriMatrix
memory blocks, DSP blocks, and IOEs. C16 interconnects can cross
M-RAM blocks and also drive to row and column interconnects at every
fourth LAB. C16 interconnects drive LAB local interconnects via C4 and
R4 interconnects and do not drive LAB local interconnects directly.
All embedded blocks communicate with the logic array similar to LABto-LAB interfaces. Each block (that is, TriMatrix memory and DSP blocks)
connects to row and column interconnects and has local interconnect
regions driven by row and column interconnects. These blocks also have
direct link interconnects for fast connections to and from a neighboring
LAB. All blocks are fed by the row LAB clocks, labclk[5..0].
Table 2–2 shows the Stratix II device’s routing scheme.
Table 2–2. Stratix II Device Routing Scheme (Part 1 of 2)
Source
Carry Chain
Register Chain
Local Interconnect
Direct Link Interconnect
v
vvvvv
vvvv
vvv
vvvv
vvvv
vvvv
vvvv
vvv
Shared arithmetic chain
Carry chain
Register chain
Local interconnect
Direct link interconnect
R4 interconnect
R24 interconnect
C4 interconnect
C16 interconnect
ALM
M512 RAM block
M4K RAM block
M-RAM block
DSP blocks
Shared Arithmetic Chain
vvvvvvv
Destination
R24 Interconnect
C4 Interconnect
R4 Interconnect
Stratix II Architecture
ALM
C16 Interconnect
M4K RAM Block
M512 RAM Block
DSP Blocks
M-RAM Block
v
v
v
vvvvvvv
Column IOE
Row IOE
Altera Corporation 2–27
May 2007Stratix II Device Handbook, Volume 1
Tri M a t r ix Memor y
Table 2–2. Stratix II Device Routing Scheme (Part 2 of 2)
Source
Carry Chain
Register Chain
Local Interconnect
Shared Arithmetic Chain
Column IOE
Row IOE
Direct Link Interconnect
vvv
vvvv
R4 Interconnect
Destination
C4 Interconnect
R24 Interconnect
ALM
C16 Interconnect
M512 RAM Block
DSP Blocks
M-RAM Block
M4K RAM Block
Row IOE
Column IOE
TriMatrix
Memory
TriMatrix memory consists of three types of RAM blocks: M512, M4K,
and M-RAM. Although these memory blocks are different, they can all
implement various types of memory with or without parity, including
true dual-port, simple dual-port, and single-port RAM, ROM, and FIFO
buffers. Table 2–3 shows the size and features of the different RAM
blocks.
Table 2–3. TriMatrix Memory Features (Part 1 of 2)
Memory Feature
Maximum performance 500 MHz550 MHz420 MHz
True dual-port memory
Simple dual-port memory
Single-port memory
Shift register
ROM
FIFO buffer
Pack mode
Byte enable
Address clock enable
Parity bits
Mixed clock mode
Memory initialization (.mif)
M512 RAM Block
(32 × 18 Bits)
vvv
vvv
vv
vv
vvv
vvv
vvv
vvv
vv
M4K RAM Block
(128 × 36 Bits)
M-RAM Block
(4K × 144 Bits)
vv
(1)
vv
vv
2–28Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Table 2–3. TriMatrix Memory Features (Part 2 of 2)
TriMatrix memory provides three different memory sizes for efficient
application support. The Quartus II software automatically partitions the
user-defined memory into the embedded memory blocks using the most
efficient size combinations. You can also manually assign the memory to
a specific block size or a mixture of block sizes.
When applied to input registers, the asynchronous clear signal for the
TriMatrix embedded memory immediately clears the input registers.
However, the output of the memory block does not show the effects until
the next clock edge. When applied to output registers, the asynchronous
clear signal clears the output registers and the effects are seen
immediately.
Altera Corporation 2–29
May 2007Stratix II Device Handbook, Volume 1
Tri M a t r ix Memor y
M512 RAM Block
The M512 RAM block is a simple dual-port memory block and is useful
for implementing small FIFO buffers, DSP, and clock domain transfer
applications. Each block contains 576 RAM bits (including parity bits).
M512 RAM blocks can be configured in the following modes:
■Simple dual-port RAM
■Single-port RAM
■FIFO
■ROM
■Shift register
1Violating the setup or hold time on the memory block address
registers could corrupt memory contents. This applies to both
read and write operations.
When configured as RAM or ROM, you can use an initialization file to
pre-load the memory contents.
M512 RAM blocks can have different clocks on its inputs and outputs.
The wren, datain, and write address registers are all clocked together
from one of the two clocks feeding the block. The read address, rden, and
output registers can be clocked by either of the two clocks driving the
block. This allows the RAM block to operate in read/write or
input/output clock modes. Only the output register can be bypassed. The
six labclk signals or local interconnect can drive the inclock,
outclock, wren, rden, and outclr signals. Because of the advanced
interconnect between the LAB and M512 RAM blocks, ALMs can also
control the wren and rden signals and the RAM clock, clock enable, and
asynchronous clear signals. Figure 2–19 shows the M512 RAM block
control signal generation logic.
The RAM blocks in Stratix II devices have local interconnects to allow
ALMs and interconnects to drive into RAM blocks. The M512 RAM block
local interconnect is driven by the R4, C4, and direct link interconnects
from adjacent LABs. The M512 RAM blocks can communicate with LABs
on either the left or right side through these row interconnects or with
LAB columns on the left or right side with the column interconnects. The
M512 RAM block has up to 16 direct link input connections from the left
adjacent LABs and another 16 from the right adjacent LAB. M512 RAM
outputs can also connect to left and right LABs through direct link
interconnect. The M512 RAM block has equal opportunity for access and
performance to and from LABs on either its left or right side. Figure 2–20
shows the M512 RAM block to logic array interface.
2–30Altera Corporation
Stratix II Device Handbook, Volume 1May 2007
Loading...
+ 76 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.