ALTERA Stratix II DATA SHEET

SII51002-4.3
2. Stratix II Architecture

Functional Description

Stratix®II devices contain a two-dimensional row- and column-based architecture to implement custom logic. A series of column and row interconnects of varying length and speed provides signal interconnects between logic array blocks (LABs), memory block structures (M512 RAM, M4K RAM, and M-RAM blocks), and digital signal processing (DSP) blocks.
Each LAB consists of eight adaptive logic modules (ALMs). An ALM is the Stratix II device family’s basic building block of logic providing efficient implementation of user logic functions. LABs are grouped into rows and columns across the device.
M512 RAM blocks are simple dual-port memory blocks with 512 bits plus parity (576 bits). These blocks provide dedicated simple dual-port or single-port memory up to 18-bits wide at up to 500 MHz. M512 blocks are grouped into columns across the device in between certain LABs.
M4K RAM blocks are true dual-port memory blocks with 4K bits plus parity (4,608 bits). These blocks provide dedicated true dual-port, simple dual-port, or single-port memory up to 36-bits wide at up to 550 MHz. These blocks are grouped into columns across the device in between certain LABs.
M-RAM blocks are true dual-port memory blocks with 512K bits plus parity (589,824 bits). These blocks provide dedicated true dual-port, simple dual-port, or single-port memory up to 144-bits wide at up to 420 MHz. Several M-RAM blocks are located individually in the device's logic array.
DSP blocks can implement up to either eight full-precision 9 × 9-bit multipliers, four full-precision 18 × 18-bit multipliers, or one full-precision 36 × 36-bit multiplier with add or subtract features. The DSP blocks support Q1.15 format rounding and saturation in the multiplier and accumulator stages. These blocks also contain shift registers for digital signal processing applications, including finite impulse response (FIR) and infinite impulse response (IIR) filters. DSP blocks are grouped into columns across the device and operate at up to 450 MHz.
Altera Corporation 2–1 May 2007
Functional Description
Each Stratix II device I/O pin is fed by an I/O element (IOE) located at the end of LAB rows and columns around the periphery of the device. I/O pins support numerous single-ended and differential I/O standards. Each IOE contains a bidirectional I/O buffer and six registers for registering input, output, and output-enable signals. When used with dedicated clocks, these registers provide exceptional performance and interface support with external memory devices such as DDR and DDR2 SDRAM, RLDRAM II, and QDR II SRAM devices. High-speed serial interface channels with dynamic phase alignment (DPA) support data transfer at up to 1 Gbps using LVDS or HyperTransport standards.
Figure 2–1 shows an overview of the Stratix II device.
Figure 2–1. Stratix II Block Diagram
M512 RAM Blocks for Dual-Port Memory, Shift Registers, & FIFO Buffers
DSP Blocks for Multiplication and Full Implementation of FIR Filters
M4K RAM Blocks for True Dual-Port Memory & Other Embedded Memory Functions
IOEs Support DDR, PCI, PCI-X, SSTL-3, SSTL-2, HSTL-1, HSTL-2, LVDS, HyperTransport & other I/O Standards
TM
technology I/O
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
IOEs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
IOEs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
DSP Block
IOEs IOEs
LABs LABs
LABs
LABs LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABs
LABsLABs
M-RAM Block
LABsLABs
LABsLABs
2–2 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
The number of M512 RAM, M4K RAM, and DSP blocks varies by device along with row and column numbers and M-RAM blocks. Table 2–1 lists the resources available in Stratix II devices.
Table 2–1. Stratix II Device Resources
Stratix II Architecture
Device
EP2S15 4 / 104 3 / 78 0 2 / 12 30 26
EP2S30 6 / 202 4 / 144 1 2 / 16 49 36
EP2S60 7 / 329 5 / 255 2 3 / 36 62 51
EP2S90 8 / 488 6 / 408 4 3 / 48 71 68
EP2S130 9 / 699 7 / 609 6 3 / 63 81 87
EP2S180 11 / 930 8 / 768 9 4 / 96 100 96

Logic Array Blocks

M512 RAM
Columns/Blocks
Each LAB consists of eight ALMs, carry chains, shared arithmetic chains, LAB control signals, local interconnect, and register chain connection lines. The local interconnect transfers signals between ALMs in the same
M4K RAM
Columns/Blocks
M-RAM
Blocks
DSP Block
Columns/Blocks
LAB
Columns
LAB Rows
LAB. Register chain connections transfer the output of an ALM register to
®
the adjacent ALM register in an LAB. The Quartus
II Compiler places associated logic in an LAB or adjacent LABs, allowing the use of local, shared arithmetic chain, and register chain connections for performance and area efficiency. Figure 2–2 shows the Stratix II LAB structure.
Altera Corporation 2–3 May 2007 Stratix II Device Handbook, Volume 1
Logic Array Blocks
Figure 2–2. Stratix II LAB Structure
Direct link interconnect from adjacent block
Row Interconnects of Variable Speed & Length
ALMs
Direct link interconnect from adjacent block
Direct link interconnect to adjacent block
Direct link interconnect to adjacent block
Local Interconnect
LAB
Local Interconnect is Driven
from Either Side by Columns & LABs,
& from Above by Rows
Column Interconnects of Variable Speed & Length

LAB Interconnects

The LAB local interconnect can drive ALMs in the same LAB. It is driven by column and row interconnects and ALM outputs in the same LAB. Neighboring LABs, M512 RAM blocks, M4K RAM blocks, M-RAM blocks, or DSP blocks from the left and right can also drive an LAB's local interconnect through the direct link connection. The direct link connection feature minimizes the use of row and column interconnects, providing higher performance and flexibility. Each ALM can drive 24 ALMs through fast local and direct link interconnects. Figure 2–3 shows the direct link connection.
2–4 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Figure 2–3. Direct Link Connection
Direct link interconnect from
left LAB, TriMatrix memory
block, DSP block, or IOE output
Stratix II Architecture
Direct link interconnect from right LAB, TriMatrix memory block, DSP block, or IOE output
ALMs
Direct link
interconnect
to left
Direct link interconnect to right
Local
Interconnect

LAB Control Signals

Each LAB contains dedicated logic for driving control signals to its ALMs. The control signals include three clocks, three clock enables, two asynchronous clears, synchronous clear, asynchronous preset/load, and synchronous load control signals. This gives a maximum of 11 control signals at a time. Although synchronous load and clear signals are generally used when implementing counters, they can also be used with other functions.
Each LAB can use three clocks and three clock enable signals. However, there can only be up to two unique clocks per LAB, as shown in the LAB control signal generation circuit in Figure 2–4. Each LAB's clock and clock enable signals are linked. For example, any ALM in a particular LAB using the labclk1 signal also uses labclkena1. If the LAB uses both the rising and falling edges of a clock, it also uses two LAB-wide clock signals. De-asserting the clock enable signal turns off the corresponding LAB-wide clock.
Each LAB can use two asynchronous clear signals and an asynchronous load/preset signal. By default, the Quartus II software uses a NOT gate push-back technique to achieve preset. If you disable the NOT gate push-up option or assign a given register to power up high using the Quartus II software, the preset is achieved using the asynchronous load
Altera Corporation 2–5 May 2007 Stratix II Device Handbook, Volume 1

Adaptive Logic Modules

signal with asynchronous load data input tied high. When the asynchronous load/preset signal is used, the labclkena0 signal is no longer available.
The LAB row clocks [5..0] and LAB local interconnect generate the LAB-wide control signals. The MultiTrack skew allows clock and control signal distribution in addition to data.
Figure 2–4 shows the LAB control signal generation circuit.
Figure 2–4. LAB-Wide Control Signals
Dedicated Row LAB Clocks
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
6
6
6
There are two unique
clock signals per LAB.
TM
interconnect's inherent low
labclr1
Adaptive Logic Modules
labclk0
labclkena0
or asyncload
or labpreset
labclk1
labclkena1 labclkena2 labclr0 synclr
labclk2
syncload
The basic building block of logic in the Stratix II architecture, the adaptive logic module (ALM), provides advanced features with efficient logic utilization. Each ALM contains a variety of look-up table (LUT)-based resources that can be divided between two adaptive LUTs (ALUTs). With up to eight inputs to the two ALUTs, one ALM can implement various combinations of two functions. This adaptability allows the ALM to be
2–6 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
completely backward-compatible with four-input LUT architectures. One
r
r
r
r
ALM can also implement any function of up to six inputs and certain seven-input functions.
In addition to the adaptive LUT-based resources, each ALM contains two programmable registers, two dedicated full adders, a carry chain, a shared arithmetic chain, and a register chain. Through these dedicated resources, the ALM can efficiently implement various arithmetic functions and shift registers. Each ALM drives all types of interconnects: local, row, column, carry chain, shared arithmetic chain, register chain, and direct link interconnects. Figure 2–5 shows a high-level block diagram of the Stratix II ALM while Figure 2–6 shows a detailed view of all the connections in the ALM.
Figure 2–5. High-Level Block Diagram of the Stratix II ALM
carry_in
shared_arith_in
dataf0
datae0
dataa
datab
datac
datad
datae1
dataf1
Combinational
Logic
shared_arith_out
adder0
adder1
carry_out
reg_chain_in
reg_chain_out
DQ
reg0
DQ
reg1
Stratix II Architecture
To general o
local routing
To general o
local routing
To general o
local routing
To general o
local routing
Altera Corporation 2–7 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
Figure 2–6. Stratix II ALM Details
Row, column &
Row, column &
direct link routing
Local
Interconnect
direct link routing
Row, column &
Row, column &
direct link routing
Local
Interconnect
direct link routing
sclr asyncload
syncload ena[2..0]
reg_chain_in
carry_in
shared_arith_in
4-Input
LUT
Q
D
ADATA
PRN/ALD
ENA
3-Input
CLRN
LUT
3-Input
LUT
4-Input
LUT
Q
D
ADATA
PRN/ALD
ENA
3-Input
CLRN
LUT
3-Input
LUT
reg_chain_out
CC
V
carry_out clk[2..0]
shared_arith_out aclr[1..0]
dataf0
Local
Interconnect
datae0
Local
Interconnect
datac
Local
Interconnect
dataa
Local
Interconnect
datab
Local
Interconnect
datad
Interconnect
Local
datae1
Local
Interconnect
dataf1
Local
Interconnect
2–8 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Stratix II Architecture
One ALM contains two programmable registers. Each register has data, clock, clock enable, synchronous and asynchronous clear, asynchronous load data, and synchronous and asynchronous load/preset inputs. Global signals, general-purpose I/O pins, or any internal logic can drive the register's clock and clear control signals. Either general-purpose I/O pins or internal logic can drive the clock enable, preset, asynchronous load, and asynchronous load data. The asynchronous load data input comes from the datae or dataf input of the ALM, which are the same inputs that can be used for register packing. For combinational functions, the register is bypassed and the output of the LUT drives directly to the outputs of the ALM.
Each ALM has two sets of outputs that drive the local, row, and column routing resources. The LUT, adder, or register output can drive these output drivers independently (see Figure 2–6). For each set of output drivers, two ALM outputs can drive column, row, or direct link routing connections, and one of these ALM outputs can also drive local interconnect resources. This allows the LUT or adder to drive one output while the register drives another output. This feature, called register packing, improves device utilization because the device can use the register and the combinational logic for unrelated functions. Another special packing mode allows the register output to feed back into the LUT of the same ALM so that the register is packed with its own fan-out LUT. This provides another mechanism for improved fitting. The ALM can also drive out registered and unregistered versions of the LUT or adder output.
f See the Performance & Logic Efficiency Analysis of Stratix II Devices White
Paper for more information on the efficiencies of the Stratix II ALM and
comparisons with previous architectures.

ALM Operating Modes

The Stratix II ALM can operate in one of the following modes:
Normal mode
Extended LUT mode
Arithmetic mode
Shared arithmetic mode
Each mode uses ALM resources differently. In each mode, eleven available inputs to the ALM--the eight data inputs from the LAB local interconnect; carry-in from the previous ALM or LAB; the shared arithmetic chain connection from the previous ALM or LAB; and the register chain connection--are directed to different destinations to implement the desired logic function. LAB-wide signals provide clock, asynchronous clear, asynchronous preset/load, synchronous clear,
Altera Corporation 2–9 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
synchronous load, and clock enable control for the register. These LAB­wide signals are available in all ALM modes. See the “LAB Control
Signals” section for more information on the LAB-wide control signals.
The Quartus II software and supported third-party synthesis tools, in conjunction with parameterized functions such as library of parameterized modules (LPM) functions, automatically choose the appropriate mode for common functions such as counters, adders, subtractors, and arithmetic functions. If required, you can also create special-purpose functions that specify which ALM operating mode to use for optimal performance.

Normal Mode

The normal mode is suitable for general logic applications and combinational functions. In this mode, up to eight data inputs from the LAB local interconnect are inputs to the combinational logic. The normal mode allows two functions to be implemented in one Stratix II ALM, or an ALM to implement a single function of up to six inputs. The ALM can support certain combinations of completely independent functions and various combinations of functions which have common inputs.
Figure 2–7 shows the supported LUT combinations in normal mode.
2–10 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Figure 2–7. ALM in Normal Mode Note (1)
Stratix II Architecture
dataf0
datae0
datac dataa
datab datad
datae1
dataf1
dataf0
datae0
datac dataa datab
datad
datae1
dataf1
dataf0
datae0
datac dataa datab
4-Input
LUT
4-Input
LUT
5-Input
LUT
3-Input
LUT
5-Input
LUT
combout0
combout1
combout0
combout1
combout0
dataf0
datae0
datac dataa datab
datad
datae1
dataf1
dataf0
datae0
dataa datab datac datad
dataf0
datae0
dataa datab datac datad
5-Input
LUT
5-Input
LUT
6-Input
LUT
6-Input
LUT
combout0
combout1
combout0
combout0
datad
datae1
dataf1
4-Input
LUT
combout1
datae1
dataf1
6-Input
LUT
combout1
Note to Figure 2–7:
(1) Combinations of functions with fewer inputs than those shown are also supported. For example, combinations of
functions with the following number of inputs are supported: 4 and 3, 3 and 3, 3 and 2, 5 and 2, etc.
The normal mode provides complete backward compatibility with four­input LUT architectures. Two independent functions of four inputs or less can be implemented in one Stratix II ALM. In addition, a five-input function and an independent three-input function can be implemented without sharing inputs.
Altera Corporation 2–11 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
For the packing of two five-input functions into one ALM, the functions must have at least two common inputs. The common inputs are dataa and datab. The combination of a four-input function with a five-input function requires one common input (either dataa or datab).
In the case of implementing two six-input functions in one ALM, four inputs must be shared and the combinational function must be the same. For example, a 4 × 2 crossbar switch (two 4-to-1 multiplexers with common inputs and unique select lines) can be implemented in one ALM, as shown in Figure 2–8. The shared inputs are dataa, datab, datac, and
datad, while the unique select lines are datae0 and dataf0 for function0, and datae1 and dataf1 for function1. This crossbar
switch consumes four LUTs in a four-input LUT-based architecture.
Figure 2–8. 4 × 2 Crossbar Switch Example
4 × 2 Crossbar Switch Implementation in 1 ALM
sel0[1..0]
inputa inputb
inputc inputd
out0
out1
dataf0
datae0
dataa datab datac datad
Six-Input
LUT
(Function0)
combout0
sel1[1..0]
datae1
dataf1
Six-Input
LUT
(Function1)
combout1
In a sparsely used device, functions that could be placed into one ALM may be implemented in separate ALMs. The Quartus II Compiler spreads a design out to achieve the best possible performance. As a device begins to fill up, the Quartus II software automatically utilizes the full potential of the Stratix II ALM. The Quartus II Compiler automatically searches for functions of common inputs or completely independent functions to be placed into one ALM and to make efficient use of the device resources. In addition, you can manually control resource usage by setting location assignments.
Any six-input function can be implemented utilizing inputs dataa,
datab, datac, datad, and either datae0 and dataf0 or datae1 and dataf1. If datae0 and dataf0 are utilized, the output is driven to register0, and/or register0 is bypassed and the data drives out to
the interconnect using the top set of output drivers (see Figure 2–9). If
2–12 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Stratix II Architecture
datae1 and dataf1 are utilized, the output drives to register1 and/or bypasses register1 and drives to the interconnect using the bottom set of output drivers. The Quartus II Compiler automatically selects the inputs to the LUT. Asynchronous load data for the register comes from the datae or dataf input of the ALM. ALMs in normal mode support register packing.
Figure 2–9. 6-Input Function in Normal Mode Notes (1), (2)
dataf0
datae0
dataa datab datac datad
datae1
dataf1
(2)
These inputs are available for register packing.
6-Input
LUT
DQ
reg0
DQ
reg1
To general or local routing
To general or local routing
To general or local routing
Notes to Figure 2–9:
(1) If datae1 and dataf1 are used as inputs to the six-input function, then datae0
and dataf0 are available for register packing.
(2) The dataf1 input is available for register packing only if the six-input function is
un-registered.

Extended LUT Mode

The extended LUT mode is used to implement a specific set of seven-input functions. The set must be a 2-to-1 multiplexer fed by two arbitrary five-input functions sharing four inputs. Figure 2–10 shows the template of supported seven-input functions utilizing extended LUT mode. In this mode, if the seven-input function is unregistered, the unused eighth input is available for register packing.
Functions that fit into the template shown in Figure 2–10 occur naturally in designs. These functions often appear in designs as “if-else” statements in Verilog HDL or VHDL code.
Altera Corporation 2–13 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
r
r
Figure 2–10. Template for Supported Seven-Input Functions in Extended LUT Mode
datae0
datac dataa datab datad
dataf0
datae1
dataf1
(1)
5-Input
LUT
5-Input
LUT
This input is available for register packing.
combout0
DQ
reg0
To general o
local routing
To general o
local routing
Note to Figure 2–10:
(1) If the seven-input function is unregistered, the unused eighth input is available for register packing. The second
register, reg1, is not available.

Arithmetic Mode

The arithmetic mode is ideal for implementing adders, counters, accumulators, wide parity functions, and comparators. An ALM in arithmetic mode uses two sets of two four-input LUTs along with two dedicated full adders. The dedicated adders allow the LUTs to be available to perform pre-adder logic; therefore, each adder can add the output of two four-input functions. The four LUTs share the dataa and
datab inputs. As shown in Figure 2–11, the carry-in signal feeds to adder0, and the carry-out from adder0 feeds to carry-in of adder1. The
carry-out from adder1 drives to adder0 of the next ALM in the LAB. ALMs in arithmetic mode can drive out registered and/or unregistered versions of the adder outputs.
2–14 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Figure 2–11. ALM in Arithmetic Mode
r
carry_in
Stratix II Architecture
datae0
dataf0
datac datab dataa
datad
datae1
dataf1
4-Input
LUT
4-Input
LUT
4-Input
LUT
4-Input
LUT
carry_out
adder0
adder1
DQ
reg0
DQ
reg1
To general or
local routing
To general or
local routing
To general or
local routing
To general o
local routing
While operating in arithmetic mode, the ALM can support simultaneous use of the adder's carry output along with combinational logic outputs. In this operation, the adder output is ignored. This usage of the adder with the combinational logic output provides resource savings of up to 50% for functions that can use this ability. An example of such functionality is a conditional operation, such as the one shown in Figure 2–12. The equation for this example is:
R = (X < Y) ? Y : X
To implement this function, the adder is used to subtract ‘Y’ from ‘X.’ If ‘X’ is less than ‘Y,’ the carry_out signal is ‘1.’ The carry_out signal is fed to an adder where it drives out to the LAB local interconnect. It then feeds to the LAB-wide syncload signal. When asserted, syncload selects the syncdata input. In this case, the data ‘Y’ drives the
syncdata inputs to the registers. If ‘X’ is greater than or equal to ‘Y,’ the syncload signal is de-asserted and ‘X’ drives the data port of the
registers.
Altera Corporation 2–15 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
Figure 2–12. Conditional Operation Example
Adder output is not used.
ALM 1
X[0]
Y[0]
syncdata
X[1]
Y[1]
Carry Chain
X[2]
Y[2]
ALM 2
Comb &
Adder
Logic
Comb &
Adder
Logic
Comb &
Adder
Logic
Comb &
Adder
Logic
X[0]
X[1]
X[2]
syncload
syncload
syncload
DQ
reg0
DQ
reg1
DQ
reg0
carry_out
R[0]
R[1]
R[2]
To general or local routing
To general or local routing
To general or local routing
To local routing & then to LAB-wide syncload
The arithmetic mode also offers clock enable, counter enable, synchronous up/down control, add/subtract control, synchronous clear, synchronous load. The LAB local interconnect data inputs generate the clock enable, counter enable, synchronous up/down and add/subtract control signals. These control signals are good candidates for the inputs that are shared between the four LUTs in the ALM. The synchronous clear and synchronous load options are LAB-wide signals that affect all registers in the LAB. The Quartus II software automatically places any registers that are not used by the counter into other LABs.
Carry Chain
The carry chain provides a fast carry function between the dedicated adders in arithmetic or shared arithmetic mode. Carry chains can begin in either the first ALM or the fifth ALM in an LAB. The final carry-out signal is routed t o an ALM , wh ere it i s fe d to loc al, row, or col umn int erc onn ect s.
2–16 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Stratix II Architecture
The Quartus II Compiler automatically creates carry chain logic during design processing, or you can create it manually during design entry. Parameterized functions such as LPM functions automatically take advantage of carry chains for the appropriate functions.
The Quartus II Compiler creates carry chains longer than 16 (8 ALMs in arithmetic or shared arithmetic mode) by linking LABs together automatically. For enhanced fitting, a long carry chain runs vertically allowing fast horizontal connections to TriMatrix memory and DSP blocks. A carry chain can continue as far as a full column.
To avoid routing congestion in one small area of the device when a high fan-in arithmetic function is implemented, the LAB can support carry chains that only utilize either the top half or the bottom half of the LAB before connecting to the next LAB. This leaves the other half of the ALMs in the LAB available for implementing narrower fan-in functions in normal mode. Carry chains that use the top four ALMs in the first LAB carry into the top half of the ALMs in the next LAB within the column. Carry chains that use the bottom four ALMs in the first LAB carry into the bottom half of the ALMs in the next LAB within the column. Every other column of LABs is top-half bypassable, while the other LAB columns are bottom-half bypassable.
See the “MultiTrack Interconnect” on page 2–22 section for more information on carry chain interconnect.

Shared Arithmetic Mode

In shared arithmetic mode, the ALM can implement a three-input add. In this mode, the ALM is configured with four 4-input LUTs. Each LUT either computes the sum of three inputs or the carry of three inputs. The output of the carry computation is fed to the next adder (either to adder1 in the same ALM or to adder0 of the next ALM in the LAB) via a dedicated connection called the shared arithmetic chain. This shared arithmetic chain can significantly improve the performance of an adder tree by reducing the number of summation stages required to implement an adder tree. Figure 2–13 shows the ALM in shared arithmetic mode.
Altera Corporation 2–17 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
Figure 2–13. ALM in Shared Arithmetic Mode
shared_arith_in
carry_in
4-Input
LUT
DQ
datae0
datac datab dataa
datad
datae1
4-Input
LUT
4-Input
LUT
4-Input
LUT
carry_out
shared_arith_out
reg0
DQ
reg1
Note to Figure 2–13:
(1) Inputs dataf0 and dataf1 are available for register packing in shared arithmetic mode.
Adder trees can be found in many different applications. For example, the summation of the partial products in a logic-based multiplier can be implemented in a tree structure. Another example is a correlator function that can use a large adder tree to sum filtered data samples in a given time frame to recover or to de-spread data which was transmitted utilizing spread spectrum technology.
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
An example of a three-bit add operation utilizing the shared arithmetic mode is shown in Figure 2–14. The partial sum (S[2..0]) and the partial carry (C[2..0]) is obtained using the LUTs, while the result (R[2..0]) is computed using the dedicated adders.
2–18 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Figure 2–14. Example of a 3-bit Add Utilizing Shared Arithmetic Mode
3-Bit Add Example ALM Implementation
ALM 1
1st stage add is
implemented in LUTs.
2nd stage add is
implemented in adders.
X2 X1 X0
Y2 Y1 Y0 Z2 Z1 Z0
+
S2 S1 S0
+
C2 C1 C0
R3 R2 R1 R0
X0 Y0
Z0
3-Input
LUT
3-Input
LUT
Stratix II Architecture
shared_arith_in = '0'
carry_in = '0'
S0
R0
C0
Binary Add
+
+
1 1 0
1 1 0 1
1 1 0
1 0 1 0 1 0
0 0 1
Decimal
Equivalents
6
5 2
+
1
+
2 x 6
13
X1 Y1 Z1
X2 Y2 Z2
3-Input
3-Input
ALM 2
3-Input
3-Input
3-Input
3-Input
S1
LUT
R1
C1
LUT
S2
LUT
R2
C2
LUT
'0'
LUT
R3
LUT
Shared Arithmetic Chain
In addition to the dedicated carry chain routing, the shared arithmetic chain available in shared arithmetic mode allows the ALM to implement a three-input add. This significantly reduces the resources necessary to implement large adder trees or correlator functions.
The shared arithmetic chains can begin in either the first or fifth ALM in an LAB. The Quartus II Compiler creates shared arithmetic chains longer than 16 (8 ALMs in arithmetic or shared arithmetic mode) by linking LABs together automatically. For enhanced fitting, a long shared
Altera Corporation 2–19 May 2007 Stratix II Device Handbook, Volume 1
Adaptive Logic Modules
arithmetic chain runs vertically allowing fast horizontal connections to TriMatrix memory and DSP blocks. A shared arithmetic chain can continue as far as a full column.
Similar to the carry chains, the shared arithmetic chains are also top- or bottom-half bypassable. This capability allows the shared arithmetic chain to cascade through half of the ALMs in a LAB while leaving the other half available for narrower fan-in functionality. Every other LAB column is top-half bypassable, while the other LAB columns are bottom­half bypassable.
See the “MultiTrack Interconnect” on page 2–22 section for more information on shared arithmetic chain interconnect.

Register Chain

In addition to the general routing outputs, the ALMs in an LAB have register chain outputs. The register chain routing allows registers in the same LAB to be cascaded together. The register chain interconnect allows an LAB to use LUTs for a single combinational function and the registers to be used for an unrelated shift register implementation. These resources speed up connections between ALMs while saving local interconnect resources (see Figure 2–15). The Quartus II Compiler automatically takes advantage of these resources to improve utilization and performance.
2–20 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Figure 2–15. Register Chain within an LAB Note (1)
adder0
Combinational
Logic
reg_chain_in
From Previous ALM Within The LAB
DQ
reg0
Stratix II Architecture
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
To general or
local routing
Combinational
Logic
adder1
adder0
adder1
DQ
reg1
DQ
reg0
DQ
reg1
reg_chain_out
To Next ALM within the LAB
Note to Figure 2–15:
(1) The combinational or adder logic can be utilized to implement an unrelated, un-registered function.
See the “MultiTrack Interconnect” on page 2–22 section for more information on register chain interconnect.
Altera Corporation 2–21 May 2007 Stratix II Device Handbook, Volume 1

MultiTrack Interconnect

Clear & Preset Logic Control

LAB-wide signals control the logic for the register's clear and load/preset signals. The ALM directly supports an asynchronous clear and preset function. The register preset is achieved through the asynchronous load of a logic high. The direct asynchronous preset does not require a NOT­gate push-back technique. Stratix II devices support simultaneous asynchronous load/preset, and clear signals. An asynchronous clear signal takes precedence if both signals are asserted simultaneously. Each LAB supports up to two clears and one load/preset signal.
In addition to the clear and load/preset ports, Stratix II devices provide a device-wide reset pin (DEV_CLRn) that resets all registers in the device. An option set before compilation in the Quartus II software controls this pin. This device-wide reset overrides all other control signals.
MultiTrack Interconnect
In the Stratix II architecture, connections between ALMs, TriMatrix memory, DSP blocks, and device I/O pins are provided by the MultiTrack interconnect structure with DirectDrive interconnect consists of continuous, performance-optimized routing lines of different lengths and speeds used for inter- and intra-design block connectivity. The Quartus II Compiler automatically places critical design paths on faster interconnects to improve design performance.
DirectDrive technology is a deterministic routing technology that ensures identical routing resource usage for any function regardless of placement in the device. The MultiTrack interconnect and DirectDrive technology simplify the integration stage of block-based designing by eliminating the re-optimization cycles that typically follow design changes and additions.
The MultiTrack interconnect consists of row and column interconnects that span fixed distances. A routing structure with fixed length resources for all devices allows predictable and repeatable performance when migrating through different device densities. Dedicated row interconnects route signals to and from LABs, DSP blocks, and TriMatrix memory in the same row. These row resources include:
Direct link interconnects between LABs and adjacent blocks
R4 interconnects traversing four blocks to the right or left
R24 row interconnects for high-speed access across the length of the
device
TM
technology. The MultiTrack
2–22 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Stratix II Architecture
The direct link interconnect allows an LAB, DSP block, or TriMatrix memory block to drive into the local interconnect of its left and right neighbors and then back into itself. This provides fast communication between adjacent LABs and/or blocks without using row interconnect resources.
The R4 interconnects span four LABs, three LABs and one M512 RAM block, two LABs and one M4K RAM block, or two LABs and one DSP block to the right or left of a source LAB. These resources are used for fast row connections in a four-LAB region. Every LAB has its own set of R4 interconnects to drive either left or right. Figure 2–16 shows R4 interconnect connections from an LAB. R4 interconnects can drive and be driven by DSP blocks and RAM blocks and row IOEs. For LAB interfacing, a primary LAB or LAB neighbor can drive a given R4 interconnect. For R4 interconnects that drive to the right, the primary LAB and right neighbor can drive on to the interconnect. For R4 interconnects that drive to the left, the primary LAB and its left neighbor can drive on to the interconnect. R4 interconnects can drive other R4 interconnects to extend the range of LABs they can drive. R4 interconnects can also drive C4 and C16 interconnects for connections from one row to another. Additionally, R4 interconnects can drive R24 interconnects.
Figure 2–16. R4 Interconnect Connections Notes (1), (2), (3)
Adjacent LAB can Drive onto Another LAB's R4 Interconnect
R4 Interconnect
Driving Left
LAB
Neighbor
Primary LAB (2)
Notes to Figure 2–16:
(1) C4 and C16 interconnects can drive R4 interconnects. (2) This pattern is repeated for every LAB in the LAB row. (3) The LABs in Figure 2–16 show the 16 possible logical outputs per LAB.
Altera Corporation 2–23 May 2007 Stratix II Device Handbook, Volume 1
C4 and C16 Column Interconnects (1)
LAB
Neighbor
R4 Interconnect Driving Right
MultiTrack Interconnect
R24 row interconnects span 24 LABs and provide the fastest resource for long row connections between LABs, TriMatrix memory, DSP blocks, and Row IOEs. The R24 row interconnects can cross M-RAM blocks. R24 row interconnects drive to other row or column interconnects at every fourth LAB and do not drive directly to LAB local interconnects. R24 row interconnects drive LAB local interconnects via R4 and C4 interconnects. R24 interconnects can drive R24, R4, C16, and C4 interconnects.
The column interconnect operates similarly to the row interconnect and vertically routes signals to and from LABs, TriMatrix memory, DSP blocks, and IOEs. Each column of LABs is served by a dedicated column interconnect. These column resources include:
Shared arithmetic chain interconnects in an LAB
Carry chain interconnects in an LAB and from LAB to LAB
Register chain interconnects in an LAB
C4 interconnects traversing a distance of four block s in up and down
direction
C16 column interconnects for high-speed vertical routing through
the device
Stratix II devices include an enhanced interconnect structure in LABs for routing shared arithmetic chains and carry chains for efficient arithmetic functions. The register chain connection allows the register output of one ALM to connect directly to the register input of the next ALM in the LAB for fast shift registers. These ALM to ALM connections bypass the local interconnect. The Quartus II Compiler automatically takes advantage of these resources to improve utilization and performance. Figure 2–17 shows the shared arithmetic chain, carry chain and register chain interconnects.
2–24 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Stratix II Architecture
u
Figure 2–17. Shared Arithmetic Chain, Carry Chain & Register Chain Interconnects
Local Interconnect Routing Among ALMs in the LAB
Carry Chain & Shared
Arithmetic Chain
outing to Adjacent ALM
Local
Interconnect
ALM 1
ALM 2
ALM 3
ALM 4
ALM 5
ALM 6
ALM 7
ALM 8
Register Chain Routing to Adjacent ALM's Register Inp
The C4 interconnects span four LABs, M512, or M4K blocks up or down from a source LAB. Every LAB has its own set of C4 interconnects to drive either up or down. Figure 2–18 shows the C4 interconnect connections from an LAB in a column. The C4 interconnects can drive and be driven by all types of architecture blocks, including DSP blocks, TriMatrix memory blocks, and column and row IOEs. For LAB interconnection, a primary LAB or its LAB neighbor can drive a given C4 interconnect. C4 interconnects can drive each other to extend their range as well as drive row interconnects for column-to-column connections.
Altera Corporation 2–25 May 2007 Stratix II Device Handbook, Volume 1
MultiTrack Interconnect
4
Figure 2–18. C4 Interconnect Connections Note (1)
C4 Interconnect Drives Local and R Interconnects up to Four Rows
C4 Interconnect Driving Up
LAB
Row Interconnect
Adjacent LAB can drive onto neighboring LAB's C4 interconnect
Local
Interconnect
C4 Interconnect Driving Down
Note to Figure 2–18:
(1) Each C4 interconnect can drive either up or down four rows.
2–26 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
C16 column interconnects span a length of 16 LABs and provide the fastest resource for long column connections between LABs, TriMatrix memory blocks, DSP blocks, and IOEs. C16 interconnects can cross M-RAM blocks and also drive to row and column interconnects at every fourth LAB. C16 interconnects drive LAB local interconnects via C4 and R4 interconnects and do not drive LAB local interconnects directly.
All embedded blocks communicate with the logic array similar to LAB­to-LAB interfaces. Each block (that is, TriMatrix memory and DSP blocks) connects to row and column interconnects and has local interconnect regions driven by row and column interconnects. These blocks also have direct link interconnects for fast connections to and from a neighboring LAB. All blocks are fed by the row LAB clocks, labclk[5..0].
Table 2–2 shows the Stratix II device’s routing scheme.
Table 2–2. Stratix II Device Routing Scheme (Part 1 of 2)
Source
Carry Chain
Register Chain
Local Interconnect
Direct Link Interconnect
v v vvvv
vvvv
vvv
vvvv
vvv v vvv v
vvvv vv v
Shared arithmetic chain
Carry chain
Register chain
Local interconnect
Direct link interconnect
R4 interconnect
R24 interconnect
C4 interconnect
C16 interconnect
ALM
M512 RAM block
M4K RAM block
M-RAM block
DSP blocks
Shared Arithmetic Chain
vvvvvv v
Destination
R24 Interconnect
C4 Interconnect
R4 Interconnect
Stratix II Architecture
ALM
C16 Interconnect
M4K RAM Block
M512 RAM Block
DSP Blocks
M-RAM Block
v v v vvvvvvv
Column IOE
Row IOE
Altera Corporation 2–27 May 2007 Stratix II Device Handbook, Volume 1

Tri M a t r ix Memor y

Table 2–2. Stratix II Device Routing Scheme (Part 2 of 2)
Source
Carry Chain
Register Chain
Local Interconnect
Shared Arithmetic Chain
Column IOE
Row IOE
Direct Link Interconnect
vvv vvvv
R4 Interconnect
Destination
C4 Interconnect
R24 Interconnect
ALM
C16 Interconnect
M512 RAM Block
DSP Blocks
M-RAM Block
M4K RAM Block
Row IOE
Column IOE
TriMatrix Memory
TriMatrix memory consists of three types of RAM blocks: M512, M4K, and M-RAM. Although these memory blocks are different, they can all implement various types of memory with or without parity, including true dual-port, simple dual-port, and single-port RAM, ROM, and FIFO buffers. Table 2–3 shows the size and features of the different RAM blocks.
Table 2–3. TriMatrix Memory Features (Part 1 of 2)
Memory Feature
Maximum performance 500 MHz 550 MHz 420 MHz
True dual-port memory
Simple dual-port memory
Single-port memory
Shift register
ROM
FIFO buffer
Pack mode
Byte enable
Address clock enable
Parity bits
Mixed clock mode
Memory initialization (.mif)
M512 RAM Block
(32 × 18 Bits)
vvv vvv vv vv vvv
vvv
vvv vvv vv
M4K RAM Block
(128 × 36 Bits)
M-RAM Block
(4K × 144 Bits)
vv
(1)
vv
vv
2–28 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Table 2–3. TriMatrix Memory Features (Part 2 of 2)
Stratix II Architecture
Memory Feature
Simple dual-port memory mixed width support
True dual-port memory mixed width support
Power-up conditions Outputs cleared Outputs cleared Outputs unknown
Register clears Output registers Output registers Output registers
Mixed-port read-during-write Unknown output/old data Unknown output/old data Unknown output
Configurations 512 × 1
Notes to Ta b l e 2– 3 :
(1) The M-RAM block does not support memory initializations. However, the M-RAM block can emulate a ROM
function using a dual-port RAM bock. The Stratix II device must write to the dual-port memory once and then disable the write-enable ports afterwards.
M512 RAM Block
(32 × 18 Bits)
vvv
256 × 2 128 × 4 64 × 8 64 × 9 32 × 16 32 × 18
M4K RAM Block
(128 × 36 Bits)
4K × 1 2K × 2 1K × 4 512 × 8 512 × 9 256 × 16 256 × 18 128 × 32 128 × 36
M-RAM Block
(4K × 144 Bits)
vv
64K × 8 64K × 9 32K × 16 32K × 18 16K × 32 16K × 36 8K × 64 8K × 72 4K × 128 4K × 144

Memory Block Size

TriMatrix memory provides three different memory sizes for efficient application support. The Quartus II software automatically partitions the user-defined memory into the embedded memory blocks using the most efficient size combinations. You can also manually assign the memory to a specific block size or a mixture of block sizes.
When applied to input registers, the asynchronous clear signal for the TriMatrix embedded memory immediately clears the input registers. However, the output of the memory block does not show the effects until the next clock edge. When applied to output registers, the asynchronous clear signal clears the output registers and the effects are seen immediately.
Altera Corporation 2–29 May 2007 Stratix II Device Handbook, Volume 1
Tri M a t r ix Memor y

M512 RAM Block

The M512 RAM block is a simple dual-port memory block and is useful for implementing small FIFO buffers, DSP, and clock domain transfer applications. Each block contains 576 RAM bits (including parity bits). M512 RAM blocks can be configured in the following modes:
Simple dual-port RAM
Single-port RAM
FIFO
ROM
Shift register
1 Violating the setup or hold time on the memory block address
registers could corrupt memory contents. This applies to both read and write operations.
When configured as RAM or ROM, you can use an initialization file to pre-load the memory contents.
M512 RAM blocks can have different clocks on its inputs and outputs. The wren, datain, and write address registers are all clocked together from one of the two clocks feeding the block. The read address, rden, and output registers can be clocked by either of the two clocks driving the block. This allows the RAM block to operate in read/write or input/output clock modes. Only the output register can be bypassed. The six labclk signals or local interconnect can drive the inclock, outclock, wren, rden, and outclr signals. Because of the advanced interconnect between the LAB and M512 RAM blocks, ALMs can also control the wren and rden signals and the RAM clock, clock enable, and asynchronous clear signals. Figure 2–19 shows the M512 RAM block control signal generation logic.
The RAM blocks in Stratix II devices have local interconnects to allow ALMs and interconnects to drive into RAM blocks. The M512 RAM block local interconnect is driven by the R4, C4, and direct link interconnects from adjacent LABs. The M512 RAM blocks can communicate with LABs on either the left or right side through these row interconnects or with LAB columns on the left or right side with the column interconnects. The M512 RAM block has up to 16 direct link input connections from the left adjacent LABs and another 16 from the right adjacent LAB. M512 RAM outputs can also connect to left and right LABs through direct link interconnect. The M512 RAM block has equal opportunity for access and performance to and from LABs on either its left or right side. Figure 2–20 shows the M512 RAM block to logic array interface.
2–30 Altera Corporation Stratix II Device Handbook, Volume 1 May 2007
Loading...
+ 76 hidden pages