Features ................................................................................................................................................... 1–2
LAB Control Signals ......................................................................................................................... 2–5
Logic Elements ....................................................................................................................................... 2–6
Fast PLLs ........................................................................................................................................ 2–100
Slew-Rate Control ........................................................................................................................ 2–120
Bus Hold ........................................................................................................................................ 2–121
Power Consumption ........................................................................................................................... 4–17
Timing Model ....................................................................................................................................... 4–19
Preliminary & Final Timing .......................................................................................................... 4–19
Ordering Information ........................................................................................................................... 5–1
Index
Altera Corporation v
Page 6
ContentsStratix Device Handbook, Volume 1
vi Altera Corporation
Page 7
Chapter Revision Dates
The chapters in this book, Stratix Device Handbook, Volume 1, were revised on the following dates.
Where chapters or groups of chapters are available separately, part numbers are listed.
Altera literature servicesliterature@altera.comliterature@altera.com
Non-technical customer
service
FTP siteftp.altera.comftp.altera.com
You can find more information in the following ways:
■The Adobe Acrobat Find feature, which searches the text of a PDF
document. Click the binoculars toolbar icon to open the Find dialog
box.
■Acrobat bookmarks, which serve as an additional table of contents in
PDF documents.
■Thumbnail icons, which provide miniature previews of each page,
provide a link to the pages.
■Numerous links, shown in green text, which allow you to jump to
related information.
For the most up-to-date information about Altera products, go to the
Altera world-wide web site at www.altera.com. For technical support on
this product, go to www.altera.com/mysupport. For additional
information about Altera products, consult the sources shown below.
(800) 800-EPLD (3753)
(7:00 a.m. to 5:00 p.m. Pacific Time)
(800) 767-3753+ 1 408-544-7000
+1 408-544-8767
7:00 a.m. to 5:00 p.m. (GMT -8:00)
Pacific Time
tdi, input. Active-low signals are denoted by suffix n, e.g., resetn.
Anything that must be typed exactly as it appears is shown in Courier type. For
example:
actual file, such as a Report File, references to parts of files (e.g., the AHDL
keyword
Courier.
Numbered steps are used in a list of items when the sequence of the items is
important, such as the steps listed in a procedure.
c:\qdesigns\tutorial\chiptrip.gdf. Also, sections of an
SUBDESIGN), as well as logic function names (e.g., TRI) are shown in
x Altera Corporation
Page 11
Section I. Stratix Device
Family Data Sheet
This section provides the data sheet specifications for Stratix® devices.
They contain feature definitions of the internal architecture,
configuration and JTAG boundary-scan testing information, DC
operating conditions, AC timing parameters, a reference to power
consumption, and ordering information for Stratix devices.
This section contains the following chapters:
■Chapter 1, Introduction
■Chapter 2, Stratix Architecture
■Chapter 3, Configuration & Testing
■Chapter 4, DC & Switching Characteristics
■Chapter 5, Reference & Ordering Information
Revision History
The table below shows the revision history for Chapters 1 through 5.
ChapterDate/VersionChanges Made
1July 2005, v3.2● Minor content changes.
September 2004, v3.1
April 2004, v3.0
January 2004, v2.2
October 2003, v2.1
July 2003, v2.0
Altera Corporation Section I–1
● Updated Table 1–6 on page 1–5.
● Main section page numbers changed on first page.
● Changed PCI-X to PCI-X 1.0 in “Features” on page 1–2.
● Global change from SignalTap to SignalTap II.
● The DSP blocks in “Features” on page 1–2 provide dedicated
implementation of multipliers that are now “faster than 300 MHz.”
● Updated -5 speed grade device information in Table 1-6.
● Add -8 speed grade device information.
● Format changes throughout chapter.
Page 12
Stratix Device Family Data SheetStratix Device Handbook, Volume 1
ChapterDate/VersionChanges Made
2July 2005 v3.2● Added “Clear Signals” section.
● Updated “Power Sequencing & Hot Socketing” section.
● Format changes.
September 2004, v3.1
April 2004, v3.0
November 2003, v2.2
October 2003, v2.1
● Updated fast regional clock networks description on page 2–73.
● Deleted the word preliminary from the “specification for the maximum
time to relock is 100 µs” on page 2–90.
● Added information about differential SSTL and HSTL outputs in
“External Clock Outputs” on page 2–92.
● Updated notes in Figure 2–55 on page 2–93.
● Added information about m counter to “Clock Multiplication &
Division” on page 2–101.
● Updated Note 1 in Table 2–58 on page 2–101.
● Updated description of “Clock Multiplication & Division” on
page 2–88.
● Updated Table 2–22 on page 2–102.
● Added references to AN 349 and AN 329 to “External RAM
Interfacing” on page 2–115.
● Table 2–25 on page 2–116: updated the table, updated Notes 3 and
4. Notes 4, 5, and 6, are now Notes 5, 6, and 7, respectively.
● Updated Table 2–26 on page 2–117.
● Added information about PCI Compliance to page 2–120.
● Table 2–32 on page 2–126: updated the table and deleted Note 1.
● Updated reference to device pin-outs now being available on the web
on page 2–130.
● Added Notes 4 and 5 to Table 2–36 on page 2–130.
● Updated Note 3 in Table 2–37 on page 2–131.
● Updated Note 5 in Table 2–41 on page 2–135.
● Added note 3 to rows 11 and 12 in Table 2–18.
● Deleted “Stratix and Stratix GX Device PLL Availability” table.
● Added I/O standards row in Table 2–28 that support max and min
strength.
● Row clk[1,3,8,10] was removed from Ta bl e 2 – 30 .
● Added checkmarks in Enhanced column for LVPECL, 3.3-V PCML,
LVDS, and HyperTransport technology rows in Table 2–32.
● Removed the Left and Right I/O Banks row in Table 2–34.
● Changed RCLK values in Figures 2–50 and 2–51.
● External RAM Interfacing section replaced.
● Added 672-pin BGA package information in Table 2–37.
● Removed support for series and parallel on-chip termination.
● Updated timing information in Tables 4–55 through 4–96.
● Updated delay information in Tables 4–103 through 4–108.
● Updated programmable delay information in Tables 4–100 and
4–103.
July 2003, v2.0
5September 2004, v2.1
April 2003, v1.0
● Updated clock rates in Tables 4–114 through 4–123.
● Updated speed grade information in the introduction on page 4-1.
● Corrected figures 4-1 & 4-2 and Table 4-9 to reflect how VID and VOD
are specified.
● Added note 6 to Table 4-32.
● Updated Stratix Performance Table 4-35.
● Updated EP1S60 and EP1S80 timing parameters in Tables 4-82 to 4-
93. The Stratix timing models are final for all devices.
● Updated Stratix IOE programmable delay chains in Tables 4-100 to 4-
101.
● Added single-ended I/O standard output pin delay adders for loading
in Table 4-102.
● Added spec for FPLL[10..7]CLK pins in Tables 4-104 and 4-107.
● Updated high-speed I/O specification for J=2 in Tables 4-114 and 4-
115.
● Updated EPLL specification and fast PLL specification in Tables 4-
116 to 4-120.
● Updated reference to device pin-outs on page 5–1 to indicate that
device pin-outs are no longer included in this manual and are now
available on the Altera web site.
● No new changes in Stratix Device Handbook v2.0.
Altera Corporation Section I–7
Page 18
Stratix Device Family Data SheetStratix Device Handbook, Volume 1
Section I–8Altera Corporation
Page 19
S51001-3.2
1. Introduction
Introduction
The Stratix® fa mi ly of FPGAs is based o n a 1.5-V, 0.13-µm, all-layer copper
SRAM process, with densities of up to 79,040 logic elements (LEs) and up
to 7.5 Mbits of RAM. Stratix devices offer up to 22 digital signal
processing (DSP) blocks with up to 176 (9-bit × 9-bit) embedded
multipliers, optimized for DSP applications that enable efficient
implementation of high-performance filters and multipliers. Stratix
devices support various I/O standards and also offer a complete clock
management solution with its hierarchical clock structure with up to
420-MHz performance and up to 12 phase-locked loops (PLLs).
The following shows the main sections in the Stratix Device Family Data
Sheet:
(1) This parameter lists the total number of 9 × 9-bit multipliers for each device. For the total number of 18 × 18-bit
multipliers per device, divide the total number of 9 × 9-bit multipliers by 2. For the total number of 36 × 36-bit
multipliers per device, divide the total number of 9 × 9-bit multipliers by 8.
Altera Corporation 1–3
July 2005Stratix Device Handbook, Volume 1
Page 22
Features
Stratix devices are available in space-saving FineLine BGA® and ball-grid
array (BGA) packages (see Tables 1–3 through 1–5). All Stratix devices
support vertical migration within the same package (for example, you
can migrate between the EP1S10, EP1S20, and EP1S25 devices in the 672pin BGA package). Vertical migration means that you can migrate to
devices whose dedicated pins, configuration pins, and power pins are the
same for a given package across device densities. For I/O pin migration
across densities, you must cross-reference the available I/O pins using
the device pin-outs for all planned densities of a given package type to
identify which I/O pins are migrational. The Quartus
automatically cross reference and place all pins except differential pins
for migration when given a device migration list. You must use the pinouts for each device to verify the differential placement migration. A
future version of the Quartus II software will support differential pin
migration.
Stratix devices are available in up to four speed grades, -5, -6, -7, and -8,
with -5 being the fastest. Table 1–6 shows Stratix device speed-grade
offerings.
Stratix® devices contain a two-dimensional row- and column-based
architecture to implement custom logic. A series of column and row
interconnects of varying length and speed provide signal interconnects
between logic array blocks (LABs), memory block structures, and DSP
blocks.
The logic array consists of LABs, with 10 logic elements (LEs) in each
LAB. An LE is a small unit of logic providing efficient implementation of
user logic functions. LABs are grouped into rows and columns across the
device.
M512 RAM blocks are simple dual-port memory blocks with 512 bits plus
parity (576 bits). These blocks provide dedicated simple dual-port or
single-port memory up to 18-bits wide at up to 318 MHz. M512 blocks are
grouped into columns across the device in between certain LABs.
M4K RAM blocks are true dual-port memory blocks with 4K bits plus
parity (4,608 bits). These blocks provide dedicated true dual-port, simple
dual-port, or single-port memory up to 36-bits wide at up to 291 MHz.
These blocks are grouped into columns across the device in between
certain LABs.
M-RAM blocks are true dual-port memory blocks with 512K bits plus
parity (589,824 bits). These blocks provide dedicated true dual-port,
simple dual-port, or single-port memory up to 144-bits wide at up to
269 MHz. Several M-RAM blocks are located individually or in pairs
within the device’s logic array.
Digital signal processing (DSP) blocks can implement up to either eight
full-precision 9 × 9-bit multipliers, four full-precision 18 × 18-bit
multipliers, or one full-precision 36 × 36-bit multiplier with add or
subtract features. These blocks also contain 18-bit input shift registers for
digital signal processing applications, including FIR and infinite impulse
response (IIR) filters. DSP blocks are grouped into two columns in each
device.
Each Stratix device I/O pin is fed by an I/O element (IOE) located at the
end of LAB rows and columns around the periphery of the device. I/O
pins support numerous single-ended and differential I/O standards.
Each IOE contains a bidirectional I/O buffer and six registers for
registering input, output, and output-enable signals. When used with
Altera Corporation 2–1
July 2005
Page 26
Functional Description
dedicated clocks, these registers provide exceptional performance and
interface support with external memory devices such as DDR SDRAM,
FCRAM, ZBT, and QDR SRAM devices.
High-speed serial interface channels support transfers at up to 840 Mbps
using LVDS, LVPECL, 3.3-V PCML, or HyperTransport technology I/O
standards.
Figure 2–1 shows an overview of the Stratix device.
The number of M512 RAM, M4K RAM, and DSP blocks varies by device
along with row and column numbers and M-RAM blocks. Table 2–1 lists
the resources available in Stratix devices.
Table 2–1. Stratix Device Resources
Stratix Architecture
Device
EP1S104 / 942 / 6012 / 64030
EP1S206 / 1942 / 8222 / 105241
EP1S256 / 2243 / 13822 / 106246
EP1S307 / 2953 / 17142 / 126757
EP1S408 / 3843 / 18342 / 147761
EP1S6010 / 5744 / 29262 / 189073
EP1S8011 / 7674 / 36492 / 2210191
M512 RAM
Columns/Blocks
Logic Array
Blocks
M4K RAM
Columns/Blocks
M-RAM
Blocks
DSP Block
Columns/Blocks
LAB
Columns
LAB Rows
Each LAB consists of 10 LEs, LE carry chains, LAB control signals, local
interconnect, LUT chain, and register chain connection lines. The local
interconnect transfers signals between LEs in the same LAB. LUT chain
connections transfer the output of one LE’s LUT to the adjacent LE for fast
sequential LUT connections within the same LAB. Register chain
connections transfer the output of one LE’s register to the adjacent LE’s
®
register within an LAB. The Quartus
II Compiler places associated logic
within an LAB or adjacent LABs, allowing the use of local, LUT chain,
and register chain connections for performance and area efficiency.
Figure 2–2 shows the Stratix LAB.
Altera Corporation 2–3
July 2005Stratix Device Handbook, Volume 1
Page 28
Logic Array Blocks
Figure 2–2. Stratix LAB Structure
Direct link
interconnect from
adjacent block
Row Interconnects of
Variable Speed & Length
Direct link
interconnect from
adjacent block
Direct link
interconnect to
adjacent block
Direct link
interconnect to
adjacent block
Local Interconnect
LAB
Three-Sided Architecture—Local
Interconnect is Driven from Either Side by
Columns & LABs, & from Above by Rows
Column Interconnects of
Variable Speed & Length
LAB Interconnects
The LAB local interconnect can drive LEs within the same LAB. The LAB
local interconnect is driven by column and row interconnects and LE
outputs within the same LAB. Neighboring LABs, M512 RAM blocks,
M4K RAM blocks, or DSP blocks from the left and right can also drive an
LAB’s local interconnect through the direct link connection. The direct
link connection feature minimizes the use of row and column
interconnects, providing higher performance and flexibility. Each LE can
drive 30 other LEs through fast local and direct link interconnects.
Direct link interconnect from
right LAB, TriMatrix memory
block, DSP block, or IOE output
Direct link
interconnect
to right
Local
LAB
LAB Control Signals
Each LAB contains dedicated logic for driving control signals to its LEs.
The control signals include two clocks, two clock enables, two
asynchronous clears, synchronous clear, asynchronous preset/load,
synchronous load, and add/subtract control signals. This gives a
maximum of 10 control signals at a time. Although synchronous load and
clear signals are generally used when implementing counters, they can
also be used with other functions.
Each LAB can use two clocks and two clock enable signals. Each LAB’s
clock and clock enable signals are linked. For example, any LE in a
particular LAB using the labclk1 signal will also use labclkena1. If
the LAB uses both the rising and falling edges of a clock, it also uses both
LAB-wide clock signals. De-asserting the clock enable signal will turn off
the LAB-wide clock.
Each LAB can use two asynchronous clear signals and an asynchronous
load/preset signal. The asynchronous load acts as a preset when the
asynchronous load data input is tied high.
Altera Corporation 2–5
July 2005Stratix Device Handbook, Volume 1
Page 30
Logic Elements
With the LAB-wide addnsub control signal, a single LE can implement a
one-bit adder and subtractor. This saves LE resources and improves
performance for logic functions such as DSP correlators and signed
multipliers that alternate between addition and subtraction depending
on data.
The LAB row clocks [7..0] and LAB local interconnect generate the LABwide control signals. The MultiTrack
allows clock and control signal distribution in addition to data. Figure 2–4
shows the LAB control signal generation circuit.
Figure 2–4. LAB-Wide Control Signals
Dedicated
Row LAB
Clocks
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
Logic Elements
8
The smallest unit of logic in the Stratix architecture, the LE, is compact
and provides advanced features with efficient logic utilization. Each LE
contains a four-input LUT, which is a function generator that can
implement any function of four variables. In addition, each LE contains a
programmable register and carry chain with carry select capability. A
single LE also supports dynamic single bit addition or subtraction mode
selectable by an LAB-wide control signal. Each LE drives all types of
interconnects: local, row, column, LUT chain, register chain, and direct
link interconnects. See Figure 2–5.
Each LE’s programmable register can be configured for D, T, JK, or SR
operation. Each register has data, true asynchronous load data, clock,
clock enable, clear, and asynchronous load/preset inputs. Global signals,
general-purpose I/O pins, or any internal logic can drive the register’s
clock and clear control signals. Either general-purpose I/O pins or
internal logic can drive the clock enable, preset, asynchronous load, and
asynchronous data. The asynchronous load data input comes from the
data3 input of the LE. For combinatorial functions, the register is
bypassed and the output of the LUT drives directly to the outputs of the
LE.
Each LE has three outputs that drive the local, row, and column routing
resources. The LUT or register output can drive these three outputs
independently. Two LE outputs drive column or row and direct link
routing connections and one drives local interconnect resources. This
allows the LUT to drive one output while the register drives another
output. This feature, called register packing, improves device utilization
because the device can use the register and the LUT for unrelated
Altera Corporation 2–7
July 2005Stratix Device Handbook, Volume 1
Page 32
Logic Elements
functions. Another special packing mode allows the register output to
feed back into the LUT of the same LE so that the register is packed with
its own fan-out LUT. This provides another mechanism for improved
fitting. The LE can also drive out registered and unregistered versions of
the LUT output.
LUT Chain & Register Chain
In addition to the three general routing outputs, the LEs within an LAB
have LUT chain and register chain outputs. LUT chain connections allow
LUTs within the same LAB to cascade together for wide input functions.
Register chain outputs allow registers within the same LAB to cascade
together. The register chain output allows an LAB to use LUTs for a single
combinatorial function and the registers to be used for an unrelated shift
register implementation. These resources speed up connections between
LABs while saving local interconnect resources. See “MultiTrack
Interconnect” on page 2–14 for more information on LUT chain and
register chain connections.
addnsub Signal
The LE’s dynamic adder/subtractor feature saves logic resources by
using one set of LEs to implement both an adder and a subtractor. This
feature is controlled by the LAB-wide control signal addnsub. The
addnsub signal sets the LAB to perform either A + B or A – B. The LUT
computes addition, and subtraction is computed by adding the two’s
complement of the intended subtractor. The LAB-wide signal converts to
two’s complement by inverting the B bits within the LAB and setting
carry-in = 1 to add one to the least significant bit (LSB). The LSB of an
adder/subtractor must be placed in the first LE of the LAB, where the
LAB-wide addnsub signal automatically sets the carry-in to 1. The
Quartus II Compiler automatically places and uses the adder/subtractor
feature when using adder/subtractor parameterized functions.
LE Operating Modes
The Stratix LE can operate in one of the following modes:
■Normal mode
■Dynamic arithmetic mode
Each mode uses LE resources differently. In each mode, eight available
inputs to the LE—the four data inputs from the LAB local interconnect;
carry-in0 and carry-in1 from the previous LE; the LAB carry-in
from the previous carry-chain LAB; and the register chain connection—
are directed to different destinations to implement the desired logic
function. LAB-wide signals provide clock, asynchronous clear,
asynchronous preset load, synchronous clear, synchronous load, and
clock enable control for the register. These LAB-wide signals are available
in all LE modes. The addnsub control signal is allowed in arithmetic
mode.
The Quartus II software, in conjunction with parameterized functions
such as library of parameterized modules (LPM) functions, automatically
chooses the appropriate mode for common functions such as counters,
adders, subtractors, and arithmetic functions. If required, you can also
create special-purpose functions that specify which LE operating mode to
use for optimal performance.
Normal Mode
The normal mode is suitable for general logic applications and
combinatorial functions. In normal mode, four data inputs from the LAB
local interconnect are inputs to a four-input LUT (see Figure 2–6). The
Quartus II Compiler automatically selects the carry-in or the data3
signal as one of the inputs to the LUT. Each LE can use LUT chain
connections to drive its combinatorial output directly to the next LE in the
LAB. Asynchronous load data for the register comes from the data3
input of the LE. LEs in normal mode support packed registers.
Figure 2–6. LE in Normal Mode
sclear
(LAB Wide)
addnsub (LAB Wide)
(1)
data1
data2
data3
cin (from cout
of previous LE)
data4
4-Input
LUT
Register Feedback
Register chain
connection
sload
(LAB Wide)
clock (LAB Wide)
ena (LAB Wide)
aclr (LAB Wide)
Note to Figure 2–6:
(1) This signal is only allowed in normal mode if the LE is at the end of an adder/subtractor chain.
Altera Corporation 2–9
July 2005Stratix Device Handbook, Volume 1
aload
(LAB Wide)
ALD/PRE
ADATA
D
ENA
CLRN
Q
Row, column, and
direct link routing
Row, column, and
direct link routing
Local routing
LUT chain
connection
Register
chain output
Page 34
Logic Elements
Dynamic Arithmetic Mode
The dynamic arithmetic mode is ideal for implementing adders, counters,
accumulators, wide parity functions, and comparators. An LE in dynamic
arithmetic mode uses four 2-input LUTs configurable as a dynamic
adder/subtractor. The first two 2-input LUTs compute two summations
based on a possible carry-in of 1 or 0; the other two LUTs generate carry
outputs for the two chains of the carry select circuitry. As shown in
Figure 2–7, the LAB carry-in signal selects either the carry-in0 or
carry-in1 chain. The selected chain’s logic level in turn determines
which parallel sum is generated as a combinatorial or registered output.
For example, when implementing an adder, the sum output is the
selection of two possible calculated sums: data1 + data2 + carry-in0
or data1 + data2 + carry-in1. The other two LUTs use the data1 and data2 signals to generate two possible carry-out signals—one for a carry
of 1 and the other for a carry of 0. The carry-in0 signal acts as the carry
select for the carry-out0 output and carry-in1 acts as the carry select
for the carry-out1 output. LEs in arithmetic mode can drive out
registered and unregistered versions of the LUT output.
The dynamic arithmetic mode also offers clock enable, counter enable,
synchronous up/down control, synchronous clear, synchronous load,
and dynamic adder/subtractor options. The LAB local interconnect data
inputs generate the counter enable and synchronous up/down control
signals. The synchronous clear and synchronous load options are LABwide signals that affect all registers in the LAB. The Quartus II software
automatically places any registers that are not used by the counter into
other LABs. The addnsub LAB-wide signal controls whether the LE acts
as an adder or subtractor.
(1) The addnsub signal is tied to the carry input for the first LE of a carry chain only.
Carry-Select Chain
The carry-select chain provides a very fast carry-select function between
LEs in arithmetic mode. The carry-select chain uses the redundant carry
calculation to increase the speed of carry functions. The LE is configured
to calculate outputs for a possible carry-in of 1 and carry-in of 0 in
parallel. The carry-in0 and carry-in1 signals from a lower-order bit
feed forward into the higher-order bit via the parallel carry chain and feed
into both the LUT and the next portion of the carry chain. Carry-select
chains can begin in any LE within an LAB.
Q
Row, column, and
direct link routing
Row, column, and
direct link routing
Local routing
LUT chain
connection
Register
chain output
The speed advantage of the carry-select chain is in the parallel precomputation of carry chains. Since the LAB carry-in selects the
precomputed carry chain, not every LE is in the critical path. Only the
propagation delay between LAB carry-in generation (LE 5 and LE 10) are
now part of the critical path. This feature allows the Stratix architecture to
implement high-speed counters, adders, multipliers, parity functions,
and comparators of arbitrary width.
Altera Corporation 2–11
July 2005Stratix Device Handbook, Volume 1
Page 36
Logic Elements
Figure 2–8 shows the carry-select circuitry in an LAB for a 10-bit full
adder. One portion of the LUT generates the sum of two bits using the
input signals and the appropriate carry-in bit; the sum is routed to the
output of the LE. The register can be bypassed for simple adders or used
for accumulator functions. Another portion of the LUT generates carryout bits. An LAB-wide carry in bit selects which chain is used for the
addition of given inputs. The carry-in signal for each chain, carry-in0
or carry-in1, selects the carry-out to carry forward to the carry-in
signal of the next-higher-order bit. The final carry-out signal is routed to
an LE, where it is fed to local, row, or column interconnects.
The Quartus II Compiler automatically creates carry chain logic during
design processing, or you can create it manually during design entry.
Parameterized functions such as LPM functions automatically take
advantage of carry chains for the appropriate functions.
The Quartus II Compiler creates carry chains longer than 10 LEs by
linking LABs together automatically. For enhanced fitting, a long carry
chain runs vertically allowing fast horizontal connections to TriMatrix
™
memory and DSP blocks. A carry chain can continue as far as a full
column.
LAB-wide signals control the logic for the register’s clear and preset
signals. The LE directly supports an asynchronous clear and preset
function. The register preset is achieved through the asynchronous load
of a logic high. The direct asynchronous preset does not require a NOTgate push-back technique. Stratix devices support simultaneous preset/
Altera Corporation 2–13
July 2005Stratix Device Handbook, Volume 1
Page 38
MultiTrack Interconnect
asynchronous load, and clear signals. An asynchronous clear signal takes
precedence if both signals are asserted simultaneously. Each LAB
supports up to two clears and one preset signal.
In addition to the clear and preset ports, Stratix devices provide a chipwide reset pin (DEV_CLRn) that resets all registers in the device. An
option set before compilation in the Quartus II software controls this pin.
This chip-wide reset overrides all other control signals.
MultiTrack
Interconnect
In the Stratix architecture, connections between LEs, TriMatrix memory,
DSP blocks, and device I/O pins are provided by the MultiTrack
interconnect structure with DirectDriveTM technology. The MultiTrack
interconnect consists of continuous, performance-optimized routing lines
of different lengths and speeds used for inter- and intra-design block
connectivity. The Quartus II Compiler automatically places critical design
paths on faster interconnects to improve design performance.
DirectDrive technology is a deterministic routing technology that ensures
identical routing resource usage for any function regardless of placement
within the device. The MultiTrack interconnect and DirectDrive
technology simplify the integration stage of block-based designing by
eliminating the re-optimization cycles that typically follow design
changes and additions.
The MultiTrack interconnect consists of row and column interconnects
that span fixed distances. A routing structure with fixed length resources
for all devices allows predictable and repeatable performance when
migrating through different device densities. Dedicated row
interconnects route signals to and from LABs, DSP blocks, and TriMatrix
memory within the same row. These row resources include:
■Direct link interconnects between LABs and adjacent blocks.
■R4 interconnects traversing four blocks to the right or left.
■R8 interconnects traversing eight blocks to the right or left.
■R24 row interconnects for high-speed access across the length of the
device.
The direct link interconnect allows an LAB, DSP block, or TriMatrix
memory block to drive into the local interconnect of its left and right
neighbors and then back into itself. Only one side of a M-RAM block
interfaces with direct link and row interconnects. This provides fast
communication between adjacent LABs and/or blocks without using row
interconnect resources.
The R4 interconnects span four LABs, three LABs and one M512 RAM
block, two LABs and one M4K RAM block, or two LABs and one DSP
block to the right or left of a source LAB. These resources are used for fast
row connections in a four-LAB region. Every LAB has its own set of R4
interconnects to drive either left or right. Figure 2–9 shows R4
interconnect connections from an LAB. R4 interconnects can drive and be
driven by DSP blocks and RAM blocks and horizontal IOEs. For LAB
interfacing, a primary LAB or LAB neighbor can drive a given R4
interconnect. For R4 interconnects that drive to the right, the primary
LAB and right neighbor can drive on to the interconnect. For R4
interconnects that drive to the left, the primary LAB and its left neighbor
can drive on to the interconnect. R4 interconnects can drive other R4
interconnects to extend the range of LABs they can drive. R4
interconnects can also drive C4 and C16 interconnects for connections
from one row to another. Additionally, R4 interconnects can drive R24
interconnects.
Figure 2–9. R4 Interconnect Connections
Adjacent LAB can
Drive onto Another
LAB's R4 Interconnect
R4 Interconnect
Driving Left
C4, C8, and C16
Column Interconnects (1)
R4 Interconnect
Driving Right
Stratix Architecture
LAB
Neighbor
Primary
LAB (2)
LAB
Neighbor
Notes to Figure 2–9:
(1) C4 interconnects can drive R4 interconnects.
(2) This pattern is repeated for every LAB in the LAB row.
The R8 interconnects span eight LABs, M512 or M4K RAM blocks, or DSP
blocks to the right or left from a source LAB. These resources are used for
fast row connections in an eight-LAB region. Every LAB has its own set
of R8 interconnects to drive either left or right. R8 interconnect
connections between LABs in a row are similar to the R4 connections
shown in Figure 2–9, with the exception that they connect to eight LABs
to the right or left, not four. Like R4 interconnects, R8 interconnects can
drive and be driven by all types of architecture blocks. R8 interconnects
Altera Corporation 2–15
July 2005Stratix Device Handbook, Volume 1
Page 40
MultiTrack Interconnect
can drive other R8 interconnects to extend their range as well as C8
interconnects for row-to-row connections. One R8 interconnect is faster
than two R4 interconnects connected together.
R24 row interconnects span 24 LABs and provide the fastest resource for
long row connections between LABs, TriMatrix memory, DSP blocks, and
IOEs. The R24 row interconnects can cross M-RAM blocks. R24 row
interconnects drive to other row or column interconnects at every fourth
LAB and do not drive directly to LAB local interconnects. R24 row
interconnects drive LAB local interconnects via R4 and C4 interconnects.
R24 interconnects can drive R24, R4, C16, and C4 interconnects.
The column interconnect operates similarly to the row interconnect and
vertically routes signals to and from LABs, TriMatrix memory, DSP
blocks, and IOEs. Each column of LABs is served by a dedicated column
interconnect, which vertically routes signals to and from LABs, TriMatrix
memory and DSP blocks, and horizontal IOEs. These column resources
include:
■LUT chain interconnects within an LAB
■Register chain interconnects within an LAB
■C4 interconnects traversing a distance of four blocks in up and down
direction
■C8 interconnects traversing a distance of eight blocks in up and
down direction
■C16 column interconnects for high-speed vertical routing through
the device
Stratix devices include an enhanced interconnect structure within LABs
for routing LE output to LE input connections faster using LUT chain
connections and register chain connections. The LUT chain connection
allows the combinatorial output of an LE to directly drive the fast input
of the LE right below it, bypassing the local interconnect. These resources
can be used as a high-speed connection for wide fan-in functions from
LE 1 to LE 10 in the same LAB. The register chain connection allows the
register output of one LE to connect directly to the register input of the
next LE in the LAB for fast shift registers. The Quartus II Compiler
automatically takes advantage of these resources to improve utilization
and performance. Figure 2–10 shows the LUT chain and register chain
interconnects.
Register Chain
Routing to Adjacen
LE's Register Input
The C4 interconnects span four LABs, M512, or M4K blocks up or down
from a source LAB. Every LAB has its own set of C4 interconnects to drive
either up or down. Figure 2–11 shows the C4 interconnect connections
from an LAB in a column. The C4 interconnects can drive and be driven
by all types of architecture blocks, including DSP blocks, TriMatrix
memory blocks, and vertical IOEs. For LAB interconnection, a primary
LAB or its LAB neighbor can drive a given C4 interconnect.
C4 interconnects can drive each other to extend their range as well as
drive row interconnects for column-to-column connections.
Altera Corporation 2–17
July 2005Stratix Device Handbook, Volume 1
Page 42
MultiTrack Interconnect
4
Figure 2–11. C4 Interconnect Connections Note (1)
C4 Interconnect
Drives Local and R
Interconnects
up to Four Rows
C4 Interconnect
Driving Up
LAB
Row
Interconnect
Adjacent LAB can
drive onto neighboring
LAB's C4 interconnect
Local
Interconnect
C4 Interconnect
Driving Down
Note to Figure 2–11:
(1) Each C4 interconnect can drive either up or down four rows.
C8 interconnects span eight LABs, M512, or M4K blocks up or down from
a source LAB. Every LAB has its own set of C8 interconnects to drive
either up or down. C8 interconnect connections between the LABs in a
column are similar to the C4 connections shown in Figure 2–11 with the
exception that they connect to eight LABs above and below. The C8
interconnects can drive and be driven by all types of architecture blocks
similar to C4 interconnects. C8 interconnects can drive each other to
extend their range as well as R8 interconnects for column-to-column
connections. C8 interconnects are faster than two C4 interconnects.
C16 column interconnects span a length of 16 LABs and provide the
fastest resource for long column connections between LABs, TriMatrix
memory blocks, DSP blocks, and IOEs. C16 interconnects can cross MRAM blocks and also drive to row and column interconnects at every
fourth LAB. C16 interconnects drive LAB local interconnects via C4 and
R4 interconnects and do not drive LAB local interconnects directly.
All embedded blocks communicate with the logic array similar to LABto-LAB interfaces. Each block (i.e., TriMatrix memory and DSP blocks)
connects to row and column interconnects and has local interconnect
regions driven by row and column interconnects. These blocks also have
direct link interconnects for fast connections to and from a neighboring
LAB. All blocks are fed by the row LAB clocks, labclk[7..0].
Altera Corporation 2–19
July 2005Stratix Device Handbook, Volume 1
Page 44
MultiTrack Interconnect
Table 2–2 shows the Stratix device’s routing scheme.
TriMatrix memory consists of three types of RAM blocks: M512, M4K,
and M-RAM blocks. Although these memory blocks are different, they
can all implement various types of memory with or without parity,
including true dual-port, simple dual-port, and single-port RAM, ROM,
and FIFO buffers. Table 2–3 shows the size and features of the different
RAM blocks.
Table 2–3. TriMatrix Memory Features (Part 1 of 2)
(1) See Table 4–36 for maximum performance information.
(2) The M-RAM block does not support memory initializations. However, the
M-RAM block can emulate a ROM function using a dual-port RAM bock. The
Stratix device must write to the dual-port memory once and then disable the
write-enable ports afterwards.
1Violating the setup or hold time on the address registers could
corrupt the memory contents. This applies to both read and
write operations.
Memory Modes
TriMatrix memory blocks include input registers that synchronize writes
and output registers to pipeline designs and improve system
performance. M4K and M-RAM memory blocks offer a true dual-port
mode to support any combination of two-port operations: two reads, two
writes, or one read and one write at two different clock frequencies.
In addition to true dual-port memory, the memory blocks support simple
dual-port and single-port RAM. Simple dual-port memory supports a
simultaneous read and write and can either read old data before the write
occurs or just read the don’t care bits. Single-port memory supports nonsimultaneous reads and writes, but the q[] port will output the data once
it has been written to the memory (if the outputs are not registered) or
after the next rising edge of the clock (if the outputs are registered). For
more information, see Chapter 2, TriMatrix Embedded Memory Blocks in
Stratix & Stratix GX Devices of the Stratix Device Handbook, Volume 2.
Figure 2–13 shows these different RAM memory port configurations for
(1) Two single-port memory blocks can be implemented in a single M4K block as long
as each of the two independent block sizes is equal to or less than half of the M4K
block size.
The memory blocks also enable mixed-width data ports for reading and
writing to the RAM ports in dual-port RAM configuration. For example,
the memory block can be written in ×1 mode at port A and read out in ×16
mode from port B.
Altera Corporation 2–23
July 2005Stratix Device Handbook, Volume 1
Page 48
Tri M a t r ix Memo r y
TriMatrix memory architecture can implement pipelined RAM by
registering both the input and output signals to the RAM block. All
TriMatrix memory block inputs are registered providing synchronous
write cycles. In synchronous operation, the memory block generates its
own self-timed strobe write enable (WREN) signal derived from the global
or regional clock. In contrast, a circuit using asynchronous RAM must
generate the RAM WREN signal while ensuring its data and address
signals meet setup and hold time specifications relative to the WREN
signal. The output registers can be bypassed. Flow-through reading is
possible in the simple dual-port mode of M512 and M4K RAM blocks by
clocking the read enable and read address registers on the negative clock
edge and bypassing the output registers.
Two single-port memory blocks can be implemented in a single M4K
block as long as each of the two independent block sizes is equal to or less
than half of the M4K block size.
The Quartus II software automatically implements larger memory by
combining multiple TriMatrix memory blocks. For example, two
256 × 16-bit RAM blocks can be combined to form a 256 × 32-bit RAM
block. Memory performance does not degrade for memory blocks using
the maximum number of words available in one memory block. Logical
memory blocks using less than the maximum number of words use
physical blocks in parallel, eliminating any external control logic that
would increase delays. To create a larger high-speed memory block, the
Quartus II software automatically combines memory blocks with LE
control logic.
Clear Signals
When applied to input registers, the asynchronous clear signal for the
TriMatrix embedded memory immediately clears the input registers.
However, the output of the memory block does not show the effects until
the next clock edge. When applied to output registers, the asynchronous
clear signal clears the output registers and the effects are seen
immediately.
Parity Bit Support
The memory blocks support a parity bit for each byte. The parity bit,
along with internal LE logic, can implement parity checking for error
detection to ensure data integrity. You can also use parity-size data words
to store user-specified control bits. In the M4K and M-RAM blocks, byte
enables are also available for data input masking during write operations.
You can configure embedded memory blocks to implement shift registers
for DSP applications such as pseudo-random number generators, multichannel filtering, auto-correlation, and cross-correlation functions. These
and other DSP applications require local data storage, traditionally
implemented with standard flip-flops, which can quickly consume many
logic cells and routing resources for large shift registers. A more efficient
alternative is to use embedded memory as a shift register block, which
saves logic cell and routing resources and provides a more efficient
implementation with the dedicated circuitry.
The size of a w × m × n shift register is determined by the input data
width (w), the length of the taps (m), and the number of taps (n). The size
of a w × m × n shift register must be less than or equal to the maximum
number of memory bits in the respective block: 576 bits for the M512
RAM block and 4,608 bits for the M4K RAM block. The total number of
shift register outputs (number of taps n × width w) must be less than the
maximum data width of the RAM block (18 for M512 blocks, 36 for M4K
blocks). To create larger shift registers, the memory blocks are cascaded
together.
Data is written into each address location at the falling edge of the clock
and read from the address at the rising edge of the clock. The shift register
mode logic automatically controls the positive and negative edge
clocking to shift the data in one clock cycle. Figure 2–14 shows the
TriMatrix memory block in the shift register mode.
Altera Corporation 2–25
July 2005Stratix Device Handbook, Volume 1
Page 50
Tri M a t r ix Memo r y
r
Figure 2–14. Shift Register Memory Configuration
w × m × n Shift Register
m-Bit Shift Register
ww
m-Bit Shift Register
w
m-Bit Shift Register
w
w
n Numbe
of Taps
w
m-Bit Shift Register
w
w
Memory Block Size
TriMatrix memory provides three different memory sizes for efficient
application support. The large number of M512 blocks are ideal for
designs with many shallow first-in first-out (FIFO) buffers. M4K blocks
provide additional resources for channelized functions that do not
require large amounts of storage. The M-RAM blocks provide a large
single block of RAM ideal for data packet storage. The different-sized
blocks allow Stratix devices to efficiently support variable-sized memory
in designs.
The Quartus II software automatically partitions the user-defined
memory into the embedded memory blocks using the most efficient size
combinations. You can also manually assign the memory to a specific
block size or a mixture of block sizes.
The M512 RAM block is a simple dual-port memory block and is useful
for implementing small FIFO buffers, DSP, and clock domain transfer
applications. Each block contains 576 RAM bits (including parity bits).
M512 RAM blocks can be configured in the following modes:
■Simple dual-port RAM
■Single-port RAM
■FIFO
■ROM
■Shift register
When configured as RAM or ROM, you can use an initialization file to
pre-load the memory contents.
The memory address depths and output widths can be configured as
512 × 1, 256 × 2, 128 × 4, 64 × 8 (64 × 9 bits with parity), and 32 × 16
(32 × 18 bits with parity). Mixed-width configurations are also possible,
allowing different read and write widths. Table 2–4 summarizes the
possible M512 RAM block configurations.
When the M512 RAM block is configured as a shift register block, a shift
register of size up to 576 bits is possible.
The M512 RAM block can also be configured to support serializer and
deserializer applications. By using the mixed-width support in
combination with DDR I/O standards, the block can function as a
SERDES to support low-speed serial I/O standards using global or
regional clocks. See “I/O Structure” on page 2–104 for details on
dedicated SERDES in Stratix devices.
Altera Corporation 2–27
July 2005Stratix Device Handbook, Volume 1
Page 52
Tri M a t r ix Memo r y
M512 RAM blocks can have different clocks on its inputs and outputs.
The wren, datain, and write address registers are all clocked together
from one of the two clocks feeding the block. The read address, rden, and
output registers can be clocked by either of the two clocks driving the
block. This allows the RAM block to operate in read/write or
input/output clock modes. Only the output register can be bypassed. The
eight labclk signals or local interconnect can drive the inclock,
outclock, wren, rden, inclr, and outclr signals. Because of the
advanced interconnect between the LAB and M512 RAM blocks, LEs can
also control the wren and rden signals and the RAM clock, clock enable,
and asynchronous clear signals. Figure 2–15 shows the M512 RAM block
control signal generation logic.
The RAM blocks within Stratix devices have local interconnects to allow
LEs and interconnects to drive into RAM blocks. The M512 RAM block
local interconnect is driven by the R4, R8, C4, C8, and direct link
interconnects from adjacent LABs. The M512 RAM blocks can
communicate with LABs on either the left or right side through these row
interconnects or with LAB columns on the left or right side with the
column interconnects. Up to 10 direct link input connections to the M512
RAM block are possible from the left adjacent LABs and another
10 possible from the right adjacent LAB. M512 RAM outputs can also
connect to left and right LABs through 10 direct link interconnects. The
M512 RAM block has equal opportunity for access and performance to
and from LABs on either its left or right side. Figure 2–16 shows the M512
RAM block to logic array interface.
Altera Corporation 2–29
July 2005Stratix Device Handbook, Volume 1
Page 54
Tri M a t r ix Memo r y
Figure 2–16. M512 RAM Block LAB Row Interface
C4 and C8
Interconnects
R4 and R8
Interconnects
Direct link
interconnect
to adjacent LAB
Direct link
interconnect
from adjacent LAB
10
2
8
Small RAM Block Local
Interconnect Region
M512 RAM
datain
Clocks
dataout
Block
Control
Signals
address
LAB Row Clocks
Direct link
interconnect
to adjacent LAB
Direct link
interconnect
from adjacent LAB
M4K RAM Blocks
The M4K RAM block includes support for true dual-port RAM. The M4K
RAM block is used to implement buffers for a wide variety of applications
such as storing processor code, implementing lookup schemes, and
implementing larger memory applications. Each block contains
4,608 RAM bits (including parity bits). M4K RAM blocks can be
configured in the following modes:
■True dual-port RAM
■Simple dual-port RAM
■Single-port RAM
■FIFO
■ROM
■Shift register
When configured as RAM or ROM, you can use an initialization file to
pre-load the memory contents.
The memory address depths and output widths can be configured as
4,096 × 1, 2,048 × 2, 1,024 × 4, 512 × 8 (or 512 × 9 bits), 256 × 16 (or
256 × 18 bits), and 128 × 32 (or 128 × 36 bits). The 128 × 32- or 36-bit
configuration is not available in the true dual-port mode. Mixed-width
configurations are also possible, allowing different read and write
widths. Tables 2–5 and 2–6 summarize the possible M4K RAM block
configurations.
When the M4K RAM block is configured as a shift register block, you can
create a shift register up to 4,608 bits (w × m × n).
Altera Corporation 2–31
July 2005Stratix Device Handbook, Volume 1
Page 56
Tri M a t r ix Memo r y
M4K RAM blocks support byte writes when the write port has a data
width of 16, 18, 32, or 36 bits. The byte enables allow the input data to be
masked so the device can write to specific bytes. The unwritten bytes
retain the previous written value. Table 2–7 summarizes the byte
selection.
Table 2–7. Byte Enable for M4K Blocks Notes (1), (2)
byteena[3..0]datain ×18datain ×36
[0] = 1[8..0][8..0]
[1] = 1[17..9][17..9]
[2] = 1–[26..18]
[3] = 1–[35..27]
Notes to Ta b l e 2 – 7:
(1) Any combination of byte enables is possible.
(2) Byte enables can be used in the same manner with 8-bit words, i.e., in × 16 and
× 32 modes.
The M4K RAM blocks allow for different clocks on their inputs and
outputs. Either of the two clocks feeding the block can clock M4K RAM
block registers (renwe, address, byte enable, datain, and output
registers). Only the output register can be bypassed. The eight labclk
signals or local interconnects can drive the control signals for the A and B
ports of the M4K RAM block. LEs can also control the clock_a,
clock_b, renwe_a, renwe_b, clr_a, clr_b, clocken_a, and
clocken_b signals, as shown in Figure 2–17.
The R4, R8, C4, C8, and direct link interconnects from adjacent LABs
drive the M4K RAM block local interconnect. The M4K RAM blocks can
communicate with LABs on either the left or right side through these row
resources or with LAB columns on either the right or left with the column
resources. Up to 10 direct link input connections to the M4K RAM Block
are possible from the left adjacent LABs and another 10 possible from the
right adjacent LAB. M4K RAM block outputs can also connect to left and
right LABs through 10 direct link interconnects each. Figure 2–18 shows
the M4K RAM block to logic array interface.
Altera Corporation 2–33
July 2005Stratix Device Handbook, Volume 1
Page 58
Tri M a t r ix Memo r y
M-RAM Block
The largest TriMatrix memory block, the M-RAM block, is useful for
applications where a large volume of data must be stored on-chip. Each
block contains 589,824 RAM bits (including parity bits). The M-RAM
block can be configured in the following modes:
■True dual-port RAM
■Simple dual-port RAM
■Single-port RAM
■FIFO RAM
You cannot use an initialization file to initialize the contents of a M-RAM
block. All M-RAM block contents power up to an undefined value. Only
synchronous operation is supported in the M-RAM block, so all inputs
are registered. Output registers can be bypassed. The memory address
and output width can be configured as 64K × 8 (or 64K × 9 bits), 32K × 16
(or 32K × 18 bits), 16K × 32 (or 16K × 36 bits), 8K × 64 (or 8K × 72 bits), and
4K × 128 (or 4K × 144 bits). The 4K × 128 configuration is unavailable in
true dual-port mode because there are a total of 144 data output drivers
in the block. Mixed-width configurations are also possible, allowing
different read and write widths. Tables 2–8 and 2–9 summarize the
possible M-RAM block configurations:
The read and write operation of the memory is controlled by the WREN
signal, which sets the ports into either read or write modes. There is no
separate read enable (RE) signal.
Writing into RAM is controlled by both the WREN and byte enable
(byteena) signals for each port. The default value for the byteena
signal is high, in which case writing is controlled only by the WREN signal.
The byte enables are available for the ×18, ×36, and ×72 modes. In the
×144 simple dual-port mode, the two sets of byteena signals
(byteena_a and byteena_b) are combined to form the necessary
16 byte enables. Tables 2–10 and 2–11 summarize the byte selection.
Table 2–10. Byte Enable for M-RAM Blocks Notes (1), (2)
byteena[3..0]datain ×18datain ×36datain ×72
[0] = 1[8..0][8..0][8..0]
[1] = 1[17..9][17..9][17..9]
[2] = 1–[26..18][26..18]
[3] = 1–[35..27][35..27]
[4] = 1––[44..36]
[5] = 1––[53..45]
[6] = 1––[62..54]
[7] = 1––[71..63]
Altera Corporation 2–35
July 2005Stratix Device Handbook, Volume 1
(1) Any combination of byte enables is possible.
(2) Byte enables can be used in the same manner with 8-bit words, i.e., in × 16, ×32,
× 64, and ×128 modes.
Similar to all RAM blocks, M-RAM blocks can have different clocks on
their inputs and outputs. All input registers—renwe, datain, address,
and byte enable registers—are clocked together from either of the two
clocks feeding the block. The output register can be bypassed. The eight
labclk signals or local interconnect can drive the control signals for the
A and B ports of the M-RAM block. LEs can also control the clock_a,
clock_b, renwe_a, renwe_b, clr_a, clr_b, clocken_a, and
clocken_b signals as shown in Figure 2–19.
One of the M-RAM block’s horizontal sides drive the address and control
signal (clock, renwe, byteena, etc.) inputs. Typically, the horizontal side
closest to the device perimeter contains the interfaces. The one exception
is when two M-RAM blocks are paired next to each other. In this case, the
side of the M-RAM block opposite the common side of the two blocks
contains the input interface. The top and bottom sides of any M-RAM
block contain data input and output interfaces to the logic array. The top
side has 72 data inputs and 72 data outputs for port B, and the bottom side
has another 72 data inputs and 72 data outputs for port A. Figure 2–20
shows an example floorplan for the EP1S60 device and the location of the
M-RAM interfaces.
Altera Corporation 2–37
July 2005Stratix Device Handbook, Volume 1
Page 62
Tri M a t r ix Memo r y
Figure 2–20. EP1S60 Device with M-RAM Interface Locations Note (1)
M-RAM pairs interface to
top, bottom, and side opposite
of block-to-block border.
Independent M-RAM blocks
interface to top, bottom, and side facing
device perimeter for easy access
to horizontal I/O pins.
DSP
Blocks
M-RAM
Block
M-RAM
Block
M512
Blocks
M-RAM
Block
M-RAM
Block
M4K
Blocks
LABs
M-RAM
Block
M-RAM
Block
DSP
Blocks
Note to Figure 2–20:
(1) Device shown is an EP1S60 device. The number and position of M-RAM blocks varies in other devices.
The M-RAM block local interconnect is driven by the R4, R8, C4, C8, and
direct link interconnects from adjacent LABs. For independent M-RAM
blocks, up to 10 direct link address and control signal input connections
to the M-RAM block are possible from the left adjacent LABs for M-RAM
blocks facing to the left, and another 10 possible from the right adjacent
LABs for M-RAM blocks facing to the right. For column interfacing, every
M-RAM column unit connects to the right and left column lines, allowing
each M-RAM column unit to communicate directly with three columns of
LABs. Figures 2–21 through 2–23 show the interface between the M-RAM
block and the logic array.
Altera Corporation 2–39
July 2005Stratix Device Handbook, Volume 1
Page 64
Tri M a t r ix Memo r y
Figure 2–21. Left-Facing M-RAM to Interconnect Interface Notes (1), (2)
M512 RAM Block Columns
Row Unit Interface
Allows LAB Rows to
Drive Address and
Control Signals to
M-RAM Block
LABs in Row
M-RAM Boundary
R11
R10
R9
R8
R7
R6
R5
R4
R3
R2
R1
B1B2B3B4B5B6
Port B
M-RAM Block
Port A
A1A2A3A4A5A6
LABs in Column
M-RAM Boundary
Column Interface Block
Drives to and from
C4 and C8 Interconnects
Column Interface Block
Allows LAB Columns to
Drive datain and dataout to
and from M-RAM Block
LAB Interface
Blocks
Notes to Figure 2–21:
(1) Only R24 and C16 interconnects cross the M-RAM block boundaries.
(2) The right-facing M-RAM block has interface blocks on the right side, but none on the left. B1 to B6 and A1 to A6
orientation is clipped across the vertical axis for right-facing M-RAM blocks.
Table 2–12 shows the input and output data signal connections for the
column units (B1 to B6 and A1 to A6). It also shows the address and
control signal input connections to the row units (R1 to R11).
Table 2–12. M-RAM Row & Column Interface Unit Signals
Unit Interface Block Input SIgnalsOutput Signals
R1addressa[7..0]
R2addressa[15..8]
R3byte_enable_a[7..0]
renwe_a
R4-
R5-
R6clock_a
clocken_a
clock_b
clocken_b
R7-
R8-
R9byte_enable_b[7..0]
renwe_b
R10addressb[15..8]
R11addressb[7..0]
B1datain_b[71..60]dataout_b[71..60]
B2datain_b[59..48]dataout_b[59..48]
B3datain_b[47..36]dataout_b[47..36]
B4datain_b[35..24]dataout_b[35..24]
B5datain_b[23..12]dataout_b[23..12]
B6datain_b[11..0]dataout_b[11..0]
A1datain_a[71..60]dataout_a[71..60]
A2datain_a[59..48]dataout_a[59..48]
A3datain_a[47..36]dataout_a[47..36]
A4datain_a[35..24]dataout_a[35..24]
A5datain_a[23..12]dataout_a[23..12]
A6datain_a[11..0]dataout_a[11..0]
Altera Corporation 2–43
July 2005Stratix Device Handbook, Volume 1
Page 68
Tri M a t r ix Memo r y
Independent Clock Mode
The memory blocks implement independent clock mode for true dualport memory. In this mode, a separate clock is available for each port
(ports A and B). Clock A controls all registers on the port A side, while
clock B controls all registers on the port B side. Each port, A and B, also
supports independent clock enables and asynchronous clear signals for
port A and B registers. Figure 2–24 shows a TriMatrix memory block in
independent clock mode.
(1) All registers shown have asynchronous clear ports.
(2) Violating the setup or hold time on the address registers could corrupt the memory
contents. This applies to both read and write operations.
Altera Corporation 2–45
July 2005Stratix Device Handbook, Volume 1
Page 70
Tri M a t r ix Memo r y
Input/Output Clock Mode
Input/output clock mode can be implemented for both the true and
simple dual-port memory modes. On each of the two ports, A or B, one
clock controls all registers for inputs into the memory block: data input,
wren, and address. The other clock controls the block’s data output
registers. Each memory block port, A or B, also supports independent
clock enables and asynchronous clear signals for input and output
registers. Figures 2–25 and 2–26 show the memory block in input/output
clock mode.
(1) All registers shown except the rden register have asynchronous clear ports.
(2) Violating the setup or hold time on the address registers could corrupt the memory contents. This applies to both
The memory blocks implement read/write clock mode for simple dualport memory. You can use up to two clocks in this mode. The write clock
controls the block’s data inputs, wraddress, and wren. The read clock
controls the data output, rdaddress, and rden. The memory blocks
support independent clock enables for each clock and asynchronous clear
signals for the read- and write-side registers. Figure 2–27 shows a
memory block in read/write clock mode.
Altera Corporation 2–49
July 2005Stratix Device Handbook, Volume 1
(1) All registers shown except the rden register have asynchronous clear ports.
(2) Violating the setup or hold time on the address registers could corrupt the memory contents. This applies to both
The memory blocks also support single-port mode, used when
simultaneous reads and writes are not required. See Figure 2–28. A single
block in a memory block can support up to two single-port mode RAM
blocks in the M4K RAM blocks if each RAM block is less than or equal to
2K bits in size.
Figure 2–28. Single-Port Mode Note (1)
8 LAB Row
Clocks
Stratix Architecture
RAM/ROM
1,024 × 4
Data In
2,048× 2
4,096 × 1
Data Out
Address
Write Enable
256 × 16
512 ×8
D
ENA
To MultiTrac
Q
Interconnect
data[ ]
address[ ]
wren
outclken
inclken
inclock
outclock
8
D
Q
ENA
D
Q
ENA
D
ENA
Q
Write
Pulse
Generator
Note to Figure 2–28:
(1) Violating the setup or hold time on the address registers could corrupt the memory contents. This applies to both
read and write operations.
Altera Corporation 2–51
July 2005Stratix Device Handbook, Volume 1
Page 76
Digital Signal Processing Block
Digital Signal
Processing
Block
The most commonly used DSP functions are finite impulse response (FIR)
filters, complex FIR filters, infinite impulse response (IIR) filters, fast
Fourier transform (FFT) functions, direct cosine transform (DCT)
functions, and correlators. All of these blocks have the same fundamental
building block: the multiplier. Additionally, some applications need
specialized operations such as multiply-add and multiply-accumulate
operations. Stratix devices provide DSP blocks to meet the arithmetic
requirements of these functions.
Each Stratix device has two columns of DSP blocks to efficiently
implement DSP functions faster than LE-based implementations. Larger
Stratix devices have more DSP blocks per column (see Table 2–13). Each
DSP block can be configured to support up to:
■Eight 9 × 9-bit multipliers
■Four 18 × 18-bit multipliers
■One 36 × 36-bit multiplier
As indicated, the Stratix DSP block can support one 36 × 36-bit multiplier
in a single DSP block. This is true for any matched sign multiplications
(either unsigned by unsigned or signed by signed), but the capabilities for
dynamic and mixed sign multiplications are handled differently. The
following list provides the largest functions that can fit into a single DSP
block.
■36 × 36-bit unsigned by unsigned multiplication
■36 × 36-bit signed by signed multiplication
■35 × 36-bit unsigned by signed multiplication
■36 × 35-bit signed by unsigned multiplication
■36 × 35-bit signed by dynamic sign multiplication
■35 × 36-bit dynamic sign by signed multiplication
■35 × 36-bit unsigned by dynamic sign multiplication
■36 × 35-bit dynamic sign by unsigned multiplication
■35 × 35-bit dynamic sign multiplication when the sign controls for
each operand are different
■36 × 36-bit dynamic sign multiplication when the same sign control
is used for both operands
1This list only shows functions that can fit into a single DSP block.
Multiple DSP blocks can support larger multiplication
functions.
Figure 2–29 shows one of the columns with surrounding LAB rows.
Altera Corporation 2–53
July 2005Stratix Device Handbook, Volume 1
Page 78
Digital Signal Processing Block
Table 2–13 shows the number of DSP blocks in each Stratix device.
Table 2–13. DSP Blocks in Stratix Devices Notes (1), (2)
DeviceDSP Blocks
EP1S10648246
EP1S2010804010
EP1S2510804010
EP1S3012964812
EP1S40141125614
EP1S60181447218
EP1S80221768822
Notes to Ta b l e 2 – 13 :
(1) Each device has either the number of 9 × 9-, 18 × 18-, or 36 × 36-bit multipliers
shown. The total number of multipliers for each device is not the sum of all the
multipliers.
(2) The number of supported multiply functions shown is based on signed/signed
or unsigned/unsigned implementations.
Tot a l 9 × 9
Multipliers
Total 18 × 18
Multipliers
Total 36 × 36
Multipliers
DSP block multipliers can optionally feed an adder/subtractor or
accumulator within the block depending on the configuration. This
makes routing to LEs easier, saves LE routing resources, and increases
performance, because all connections and blocks are within the DSP
block. Additionally, the DSP block input registers can efficiently
implement shift registers for FIR filter applications.
Figure 2–30 shows the top-level diagram of the DSP block configured for
18 × 18-bit multiplier mode. Figure 2–31 shows the 9 × 9-bit multiplier
configuration of the DSP block.
The DSP block multiplier block consists of the input registers, a
multiplier, and pipeline register for pipelining multiply-accumulate and
multiply-add/subtract functions as shown in Figure 2–32.
Figure 2–32. Multiplier Sub-Block within Stratix DSP Block
sign_a (1)
sign_b (1)
aclr[3..0]
clock[3..0]
ena[3..0]
Stratix Architecture
shiftin B
Data A
Data B
shiftout B shiftout A
shiftin A
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
Result
to Adder
blocks
Optional
Multiply-Accumulate
and Multiply-Add
Pipeline
Note to Figure 2–32:
(1) These signals can be unregistered or registered once to match data path pipelines if required.
Altera Corporation 2–57
July 2005Stratix Device Handbook, Volume 1
Page 82
Digital Signal Processing Block
Input Registers
A bank of optional input registers is located at the input of each multiplier
and multiplicand inputs to the multiplier. When these registers are
configured for parallel data inputs, they are driven by regular routing
resources. You can use a clock signal, asynchronous clear signal, and a
clock enable signal to independently control each set of A and B inputs for
each multiplier in the DSP block. You select these control signals from a
set of four different clock[3..0], aclr[3..0], and ena[3..0]
signals that drive the entire DSP block.
You can also configure the input registers for a shift register application.
In this case, the input registers feed the multiplier and drive two
dedicated shift output lines: shiftoutA and shiftoutB. The shift
outputs of one multiplier block directly feed the adjacent multiplier block
in the same DSP block (or the next DSP block) as shown in Figure 2–33, to
form a shift register chain. This chain can terminate in any block, that is,
you can create any length of shift register chain up to 224 registers. You
can use the input shift registers for FIR filter applications. One set of shift
inputs can provide data for a filter, and the other are coefficients that are
optionally loaded in serial or parallel. When implementing 9 × 9- and
18 × 18-bit multipliers, you do not need to implement external shift
registers in LAB LEs. You implement all the filter circuitry within the DSP
block and its routing resources, saving LE and general routing resources
for general logic. External registers are needed for shift register inputs
when using 36 × 36-bit multipliers.
Figure 2–33. Multiplier Sub-Blocks Using Input Shift Register Connections
Note (1)
Data A
Data B
Data B
Data B
Data A
Data A
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
A[n] × B[n]
A[n Ð 1] × B[n Ð 1]
A[n Ð 2] × B[n Ð 2]
DQ
ENA
CLRN
CLRN
Note to Figure 2–33:
(1) Either Data A or Data B input can be set to a parallel input for constant coefficient
multiplication.
Altera Corporation 2–59
July 2005Stratix Device Handbook, Volume 1
Page 84
Digital Signal Processing Block
Table 2–14 shows the summary of input register modes for the DSP block.
Parallel input
Shift register input
Multiplier
The multiplier supports 9 × 9-, 18 × 18-, or 36 × 36-bit multiplication. Each
DSP block supports eight possible 9 × 9-bit or smaller multipliers. There
are four multiplier blocks available for multipliers larger than 9 × 9 bits
but smaller than 18 × 18 bits. There is one multiplier block available for
multipliers larger than 18 × 18 bits but smaller than or equal to 36 × 36
bits. The ability to have several small multipliers is useful in applications
such as video processing. Large multipliers greater than 18 × 18 bits are
useful for applications such as the mantissa multiplication of a singleprecision floating-point number.
The multiplier operands can be signed or unsigned numbers, where the
result is signed if either input is signed as shown in Table 2–15. The
sign_a and sign_b signals provide dynamic control of each operand’s
representation: a logic 1 indicates the operand is a signed number, a logic
0 indicates the operand is an unsigned number. These sign signals affect
all multipliers and adders within a single DSP block and you can register
them to match the data path pipeline. The multipliers are full precision
(that is, 18 bits for the 18-bit multiply, 36-bits for the 36-bit multiply, and
so on) regardless of whether sign_a or sign_b set the operands as
signed or unsigned numbers.
The output of 9 × 9- or 18 × 18-bit multipliers can optionally feed a register
to pipeline multiply-accumulate and multiply-add/subtract functions.
For 36 × 36-bit multipliers, this register will pipeline the multiplier
function.
Adder/Output Blocks
The result of the multiplier sub-blocks are sent to the adder/output block
which consist of an adder/subtractor/accumulator unit, summation unit,
output select multiplexer, and output registers. The results are used to
configure the adder/output block as a pure output, accumulator, a simple
two-multiplier adder, four-multiplier adder, or final stage of the 36-bit
multiplier. You can configure the adder/output block to use output
registers in any mode, and must use output registers for the accumulator.
The system cannot use adder/output blocks independently of the
multiplier. Figure 2–34 shows the adder and output stages.
Altera Corporation 2–61
July 2005Stratix Device Handbook, Volume 1
Page 86
Digital Signal Processing Block
n
Figure 2–34. Adder/Output Blocks Note (1)
accum_sload0 (2)
Result A
Accumulator Feedback
addnsub1 (2)
Result B
signa (2)
signb (2)
Result C
addnsub3 (2)
Result D
Adder/
Subtractor/
Accumulator1
Adder/
Subtractor/
Accumulator2
overflow0
Output Selectio
Multiplexer
Summation
Output
Register Block
overflow1
accum_sload1 (2)
Accumulator Feedback
Notes to Figure 2–34:
(1) Adder/output block shown in Figure 2–34 i s in 1 8 × 18-bit mode. In 9 × 9-bit mode, there are four adder/subtractor
blocks and two summation blocks.
(2) These signals are either not registered, registered once, or registered twice to match the data path pipeline.
The adder/subtractor/accumulator is the first level of the adder/output
block and can be used as an accumulator or as an adder/subtractor.
Adder/Subtractor
Each adder/subtractor/accumulator block can perform addition or
subtraction using the addnsub independent control signal for each firstlevel adder in 18 × 18-bit mode. There are two addnsub[1..0] signals
available in a DSP block for any configuration. For 9 × 9-bit mode, one
addnsub[1..0] signal controls the top two one-level adders and
another addnsub[1..0] signal controls the bottom two one-level
adders. A high addnsub signal indicates addition, and a low signal
indicates subtraction. The addnsub control signal can be unregistered or
registered once or twice when feeding the adder blocks to match data
path pipelines.
The signa and signb signals serve the same function as the multiplier
block signa and signb signals. The only difference is that these signals
can be registered up to two times. These signals are tied to the same
signa and signb signals from the multiplier and must be connected to
the same clocks and control signals.
Accumulator
When configured for accumulation, the adder/output block output feeds
back to the accumulator as shown in Figure 2–34. The
accum_sload[1..0] signal synchronously loads the multiplier result
to the accumulator output. This signal can be unregistered or registered
once or twice. Additionally, the overflow signal indicates the
accumulator has overflowed or underflowed in accumulation mode. This
signal is always registered and must be externally latched in LEs if the
design requires a latched overflow signal.
Summation
The output of the adder/subtractor/accumulator block feeds to an
optional summation block. This block sums the outputs of the DSP block
multipliers. In 9 × 9-bit mode, there are two summation blocks providing
the sums of two sets of four 9 × 9-bit multipliers. In 18 × 18-bit mode, there
is one summation providing the sum of one set of four 18 × 18-bit
multipliers.
Altera Corporation 2–63
July 2005Stratix Device Handbook, Volume 1
Page 88
Digital Signal Processing Block
Output Selection Multiplexer
The outputs from the various elements of the adder/output block are
routed through an output selection multiplexer. Based on the DSP block
operational mode and user settings, the multiplexer selects whether the
output from the multiplier, the adder/subtractor/accumulator, or
summation block feeds to the output.
Output Registers
Optional output registers for the DSP block outputs are controlled by four
sets of control signals: clock[3..0], aclr[3..0], and ena[3..0].
Output registers can be used in any mode.
Modes of Operation
The adder, subtractor, and accumulate functions of a DSP block have four
modes of operation:
■Simple multiplier
■Multiply-accumulator
■Two-multipliers adder
■Four-multipliers adder
1Each DSP block can only support one mode. Mixed modes in the
same DSP block is not supported.
Simple Multiplier Mode
In simple multiplier mode, the DSP block drives the multiplier sub-block
result directly to the output with or without an output register. Up to four
18 × 18-bit multipliers or eight 9 × 9-bit multipliers can drive their results
directly out of one DSP block. See Figure 2–35.
(1) These signals are not registered or registered once to match the data path pipeline.
DSP blocks can also implement one 36 × 36-bit multiplier in multiplier
mode. DSP blocks use four 18 × 18-bit multipliers combined with
dedicated adder and internal shift circuitry to achieve 36-bit
multiplication. The input shift register feature is not available for the
36 × 36-bit multiplier. In 36 × 36-bit mode, the device can use the register
that is normally a multiplier-result-output register as a pipeline stage for
the 36 × 36-bit multiplier. Figure 2–36 shows the 36 × 36-bit multiply
mode.
Data Out
CLRN
Altera Corporation 2–65
July 2005Stratix Device Handbook, Volume 1
Page 90
Digital Signal Processing Block
t
Figure 2–36. 36 × 36 Multiply Mode
signa (1)
signb (1)
aclr
clock
ena
A[17..0]
B[17..0]
A[35..18]
B[35..18]
A[35..18]
B[17..0]
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
signa (2)
signb (2)
36 × 36
Multiplier
Adder
DQ
ENA
CLRN
Data Ou
A[17..0]
B[35..18]
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
Notes to Figure 2–36:
(1) These signals are not registered or registered once to match the pipeline.
(2) These signals are not registered, registered once, or registered twice for latency to match the pipeline.
In multiply-accumulator mode (see Figure 2–37), the DSP block drives
multiplied results to the adder/subtractor/accumulator block configured
as an accumulator. You can implement one or two multiply-accumulators
up to 18 × 18 bits in one DSP block. The first and third multiplier subblocks are unused in this mode, because only one multiplier can feed one
of two accumulators. The multiply-accumulator output can be up to 52
bits—a maximum of a 36-bit result with 16 bits of accumulation. The
accum_sload and overflow signals are only available in this mode.
The addnsub signal can set the accumulator for decimation and the
overflow signal indicates underflow condition.
Figure 2–37. Multiply-Accumulate Mode
signa (1)
signb (1)
aclr
clock
ena
Shiftin A
Shiftin B
Stratix Architecture
Data A
Data B
Shiftout B Shiftout A
DQ
ENA
CLRN
DQ
ENA
CLRN
DQ
ENA
CLRN
addnsub (2)
signa (2)
signb (2)
accum_sload (2)
Accumulator
DQ
ENA
CLRN
Data Out
overflow
Notes to Figure 2–37:
(1) These signals are not registered or registered once to match the data path pipeline.
(2) These signals are not registered, registered once, or registered twice for latency to match the data path pipeline.
Two-Multipliers Adder Mode
The two-multipliers adder mode uses the adder/subtractor/accumulator
block to add or subtract the outputs of the multiplier block, which is
useful for applications such as FFT functions and complex FIR filters. A
Altera Corporation 2–67
July 2005Stratix Device Handbook, Volume 1
Page 92
Digital Signal Processing Block
single DSP block can implement two sums or differences from two
18 × 18-bit multipliers each or four sums or differences from two 9 × 9-bit
multipliers each.
You can use the two-multipliers adder mode for complex multiplications,
which are written as:
The two-multipliers adder mode allows a single DSP block to calculate
the real part [(a × c) – (b × d)] using one subtractor and the imaginary part
[(a × d) + (b × c)] using one adder, for data widths up to 18 bits. Two
complex multiplications are possible for data widths up to 9 bits using
four adder/subtractor/accumulator blocks. Figure 2–38 shows an 18-bit
two-multipliers adder.
In the four-multipliers adder mode, the DSP block adds the results of two
first -stage adder/subtractor blocks. One sum of four 18 × 18-bit
multipliers or two different sums of two sets of four 9 × 9-bit multipliers
can be implemented in a single DSP block. The product width for each
multiplier must be the same size. The four-multipliers adder mode is
useful for FIR filter applications. Figure 2–39 shows the four multipliers
adder mode.
(1) These signals are not registered or registered once to match the data path pipeline.
(2) These signals are not registered, registered once, or registered twice for latency to match the data path pipeline.
Altera Corporation 2–69
July 2005Stratix Device Handbook, Volume 1
Page 94
Digital Signal Processing Block
For FIR filters, the DSP block combines the four-multipliers adder mode
with the shift register inputs. One set of shift inputs contains the filter
data, while the other holds the coefficients loaded in serial or parallel. The
input shift register eliminates the need for shift registers external to the
DSP block (i.e., implemented in LEs). This architecture simplifies filter
design since the DSP block implements all of the filter circuitry.
One DSP block can implement an entire 18-bit FIR filter with up to four
taps. For FIR filters larger than four taps, DSP blocks can be cascaded with
additional adder stages implemented in LEs.
Table 2–16 shows the different number of multipliers possible in each
DSP block mode according to size. These modes allow the DSP blocks to
implement numerous applications for DSP including FFTs, complex FIR,
FIR, and 2D FIR filters, equalizers, IIR, correlators, matrix multiplication
and many other functions.
Table 2–16. Multiplier Size & Configurations per DSP block
DSP Block Mode9 × 918 × 1836 × 36 (1)
MultiplierEight multipliers with
eight product outputs
Multiply-accumulatorTwo multiply and
accumulate (52 bits)
Two-multipliers adderFour sums of two
multiplier products each
Four-multipliers adderTwo sums of four
multiplier products each
Note to Table 2–16:
(1) The number of supported multiply functions shown is based on signed/signed or unsigned/unsigned
implementations.
Four multipliers with four
product outputs
Two multiply and
accumulate (52 bits)
Two sums of two
multiplier products each
One sum of four multiplier
products each
One multiplier with one
product output
–
–
–
DSP Block Interface
Stratix device DSP block outputs can cascade down within the same DSP
block column. Dedicated connections between DSP blocks provide fast
connections between the shift register inputs to cascade the shift register
chains. You can cascade DSP blocks for 9 × 9- or 18 × 18-bit FIR filters
larger than four taps, with additional adder stages implemented in LEs.
If the DSP block is configured as 36 × 36 bits, the adder, subtractor, or
accumulator stages are implemented in LEs. Each DSP block can route the
shift register chain out of the block to cascade two full columns of DSP
blocks.
The DSP block is divided into eight block units that interface with eight
LAB rows on the left and right. Each block unit can be considered half of
an 18 × 18-bit multiplier sub-block with 18 inputs and 18 outputs. A local
interconnect region is associated with each DSP block. Like an LAB, this
interconnect region can be fed with 10 direct link interconnects from the
LAB to the left or right of the DSP block in the same row. All row and
column routing resources can access the DSP block’s local interconnect
region. The outputs also work similarly to LAB outputs as well. Nine
outputs from the DSP block can drive to the left LAB through direct link
interconnects and nine can drive to the right LAB though direct link
interconnects. All 18 outputs can drive to all types of row and column
routing. Outputs can drive right- or left-column routing. Figures 2–40
and 2–41 show the DSP block interfaces to LAB rows.
Figure 2–40. DSP Block Interconnect Interface
DSP Block
MultiTrack
Interconnect
OA[17..0]
A1[17..0]
OB[17..0]
B1[17..0]
OC[17..0]
A2[17..0]
OD[17..0]
B2[17..0]
OE[17..0]
A3[17..0]
OF[17..0]
B3[17..0]
OG[17..0]
A4[17..0]
OH[17..0]
MultiTrack
Interconnec
B4[17..0]
Altera Corporation 2–71
July 2005Stratix Device Handbook, Volume 1
Page 96
Digital Signal Processing Block
Figure 2–41. DSP Block Interface to Interconnect
C4 andC8
Interconnects
LABLAB
Direct Link Interconnect
from Adjacent LAB
10
R4 and R8 Interconnects
9
10
DSP Block
Row Structure
Nine Direct Link Outputs
to Adjacent LABs
18
9
Direct Link Interconnect
from Adjacent LAB
Row Interface
DSPBlock to
LABRow InterfaceBlock InterconnectRegion
A bus of 18 control signals feeds the entire DSP block. These signals
include clock[0..3] clocks, aclr[0..3] asynchronous clears,
ena[1..4] clock enables, signa, signb signed/unsigned control
signals, addnsub1 and addnsub3 addition and subtraction control
signals, and accum_sload[0..1] accumulator synchronous loads. The
clock signals are routed from LAB row clocks and are generated from
specific LAB rows at the DSP block interface. The LAB row source for
control signals, data inputs, and outputs is shown in Table 2–17.
Table 2–17. DSP Block Signal Sources & Destinations
PLLs & Clock
Networks
LAB Row at
Interface
1
2
3
4
5
6
7
8
Control Signals
Generated
signaA1[17..0]OA[17..0]
aclr0
accum_sload0
addnsub1
clock0
ena0
aclr1
clock1
ena1
aclr2
clock2
ena2
sign_b
clock3
ena3
clear3
accum_sload1
addnsub3B4[17..0]OH[17..0]
Data InputsData Outputs
B1[17..0]OB[17..0]
A2[17..0]OC[17..0]
B2[17..0]OD[17..0]
A3[17..0]OE[17..0]
B3[17..0]OF[17..0]
A4[17..0]OG[17..0]
Stratix devices provide a hierarchical clock structure and multiple PLLs
with advanced features. The large number of clocking resources in
combination with the clock synthesis precision provided by enhanced
and fast PLLs provides a complete clock management solution.
Global & Hierarchical Clocking
Stratix devices provide 16 dedicated global clock networks, 16 regional
clock networks (four per device quadrant), and 8 dedicated fast regional
clock networks (for EP1S10, EP1S20, and EP1S25 devices), and
16 dedicated fast regional clock networks (for EP1S30 EP1S40, and
EP1S60, and EP1S80 devices). These clocks are organized into a
hierarchical clock structure that allows for up to 22 clocks per device
region with low skew and delay. This hierarchical clocking scheme
provides up to 48 unique clock domains within Stratix devices.
Altera Corporation 2–73
July 2005Stratix Device Handbook, Volume 1
Page 98
PLLs & Clock Networks
There are 16 dedicated clock pins (CLK[15..0]) to drive either the global
or regional clock networks. Four clock pins drive each side of the device,
as shown in Figure 2–42. Enhanced and fast PLL outputs can also drive
the global and regional clock networks.
Global Clock Network
These clocks drive throughout the entire device, feeding all device
quadrants. The global clock networks can be used as clock sources for all
resources within the device—IOEs, LEs, DSP blocks, and all memory
blocks. These resources can also be used for control signals, such as clock
enables and synchronous or asynchronous clears fed from the external
pin. The global clock networks can also be driven by internal logic for
internally generated global clocks and asynchronous clears, clock
enables, or other control signals with large fanout. Figure 2–42 shows the
16 dedicated CLK pins driving global clock networks.
(1) The corner fast PLLs can also be driven through the global or regional clock
networks. The global or regional clock input to the fast PLL can be driven by an
output from another PLL, a pin-driven global or regional clock, or internallygenerated global signals.
Global Clock [15..0]
CLK[11..8]
Regional Clock Network
There are four regional clock networks within each quadrant of the Stratix
device that are driven by the same dedicated CLK[15..0] input pins or
from PLL outputs. From a top view of the silicon, RCLK[0..3] are in the
top left quadrant, RCLK[8..11] are in the top-right quadrant,
RCLK[4..7] are in the bottom-left quadrant, and RCLK[12..15] are in
the bottom-right quadrant. The regional clock networks only pertain to
the quadrant they drive into. The regional clock networks provide the
lowest clock delay and skew for logic contained within a single quadrant.
RCLK cannot be driven by internal logic. The CLK clock pins
symmetrically drive the RCLK networks within a particular quadrant, as
shown in Figure 2–43. See Figures 2–50 and 2–51 for RCLK connections
from PLLs and CLK pins.
Altera Corporation 2–75
July 2005Stratix Device Handbook, Volume 1
Page 100
PLLs & Clock Networks
Figure 2–43. Regional Clocks
RCLK[2..3]RCLK[11..10]
CLK[15..12]
RCLK[1..0]
CLK[3..0]
RCLK[4..5]
CLK[7..4]
Regional ClocksOnly DriveaDevice
QuadrantfromSpecifiedCLK Pins or
PLLs within that Quadrant
RCLK[6..7]RCLK[12..13]
RCLK[9..8]
CLK[11..8]
RCLK[14..15]
Fast Regional Clock Network
In EP1S25, EP1S20, and EP1S10 devices, there are two fast regional clock
networks, FCLK[1..0], within each quadrant, fed by input pins that can
connect to fast regional clock networks (see Figure 2–44). In EP1S30 and
larger devices, there are two fast regional clock networks within each
half-quadrant (see Figure 2–45). Dual-purpose FCLK pins drive the fast
clock networks. All devices have eight FCLK pins to drive fast regional
clock networks. Any I/O pin can drive a clock or control signal onto any
fast regional clock network with the addition of a delay. This signal is
driven via the I/O interconnect. The fast regional clock networks can also
be driven from internal logic elements.