The chapters in this book, Arria GX Device Handbook, Volume 1, were revised on the
following dates. Where chapters or groups of chapters are available separately, part
numbers are listed.
This section provides designers with the data sheet specifications for Arria® GX
devices. They contain feature definitions of the transceivers, internal architecture,
configuration, and JTAG boundary-scan testing information, DC operating
conditions, AC timing parameters, a reference to power consumption, and ordering
information for Arria GX devices.
This section includes the following chapters:
■ Chapter 1, Arria GX Device Family Overview
■ Chapter 2, Arria GX Architecture
■ Chapter 3, Configuration and Testing
■ Chapter 4, DC and Switching Characteristics
■ Chapter 5, Reference and Ordering Information
Revision History
Refer to each chapter for its own specific revision history. For information about when
each chapter was updated, refer to the Chapter Revision Dates section, which appears
in the full handbook.
The Arria®GX family of devices combines 3.125 Gbps serial transceivers with reliable
packaging technology and a proven logic array. Arria GX devices include 4 to 12
high-speed transceiver channels, each incorporating clock data recovery (CDR)
technology and embedded SERDES circuitry designed to support PCI-Express,
Gigabit Ethernet, SDI, SerialLite II, XAUI, and Serial RapidIO protocols, along with
the ability to develop proprietary, serial-based IP using its Basic mode. The
transceivers build upon the success of the Stratix®II GX family. The Arria GX FPGA
technology offers a 1.2-V logic array with the right level of performance and
dependability needed to support these mainstream protocols.
The key features of Arria GX devices include:
■ Transceiver block features
■High-speed serial transceiver channels with CDR support up to 3.125 Gbps.
■Devices available with 4, 8, or 12 high-speed full-duplex serial transceiver
channels
■Support for the following CDR-based bus standards—PCI Express, Gigabit
Ethernet, SDI, SerialLite II, XAUI, and Serial RapidIO, along with the ability to
develop proprietary, serial-based IP using its Basic mode
■Individual transmitter and receiver channel power-down capability for
reduced power consumption during non-operation
■1.2- and 1.5-V pseudo current mode logic (PCML) support on transmitter
output buffers
■Receiver indicator for loss of signal (available only in PCI Express [PIPE]
mode)
■Hot socketing feature for hot plug-in or hot swap and power sequencing
support without the use of external devices
■Dedicated circuitry that is compliant with PIPE, XAUI, Gigabit Ethernet, Serial
Digital Interface (SDI), and Serial RapidIO
■8B/10B encoder/decoder performs 8-bit to 10-bit encoding and 10-bit to 8-bit
decoding
■Phase compensation FIFO buffer performs clock domain translation between
Arria GX devices are available in space-saving FBGA packages (refer to Table 1–2). All
Arria GX devices support vertical migration within the same package. With vertical
migration support, designers can migrate to devices whose dedicated pins,
configuration pins, and power pins are the same for a given package across device
densities. For I/O pin migration across densities, the designer must cross-reference
the available I/O pins with the device pin-outs for all planned densities of a given
package type to identify which I/O pins are migratable.
Tab le 1 –2. Arria GX Package Options (Pin Counts and Transceiver Channels) (Part 1 of 2)
Source-Synchronous Chan nelsMaximum User I/O Pin Count
Arria® GX devices incorporate up to 12 high-speed serial transceiver channels that
build on the success of the Stratix®II GX device family. Arria GX transceivers are
structured into full-duplex (transmitter and receiver) four-channel groups called
transceiver blocks located on the right side of the device. You can configure the
transceiver blocks to support the following serial connectivity protocols
(functional modes):
■ PCI Express (PIPE)
■ Gigabit Ethernet (GIGE)
■ XAUI
■ Basic (600 Mbps to 3.125 Gbps)
■ SDI (HD, 3G)
■ Serial RapidIO (1.25 Gbps, 2.5 Gbps, 3.125 Gbps)
Transceivers within each block are independent and have their own set of dividers.
Therefore, each transceiver can operate at different frequencies. Each block can select
from two reference clocks to provide two clock domains that each transceiver can
select from.
Table 2–1 lists the number of transceiver channels for each member of the Arria GX
You can configure the transceiver channels to the desired functional modes using the
ALT2GXB MegaCore instance in the Quartus® II MegaWizard™ Plug-in Manager for
the Arria GX device family. Depending on the selected functional mode, the
Quartus II software automatically configures the transceiver channels to employ a
subset of the sub-blocks listed above.
This section describes the data path through the Arria GX transmitter. The sub-blocks
are described in order from the PLD-transmitter parallel interface to the serial
transmitter buffer.
Clock Multiplier Unit
Each transceiver block has a clock multiplier unit (CMU) that takes in a reference
clock and synthesizes two clocks: a high-speed serial clock to serialize the data and a
low-speed parallel clock to clock the transmitter digital logic (PCS).
The CMU is further divided into three sub-blocks:
■ One transmitter PLL
■ One central clock divider block
■ Four local clock divider blocks (one per channel)
Reverse serial pre-CDR loopback mode uses the analog portion of the transceiver. An
external source (pattern generator or transceiver) generates the source data. The
high-speed serial source data arrives at the high-speed differential receiver input
buffer, loops back before the CRU unit, and is transmitted though the high-speed
differential transmitter output buffer. It is for test or verification use only to verify the
signal being received after the gain and equalization improvements of the input
buffer. The signal at the output is not exactly what is received because the signal goes
through the output buffer and the VOD is changed to the VOD setting level.
Pre-emphasis settings have no effect.
Figure 2–20 shows the Arria GX block in reverse serial pre-CDR loopback mode.
Figur e 2–20. Arria GX Block in Reverse Serial Pre-CDR Loopback Mode
Transmitter Digital Logic
BIST
Incremental
Generator
FPGA
Logic
Array
TX Phase
Compensation
BIST
Incremental
Verify
RX Phase
Compen-
sation
FIFO
FIFO
Byte
Serializer
20
Byte
De-
serializer
8B/10B
Encoder
8B/10B
Decoder
Receiver Digital Logic
PCI Express (PIPE) Reverse Parallel Loopback
Figure 2–21 shows the data path for PCI Express (PIPE) reverse parallel loopback. The
reverse parallel loopback configuration is compliant with the PCI Express (PIPE)
specification and is available only on PCI Express (PIPE) mode.
Figur e 2–21. PCI Express (PIPE) Reverse Parallel Loopback
You can dynamically put the PCI Express (PIPE) mode transceiver in reverse parallel
loopback by controlling the tx_detectrxloopback port instantiated in the
MegaWizard Plug-In Manager. A high on the tx_detectrxloopback port in P0
power state puts the transceiver in reverse parallel loopback. A high on the
tx_detectrxloopback port in any other power state does not put the transceiver
in reverse parallel loopback.
As seen in Figure 2–21, the serial data received on the rx_datain port in reverse
parallel loopback goes through the CRU, deserializer, word aligner, and the rate
matcher blocks. The parallel data at the output of the receiver rate matcher block is
looped back to the input of the transmitter serializer block. The serializer converts the
parallel data to serial data and feeds it to the transmitter output buffer that drives the
data out on the tx_dataout port. The data at the output of the rate matcher also
goes through the 8B/10B decoder, byte deserializer, and receiver phase compensation
FIFO before being fed to the PLD on the rx_dataout port.
Reset and Powerdown
Arria GX transceivers offer a power saving advantage with their ability to shut off
functions that are not needed.
The following three reset signals are available per transceiver channel and can be used
to individually reset the digital and analog portions within each channel:
■ tx_digitalreset
■ rx_analogreset
■ rx_digitalreset
The following two powerdown signals are available per transceiver block and can be
used to shut down an entire transceiver block that is not being used:
Arria GX devices use the calibration block to calibrate OCT for the PLLs, and their
associated output buffers, and the terminating resistors on the transceivers. The
calibration block counters the effects of process, voltage, and temperature (PVT). The
calibration block references a derived voltage across an external reference resistor to
calibrate the OCT resistors on Arria GX devices. You can power down the calibration
block. However, powering down the calibration block during operations can yield
transmit and receive data errors.
Transceiver Clocking
This section describes the clock distribution in an Arria GX transceiver channel and
the PLD clock resource utilization by the transceiver blocks.
Transceiver Channel Clock Distribution
Each transceiver block has one transmitter PLL and four receiver PLLs.
The transmitter PLL multiplies the input reference clock to generate a high-speed
serial clock at a frequency that is half the data rate of the configured functional mode.
This high-speed serial clock (or its divide-by-two version if the functional mode uses
byte serializer) is fed to the CMU clock divider block. Depending on the configured
functional mode, the CMU clock divider block divides the high-speed serial clock to
generate the low-speed parallel clock that clocks the transceiver PCS logic in the
associated channel. The low-speed parallel clock is also forwarded to the PLD logic
array on the tx_clkout or coreclkout ports.
The receiver PLL in each channel is also fed by an input reference clock. The receiver
PLL along with the clock recovery unit generates a high-speed serial recovered clock
and a low-speed parallel recovered clock. The low-speed parallel recovered clock
feeds the receiver PCS logic until the rate matcher. The CMU low-speed parallel clock
clocks the rest of the logic from the rate matcher until the receiver phase
compensation FIFO. In modes that do not use a rate matcher, the receiver PCS logic is
clocked by the recovered clock until the receiver phase compensation FIFO.
The input reference clock to the transmitter and receiver PLLs can be derived from:
■ One of two available dedicated reference clock input pins (REFCLK0 or REFCLK1)
of the associated transceiver block
■ PLD clock network (must be driven directly from an input clock pin and cannot be
driven by user logic or enhanced PLL)
■ Inter-transceiver block lines driven by reference clock input pins of other
transceiver blocks
Figure 2–22 shows the input reference clock sources for the transmitter and receiver
fFor more information about transceiver clocking in all supported functional modes,
refer to the Arria GX Transceiver Architecture chapter.
PLD Clock Utilization by Transceiver Blocks
Arria GX devices have up to 16 global clock (GCLK) lines and 16 regional clock
(RCLK) lines that are used to route the transceiver clocks. The following transceiver
clocks use the available global and regional clock resources:
■ pll_inclk (if driven from an FPGA input pin)
■ rx_cruclk (if driven from an FPGA input pin)
■ tx_clkout/coreclkout (CMU low-speed parallel clock forwarded to the PLD)
■ Recovered clock from each channel (rx_clkout) in non-rate matcher mode
■ Calibration clock (cal_blk_clk)
■ Fixed clock (fixedclk used for receiver detect circuitry in PCI Express [PIPE]
mode only)
Figure 2–23 and Figure 2–24 show the available GCLK and RCLK resources in Arria
GX devices.
Figur e 2–23. Global Clock Resources in Arria GX Devices
Figur e 2–24. Regi onal Clock Resources in Arria GX Devices
CLK[15..12]
11 5
7
RCLK
[3..0]
RCLK
[31..28]
RCLK
[27..24]
RCLK
[23..20]
Arria GX
Transceiver
Block
CLK[3..0]
1
2
RCLK
[7..4]
RCLK
8
[11..8]
CLK[7..4]
RCLK
[15..12]
12 6
RCLK
[19..16]
Arria GX
Transceiver
Block
For the RCLK or GCLK network to route into the transceiver, a local route input
output (LRIO) channel is required. Each LRIO clock region has up to eight clock paths
and each transceiver block has a maximum of eight clock paths for connecting with
LRIO clocks. These resources are limited and determine the number of clocks that can
be used between the PLD and transceiver blocks. Table 2–7 and Table 2–8 list the
number of LRIO resources available for Arria GX devices with different numbers of
transceiver blocks.
Tab le 2 –7. Available Clocking Connections for Transceivers in EP1AGX35D, EP1AGX50D, and EP1AGX60D
Each logic array block (LAB) consists of eight adaptive logic modules (ALMs), carry
chains, shared arithmetic chains, LAB control signals, local interconnects, and register
chain connection lines. The local interconnect transfers signals between ALMs in the
same LAB. Register chain connections transfer the output of an ALM register to the
adjacent ALM register in a LAB. The Quartus II Compiler places associated logic in a
LAB or adjacent LABs, allowing the use of local, shared arithmetic chain, and register
chain connections for performance and area efficiency. Table 2–9 lists Arria GX device
resources. Figure 2–25 shows the Arria GX LAB structure.
The LAB local interconnect can drive all eight ALMs in the same LAB. It is driven by
column and row interconnects and ALM outputs in the same LAB. Neighboring
LABs, M512 RAM blocks, M4K RAM blocks, M-RAM blocks, or digital signal
processing (DSP) blocks from the left and right can also drive the local interconnect of
a LAB through the direct link connection. The direct link connection feature
minimizes the use of row and column interconnects, providing higher performance
and flexibility. Each ALM can drive 24 ALMs through fast local and direct link
interconnects.
Each LAB contains dedicated logic for driving control signals to its ALMs. The control
signals include three clocks, three clock enables, two asynchronous clears,
synchronous clear, asynchronous preset or load, and synchronous load control
signals, providing a maximum of 11 control signals at a time. Although synchronous
load and clear signals are generally used when implementing counters, they can also
be used with other functions.
TM
memory
Direct link
interconnect
to left
Direct link interconnect from
right LAB, TriMatrix memory
block, DSP block, or IOE output
ALMs
Direct link
interconnect
to right
Local
Interconnect
LAB
Each LAB can use three clocks and three clock enable signals. However, there can only
be up to two unique clocks per LAB, as shown in the LAB control signal generation
circuit in Figure 2–27. Each LAB’s clock and clock enable signals are linked. For
example, any ALM in a particular LAB using the labclk1 signal also uses
labclkena1. If the LAB uses both the rising and falling edges of a clock, it also uses
two LAB-wide clock signals. De-asserting the clock enable signal turns off the
corresponding LAB-wide clock. Each LAB can use two asynchronous clear signals
and an asynchronous load/preset signal. The asynchronous load acts as a preset
when the asynchronous load data input is tied high. When the asynchronous
load/preset signal is used, the labclkena0 signal is no longer available.
The LAB row clocks [5..0] and LAB local interconnect generate the LAB-wide
control signals. The MultiTrack interconnects have inherently low skew. This low
skew allows the MultiTrack interconnects to distribute clock and control signals in
addition to d ata.
Figure 2–27 shows the LAB control signal generation circuit.
Figur e 2–27. LAB-Wide Control Signals
There are two unique
clock signals per LAB.
Dedicated Row LAB Clocks
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
6
6
6
Local Interconnect
Adaptive Logic Modules
The basic building block of logic in the Arria GX architecture is the ALM. The ALM
provides advanced features with efficient logic utilization. Each ALM contains a
variety of look-up table (LUT)-based resources that can be divided between two
adaptive LUTs (ALUTs). With up to eight inputs to the two ALUTs, one ALM can
implement various combinations of two functions. This adaptability allows the ALM
to be completely backward-compatible with four-input LUT architectures. One ALM
can also implement any function of up to six inputs and certain seven-input functions.
In addition to the adaptive LUT-based resources, each ALM contains two
programmable registers, two dedicated full adders, a carry chain, a shared arithmetic
chain, and a register chain. Through these dedicated resources, the ALM can
efficiently implement various arithmetic functions and shift registers. Each ALM
drives all types of interconnects: local, row, column, carry chain, shared arithmetic
chain, register chain, and direct link interconnects. Figure 2–28 shows a high-level
block diagram of the Arria GX ALM while Figure 2–29 shows a detailed view of all
the connections in the ALM.
One ALM contains two programmable registers. Each register has data, clock, clock
enable, synchronous and asynchronous clear, asynchronous load data, and
synchronous and asynchronous load/preset inputs.
Global signals, general-purpose I/O pins, or any internal logic can drive the register's
clock and clear control signals. Either general-purpose I/O pins or internal logic can
drive the clock enable, preset, asynchronous load, and asynchronous load data. The
asynchronous load data input comes from the datae or dataf input of the ALM,
which are the same inputs that can be used for register packing. For combinational
functions, the register is bypassed and the output of the LUT drives directly to the
outputs of the ALM.
Each ALM has two sets of outputs that drive the local, row, and column routing
resources. The LUT, adder, or register output can drive these output drivers
independently (refer to Figure 2–29). For each set of output drivers, two ALM outputs
can drive column, row, or direct link routing connections. One of these ALM outputs
can also drive local interconnect resources. This allows the LUT or adder to drive one
output while the register drives another output. This feature, called register packing,
improves device utilization because the device can use the register and combinational
logic for unrelated functions. Another special packing mode allows the register
output to feed back into the LUT of the same ALM so that the register is packed with
its own fan-out LUT. This feature provides another mechanism for improved fitting.
The ALM can also drive out registered and unregistered versions of the LUT or adder
output.
ALM Operating Modes
The Arria GX ALM can operate in one of the following modes:
■ Normal mode
■ Extended LUT mode
■ Arithmetic mode
■ Shared arithmetic mode
Each mode uses ALM resources differently. Each mode has 11 available inputs to the
ALM (refer to Figure 2–28)the eight data inputs from the LAB local interconnect;
carry-in from the previous ALM or LAB; the shared arithmetic chain connection from
the previous ALM or LAB; and the register chain connectionare directed to different
destinations to implement the desired logic function. LAB-wide signals provide clock,
asynchronous clear, asynchronous preset/load, synchronous clear, synchronous load,
and clock enable control for the register. These LAB-wide signals are available in all
ALM modes. For more information about LAB-wide control signals, refer to “LAB
Control Signals” on page 2–30.
The Quartus II software and supported third-party synthesis tools, in conjunction
with parameterized functions such as library of parameterized modules (LPM)
functions, automatically choose the appropriate mode for common functions such as
counters, adders, subtractors, and arithmetic functions. If required, you can also
create special-purpose functions that specify which ALM operating mode to use for
optimal performance.
Normal mode is suitable for general logic applications and combinational functions.
In this mode, up to eight data inputs from the LAB local interconnect are inputs to the
combinational logic. Normal mode allows two functions to be implemented in one
Arria GX ALM, or an ALM to implement a single function of up to six inputs. The
ALM can support certain combinations of completely independent functions and
various combinations of functions which have common inputs. Figure 2–30 shows the
supported LUT combinations in normal mode.
Figur e 2–30. ALM in Normal Mode (Note 1)
dataf0
datae0
datac
dataa
datab
datad
datae1
dataf1
dataf0
datae0
datac
dataa
datab
datad
datae1
dataf1
dataf0
datae0
datac
dataa
datab
4-Input
LUT
4-Input
LUT
5-Input
LUT
3-Input
LUT
5-Input
LUT
combout0
combout1
combout0
combout1
combout0
dataf0
datae0
datac
dataa
datab
datad
datae1
dataf1
dataf0
datae0
dataa
datab
datac
datad
dataf0
datae0
dataa
datab
datac
datad
5-Input
LUT
5-Input
LUT
6-Input
LUT
6-Input
LUT
combout0
combout1
combout0
combout0
datad
datae1
dataf1
4-Input
LUT
combout1
datae1
dataf1
6-Input
LUT
combout1
Note to Figure 2–30:
(1) Combinations of functions with less inputs than those shown are also supported. For example, combinations of functions with the following
number of inputs are supported: 4 and 3, 3 and 3, 3 and 2, 5 and 2, and so on.
Normal mode provides complete backward compatibility with four-input LUT
architectures. Two independent functions of four inputs or less can be implemented in
one Arria GX ALM. In addition, a five-input function and an independent three-input
function can be implemented without sharing inputs.
To pack two five-input functions into one ALM, the functions must have at least two
common inputs. The common inputs are dataa and datab. The combination of a
four-input function with a five-input function requires one common input
(either dataa or datab).
To implement two six-input functions in one ALM, four inputs must be shared and
the combinational function must be the same. For example, a 4 × 2 crossbar switch
(two 4-to-1 multiplexers with common inputs and unique select lines) can be
implemented in one ALM, as shown in Figure 2–31. The shared inputs are dataa,
datab, datac, and datad, while the unique select lines are datae0 and dataf0 for
function0, and datae1 and dataf1 for function1. This crossbar switch
consumes four LUTs in a four-input LUT-based architecture.
Figur e 2–31. 4 × 2 Crossbar Switch Example
4 ´ 2 Crossbar SwitchImplementation in 1 ALM
sel0[1..0]
inputa
inputb
inputc
inputd
out0
out1
dataf0
datae0
dataa
datab
datac
datad
Six-Input
LUT
(Function0)
combout0
sel1[1..0]
datae1
dataf1
Six-Input
LUT
(Function1)
combout1
In a sparsely used device, functions that can be placed into one ALM can be
implemented in separate ALMs. The Quartus II Compiler spreads a design out to
achieve the best possible performance. As a device begins to fill up, the Quartus II
software automatically uses the full potential of the Arria GX ALM. The Quartus II
Compiler automatically searches for functions of common inputs or completely
independent functions to be placed into one ALM and to make efficient use of the
device resources. In addition, you can manually control resource usage by setting
location assignments. Any six-input function can be implemented utilizing inputs
dataa, datab, datac, datad, and either datae0 and dataf0 or datae1 and
dataf1. If datae0 and dataf0 are used, the output is driven to register0,
and/or register0 is bypassed and the data drives out to the interconnect using the
top set of output drivers (refer to Figure 2–32). If datae1 and dataf1 are used, the
output drives to register1 and/or bypasses register1 and drives to the
interconnect using the bottom set of output drivers. The Quartus II Compiler
automatically selects the inputs to the LUT. Asynchronous load data for the register
comes from the datae or dataf input of the ALM. ALMs in normal mode support
register packing.
Figur e 2–32. Six-Input Function in Normal Mode Note (1), (2)
dataf0
datae0
dataa
datab
datac
datad
datae1
dataf1
(2)
These inputs are available for register packing.
6-Input
LUT
DQ
reg0
DQ
reg1
To general or
local routing
To general or
local routing
To general or
local routing
Notes to Figure 2–32:
(1) If datae1 and dataf1 are used as inputs to the six-input function, datae0 and dataf0 are available for register
packing.
(2) The dataf1 input is available for register packing only if the six-input function is un-registered.
Extended LUT Mode
Extended LUT mode is used to implement a specific set of seven-input functions. The
set must be a 2-to-1 multiplexer fed by two arbitrary five-input functions sharing four
inputs. Figure 2–33 shows the template of supported seven-input functions utilizing
extended LUT mode. In this mode, if the seven-input function is unregistered, the
unused eighth input is available for register packing. Functions that fit into the
template shown in Figure 2–33 occur naturally in designs. These functions often
appear in designs as “if-else” statements in Verilog HDL or VHDL code.
Figur e 2–33. Template for Support ed Seven-Input Functions in Extended LUT Mode
datae0
datac
dataa
datab
datad
dataf0
datae1
dataf1
(1)
Note to Figure 2–33:
(1) If the seven-input function is unregistered, the unused eighth input is available for register packing. The second register, reg1, is not available.
Arithmetic mode is ideal for implementing adders, counters, accumulators, wide
parity functions, and comparators. An ALM in arithmetic mode uses two sets of 2
four-input LUTs along with two dedicated full adders. The dedicated adders allow
the LUTs to be available to perform pre-adder logic; therefore, each adder can add the
output of two four-input functions. The four LUTs share the dataa and datab
inputs. As shown in Figure 2–34, the carry-in signal feeds to adder0, and the
carry-out from adder0 feeds to carry-in of adder1. The carry-out from adder1
drives to adder0 of the next ALM in the LAB. ALMs in arithmetic mode can drive
out registered and/or unregistered versions of the adder outputs.
Figur e 2–34. ALM in Arithmetic Mode
While operating in arithmetic mode, the ALM can support simultaneous use of the
adder’s carry output along with combinational logic outputs. In this operation, adder
output is ignored. This usage of the adder with the combinational logic output
provides resource savings of up to 50% for functions that can use this ability. An
example of such functionality is a conditional operation, such as the one shown in
Figure 2–35. The equation for this example is:
Equation 2–1.
To implement this function, the adder is used to subtract ‘Y’ from ‘X.’ If ‘X’ is less than
‘Y,’ the carry_out signal is ‘1.’ The carry_out signal is fed to an adder where it
drives out to the LAB local interconnect. It then feeds to the LAB-wide syncload
R=(X<Y)?Y:X
signal. When asserted, syncload selects the syncdata input. In this case, the data
‘Y’ d rives the syncdata inputs to the registers. If ‘X’ is greater than or equal to ‘Y,’ the
syncload signal is deasserted and ‘X’ drives the data port of the registers.
Arithmetic mode also offers clock enable, counter enable, synchronous up/down
control, add/subtract control, synchronous clear, and synchronous load. The LAB
local interconnect data inputs generate the clock enable, counter enable, synchronous
up/down and add/subtract control signals. These control signals can be used for the
inputs that are shared between the four LUTs in the ALM. The synchronous clear and
synchronous load options are LAB-wide signals that affect all registers in the LAB.
The Quartus II software automatically places any registers that are not used by the
counter into other LABs.
Carry Chain
Carry chain provides a fast carry function between the dedicated adders in arithmetic
or shared arithmetic mode. Carry chains can begin in either the first ALM or the fifth
ALM in a LAB. The final carry-out signal is routed to an ALM, where it is fed to local,
row, or column interconnects.
The Quartus II Compiler automatically creates carry chain logic during compilation,
or you can create it manually during design entry. Parameterized functions such as
LPM functions automatically take advantage of carry chains for the appropriate
functions. The Quartus II Compiler creates carry chains longer than 16 (8 ALMs in
arithmetic or shared arithmetic mode) by linking LABs together automatically. For
enhanced fitting, a long carry chain runs vertically allowing fast horizontal
connections to TriMatrix memory and DSP blocks. A carry chain can continue as far as
a full column. To avoid routing congestion in one small area of the device when a high
fan-in arithmetic function is implemented, the LAB can support carry chains that only
use either the top half or bottom half of the LAB before connecting to the next LAB.
Page 54
2–40Chapter 2: Arria GX Architecture
Adaptive Logic Modules
The other half of the ALMs in the LAB is available for implementing narrower fan-in
functions in normal mode. Carry chains that use the top four ALMs in the first LAB
carries into the top half of the ALMs in the next LAB within the column. Carry chains
that use the bottom four ALMs in the first LAB carries into the bottom half of the
ALMs in the next LAB within the column. Every other column of the LABs are
top-half bypassable, while the other LAB columns are bottom-half bypassable. For
more information about carry chain interconnect, refer to “MultiTrack Interconnect”
on page 2–44.
Shared Arithmetic Mode
In shared arithmetic mode, the ALM can implement a three-input add. In this mode,
the ALM is configured with four 4-input LUTs. Each LUT either computes the sum of
three inputs or the carry of three inputs. The output of the carry computation is fed to
the next adder (either to adder1 in the same ALM or to adder0 of the next ALM in
the LAB) using a dedicated connection called the shared arithmetic chain. This shared
arithmetic chain can significantly improve the performance of an adder tree by
reducing the number of summation stages required to implement an adder tree.
Figure 2–36 shows the ALM in shared arithmetic mode.
Figur e 2–36. ALM in Shared Arithmetic Mode
shared_arith_in
carry_in
4-Input
LUT
datae0
datac
datab
dataa
datad
datae1
4-Input
LUT
4-Input
LUT
4-Input
LUT
carry_out
shared_arith_out
Note to Figure 2–36:
(1) Inputs dataf0 and dataf1 are available for register packing in shared arithmetic mode.
Adder trees are used in many different applications. For example, the summation of
partial products in a logic-based multiplier can be implemented in a tree structure.
Another example is a correlator function that can use a large adder tree to sum filtered
data samples in a given time frame to recover or to de-spread data which was
transmitted utilizing spread spectrum technology. An example of a three-bit add
operation utilizing the shared arithmetic mode is shown in Figure 2–37. The partial
sum (S[2..0]) and the partial carry (C[2..0]) is obtained using LUTs, while the
result (R[2..0]) is computed using dedicated adders.
Figur e 2–37. Example of a 3-Bit Add Utilizing Shared Arithmetic Mode
shared_arith_in = '0'
3-Bit Add ExampleALM Implementation
ALM 1
1st stage add is
implemented in LUTs.
2nd stage add is
implemented in adders.
X2 X1 X0
Y2 Y1 Y0
Z2 Z1 Z0
+
S2 S1 S0
+
C2 C1 C0
R3 R2 R1 R0
X0
Y0
Z0
3-Input
LUT
3-Input
LUT
S0
C0
carry_in = '0'
R0
Binary Add
1 1 0
1 0 1
0 1 0
+
0 0 1
+
1 1 0
1 1 0 1
Shared Arithmetic Chain
In addition to dedicated carry chain routing, the shared arithmetic chain available in
shared arithmetic mode allows the ALM to implement a three-input add, which
significantly reduces the resources necessary to implement large adder trees or
correlator functions. Shared arithmetic chains can begin in either the first or fifth ALM
in a LAB. The Quartus II Compiler automatically links LABs to create shared
arithmetic chains longer than 16 (eight ALMs in arithmetic or shared arithmetic
mode). For enhanced fitting, a long shared arithmetic chain runs vertically allowing
fast horizontal connections to TriMatrix memory and DSP blocks. A shared arithmetic
chain can continue as far as a full column. Similar to carry chains, shared arithmetic
chains are also top- or bottom-half bypassable. This capability allows the shared
arithmetic chain to cascade through half of the ALMs in a LAB while leaving the other
half available for narrower fan-in functionality. Every other LAB column is top-half
bypassable, while the other LAB columns are bottom-half bypassable. For more
information about shared arithmetic chain interconnect, refer to “MultiTrack
Interconnect” on page 2–44.
Register Chain
In addition to the general routing outputs, the ALMs in a LAB have register chain
outputs. Register chain routing allows registers in the same LAB to be cascaded
together. The register chain interconnect allows a LAB to use LUTs for a single
combinational function and the registers to be used for an unrelated shift register
implementation. These resources speed up connections between ALMs while saving
local interconnect resources (refer to Figure 2–38). The Quartus II Compiler
automatically takes advantage of these resources to improve utilization and
performance. For more information about register chain interconnect, refer to
Figur e 2–38. Regi ster Chain within a LAB(Note 1)
Note to Figure 2–38:
(1) The combinational or adder logic can be used to implement an unrelated, unregistered function.
Clear and Preset Logic Control
LAB-wide signals control the logic for the register ’s clear and load/preset signals. The
ALM directly supports an asynchronous clear and preset function. The register preset
is achieved through the asynchronous load of a logic high. The direct asynchronous
preset does not require a NOT gate push-back technique. Arria GX devices support
simultaneous asynchronous load/preset and clear signals. An asynchronous clear
signal takes precedence if both signals are asserted simultaneously. Each LAB
supports up to two clears and one load/preset signal.
In addition to the clear and load/preset ports, Arria GX devices provide a
device-wide reset pin (DEV_CLRn) that resets all registers in the device. An option set
before compilation in the Quartus II software controls this pin. This device-wide reset
overrides all other control signals.
MultiTrack Interconnect
In Arria GX architecture, the MultiTrack interconnect structure with DirectDrive
technology provides connections between ALMs, TriMatrix memory, DSP blocks, and
device I/O pins. The MultiTrack interconnect consists of continuous,
performance-optimized routing lines of different lengths and speeds used for interand intra-design block connectivity. The Quartus II Compiler automatically places
critical design paths on faster interconnects to improve design performance.
DirectDrive technology is a deterministic routing technology that ensures identical
routing resource usage for any function regardless of placement in the device. The
MultiTrack interconnect and DirectDrive technology simplify the integration stage of
block-based designing by eliminating the re-optimization cycles that typically follow
design changes and additions.
The MultiTrack interconnect consists of row and column interconnects that span fixed
distances. A routing structure with fixed length resources for all devices allows
predictable and repeatable performance when migrating through different device
densities. Dedicated row interconnects route signals to and from LABs, DSP blocks,
and TriMatrix memory in the same row.
These row resources include:
■ Direct link interconnects between LABs and adjacent blocks
■ R4 interconnects traversing four blocks to the right or left
■ R24 row interconnects for high-speed access across the length of the device
The direct link interconnect allows a LAB, DSP block, or TriMatrix memory block to
drive into the local interconnect of its left and right neighbors and then back into
itself, providing fast communication between adjacent LABs and/or blocks without
using row interconnect resources.
The R4 interconnects span four LABs, three LABs and one M512 RAM block, two
LABs and one M4K RAM block, or two LABs and one DSP block to the right or left of
a source LAB. These resources are used for fast row connections in a four-LAB region.
Every LAB has its own set of R4 interconnects to drive either left or right. Figure 2–39
shows R4 interconnect connections from a LAB.
R4 interconnects can drive and be driven by DSP blocks and RAM blocks and row
IOEs. For LAB interfacing, a primary LAB or LAB neighbor can drive a given R4
interconnect. For R4 interconnects that drive to the right, the primary LAB and right
neighbor can drive onto the interconnect. For R4 interconnects that drive to the left,
the primary LAB and its left neighbor can drive onto the interconnect. R4
interconnects can drive other R4 interconnects to extend the range of LABs they can
drive. R4 interconnects can also drive C4 and C16 interconnects for connections from
one row to another. Additionally, R4 interconnects can drive R24 interconnects.
Figur e 2–39. R4 Interconnect Connections(Note 1), (2), (3)
R4 Interconnect
Driving Left
Adjacent LAB can
Drive onto Another
LAB's R4 Interconnect
C4 and C16
Column Interconnects (1)
R4 Interconnect
Driving Right
LAB
Neighbor
Primary
LAB (2)
Notes to Figure 2–39:
(1) C4 and C16 interconnects can drive R4 interconnects.
(2) This pattern is repeated for every LAB in the LAB row.
(3) The LABs in Figure 2–39 show the 16 possible logical outputs per LAB.
R24 row interconnects span 24 LABs and provide the fastest resource for long row
connections between LABs, TriMatrix memory, DSP blocks, and row IOEs. The R24
row interconnects can cross M-RAM blocks. R24 row interconnects drive to other row
or column interconnects at every fourth LAB and do not drive directly to LAB local
interconnects. R24 row interconnects drive LAB local interconnects via R4 and C4
interconnects. R24 interconnects can drive R24, R4, C16, and C4 interconnects. The
column interconnect operates similarly to the row interconnect and vertically routes
signals to and from LABs, TriMatrix memory, DSP blocks, and IOEs. Each column of
LABs is served by a dedicated column interconnect.
These column resources include:
■ Shared arithmetic chain interconnects in a LAB
■ Carry chain interconnects in a LAB and from LAB to LAB
■ Register chain interconnects in a LAB
■ C4 interconnects traversing a distance of four blocks in up and down direction
LAB
Neighbor
■ C16 column interconnects for high-speed vertical routing through the device
Arria GX devices include an enhanced interconnect structure in LABs for routing
shared arithmetic chains and carry chains for efficient arithmetic functions. The
register chain connection allows the register output of one ALM to connect directly to
the register input of the next ALM in the LAB for fast shift registers. These
ALM-to-ALM connections bypass the local interconnect. The Quartus II Compiler
automatically takes advantage of these resources to improve utilization and
performance. Figure 2–40 shows shared arithmetic chain, carry chain, and register
chain interconnects.
Figur e 2–40. Shar ed Arithmetic Chain, Carry Chain and Register Chain Interconnects
Local Interconnect
Routing Among ALMs
in the LAB
Carry Chain & Shared
Arithmetic Chain
Routing to Adjacent ALM
Local
Interconnect
ALM 1
ALM 2
ALM 3
ALM 4
ALM 5
ALM 6
ALM 7
ALM 8
Register Chain
Routing to Adjacent
ALM's Register Inpu
C4 interconnects span four LABs, M512, or M4K blocks up or down from a source
LAB. Every LAB has its own set of C4 interconnects to drive either up or down.
Figure 2–41 shows the C4 interconnect connections from a LAB in a column. C4
interconnects can drive and be driven by all types of architecture blocks, including
DSP blocks, TriMatrix memory blocks, and column and row IOEs. For LAB
interconnection, a primary LAB or its LAB neighbor can drive a given C4
interconnect. C4 interconnects can drive each other to extend their range as well as
drive row interconnects for column-to-column connections.
C4 Interconnect
Drives Local and R4
Interconnectsup to Four Rows
Adjacent LAB can
drive onto neighboring
LAB's C4 interconnect
C4 Interconnect
Driving Up
C4 Interconnect
Driving Down
LAB
Row
Interconnect
Local
Interconnect
MultiTrack Interconnect
Figur e 2–41. C4 Interconnect Connections (Note 1)
Note to Figure 2–41:
(1) Each C4 interconnect can drive either up or down four rows.
C16 column interconnects span a length of 16 LABs and provide the fastest resource
for long column connections between LABs, TriMatrix memory blocks, DSP blocks,
and IOEs. C16 interconnects can cross M-RAM blocks and also drive to row and
column interconnects at every fourth LAB. C16 interconnects drive LAB local
interconnects via C4 and R4 interconnects and do not drive LAB local interconnects
directly. All embedded blocks communicate with the logic array similar to
LAB-to-LAB interfaces. Each block (that is, TriMatrix memory and DSP blocks)
connects to row and column interconnects and has local interconnect regions driven
by row and column interconnects. These blocks also have direct link interconnects for
fast connections to and from a neighboring LAB. All blocks are fed by the row LAB
clocks, labclk[5..0].
TriMatrix memory consists of three types of RAM blocks: M512, M4K, and M-RAM.
Although these memory blocks are different, they can all implement various types of
memory with or without parity, including true dual-port, simple dual-port, and
single-port RAM, ROM, and FIFO buffers. Table 2–11 lists the size and features of the
different RAM blocks.
Table 2–11. TriMatrix Memor y Featur es (Part 1 of 2)
Memory Feature
Maximum performance 345 MHz380 MHz290 MHz
True dual-port memory—vv
Simple dual-p ort memoryvvv
Single-port memoryvvv
Shift r egistervv—
TriMatrix memory provides three different memory sizes for efficient application
support. The Quartus II software automatically partitions the user-defined memory
into the embedded memory blocks using the most efficient size combinations. You can
also manually assign the memory to a specific block size or a mixture of block sizes.
M512 RAM Block
The M512 RAM block is a simple dual-port memory block and is useful for
implementing small FIFO buffers, DSP, and clock domain transfer applications. Each
block contains 576 RAM bits (including parity bits). M512 RAM blocks can be
configured in the following modes:
When configured as RAM or ROM, you can use an initialization file to pre-load the
memory contents.
M512 RAM blocks can have different clocks on its inputs and outputs. The wren, datain, and write address registers are all clocked together from one of the two
clocks feeding the block. The read address, rden, and output registers can be clocked
by either of the two clocks driving the block, allowing the RAM block to operate in
read and write or input and output clock modes. Only the output register can be
bypassed. The six labclk signals or local interconnect can drive the inclock, outclock, wren, rden, and outclr signals. Because of the advanced interconnect
between the LAB and M512 RAM blocks, ALMs can also control the wren and rden
signals and the RAM clock, clock enable, and asynchronous clear signals. Figure 2–42
shows the M512 RAM block control signal generation logic.
Figur e 2–42. M512 RAM Block Control Signals
The RAM blocks in Arria GX devices have local interconnects to allow ALMs and
interconnects to drive into RAM blocks. The M512 RAM block local interconnect is
driven by the R4, C4, and direct link interconnects from adjacent LABs. The M512
RAM blocks can communicate with LABs on either the left or right side through these
row interconnects or with LAB columns on the left or right side with the column
interconnects. The M512 RAM block has up to 16 direct link input connections from
the left adjacent LABs and another 16 from the right adjacent LAB. M512 RAM
outputs can also connect to left and right LABs through direct link interconnect. The
M512 RAM block has equal opportunity for access and performance to and from
LABs on either its left or right side. Figure 2–43 shows the M512 RAM block to logic
array interface.
The M4K RAM block includes support for true dual-port RAM. The M4K RAM block
is used to implement buffers for a wide variety of applications such as storing
processor code, implementing lookup schemes, and implementing larger memory
applications. Each block contains 4,608 RAM bits (including parity bits). M4K RAM
blocks can be configured in the following modes:
■ True dual-port RAM
■ Simple dual-port RAM
■ Single-port RAM
■ FIFO
■ ROM
■ Shift register
When configured as RAM or ROM, you can use an initialization file to pre-load the
memory contents.
M4K RAM blocks allow for different clocks on their inputs and outputs. Either of the
two clocks feeding the block can clock M4K RAM block registers (renwe, address, byte enable, datain, and output registers). Only the output register can be
bypassed. The six labclk signals or local interconnects can drive the control signals
for the A and B ports of the M4K RAM block. ALMs can also control the clock_a, clock_b, renwe_a, renwe_b, clr_a, clr_b, clocken_a, and clocken_b
signals, as shown in Figure 2–44.
The R4, C4, and direct link interconnects from adjacent LABs drive the M4K RAM
block local interconnect. The M4K RAM blocks can communicate with LABs on either
the left or right side through these row resources or with LAB columns on either the
right or left with the column resources. Up to 16 direct link input connections to the
M4K RAM block are possible from the left adjacent LABs and another 16 are possible
from the right adjacent LAB. M4K RAM block outputs can also connect to left and
right LABs through direct link interconnect. Figure 2–45 shows the M4K RAM block
to logic array interface.
The largest TriMatrix memory block, the M-RAM block, is useful for applications
where a large volume of data must be stored on-chip. Each block contains 589,824
RAM bits (including parity bits). The M-RAM block can be configured in the
following modes:
■ True dual-port RAM
■ Simple dual-port RAM
■ Single-port RAM
■ FIFO
You cannot use an initialization file to initialize the contents of a M-RAM block. All
M-RAM block contents power up to an undefined value. Only synchronous operation
is supported in the M-RAM block, so all inputs are registered. Output registers can be
bypassed.
Similar to all RAM blocks, M-RAM blocks can have different clocks on their inputs
and outputs. Either of the two clocks feeding the block can clock M-RAM block
registers (renwe, address, byte enable, datain, and output registers). You can
bypass the output register. The six labclk signals or local interconnect can drive the
control signals for the A and B ports of the M-RAM block. ALMs can also control the
clock_a, clock_b, renwe_a, renwe_b, clr_a, clr_b, clocken_a, and
clocken_b signals, as shown in Figure 2–46.
The R4, R24, C4, and direct link interconnects from adjacent LABs on either the right
or left side drive the M-RAM block local interconnect. Up to 16 direct link input
connections to the M-RAM block are possible from the left adjacent LABs and another
16 are possible from the right adjacent LAB. M-RAM block outputs can also connect to
left and right LABs through direct link interconnect. Figure 2–47 shows an example
floorplan for the EP1AGX90 device and the location of the M-RAM interfaces.
Figure 2–48 and Figure 2–49 show the interface between the M-RAM block and the
Table 2–12. M-RAM Row Interface Unit Signals (Part 2 of 2)
Unit Interface Bl ockInput SignalsOutput Signal s
L4
L5
R0
R1
R2
R3
R4
R5
datain_a[ 56..42]
byteena_a [5..4]
datain_a[ 71..57]
byteena_a [7..6]
datain_b[ 14..0]
byteena_b [1..0]
datain_b[ 29..15]
byteena_b [3..2]
datain_b[ 35..30]
addressb[ 4..0]
addr_ena_ b
clock_b
clocken_b
renwe_b
aclr_b
addressb[ 15..5]
datain_b[ 41..36]
datain_b[ 56..42]
byteena_b [5..4]
datain_b[ 71..57]
byteena_b [7..6]
dataout_a[59..48]
dataout_a[71..60]
dataout_b[11..0]
dataout_b[23..12]
dataout_b[35..24]
dataout_b[47..36]
dataout_b[59..48]
dataout_b[71..60]
fFor more information about TriMatrix memory, refer to the TriMatrix Embedded
Memory Blocks in Arria GX Devices chapter.
Digital Signal Processing Block
The most commonly used DSP functions are finite impulse response (FIR) filters,
complex FIR filters, infinite impulse response (IIR) filters, fast Fourier transform (FFT)
functions, direct cosine transform (DCT) functions, and correlators. All of these use
the multiplier as the fundamental building block. Additionally, some applications
need specialized operations such as multiply-add and multiply-accumulate
operations. Arria GX devices provide DSP blocks to meet the arithmetic requirements
of these functions.
Each Arria GX device has two to four columns of DSP blocks to efficiently implement
DSP functions faster than ALM-based implementations. Each DSP block can be
configured to support up to:
As indicated, the Arria GX DSP block can support one 36 × 36-bit multiplier in a
single DSP block and is true for any combination of signed, unsigned, or mixed sign
multiplications.
Figure 2–50 shows one of the columns with surrounding LAB rows.
Figur e 2–50. DSP Blocks Arranged in Columns
DSP Block
Column
4 LAB
Rows
DSP Block
Table 2–13 lists the number of DSP blocks in each Arria GX device. DSP block
multipliers can optionally feed an adder/subtractor or accumulator in the block
depending on the configuration, which makes routing to ALMs easier, saves ALM
routing resources, and increases performance because all connections and blocks are
in the DSP block.
Table 2–13. DSP Blocks in Arria GX Devices (Note 1)
DeviceDSP Blocks
Total 9 × 9
Multipliers
Total 18 × 18
Multipliers
Total 36 × 36
Multipliers
EP1AGX2010804010
EP1AGX35141125614
EP1AGX502620810426
EP1AGX603225612832
EP1AGX904435217644
Note to Ta b le 2 –1 3 :
(1) This list only shows functions that can fit into a single DSP block. Multiple DSP blocks can support larger
multiplication functions.
Additionally, DSP block input registers can efficiently implement shift registers for
FIR filter applications. DSP blocks support Q1.15 format rounding and saturation.
Figure 2–51 shows a top-level diagram of the DSP block configured for 18 × 18-bit
The adder, subtractor, and accumulate functions of a DSP block have four modes of
operation:
■ Simple multiplier
■ Multiply-accumulator
■ Two-multipliers adder
■ Four-multipliers adder
Table 2–14 shows the different number of multipliers possible in each DSP block
mode according to size. These modes allow the DSP blocks to implement numerous
applications for DSP including FFTs, complex FIR, FIR, 2D FIR filters, equalizers, IIR,
correlators, matrix multiplication, and many other functions. DSP blocks also support
mixed modes and mixed multiplier sizes in the same block. For example, half of one
DSP block can implement one 18 × 18-bit multiplier in multiply-accumulator mode,
while the other half of the DSP block implements four 9 × 9-bit multipliers in simple
multiplier mode.
Table 2–14. Multiplier Size and Configurations per DSP Block
Two two-multiplier adder (one
18 × 18 complex multiply)
One multiplier with one
product output
DSP Block Interface
The Arria GX device DSP block input registers can generate a shift register that can
cascade down in the same DSP block column. Dedicated connections between DSP
blocks provide fast connections between shift register inp uts to cascade shift register
chains. You can cascade registers within multiple DSP blocks for 9 × 9- or 18 × 18-bit
FIR filters larger than four taps, with additional adder stages implemented in ALMs.
If the DSP block is configured as 36 × 36 bits, the adder, subtractor, or accumulator
stages are implemented in ALMs. Each DSP block can route the shift register chain
out of the block to cascade multiple columns of DSP blocks.
The DSP block is divided into four block units that interface with four LAB rows on
the left and right. Each block unit can be considered one complete 18 × 18-bit
multiplier with 36 inputs and 36 outputs. A local interconnect region is associated
with each DSP block. Like an LAB, this interconnect region can be fed with 16 direct
link interconnects from the LAB to the left or right of the DSP block in the same row.
R4 and C4 routing resources can access the DSP block’s local interconnect region.
—
—
The outputs also work similarly to LAB outputs. Eighteen outputs from the DSP block
can drive to the left LAB through direct link interconnects and 18 can drive to the
right LAB though direct link interconnects. All 36 outputs can drive to R4 and C4
routing interconnects. Outputs can drive right- or left-column routing.
DSP Block to
LAB Row Interface
Block Interconnect Region
36 Inputs per Row36 Outputs per Row
R4 Interconnect
C4 Interconnect
Direct Link Interconnect
from Adjacent LAB
Direct Link Outputs
to Adjacent LABs
Direct Link Interconnect
from Adjacent LAB
36
36
36
36
Control
12
16
18
Digital Signal Processing Block
Figur e 2–53. DSP Block Interface to Interconnect
A bus of 44 control signals feeds the entire DSP block. These signals include clocks,
asynchronous clears, clock enables, signed and unsigned control signals, addition and
subtraction control signals, rounding and saturation control signals, and accumulator
synchronous loads. The clock signals are routed from LAB row clocks and are
generated from specific LAB rows at the DSP block interface. The LAB row source for
control signals, data inputs, and outputs is shown in Table 2–15.
fFor more information about DSP blocks, refer to the DSP Blocks in Arria GX Devices
Arria GX devices provide a hierarchical clock structure and multiple PLLs with
advanced features. The large number of clocking resources in combination with the
clock synthesis precision provided by enhanced and fast PLLs provides a complete
clock management solution.
Global and Hierarchical Clocking
Arria GX devices provide 16 dedicated global clock networks and 32 regional clock
networks (eight per device quadrant). These clocks are organized into a hierarchical
clock structure that allows for up to 24 clocks per device region with low skew and
delay. This hierarchical clocking scheme provides up to 48 unique clock domains in
Arria GX devices.
There are 12 dedicated clock pins (CLK[15..12] and CLK[7..0]) to drive either the
global or regional clock networks. Four clock pins drive each side of the device except
the right side, as shown in Figure 2–54 and Figure 2–55. Internal logic and enhanced
and fast PLL outputs can also drive the global and regional clock networks. Each
global and regional clock has a clock control block, which controls the selection of the
clock source and dynamically enables or disables the clock to reduce power
consumption. Table 2–16 lists the global and regional clock features.
These clocks drive throughout the entire device, feeding all device quadrants. GCLK
networks can be used as clock sources for all resources in the device IOEs, ALMs, DSP
blocks, and all memory blocks. These resources can also be used for control signals,
such as clock enables and synchronous or asynchronous clears fed from the external
pin. The global clock networks can also be driven by internal logic for internally
generated global clocks and asynchronous clears, clock enables, or other control
signals with large fanout. Figure 2–54 shows the 12 dedicated CLK pins driving global
clock networks.
There are eight RC LK networks (RCLK[7..0]) in each quadrant of the Arria GX
device that are driven by the dedicated CLK[15..12]and CLK[7..0] input pins, by
PLL outputs, or by internal logic. The regional clock networks provide the lowest
clock delay and skew for logic contained in a single quadrant. The CLK pins
symmetrically drive the RCLK networks in a particular quadrant, as shown in
A single source (CLK pin or PLL output) can generate a dual-RCLK by driving two
RCLK network lines in adjacent quadrants (one from each quadrant), which allows
logic that spans multiple quadrants to use the same low skew clock. The routing of
this clock signal on an entire side has approximately the same speed but slightly
higher clock skew when compared with a clock signal that drives a single quadrant.
Internal logic-array routing can also drive a dual-regional clock. Clock pins and
enhanced PLL outputs on the top and bottom can drive horizontal dual-regional
clocks. Clock pins and fast PLL outputs on the left and right can drive vertical
dual-regional clocks, as shown in Figure 2–56. Corner PLLs cannot drive
dual-regional clocks.
Clock Pins or PLL Clock Outputs
Can Drive Dual-Regional Network
Clock Pins or PLL Clock
CLK[15..12]
Outputs Can Drive
Dual-Regional Network
CLK[15..12]
CLK[3..0]
CLK[7..4]
Combined Resources
Within each quadrant, there are 24 distinct dedicated clocking resources consisting of
16 global clock lines and eight regional clock lines. Multiplexers are used with these
clocks to form buses to drive LAB row clocks, column IOE clocks, or row IOE clocks.
Another multiplexer is used at the LAB level to select three of the six row clocks to
feed the ALM registers in the LAB (refer to Figure 2–57).
Figur e 2–57. Hierarchical Clock Networks Per Quadrant
You can use the Quartus II software to control whether a clock input pin drives either
a GCLK, RCLK, or dual-RCLK network. The Quartus II software automatically selects
the clocking resources if not specified.
Page 84
2–70Chapter 2: Arria GX Architecture
CLKp
Pins
PLL Counter
Outputs
Internal
Logic
CLKn
Pin
Enable/
Disable
GCLK
Internal
Logic
Static Clock Select
This multiplexer supports
User-Controllable
Dynamic Switching
CLKSELECT[1..0]
(1)
(2)
2
2
2
)
PLLs and Clock Networks
Clock Control Block
Each GCLK, RCLK, and PLL external clock output has its own clock control block.
The control block has two functions:
■ Clock source selection (dynamic selection for global clocks)
■ Clock power-down (dynamic clock enable or disable)
Figure 2–58 through Figure 2–60 show the clock control block for the global clock,
regional clock, and PLL external clock output, respectively.
Figur e 2–58. Global Clock Control Blocks
Notes to Figure 2–58:
(1) These clock select signals can be dynamically controlled through internal logic when the device is operating in user mode.
(2) These clock select signals can only be set through a configuration file (SRAM Object File [.sof] or Programmer Object File [.pof]) and cannot be
dynamically controlled during user mode operation.
Figur e 2–59. Regional Clock Control Blocks
CLKp
CLKn
Pin
Pin
(2)
PLL Counter
Outputs
Notes to Figure 2–59:
(1) These clock select signals can only be set through a configuration file (.sof or .pof) and cannot be dynamically controlled during user mode
operation.
(2) Only the CLKn pins on the top and bottom of the device feed to regional clock select.
Figur e 2–60. External PLL Output Clock Control Blocks
Notes to Figure 2–60:
(1) These clock select signals can only be set through a configuration file (.sof or .pof) and cannot be dynamically controlled during user mode
operation.
(2) The clock cont rol block fee ds to a multiple xer within the PLL_OUT pin’s IOE. The PLL_OUT pin is a dual-purpose pin. Therefore, this m ult ipl exer
selects either an internal signal or the output of the clock control block.
For the global clock control block, clock source selection can be controlled either
statically or dynamically. You have the option of statically selecting the clock source
by using the Quartus II software to set specific configuration bits in the configuration
file (.sof or .pof) or controlling the selection dynamically by using internal logic to
drive the multiplexer select inputs. When selecting statically, the clock source can be
set to any of the inputs to the select multiplexer. When selecting the clock source
dynamically, you can either select between two PLL outputs (such as the C0 or C1
outputs from one PLL), between two PLLs (such as the C0/C1 clock output of one
PLL or the C0/C1 c1ock output of the other PLL), between two clock pins (such as
CLK0 or CLK1), or between a combination of clock pins or PLL outputs.
For the regional and PLL_OUT clock control block, clock source selection can only be
controlled statically using configuration bits. Any of the inputs to the clock select
multiplexer can be set as the clock source.
Arria GX clock networks can be disabled (powered down) by both static and dynamic
approaches. When a clock net is powered down, all logic fed by the clock net is in an
off-state thereby reducing the overall power consumption of the device. GCLK and
RCLK networks can be powered down statically through a setting in the
configuration file (.sof or .pof). Clock networks that are not used are automatically
powered down through configuration bit settings in the configuration file generated
by the Quartus II software. The dynamic clock enable or disable feature allows the
internal logic to control power up/down synchronously on GCLK and RCLK nets and PLL_OUT pins. This function is independent of the PLL and is applied directly on the
clock network or PLL_OUT pin, as shown in Figure 2–58 through Figure 2–60.
Arria GX devices provide robust clock management and synthesis using up to four
enhanced PLLs and four fast PLLs. These PLLs increase performance and provide
advanced clock interfacing and clock frequency synthesis. With features such as clock
switchover, spread spectrum clocking, reconfigurable bandwidth, phase control, and
reconfigurable phase shifting, the Arria GX device’s enhanced PLLs provide you with
complete control of your clocks and system timing. The fast PLLs provide general
purpose clocking with multiplication and phase shifting as well as high-speed
outputs for high-speed differential I/O support. Enhanced and fast PLLs work
together with the Arria GX high-speed I/O and advanced clock architecture to
provide significant improvements in system performance and bandwidth.
The Quartus II software enables the PLLs and their features without requiring any
external devices. Table 2–17 lists the PLLs available for each Arria GX device and their
type.
(1) The global or regional clocks in a fast PLL's transceiver block can dr ive the fast PLL input. A pin or other PLL must drive the global or regional
source. The source cannot be driven by internally generated logic before driving the fast PLL.
(2) EP1AGX20C, EP1AGX35C/D, EP1AGX50C and EP1AGX60C/D devices only have two fast PLLs ( PLLs 1 and 2), but the connectivity from these
two PLLs to the global and regional clock networks remains the same as shown in this table.
(3) PLLs 3, 4, 9, and 10 are not available in Arria GX devices.
(4) 4 or 8 PLLs are available depending on C or D device and the package option.
(5) 4or 8 PLLs are available depending on C, D, or E device option.
vv
vv
vv
vv
vv
——————
——————
——
——
——
vv
vv
vv
——
——
——
vv
vv
vvvv
vvvv
vvvv
——
——
Table 2–18 lists the enhanced PLL and fast PLL features in Arria GX devices.
Number of feedback clock inputsOne single-ended or differential (7), (8)—
Notes to Ta bl e 2– 18 :
(1) For enhanced PLLs, m, n range from 1 to 256 and post-scale counters range from 1 to 512 with 50% duty cycle.
(2) For fast PLLs, m, and post-scale counters range from 1 to 32. The n counter ranges from 1 to 4.
(3) The smallest phase shift is determined by the voltage controlled oscillator (VCO) period divided by 8.
(4) For degree increments, Arria GX devices can shift all output frequencies in increments of at least 45. Smaller degree i ncrem ents are possible
depending on the frequency and divide parameters.
(5) Arria GX fast PLLs only support manual clock switchover.
(6) Fast PLLs can drive to any I/O pin as an external clock. For high-speed differential I/O pins, the device uses a data channel to generate
txclkout.
(7) If the feedback input is used, you lose one (or two, if f
(8) Every Arria GX device has at least two enhanced PLLs with one single-ended or differential external feedback input per PLL.
is differential) external clock output pin.
BIN
Figure 2–61 shows a top-level diagram of the Arria GX device and PLL floorplan.
Figur e 2–61. PLL Locations
CLK[15..12]
511
FPLL7CLK
CLK[3..0]
PLLs
FPLL8CLK
7
1
2
8
612
CLK[7..4]
Figure 2–62 and Figure 2–63 shows global and regional clocking from the fast PLL
outputs and side clock pins. The connections to the global and regional clocks from
the fast PLL outputs, internal drivers, and CLK pins on the left side of the device are
shown in Table 2–19.
Figur e 2–62. Global and Regional Clock Connections from Center Clock Pins and Fast PLL Outputs (Note 1)
Fast
PLL 1
Fast
PLL 2
C0
C1
C2
C3
C0
C1
C2
C3
RCLK0RCLK2
RCLK1RCLK3
RCLK4RCLK6
RCLK5RCLK7
GCLK0GCLK2
GCLK1GCLK3
Logic Array
Signal Inpu
To Clock
Network
CLK0
CLK1
CLK2
CLK3
Note to Figure 2–62:
(1) The global or regional clocks in a fast PLL's quadrant can drive the fast PLL input. A dedicated clock input pin or other PLL must drive the global
or regional source. The source cannot be driven by internally generated logic before driving the fast PLL.
Figur e 2–63. Global and Regional Clock Connections from Corner Clock Pins and Fast PLL Outputs(Note 1)
Note to Figure 2–63:
(1) The GCLK or RCLK in a fast PLL's quadrant can drive the f ast PLL input. A dedicated clock input pin or other PLL must drive the global or regional
source. The source cannot be driven by internally generated logic before driving the fast PLL.
Table 2–19. Global and Regional Clock Connect ions from Left Side Clock Pins and Fast PLL Outputs (Part 1 of 2)
Figure 2–64 shows the global and regional clocking from enhanced PLL outputs and
top and bottom CLK pins.
Figur e 2–64. Global and Regional Clock Connections from Top and Bottom Cl ock Pins and Enhanced PLL Output s (Note 1)
CLK15
CLK13
CLK12
PLL11_FB
CLK14
PLL5_FB
PLL11_OUT[2..0]p
PLL11_OUT[2..0]n
RCLK27
Regional
RCLK26
Clocks
RCLK25
RCLK24
Global
Clocks
RCLK8
Clocks
RCLK9
RCLK10
RCLK11
Regional
PLL12_OUT[2..0]p
PLL12_OUT[2..0]n
PLL 11
c0 c1 c2 c3 c4 c5 c0 c1 c2 c3 c4 c5
c0 c1 c2 c3 c4 c5 c0 c1 c2 c3 c4 c5
PLL 12
PLL 5
PLL 6
PLL5_OUT[2..0]p
PLL5_OUT[2..0]n
RCLK31
RCLK30
RCLK29
RCLK28
G15
G14
G13
G12
G4
G5
G6
G7
RCLK12
RCLK13
RCLK14
RCLK15
PLL6_OUT[2..0]p
PLL6_OUT[2..0]n
PLL12_FB
CLK4
CLK5
CLK6
PLL6_FB
CLK7
Note to Figure 2–64:
(1) If the design uses the feedback input, you might lose one (or two if FBIN is differential) external clock output pin.
The connections to the global and regional clocks from the top clock pins and
enhanced PLL outputs are shown in Table 2–20. The connections to the clocks from
the bottom clock pins are shown in Table 2–21.
Phase Selection
Selectable at Each
PLL Output Port
VCO Phase Selection
Affecting All Outputs
Shaded Portions of the
PLL are Reconfigurable
Regional
Clocks
8
6
PLLs and Clock Networks
Enhanced PLLs
Arria GX devices contain up to four enhanced PLLs with advanced clock
management features. These features include support for external clock feedback
mode, spread-spectrum clocking, and counter cascading. Figure 2–65 shows a
diagram of the enhanced PLL.
Figur e 2–65. Arria GX Enhanced PLL (Note 1)
Notes to Figure 2–65:
(1) Each clock source can come from any of the four clock pins that are physically located on the same side of the device as the PLL.
(2) If the feedback input is used, you will lose one (or two, if FBIN is differential) external clock output pin.
(3) Each enhanced PLL has three differential external clock outputs or six single-ended external clock outputs.
(4) The global or regional clock input can be driven by an output from another PLL, a pin-driven dedicated global or regional clock, or through a clock
control block provided the clock control block is fed by an output from another PLL or a pin-driven dedicated global or regional clock. An internally
generated global signal cannot drive the PLL.
Fast PLLs
Arria GX devices contain up to four fast PLLs with high-speed serial interfacing
ability. Fast PLLs offer high-speed outputs to manage the high-speed differential I/O
interfaces. Figure 2–66 shows a diagram of the fast PLL.
VCO Phase Selection
Selectable at each PLL
Output Port
Post-Scale
Counters
Global clocks
diffioclk1
load_en1
load_en0
diffioclk0
Regional clocks
to DPA block
Global or
regional clock
(1)
Global or
regional clock
(1)
÷c2
÷k
÷c3
÷n
4
Clock
Switchover
Circuitry (4)
Shaded Portions of the
PLL are Reconfigurable
(2)
(2)
(3)
(3)
I/O Structure
Figur e 2–66. Arria GX Device Fast PLL
Notes to Figure 2–66:
(1) The global or regional clock input can be driven by an output from another PLL, a pin-driven dedicated global or regional clock, or through a clock
control block provided the clock control block is fed by an output from another PLL or a pin-driven dedicated global or regional clock. An internally
generated global signal cannot drive the PLL.
(2) In high-speed differential I/O support mode, this high-speed PLL clock feeds the serializer/deserializer (SERDES) circuitry. Arria GX devices only
support one rate of data transfer per fast PLL in high-speed differential I/O support mode.
(3) This signal is a differential I/O SERDES control signal.
(4) Arria GX fast PLLs only support manual clock switchover.
fFor more information about enhanced and fast PLLs, refer to the PLLs in Arria GX
I/O Structure
Devices chapter. For more information about high-speed differential I/O support,
ref er to “High-Speed Differential I/O with DPA Support” on page 2–99.
Arria GX IOEs provide many features, including:
■ Dedicated differential and single-ended I/O buffers
The IOE in Arria GX devices contains a bidirectional I/O buffer, six registers, and a
latch for a complete embedded bidirectional single data rate or DDR transfer.
Figure 2–67 shows the Arria GX IOE structure. The IOE contains two input registers
(plus a latch), two output registers, and two output enable registers. The design can
use both input registers and the latch to capture DDR input and both outpu t registers
to drive DDR outputs. Additionally, the design can use the output enable (OE)
register for fast clock-to-output enable timing. The negative edge-clocked OE register
is used for DDR SDRAM interfacing. The Quartus II software automatically
duplicates a single OE register that controls multiple output or bidirectional pins.
Figur e 2–67. Arria GX IOE Structure
Logic Array
OE Register
OE
Output A
Output B
Input A
Input B
Output Register
DQ
Output Register
DQ
DQ
OE Register
DQ
CLK
Input Register
DQ
Input Register
DQ
Input Latch
DQ
ENA
The IOEs are located in I/O blocks around the periphery of the Arria GX device.
There are up to four IOEs per row I/O block and four IOEs per column I/O block.
Row I/O blocks drive row, column, or direct link interconnects. Column I/O blocks
drive column interconnects.
Figure 2–69 shows how a column I/O block connects to the logic array.
Figur e 2–69. Column I/O Block Connection to the Interconnect
32 Data &
Control Signals
from Logic Array (1)
Vertical I/O Block
Vertical I/O
Block Containsup to Four IOE
I/O Block
Local Interconnect
R4 & R24
Interconnects
32
LABLABLAB
IO_dataina[3..0]
IO_datainb[3..0]
io_clk[7..0]
LAB Local
Interconnect
C4 & C16
Interconnects
Note to Figure 2–69:
(1) The 32 data and control signals consist of eight data out lines: four lines each for DDR applications io_dataouta[3..0] and
io_dataoutb[3..0], four output enables io_oe[3..0], four input clock enables io_ce_in[3..0], four output clock enables
io_ce_out[3..0], four clocks io_clk[3..0], four asynchronous clear and preset signals io_aclr/apreset[3..0], and four
synchronous clear and preset signals io_sclr/spre set[3..0].
There are 32 control and data signals that feed each row or column I/O block. These
control and data signals are driven from the logic array. The row or column IOE
clocks, io_clk[7..0], provide a dedicated routing resource for low-skew,
high-speed clocks. I/O clocks are generated from global or regional clocks (refer to
Figure 2–70 shows the signal paths through the I/O block.
Figur e 2–70. Signal Path Through the I/O Block
To Logic
Array
From Logic
Array
Row or Column
io_clk[7..0]
io_dataina
io_datainb
io_oe
io_ce_in
io_ce_out
io_aclr
io_sclr
io_clk
io_dataouta
io_dataoutb
Control
Signal
Selection
oe
ce_in
ce_out
aclr/apreset
sclr/spreset
clk_in
clk_out
To Other
IOEs
IOE
Each IOE contains its own control signal selection for the following control signals:
oe, ce_in, ce_out, aclr/apreset, sclr/spreset, clk_in, and clk_out.
Figure 2–71 shows the control signal selection.
Figur e 2–71. Control Signal Selection per IOE(Note 1)
Dedicated I/O
Clock [7..0]
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
Local
Interconnect
Notes to Figure 2–71:
(1) Control signals ce_in, ce_out, aclr/apreset, sclr /sp res et, and oe can be global signals even though their control selection
multiplexers are not directly fed by the ioe_clk[7..0] signals. The ioe_clk signals can drive the I/O l ocal interconnect, which then drives
In normal bidirectional operation, you can use the input register for input data
requiring fast setup times. The input register can have its own clock input and clock
enable separate from the OE and output registers. The output register can be used for
data requiring fast clock-to-output performance. You can use the OE register for fast
clock-to-output enable timing. The OE and output register share the same clock
source and the same clock enable source from the local interconnect in the associated
LAB, dedicated I/O clocks, and the column and row interconnects. Figure 2–72 shows
the IOE in bidirectional configuration.
Figur e 2–72. Arria GX IOE in Bidirectional I/O Configuration (Note 1)
Notes to Figure 2–72:
(1) All input signals to the IOE can be inverted at the IOE.
(2) The optional PCI clamp is only available on column I/O pins.
The Arria GX device IOE includes programmable delays that can be activated to
ensure input IOE register-to-logic array register transfers, input pin-to-logic array
register transfers, or output IOE register-to-pin transfers.