ALTERA Cyclone II User Manual

CII51002-3.1
2. CycloneII Architecture

Functional Description

Cyclone®II devices contain a two-dimensional row- and column-based architecture to implement custom logic. Column and row interconnects of varying speeds provide signal interconnects between logic array blocks (LABs), embedded memory blocks, and embedded multipliers.
The logic array consists of LABs, with 16 logic elements (LEs) in each LAB. An LE is a small unit of logic providing efficient implementation of user logic functions. LABs are grouped into rows and columns across the device. Cyclone II devices range in density from 4,608 to 68,416 LEs.
Cyclone II devices provide a global clock network and up to four phase-locked loops (PLLs). The global clock network consists of up to 16 global clock lines that drive throughout the entire device. The global clock network can provide clocks for all resources within the device, such as input/output elements (IOEs), LEs, embedded multipliers, and embedded memory blocks. The global clock lines can also be used for other high fan-out signals. Cyclone II PLLs provide general-purpose clocking with clock synthesis and phase shifting as well as external outputs for high-speed differential I/O support.
M4K memory blocks are true dual-port memory blocks with 4K bits of memory plus parity (4,608 bits). These blocks provide dedicated true dual-port, simple dual-port, or single-port memory up to 36-bits wide at up to 260 MHz. These blocks are arranged in columns across the device in between certain LABs. Cyclone II devices offer between 119 to 1,152 Kbits of embedded memory.
Each embedded multiplier block can implement up to either two 9 × 9-bit multipliers, or one 18 × 18-bit multiplier with up to 250-MHz performance. Embedded multipliers are arranged in columns across the device.
Each Cyclone II device I/O pin is fed by an IOE located at the ends of LAB rows and columns around the periphery of the device. I/O pins support various single-ended and differential I/O standards, such as the 66- and 33-MHz, 64- and 32-bit PCI standard, PCI-X, and the LVDS I/O standard at a maximum data rate of 805 megabits per second (Mbps) for inputs and 640 Mbps for outputs. Each IOE contains a bidirectional I/O buffer and three registers for registering input, output, and output-enable signals. Dual-purpose DQS, DQ, and DM pins along with delay chains (used to
Altera Corporation 2–1 February 2007

Logic Elements

s
phase-align double data rate (DDR) signals) provide interface support for external memory devices such as DDR, DDR2, and single data rate (SDR) SDRAM, and QDRII SRAM devices at up to 167 MHz.
Figure 2–1 shows a diagram of the Cyclone II EP2C20 device.
Figure 2–1. Cyclone II EP2C20 Device Block Diagram
PLL PLLIOEs
Embedded Multipliers
M4K Blocks
Logic Elements
Logic
IOEs
Array
PLL PLLIOEs
Logic Array
Logic Array
Logic Array
IOEs
M4K Block
The number of M4K memory blocks, embedded multiplier blocks, PLLs, rows, and columns vary per device.
The smallest unit of logic in the Cyclone II architecture, the LE, is compact and provides advanced features with efficient logic utilization. Each LE features:
A four-input look-up table (LUT), which is a function generator that
can implement any function of four variables
A programmable register
A carry chain connection
A register chain connection
The ability to drive all types of interconnects: local, row, column,
register chain, and direct link interconnects
Support for register packing
Support for register feedback
2–2 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Figure 2–2. Cyclone II LE
y
LAB Carry-In
Figure 2–2 shows a Cyclone II LE.
Register Chain Routing From Previous LE
LAB-Wide
Synchronous
Load
Synchronous
LAB-Wide
Clear
Register Bypass
Packed Register Select
Cyclone II Architecture
Programmable Register
data1 data2
data3
data4
labclr1 labclr2
Chip-Wide
Reset
(DEV_CLRn)
labclk1 labclk2
labclkena1 labclkena2
Asynchronous
Clear Logic
Clock &
Clock Enable
Select
Look-Up
Ta bl e (LUT)
Carry Chain
LAB Carr
Synchronous
Load and
Clear Logic
-Out
D
ENA
CLRN
Register Feedback
Q
Row, Column, And Direct Link Routing
Row, Column, And Direct Link Routing
Local Routing
Register Chain Output
Each LE’s programmable register can be configured for D, T, JK, or SR operation. Each register has data, clock, clock enable, and clear inputs. Signals that use the global clock network, general-purpose I/O pins, or any internal logic can drive the register’s clock and clear control signals. Either general-purpose I/O pins or internal logic can drive the clock enable. For combinational functions, the LUT output bypasses the register and drives directly to the LE outputs.
Each LE has three outputs that drive the local, row, and column routing resources. The LUT or register output can drive these three outputs independently. Two LE outputs drive column or row and direct link routing connections and one drives local interconnect resources, allowing the LUT to drive one output while the register drives another output. This feature, register packing, improves device utilization because the device can use the register and the LUT for unrelated functions. When using register packing, the LAB-wide synchronous load control signal is not available. See “LAB Control Signals” on page 2–8 for more information.
Altera Corporation 2–3 February 2007 Cyclone II Device Handbook, Volume 1
Logic Elements
Another special packing mode allows the register output to feed back into the LUT of the same LE so that the register is packed with its own fan-out LUT, providing another mechanism for improved fitting. The LE can also drive out registered and unregistered versions of the LUT output.
In addition to the three general routing outputs, the LEs within an LAB have register chain outputs. Register chain outputs allow registers within the same LAB to cascade together. The register chain output allows an LAB to use LUTs for a single combinational function and the registers to be used for an unrelated shift register implementation. These resources speed up connections between LABs while saving local interconnect resources. See “MultiTrack Interconnect” on page 2–10 for more information on register chain connections.

LE Operating Modes

The Cyclone II LE operates in one of the following modes:
Normal mode
Arithmetic mode
Each mode uses LE resources differently. In each mode, six available inputs to the LE—the four data inputs from the LAB local interconnect, the LAB carry-in from the previous carry-chain LAB, and the register chain connection—are directed to different destinations to implement the desired logic function. LAB-wide signals provide clock, asynchronous clear, synchronous clear, synchronous load, and clock enable control for the register. These LAB-wide signals are available in all LE modes.
®
The Quartus
II software, in conjunction with parameterized functions such as library of parameterized modules (LPM) functions, automatically chooses the appropriate mode for common functions such as counters, adders, subtractors, and arithmetic functions. If required, you can also create special-purpose functions that specify which LE operating mode to use for optimal performance.
Normal Mode
The normal mode is suitable for general logic applications and combinational functions. In normal mode, four data inputs from the LAB local interconnect are inputs to a four-input LUT (see Figure 2–3). The Quartus II Compiler automatically selects the carry-in or the data3 signal as one of the inputs to the LUT. LEs in normal mode support packed registers and register feedback.
2–4 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Figure 2–3. LE in Normal Mode
Cyclone II Architecture
data1
data2 data3
cin (from cout of previous LE)
data4
Packed Register Input
Four-Input
Register Feedback
Arithmetic Mode
The arithmetic mode is ideal for implementing adders, counters, accumulators, and comparators. An LE in arithmetic mode implements a 2-bit full adder and basic carry chain (see Figure 2–4). LEs in arithmetic mode can drive out registered and unregistered versions of the LUT output. Register feedback and register packing are supported when LEs are used in arithmetic mode.
Register chain
connection
LUT
sload
(LAB Wide)
clock (LAB Wide)
ena (LAB Wide)
aclr (LAB Wide)
(LAB Wide)
sclear
D
ENA
CLRN
Q
Row, Column, and Direct Link Routing
Row, Column, and Direct Link Routing
Local routing
Register chain output
Altera Corporation 2–5 February 2007 Cyclone II Device Handbook, Volume 1
Logic Elements
Figure 2–4. LE in Arithmetic Mode
(LAB Wide)
Register chain
connection
sload
sclear
(LAB Wide)
data1 data2
cin (from cout
of previous LE)
Three-Input
LUT
Three-Input
LUT
clock (LAB Wide)
ena (LAB Wide)
aclr (LAB Wide)
cout
Register Feedback
D
ENA
Q
CLRN
Row, column, and direct link routing
Row, column, and direct link routing
Local routing
Register chain output
The Quartus II Compiler automatically creates carry chain logic during design processing, or you can create it manually during design entry. Parameterized functions such as LPM functions automatically take advantage of carry chains for the appropriate functions.
The Quartus II Compiler creates carry chains longer than 16 LEs by automatically linking LABs in the same column. For enhanced fitting, a long carry chain runs vertically, which allows fast horizontal connections to M4K memory blocks or embedded multipliers through direct link interconnects. For example, if a design has a long carry chain in a LAB column next to a column of M4K memory blocks, any LE output can feed an adjacent M4K memory block through the direct link interconnect. Whereas if the carry chains ran horizontally, any LAB not next to the column of M4K memory blocks would use other row or column interconnects to drive a M4K memory block. A carry chain continues as far as a full column.
2–6 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Cyclone II Architecture
t

Logic Array Blocks

Each LAB consists of the following:
16 LEs
LAB control signals
LE carry chains
Register chains
Local interconnect
The local interconnect transfers signals between LEs in the same LAB. Register chain connections transfer the output of one LE’s register to the adjacent LE’s register within an LAB. The Quartus II Compiler places associated logic within an LAB or adjacent LABs, allowing the use of local, and register chain connections for performance and area efficiency.
Figure 2–5 shows the Cyclone II LAB.
Figure 2–5. Cyclone II LAB Structure
Row Interconnect
Column Interconnect
Direct link
Direct link interconnect from adjacent block
Direct link interconnect to adjacent block
LAB
Local Interconnect
interconnect from adjacen block
Direct link interconnect to adjacent block
Altera Corporation 2–7 February 2007 Cyclone II Device Handbook, Volume 1
Logic Array Blocks

LAB Interconnects

The LAB local interconnect can drive LEs within the same LAB. The LAB local interconnect is driven by column and row interconnects and LE outputs within the same LAB. Neighboring LABs, PLLs, M4K RAM blocks, and embedded multipliers from the left and right can also drive an LAB’s local interconnect through the direct link connection. The direct link connection feature minimizes the use of row and column interconnects, providing higher performance and flexibility. Each LE can drive 48 LEs through fast local and direct link interconnects. Figure 2–6 shows the direct link connection.
Figure 2–6. Direct Link Connection
Direct link interconnect from
left LAB, M4K memory
block, embedded multiplier,
PLL, or IOE output
Direct link
interconnect
to left
Interconnect
Direct link interconnect from right LAB, M4K memory block, embedded multiplier, PLL, or IOE output
Direct link interconnect to right
Local
LAB

LAB Control Signals

Each LAB contains dedicated logic for driving control signals to its LEs. The control signals include:
Two clocks
Two clock enables
Two asynchronous clears
One synchronous clear
One synchronous load
2–8 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Cyclone II Architecture
This gives a maximum of seven control signals at a time. When using the LAB-wide synchronous load, the clkena of labclk1 is not available. Additionally, register packing and synchronous load cannot be used simultaneously.
Each LAB can have up to four non-global control signals. Additional LAB control signals can be used as long as they are global signals.
Synchronous clear and load signals are useful for implementing counters and other functions. The synchronous clear and synchronous load signals are LAB-wide signals that affect all registers in the LAB.
Each LAB can use two clocks and two clock enable signals. Each LAB’s clock and clock enable signals are linked. For example, any LE in a particular LAB using the labclk1 signal also uses labclkena1. If the LAB uses both the rising and falling edges of a clock, it also uses both LAB-wide clock signals. De-asserting the clock enable signal turns off the LAB-wide clock.
The LAB row clocks [5..0] and LAB local interconnect generate the LAB­wide control signals. The MultiTrack allows clock and control signal distribution in addition to data. Figure 2–7 shows the LAB control signal generation circuit.
Figure 2–7. LAB-Wide Control Signals
Dedicated LAB Row Clocks
Local Interconnect
Local Interconnect
Local Interconnect
Local Interconnect
6
LAB-wide signals control the logic for the register’s clear signal. The LE directly supports an asynchronous clear function. Each LAB supports up to two asynchronous clear signals (labclr1 and labclr2).
labclkena1
interconnect’s inherent low skew
labclkena2
labclk2labclk1
syncload
labclr1
labclr2
synclr
Altera Corporation 2–9 February 2007 Cyclone II Device Handbook, Volume 1

MultiTrack Interconnect

A LAB-wide asynchronous load signal to control the logic for the register’s preset signal is not available. The register preset is achieved by using a NOT gate push-back technique. Cyclone II devices can only support either a preset or asynchronous clear signal.
In addition to the clear port, Cyclone II devices provide a chip-wide reset pin (DEV_CLRn) that resets all registers in the device. An option set before compilation in the Quartus II software controls this pin. This chip-wide reset overrides all other control signals.
MultiTrack Interconnect
In the Cyclone II architecture, connections between LEs, M4K memory blocks, embedded multipliers, and device I/O pins are provided by the MultiTrack interconnect structure with DirectDrive™ technology. The MultiTrack interconnect consists of continuous, performance-optimized routing lines of different speeds used for inter- and intra-design block connectivity. The Quartus II Compiler automatically places critical paths on faster interconnects to improve design performance.
DirectDrive technology is a deterministic routing technology that ensures identical routing resource usage for any function regardless of placement within the device. The MultiTrack interconnect and DirectDrive technology simplify the integration stage of block-based designing by eliminating the re-optimization cycles that typically follow design changes and additions.
The MultiTrack interconnect consists of row (direct link, R4, and R24) and column (register chain, C4, and C16) interconnects that span fixed distances. A routing structure with fixed-length resources for all devices allows predictable and repeatable performance when migrating through different device densities.

Row Interconnects

Dedicated row interconnects route signals to and from LABs, PLLs, M4K memory blocks, and embedded multipliers within the same row. These row resources include:
Direct link interconnects between LABs and adjacent blocks
R4 interconnects traversing four blocks to the right or left
R24 interconnects for high-speed access across the length of the
device
2–10 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Cyclone II Architecture
The direct link interconnect allows an LAB, M4K memory block, or embedded multiplier block to drive into the local interconnect of its left and right neighbors. Only one side of a PLL block interfaces with direct link and row interconnects. The direct link interconnect provides fast communication between adjacent LABs and/or blocks without using row interconnect resources.
The R4 interconnects span four LABs, three LABs and one M4K memory block, or three LABs and one embedded multiplier to the right or left of a source LAB. These resources are used for fast row connections in a four­LAB region. Every LAB has its own set of R4 interconnects to drive either left or right. Figure 2–8 shows R4 interconnect connections from an LAB. R4 interconnects can drive and be driven by LABs, M4K memory blocks, embedded multipliers, PLLs, and row IOEs. For LAB interfacing, a primary LAB or LAB neighbor (see Figure 2–8) can drive a given R4 interconnect. For R4 interconnects that drive to the right, the primary LAB and right neighbor can drive on to the interconnect. For R4 interconnects that drive to the left, the primary LAB and its left neighbor can drive on to the interconnect. R4 interconnects can drive other R4 interconnects to extend the range of LABs they can drive. Additionally, R4 interconnects can drive R24 interconnects, C4, and C16 interconnects for connections from one row to another.
Figure 2–8. R4 Interconnect Connections
Adjacent LAB can
R4 Interconnect
Driving Left
Drive onto Another LAB's R4 Interconnect
LAB
Neighbor
Primary LAB (2)
C4 Column Interconnects (1)
LAB
Neighbor
Notes to Figure 2–8:
(1) C4 interconnects can drive R4 interconnects. (2) This pattern is repeated for every LAB in the LAB row.
Altera Corporation 2–11 February 2007 Cyclone II Device Handbook, Volume 1
R4 Interconnect Driving Right
MultiTrack Interconnect
R24 row interconnects span 24 LABs and provide the fastest resource for long row connections between non-adjacent LABs, M4K memory blocks, dedicated multipliers, and row IOEs. R24 row interconnects drive to other row or column interconnects at every fourth LAB. R24 row interconnects drive LAB local interconnects via R4 and C4 interconnects and do not drive directly to LAB local interconnects. R24 interconnects can drive R24, R4, C16, and C4 interconnects.

Column Interconnects

The column interconnect operates similar to the row interconnect. Each column of LABs is served by a dedicated column interconnect, which vertically routes signals to and from LABs, M4K memory blocks, embedded multipliers, and row and column IOEs. These column resources include:
Register chain interconnects within an LAB
C4 interconnects traversing a distance of four blocks in an up and
down direction
C16 interconnects for high-speed vertical routing through the device
Cyclone II devices include an enhanced interconnect structure within LABs for routing LE output to LE input connections faster using register chain connections. The register chain connection allows the register output of one LE to connect directly to the register input of the next LE in the LAB for fast shift registers. The Quartus II Compiler automatically takes advantage of these resources to improve utilization and performance. Figure 2–9 shows the register chain interconnects.
2–12 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Figure 2–9. Register Chain Interconnects
t
Local Interconnect Routing Among LEs in the LAB
Cyclone II Architecture
Carry Chain
Routing to
Adjacent LE
Local
Interconnect
LE 1
LE 2
LE 3
LE 4
LE 5
LE 6
LE 7
LE 8
LE 9
LE 10
LE 11
LE 12
LE13
LE 14
LE 15
Register Chain Routing to Adjacen LE's Register Input
LE 16
The C4 interconnects span four LABs, M4K blocks, or embedded multipliers up or down from a source LAB. Every LAB has its own set of C4 interconnects to drive either up or down. Figure 2–10 shows the C4 interconnect connections from an LAB in a column. The C4 interconnects can drive and be driven by all types of architecture blocks, including PLLs, M4K memory blocks, embedded multiplier blocks, and column and row IOEs. For LAB interconnection, a primary LAB or its LAB neighbor (see Figure 2–10) can drive a given C4 interconnect. C4 interconnects can drive each other to extend their range as well as drive row interconnects for column-to-column connections.
Altera Corporation 2–13 February 2007 Cyclone II Device Handbook, Volume 1
MultiTrack Interconnect
4
Figure 2–10. C4 Interconnect Connections Note (1)
C4 Interconnect Drives Local and R Interconnects Up to Four Rows
C4 Interconnect Driving Up
LAB
Row Interconnect
Adjacent LAB can drive onto neighboring LAB's C4 interconnect
Local
Interconnect
Primary
LAB
LAB
Neighbor
C4 Interconnect Driving Down
Note to Figure 2–10:
(1) Each C4 interconnect can drive either up or down four rows.
2–14 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
C16 column interconnects span a length of 16 LABs and provide the fastest resource for long column connections between LABs, M4K memory blocks, embedded multipliers, and IOEs. C16 column interconnects drive to other row and column interconnects at every fourth LAB. C16 column interconnects drive LAB local interconnects via C4 and R4 interconnects and do not drive LAB local interconnects directly. C16 interconnects can drive R24, R4, C16, and C4 interconnects.

Device Routing

All embedded blocks communicate with the logic array similar to LAB-to-LAB interfaces. Each block (for example, M4K memory, embedded multiplier, or PLL) connects to row and column interconnects and has local interconnect regions driven by row and column interconnects. These blocks also have direct link interconnects for fast connections to and from a neighboring LAB.
Table 2–1 shows the Cyclone II device’s routing scheme.
Table 2–1. Cyclone II Device Routing Scheme (Part 1 of 2)
Destination
Cyclone II Architecture
Source
LE
Register Chain
Local Interconnect
R4 Interconnect
C4 Interconnect
R24 Interconnect
C16 Interconnect
M4K RAM Block
Direct Link Interconnect
Register Chain
Local Interconnect
Direct Link Interconnect
R4 Interconnect
R24 Interconnect
C4 Interconnect
C16 Interconnect
Altera Corporation 2–15 February 2007 Cyclone II Device Handbook, Volume 1
v
v vvvv
vvvv
v vvvv
vvvv
v
vvvvvv
PLL
Column IOE
Embedded Multiplier
Row IOE

Global Clock Network & Phase-Locked Loops

Table 2–1. Cyclone II Device Routing Scheme (Part 2 of 2)
Destination
Source
Register Chain
Local Interconnect
LE vvvv v
M4K memory Block
Embedded Multipliers
PLL vv v Column IOE vv Row IOE vvvv
vvv v
vvv v
R4 Interconnect
Direct Link Interconnect
C4 Interconnect
R24 Interconnect
C16 Interconnect
LE
M4K RAM Block
Embedded Multiplier
PLL
Row IOE
Column IOE
Global Clock Network & Phase-Locked Loops
2–16 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Cyclone II devices provide global clock networks and up to four PLLs for a complete clock management solution. Cyclone II clock network features include:
Up to 16 global clock networks
Up to four PLLs
Global clock network dynamic clock source selection
Global clock network dynamic enable and disable
Cyclone II Architecture
Each global clock network has a clock control block to select from a number of input clock sources (PLL clock outputs, CLK[] pins, DPCLK[] pins, and internal logic) to drive onto the global clock network. Tab le 2 –2 lists how many PLLs, CLK[] pins, DPCLK[] pins, and global clock networks are available in each Cyclone II device. CLK[] pins are dedicated clock pins and DPCLK[] pins are dual-purpose clock pins.
Table 2–2. Cyclone II Device Clock Resources
Device
EP2C5 2 8 8 8
EP2C8 2 8 8 8
EP2C15 4 16 20 16
EP2C20 4 16 20 16
EP2C35 4 16 20 16
EP2C50 4 16 20 16
EP2C70 4 16 20 16
Number of
PLLs
Number of
CLK Pins
Number of
DPCLK Pins
Number of
Global Clock
Networks
Figures 2–11 and 2–12 show the location of the Cyclone II PLLs, CLK[]
inputs, DPCLK[] pins, and clock control blocks.
Altera Corporation 2–17 February 2007 Cyclone II Device Handbook, Volume 1
Global Clock Network & Phase-Locked Loops
Figure 2–11. EP2C5 & EP2C8 PLL, CLK[], DPCLK[] & Clock Control Block Locations
DPCLK10 DPCLK8
Clock Control
Block (1)
PLL 2
DPCLK0
CLK[3..0]
DPCLK1
4
4
PLL 1
DPCLK2
GCLK[7..0]
Note to Figure 2–11:
(1) There are four clock control blocks on each side.
4
8
8
8
4
8
GCLK[7..0]
Clock Control
DPCLK4
DPCLK7
CLK[7..4]
DPCLK6
Block (1)
2–18 Altera Corporation Cyclone II Device Handbook, Volume 1 February 2007
Figure 2–12. EP2C15 & Larger PLL, CLK[], DPCLK[] & Clock Control Block Locations
DPCLK[9..8]DPCLK[11..10]
CDPCLK7
CLK[11..8]
4
CDPCLK6
22
Cyclone II Architecture
4
3
GCLK[15..0]
Clock Control
Block (1)
Clock Control
Block (1)
16
16 16
16
GCLK[15..0]
3
4
4
22
CLK[15..12] CDPCLK3
DPCLK[5..4]DPCLK[3..2]
3
PLL 4
CDPCLK5
4
DPCLK7
CLK[7..4]
4
DPCLK6
CDPCLK4
CDPCLK0
DPCLK0
CLK[3..0]
DPCLK1
CDPCLK1
PLL 3 PLL 2
(2) (2)
4
4
3
(2) (2)
PLL 1
CDPCLK2
Notes to Figure 2–12:
(1) There are four clock control blocks on each side. (2) Only one of the corner CDPCLK pins in each corner can feed the clock control block at a time. The other CDPCLK pins
can be used as general-purpose I/O pins.
Altera Corporation 2–19 February 2007 Cyclone II Device Handbook, Volume 1
Loading...
+ 43 hidden pages