NOTICE of DISCLAIMER: The information given in this document is believed to be accurate
and reliable. However, Achronix Semiconductor Corporation does not give any
representations or warranties as to the completeness or accuracy of such information and
shall have no liability for the use of the information contained herein. Achronix
Semiconductor Corporation reserves the right to make changes to this document and the
information contained herein at any time and without notice. All Achronix trademarks,
registered trademarks, and disclaimers are listed at http://www.achronix.com and use of this
document and the Information contained therein is subject to such terms.
2 UG027, May 21, 2014
3
Table of Contents
Copyright Info .................................................................................................... 2
Table of Contents .............................................................................................. 3
Reset Sources and the Reset Input Block ........................................................................... 24
Reset Distribution ................................................................................................................. 26
Revision History .............................................................................................. 28
UG027, May 21, 2014
Introduction
Resource
Number of resources
Clock input pins
24 single ended (12 differential)
Global core clocks
48
Direct core clocks
32
Clocks per IP column in fabric
16
Boundary clocks
16
Global boundary clock hierarchy levels
4
Local boundary clock hierarchy levels
2
Clock generators (CGs)
4 (1 per device corner)
Phase Locked Loops (PLLs)
16 (4 in each CG)
Clock and Reset Networks Overview
Speedster22iHD FPGAs have two hierarchical clock networks: a core clock network and a
boundary clock network.
The core clock network is the hierarchical network that feeds resources in the FPGA fabric.
There are two types of core clock networks: a global and a direct. Both of them have common
input sources: Clock input pins and PLL outputs which make up the Clock Generator (CG)
and recovered SerDes input clocks. These input sources get channeled in from both the north
and south sides of the device, and are then fed into the FPGA core through a central trunk.
The boundary clock network is a fully programmable clock network in the IO ring, unique to
Speedster22iHD FPGAs, that provides for significant advantages when clocking IO ring
resources at high frequencies. The boundary clock network is comprised of a low skew global
boundary clock network and a lower jitter local boundary clock network. The inputs to the
boundary clock network are CGs.
Reset signals generated internally or coming from GPIOs are funneled through the Reset
Input Blocks in the device corners into the FPGA core and the IO ring. The IO ring contains a
dedicated reset network but for distribution to core logic, the clock network resources
described above are used.
Clock Resource Counts
The following table lists the clock resources available on Speedster22iHD FPGAs.
Table 1: Clock Resource Counts on Speedster22iHD FPGAs
4 UG027, May 21, 2014
5
Reset Resource Counts
Resource
Number of resources
Reset input pins
32 - 40 (pkg dependent)
Reset input blocks
4
Reset signal groups in IO ring
2
Reset bus width in IO ring
16
The following table lists the reset resources available on Speedster22iHD FPGAs.
Table 2: Reset Resource Counts on Speedster22iHD FPGAs
UG027, May 21, 2014
Clock Sources
As mentioned earlier, the clock sources are Clock Generators (CGs) and recovered SerDes
input clocks.
There are four CGs on a Speedster 22iHD FPGA, one in each corner of the device. Each CG
contains six clock I/O buffers (CBs) and four Phase Locked Loops (PLLs). The clock buffers
can be used differential I/Os or single‐ended I/Os. If these I/Os are not used as clock buffers,
they can be used as generic inputs or outputs.
The PLLs are low jitter, wide range, independent multi-phase outputs with glitch-free phase
rotators that can be used for PLL outputs of up to 2 GHz for core circuit applications. The
PLLs support both integer mode and fractional mode operation.
Each pair of SerDes lanes is provided one reference clock. Each SerDes lane has its own pair
of PLLs listed below to generate and forward clocks to the fabric:
Thus each SerDes lane provides two word-clocks (Tx and Rx) to the fabric.
a. A transmit PLL, which synthesizes the Tx clock directly from the reference clock,
and then a slower Tx word-clock for data-input from the fabric,
b. A receive PLL, which synthesizes a Rx bit-clock (and corresponding word-clock)
from the incoming data-stream.
For source synchronous transfers, there are additional clock networks known as byte-lane
clock networks that may be used especially when a small amount of logic in the fabric needs
clocking. In these cases, clocks can be routed directly into the fabric along with the data.
Reset Sources
Each corner of a Speedster22i FPGA has an individual Reset Input Block. This block receives
external reset inputs as well as inputs generated internally within the device. External reset
inputs can be driven by dedicated clock pads as well as a number of GPIOs located in the
East-North (EN), East- South (ES), West-North (WN) or West-South (WS) sides of the device.
Internal reset inputs are driven through data and clock paths in the logic fabric.
The inputs to the Reset Input Block generated either externally or internally are required to
be active‐low and glitch free. The input resets can be either asynchronous or synchronous. An
asynchronous reset is synchronized for de‐assertion to each and every clock domain where it
is utilized. A synchronous reset does not need to be synchronized to the same clock domain
but is synchronized when used in any other clock domain not synchronous with the current
clock domain.
6 UG027, May 21, 2014
7
Core Clock Network
This chapter focuses solely on the core clock network, illustrating the global and direct core
clock network hierarchies and providing a more in-depth look at the different components
that make up these networks.
Global and Direct Core Clock Network
Global Core Clock Network
The global core clock network is a balanced and low-skew H-tree that enables clock
distribution to all parts of the Speedster 22iHD FPGA fabric. Clock signals coming in from
the top and bottom CGs and SerDes blocks are routed through a clock hub and aggregated at
the center of the device. These are then provided to all clock regions (see section on Clock
Region for details) on both the west and east sides.
As shown in Table 1, a total of 48 global clock signals are generated in the clock hub. These
are then distributed to all clock regions. Every clock region supports up to a maximum of 16
clocks, and it is able to select, amongst other sources, any of the 48 available global clocks.
Figure 1 below provides a high level illustration of the routing and connection paths for the
H-tree global core clock network.
UG027, May 21, 2014
GPIOGPIO
SerDes
SerDes
GCG
GCG
GCG
GCG
Figure 1: Global Core Clock Network
Direct Core Clock Network
The direct core clock network is a distribution system that provides for much lower clock
insertion delay, which is particularly useful for more complex designs that utilize multiple
clocks and require clocks to be internally generated and re-distributed to certain parts of the
FPGA fabric. Each branch of the direct clock network is restricted to the clock region it
reaches. Furthermore, direct clocks in each of the clock regions have different insertion
delays, which may result in significant inter-region clock skew.
The clock sources for the direct core clock network are fundamentally the same as those for
the global core clock network. The main difference between the direct and global clocks is
that direct clocks get distributed to the clock regions directly out of the top and bottom clock
muxes, without going through the clock hub (H-tree) in the center of the device.
There are a total of 32 direct clocks coming in from the top clock mux, and 32 coming in from
the bottom clock mux. A pre-designated set of 12 clocks from this group of 32 is distributed
to every clock region. Once inside the clock region, these 12 clocks are then muxed with other
incoming clocks to provide for the set of 16 that ultimately goes into the LUTs, memories and
DSPs in that clock region.
Figure 2 below illustrates the direct core clock network hierarchy.
8 UG027, May 21, 2014
9
GPIOGPIO
SerDes
SerDes
GCG
GCG
GCG
GCG
Figure 2: Direct Core Clock Network
UG027, May 21, 2014
Core Clock Network Components
SerDes
PLL
GCGs
PLL
PLLs
Clock
Buffers
PLL
GCGs
PLL
PLLs
Clock
Buffers
PLL
GCGs
PLL
PLLs
SerDes
Clock
Buffers
PLL
GCGs
PLL
PLLs
Clock
Buffers
I/O
Bank
Byte LanesByte Lanes
I/O
Bank
Byte LanesByte Lanes
I/O
Bank
Byte LanesByte Lanes
I/O
Bank
Byte LanesByte Lanes
Clock Region W1
Clock Region Wn
Clock Region E1
Clock Region En
Clock
Hub
Clock Mux
Top
Clock Mux
Bottom
CRMU
Wn
CRMU
E1
CRMU
En
CRMU
W1
Figure 3 provides block level highlights of the different core clock network components on
the Speedster 22iHD FPGA. These are color coded to better distinguish between them. The
sections below provide more in depth explanations of these different components.
Figure 3: Core Clock Network Components on the FPGA
Clock Generator (CG)
There are four Clock Generators (CGs) on a Speedster 22iHD FPGA, one in each corner of the
device. Each CG contains six clock I/O buffers (CBs) and four Phase Locked Loops (PLLs) and
each PLL has four output counters.
The six clock buffers can be used either as three differential I/Os or six single‐ended I/Os. If
these I/Os are not used as clock buffers, they can be used as generic inputs or outputs. These
buffers can be easily identified in the pin table or the package spreadsheet provided by ACE,
by filtering the USE column to show the clock pins. Each of these clock pins can be uniquely
identified by referencing one of the four bank locations (CB0, CB1, CB2 and CB3
corresponding to NW, SW, SE and NE respectively) as well as the pad number (0-6). The
table snapshot from the package spreadsheet below illustrates these concepts.
10 UG027, May 21, 2014
11
Table 3: Snapshot of HD1000 52.5mm package spreadsheet to show clock buffers
Port_Name
DIFF-PAIR
USE
BANK
PURPOSE
CLK/RST USE
pad0_clk_bank_ne
P
C
CB3
USER
CLK RST
pad0_clk_bank_nw
P
C
CB0
USER
CLK RST
pad0_clk_bank_se
P
C
CB2
USER
CLK RST
pad0_clk_bank_sw
P
C
CB1
USER
CLK RST
pad1_clk_bank_ne
N
C
CB3
USER
CLK RST
pad1_clk_bank_nw
N
C
CB0
USER
CLK RST
pad1_clk_bank_se
N
C
CB2
USER
CLK RST
pad1_clk_bank_sw
N
C
CB1
USER
CLK RST
pad2_clk_bank_ne
P
C
CB3
USER
CLK RST
pad2_clk_bank_nw
P
C
CB0
USER
CLK RST
pad2_clk_bank_se
P
C
CB2
USER
CLK RST
pad2_clk_bank_sw
P
C
CB1
USER
CLK RST
pad3_clk_bank_ne
N
C
CB3
USER
CLK RST
pad3_clk_bank_nw
N
C
CB0
USER
CLK RST
pad3_clk_bank_se
N
C
CB2
USER
CLK RST
pad3_clk_bank_sw
N
C
CB1
USER
CLK RST
pad4_clk_bank_ne
P
C
CB3
USER
CLK RST
pad4_clk_bank_nw
P
C
CB0
USER
CLK RST
pad4_clk_bank_se
P
C
CB2
USER
CLK RST
pad4_clk_bank_sw
P
C
CB1
USER
CLK RST
pad5_clk_bank_ne
N
C
CB3
USER
CLK RST
pad5_clk_bank_nw
N
C
CB0
USER
CLK RST
pad5_clk_bank_se
N
C
CB2
USER
CLK RST
pad5_clk_bank_sw
N
C
CB1
USER
CLK RST
Ref Clk
Divider
PFD /
CP / LF
Sigma-
delta
Fbk Clk
Divider
Phase Rotator w/
Divider
Mux
VCO
8-phase
16 bit
4 independent
outputs
Feedback clock
from clock tree
UG027, May 21, 2014
Phase Locked Loop (PLL)
The PLLs are low jitter, wide range, independent multi-phase outputs with glitch-free phase
rotators that can be used for PLL outputs of up to 1066MHz for core circuit applications. The
block diagram below shows a high-level view of the PLL architecture.
Figure 4: PLL Architecture
The PLL contains the following major blocks: reference clock divider, feedback divider,
sigma-delta modulator, phase rotator with divider, mux to select internal or external
feedback signal, phase-frequency detector, charge pump, loop filter and VCO.
The input reference clock can be divided by the reference clock divider. The divider ratio (M)
range is 1 to 63 with 50% duty-cycle. This PLL only supports reference clock range of 30MHz
to 400MHz (after reference clock divider frequency).
The feedback divider ratio (Q) range is 2 to 255 in integer mode with 50% duty cycle. In
fractional mode, the PLL supports 8 to 254 in the integer part. The fractional part (F)
resolution is 16-bit, which is generated by a sigma-delta modulator.
The phase rotator can shift the output clock phase by 1/8th of the internal VCO clock period
at a time. It contains a 50% duty cycle divider with ratio (N) of 1 to 63. The phases of each of
the 4 phase rotators can be independently adjusted, so can the divider ratios. All 4 phase
rotator’s phase can be re-set simultaneously.
The VCO is a 4-stage differential ring oscillator with 8-phase outputs. The VCO also has a
extra divide-by-2 option circuit for process variation backup purpose. It can be turned on in
case the process is too fast.
The PLL reference clock can come from any of the 6 IO buffers in the same CG, or from any
of the four outputs of the preceding PLL. E.g., the rclk of PLL 1 can be driven by PLL 0 (the
order is 0 -> 1 -> 2 -> 3 -> 0). The PLL reference clock is divided by the reference clock divider
(6-bit: 1 to 63) before being sent to the PFD. The VCO generates 8 equally separated phases,
one of which is sent to the feedback divider through a mux to allow the PLL running in short
loop operation without de-skew. All 8-phases are sent to 6 phase rotators which can
independently select one of the 8 phases. This then goes through an output divider (6-bit: 1 to
63) before being sent out of the PLL block. One of the 4 output clocks, after going through the
clock distribution tree, has an option to be sent to the feedback divider for de-skew
functionality.
The PLL has 3 modes of feedback clock selection:
1. Internal feedback mode: the VCO clock is divided by the feedback divider only. In this
mode, the PLL can have both integer and fractional divider ratios. But the PLL does not
provide deskew capability. The VCO frequency is related to the reference clock frequency
through: Fvco=(Q/M)*Fref in integer mode, and Fvco=(Q.F/M)*Fref in fractional mode.
2. External feedback mode: the VCO clock is divided by the output divider inside the phase
rotator. Only the integer divider ratio is supported. The output clock from one of the phase
rotators is sent to downstream logic. After being consumed by the downstream logic, the
clock is fed back to PLL for deskew. Please note in this mode, it is recommend to not rotate
the phase rotator in the feedback path, if the lock signal is required to be high during the
operation because the phase rotator action introduces phase errors to the PLL, and could lead
to the PLL losing lock. The other 5 phase rotators can be used for rotating the phase. The
VCO frequency is related to the reference clock frequency through: Fvco=(N/M)*Fref.
3. Mixed feedback mode: This mode should only be used when the output divider range is
not enough. The VCO is divided by the output divider inside one of the phase rotators. The
output clock from the phase rotator goes to the downstream logic. The consumed clock is
sent back to the PLL for deskew. The feedback clock is sent to the feedback divider before it is
sent to PFD. Only the integer mode of the feedback divider ratio should be used in this mode.
The VCO frequency is related to the reference clock frequency through: Fvco=(Q*N/M)*Fref.
12 UG027, May 21, 2014
13
The PLL IP comes with a built-in LDO and a Band Gap reference circuit. The LDO takes
Performance Features
Units
Bandwidth
Tracking between 1/10th and 1/8th of
reference clk freq
Feedback Divider
8 (2 to 255); in fractional mode, only
support 8 to 254
Bits
Post Divider
6
Bits
Reference Clock Divider
6
Bits
Number of Post Dividers
4
Fractional Synthesizer Support
PLL includes a 16-bit accurate fractional
synthesizer.
Spread Spectrum
No support
Feedback signal delay(max)
Half of divided reference clock period
ns
Operation Mode
Normal, Bypass, Pwrdn, Reset
Internal phase separation
12.5% output cycle
Internal phase accuracy
+/-3.5% output cycle at 2GHz
Output phase accuracy
+/-5% output cycle at 2GHz
Number of selectable Phases
8
Each PLL output clock can select and
change to one of the 8 phases
dynamically in a glitch-free manner.
500 ref clk periods (integer mode);
1000 ref clk periods (fractional mode)
Reset divide-by-1 output
frequency range
30MHz – 50 MHz
external analog voltage (PA_VDD, 1.5V to 1.8V) and generates internal analog voltage to
provide cleaner supply voltage to PLL. The Band Gap reference circuit provides the reference
voltage for the LDO. In order to calibrate the Band Gap reference and the LDO, without
requiring an analog pin, a built-in ADC reference is included in the PLL block as well.
The PLL can be controlled in two ways: one way is to use the direct control through the
interface pins; the other is to use AHB CSR interface. More extensive descriptions of these
controls can be found in UG021 - Speedster22i Macro Cell Library. The PLL performance
specifications are listed in Table 4 below.
Table 4: PLL Performance Specifications
In ACE, users are provided with the option to configure PLLs using the “Basic PLL” IP
generator or the “Advanced PLL” IP generator. The “Basic PLL” generator helps the users in
setting up and configuring the PLL parameters and modes of operation based on the users
desired behavior.
UG027, May 21, 2014
On the other hand, the “Advanced PLL” gives users much more flexibility in setting up the
To clock networkTo clock networkTo clock networkTo clock network
individual PLL parameters, settings, counter values etc. and gives the user full control over
exactly how the PLL is tuned. Obviously, in the case of the “Advanced PLL”, it is the user’s
responsibility to ensure that the settings provided match the intended behavior.
PLL Cascading: An important feature with the PLLs in every corner of the device is the
ability to cascade them by feeding the output clock of one to the reference clock input of the
subsequent one. Each PLL can be cascaded with its neighbor only, in a circular fashion: PLL0
can generate the refclk for PLL1; PLL1 can generate the refclk for PLL2; PLL2 for PLL3; and
PLL3 for PLL0. Any of the four counter outputs can be used as a reference clock for the
neighboring PLL. However, in the rare cases where that is insufficient, a PLL output can also
be configured as bypass, so that the output follows refclk. So if PLL0 must provide the refclk
for both PLL1 and PLL2, you could use one output of PLL1 as bypass (leaving 3 outputs for
regular use), and drive PLL2 with that bypassed clock.
Refer to Figure 5 below for a graphical representation of this feature.
Figure 5: PLL Reference Clock Cascading
The reference clock cascading mechanism is useful for a number of different reasons. Two of
the main ones are as follows:
It allows for a wider range of output phases/frequencies to be generated without
needing to use the fractional mode feature in PLLs.
It enables a larger set of clock outputs to be generated based off of the same reference
clock, without requiring changes at the board-level.
Clock Mux
There are two clock muxes in the FPGA, one at the top center of the device and one at the
14 UG027, May 21, 2014
bottom center. Each one of these is used to aggregate 128 clock signals coming in from all of
the clock sources described earlier (also shown below) and output a 32-bit bus that can be fed
15
into the clock hub. Figure 6 below provides a detailed view of the internals of the clock mux
Clock Mux
Top
6
16
16
6
16
16
52
128 to 32 Fully
General Crossbar
From Top SerDes Lanes
32
From NW Clock Generator through
Global Boundary Clock
From NW Clock Generator through
Local Boundary Clock
To Clock Hub
No ConnectNo Connect
From NE Clock Generator through
Global Boundary Clock
From NE Clock Generator through
Local Boundary Clock
located at the top center of the device.
Clock Hub
The clock hub in the center of the device collects the two 32-bit buses coming in from the top
and bottom clock muxes, as well as a 16-bit bus from the data interconnect to generate the 48bit global clock bus.
The 48-bit global clock bus (shown in green) travels back up and down the length of the clock
hub to feed each of the clock regions.
The 32-bit buses coming in from the top and bottom (shown in red) are actually also the
direct core clocks which are tapped off at each of the clock regions. A 12-bit subset of this 32bit bus is selected as shown in Figure 7 below.
The 16-bit bus coming in from the data interconnect (shown in blue), is one of the ways of
getting a signal from data generated in the FPGA fabric into a clock network, and this path is
the only way of generating a global core clock signal from the interconnect data.
Figure 7 below provides a detailed view of the clock hub.
Figure 6: Detailed View of the Top Clock Mux
UG027, May 21, 2014
Clock
Hub
32
From Clock Mux Top
32
From Clock Mux Bottom
16
From data
interconnect
in fabric
80 to 48
Crossbar
48
48
48
48
48
48
48
48
48
12
12
12
12
To Clock
Region W1
To Clock
Region W2
12
12
12
12
To Clock
Region Wn-1
To Clock
Region Wn
To Clock
Region E1
To Clock
Region E2
To Clock
Region En-1
To Clock
Region En
Global Core Clk
Direct Core Clk
H-Tree
Clock Region
Figure 7: Detailed View of the Clock Hub
Clock regions in Speedster 22iHD FPGAs have fixed heights, such that a DDR controller on
the left and right sides of the IO ring can feed 3 clock regions each. The width, and
consequently, number of IP columns is variable. This means that both the size and number of
clock regions will vary from one member of the Speedster 22iHD FPGA family to the other.
The total clock region count in any device is 3 x the total number of DDR controllers on the
die. There are an equal number of clock regions on the left and right halves of the device.
Each clock region is fed 48 global and 12 direct clock inputs from the clock hub. The clock
selection for these is detailed in Figure 7 above. In addition a 16-bit bus from the data
interconnect can also be fed here to drive only the clocks in a specific region.
All of these inputs are fed into a Clock Region Management Unit (CRMU) which performs an
appropriate clock selection and outputs a 16-bit bus which is then fanned out to all of the
columns of RLBs, MULTs, LRAMs and BRAMs in that clock region. An illustration of a clock
region is shown in Figure 8 below.
16 UG027, May 21, 2014
17
12
RLBs
RLBs
RLBs
CRMU
RLBs
RLBs
RLBs
BRAMs
BRAMs
BRAMs
BRAMs
BRAMs
BRAMs
RLBs
RLBs
RLBs
RLBs
RLBs
RLBs
MULTs
MULTs
MULTs
MULTs
MULTs
MULTs
RLBs
RLBs
RLBs
RLBs
RLBs
RLBs
BRAMs
BRAMs
BRAMs
BRAMs
BRAMs
BRAMs
LRAMs
LRAMs
LRAMs
LRAMs
LRAMs
LRAMs
MULTs
MULTs
MULTs
MULTs
MULTs
MULTs
48
Global Clocks
from Clock Hub
Direct Clocks
from Clock Mux
Clock
Region
16
From data
interconnect in fabric
16
Figure 8: Detailed View a Clock Region
Clock Region Management Unit (CRMU)
Every clock region contains a Clock Region Management Unit (CRMU), which does a variety
of functions, including clock muxing and clock division.
The CRMU has two clock muxing blocks, each taking in half of the incoming global clocks
and direct clocks as well as a quarter of the signals coming in through the 16-bit data
interconnect bus. These signals go through a 32-8 mux and are then optionally divided down.
These 8-bit buses are then aggregated to form the 16-bit clock output bus which is provided
to all of the columns in the clock region fabric. The clock division and gating logic is
controlled by signals coming in from the data interconnect bus.
A summary of the features available in the CRMU are as follows:
a. Divider, with 5 (static) divide-by settings: 1, 2, 4, 6, 8,
b. Dynamic Clock gate, allowing real-time clock gating (for power management),
c. Glitchless mux, allowing dynamic muxing between different clock sources for that
particular clock region.
Figure 9 below illustrates the internals of the CRMU.
UG027, May 21, 2014
CRMU
32 to 8
Crossbar
Additional
Control
Logic
32 to 8
Crossbar
Additional
Control
Logic
Clk Div &
Gating
Clk Div &
Gating
6
24
2
6
24
Global Core Clocks
from Clock Hub
Direct Core Clocks
from Clock Mux
2
From data
interconnect in fabric
16
From data
interconnect in fabric
16
8
8
8
8
16
To Fabric
Columns
Global Core Clocks
from Clock Hub
Direct Core Clocks
from Clock Mux
Clk Div &
Gating
Clock Divider
(1, 2, 4, 6 or 8)
Clock GateClock Switch
Enable
Select /
Deselect
Clk_Out
Clk_InClk_In2
Figure 9: Detailed View of a Clock Region Management Unit (CRMU)
Figure 10 below provides a more in-depth look at the 3 features available in the CRMU.
Clock division logic is controlled by static configuration memory bits. The clock gate relies on
an “Enable” input to the logic, while the glitchless clock switch module uses select/deselect
logic to select between 2 different clock inputs. The divider and gating or switching logic can
also be cascaded as shown. Note that Figure 10 is a conceptual view of the circuitry for one
instance only and these functions exist for all of the clock inputs.
Figure 10: Internals of the Clock Division, Gating and Switching Circuitry
18 UG027, May 21, 2014
19
Junctions
Data-to-Clock Junctions
There are multiple junction-points in the fabric where a data-interconnect signal can drive a
clock network signal:
a. Clock Hub (16 data inputs; already discussed above),
b. CRMU (4 data inputs per region; already discussed above),
c. RLB input (any Logic Cluster clock can be driven by a selected data signal),
d. Selected BRAM, LRAM, or Multiplier input.
Based on the fanout and other requirements of the clock signal that is generated in the data
interconnect portion of the programmable logic fabric, an appropriate junction point would
be selected by ACE.
Clock-to-Data Junctions
Switching elements can allow some LUT inputs, BRAM inputs, LRAM inputs, and BMAC
inputs to be driven by a signal outputted by the CRMU. In particular, this would allow
certain functions that may have high fanout like register resets and clock enables to be driven
by signals from the CRMU.
UG027, May 21, 2014
Byte-Lane Clocks
Byte-Lanes 0-3Byte Lanes 4-7Byte Lanes 8-12
Core Fabric
West-North (WN) IO Cluster
Clock Region West 1
Clock Region West 2
Clock Region West 3
RLBs
RLBs
MULTs
MULTs
16
16
RLBs
RLBs
MULTs
MULTs
16
16
RLBs
RLBs
MULTs
MULTs
16
16
Clk
div4
Clk
div2
IO Ring
For source synchronous transfers, where data and clock are both sent from the IO ring to the
core, using one of the clock networks as described above would result in a large skew
between the data and the clock.
For this reason, there is a dedicated entry point from the byte lanes into the clock inputs of
the clock regions, via the data interconnect route in the fabric into the CRMU. These are
known as byte-lane clocks and are predominantly used to forward data strobe (dqs) clock
signals from the PHY into the FPGA fabric. A byte lane in a Speedster22i FPGA consists of 12
I/O buffers. For every set of 13 byte-lanes making up an IO cluster, (which corresponds to the
IOs used in the hard DDR3 controllers), there is a clock signal that is used for sampling the
data at the IO PHY. This clock signal can go through 2 clock dividers, a divide-by-2 clock
module and a divide-by-4 clock module. The outputs of these modules, which correspond to
the half-rate or quarter-rate clocks in a memory interface application can be forwarded to the
3 clock regions corresponding to that IO site. Note that for memory interface applications,
usage of mesochronous synchronizers in the Rx datapath will allow for clock domain
crossing between the dqs clock domain and a core clock network, eliminating the need for
clock forwarding into the fabric. For more details, please refer to the Memory PHY User
Guide.
Figure 11 below illustrates the the entry point of these byte-lane clocks into a clock region in
the WN site of the device. These clocks are muxed with the clocks coming from the CRMU
and are then fed to all of the blocks in the fabric IP column.
Figure 11: Byte-Lane Clocks in a Clock Region
20 UG027, May 21, 2014
21
Boundary Clock Network
Core Fabric
GCGGCG
GCGGCG
GCM
QCM
RCM
BCM
Core Fabric
GCGGCG
GCGGCG
GCM
BCM
Example Global Boundary Clock NetworkExample Local Boundary Clock Network
The boundary clock network is an architectural feature available in Speedster22iHD FPGAs
to help improve IO ring performance in high speed applications. There are two variants of
the boundary clock network: The first is a global boundary clock network, which ensures that
clocks coming in from all 4 sides of the device traverse a muxing network that provides for
low skew between them. The second is a local boundary clock network, which is a shortened
and more direct low jitter path from the CGs to the IO ring resources that need to be clocked.
The main advantages provided by the boundary clock network are as follows:
1. Resources clocked in the IO ring have much higher noise immunity to noise from the
fabric. The IO ring and boundary clock network are on a separate supply from the
rest of the core fabric, so even with very high core resource usage and toggle, power
supply noise is essentially eliminated.
2. Usage of the local boundary clock network provides for a much cleaner, lower jitter,
lower insertion delay path from the CGs to the IO ring resources.
3. Availability of additional clocking resources in the boundary frees up the core clock
network resources to be used for clocking in the fabric.
The main drawbacks of the local boundary clock network are that (a) usage of the different
local boundary clock networks requires more clocks coming into the different CG corners,
and (b) it provides for clocks that can have significant skew between them. Figure 12 below
shows a high-level illustration of the distinction between global and local boundary clock
networks.
Figure 12: Distinction Between Global and Local Boundary Clock Networks
UG027, May 21, 2014
There are four levels of hierarchy and muxing in the boundary clock network. These are, in
hierarchical order:
The global boundary clock network traverses through each of these mux stages, whereas the
local boundary clock uses a shortened path that ties the output of the GCM directly into the
BCM. The full boundary clock network architecture is shown in detail in Figure 13 below,
with red paths highlighting the GCM to QCM connectivity, the blue paths highlighting the
QCM to RCM connectivity and the dark yellow paths highlighting the RCM to BCM paths.
The local boundary clock network is shown in green.
Figure 13: Full Boundary Clock Network Architecture
Note that usage of the local boundary clock is more restrictive in that the South BCMs can
only be fed by the South GCMs/CGs. The North and Center BCMs can only be fed by the
North GCMs/CGs. The output of the BCMs can additionally be used to enter the core clock
22 UG027, May 21, 2014
network, though this is generally not recommended.
23
Clock Setting and Reporting
Much of the decision making and optimization for clock selection is automatically done by
the ACE tool to prevent no-routes. However, ACE does provide users with some options to
specify the type of clock networks they wish to use for particular implementations.
The clock type can be specified for a particular CG output in the sdc file as shown below. The
three options available are: {trunk, direct_trunk, boundary}.
Set_clock_type –direct_trunk {‘Hierarchical name of CG output’}
Trunk and direct_trunk correspond to the global and direct core clock networks, respectively.
For the boundary clock network, ACE will automatically decide whether to use the global or
local boundary clock network depending on placement and the application.
Once the clock constraints are specified in the design and the design goes through a full ACE
compilation flow, the routing step outputs text and html files called
“{design_name}_clocks_routed” describing the clock relationships, clock constraints and
clock regions in the design. The clock regions section provides detailed information on how
each of the clocks in the design were routed and how they are distributed in the used FPGA
clock regions, including the boundary. A legend is also provided in the file to help decipher
the routing details.
UG027, May 21, 2014
Reset Network
Reset
Input
Block
PLL
GCGs
PLL
PLLs
Clock
Buffers
I/O
Bank
Byte LanesByte Lanes
FPGA Core
This chapter examines the reset network in a little more detail and provides information on
the different networks, reset sources as well as associated circuitry in Speedster22i FPGAs.
Reset Sources and the Reset Input Block
Each corner of a Speedster22i FPGA has an individual Reset Input Block. This block receives
external reset inputs as well as inputs generated internally within the device. External reset
inputs can be driven by dedicated clock pads as well as a number of GPIOs located in the
East-North (EN), East- South (ES), West-North (WN) or West-South (WS) sides of the device.
Internal reset inputs are driven through data and clock paths in the logic fabric. Figure 14
below illustrates a Reset Input Block in the bottom-left corner of the device and the
appropriate signals coming in to the block.
The inputs to the Reset Input Block generated either externally or internally are required to
be active‐low and glitch free. The input resets can be either asynchronous or synchronous.
For resets to IO ring resources, there is automatic reset synchronization (using boundary
clocks) and pipelining to ensure that while reset assertion is asynchronous, the deassertion is
synchronous. The purpose of the reset pipeline insertion is to ensure that reset deassertion in
the I/O ring meets setup and hold time requirements, and occurs at the same clock tick for all
targets. This is done because (a) the STA does not verify timing requirements in the I/O ring,
24 UG027, May 21, 2014
Figure 14: Reset Input Block
25
and (b) even if it did the user wouldn’t be able to do anything to fix timing problems in the
DQDQ
nclrnclr
To nclr ports of logic
receiving reset
Nrst
Clk
Port_Name
DIFF-PAIR
USE
BANK
PURPOSE
CLK/RST USE
pad_en_byteio0_dq0
P
G
B50
USER
RST
pad_en_byteio0_dq1
N
G
B50
USER
RST
pad_en_byteio0_dq2
P
G
B50
USER
RST
pad_en_byteio0_dq3
N
G
B50
USER
RST
pad_es_byteio12_dq6
P
G
B32
USER
RST
pad_es_byteio12_dq7
N
G
B32
USER
RST
pad_es_byteio12_dq8
P
G
B32
USER
RST
pad_es_byteio12_dq9
N
G
B32
USER
RST
pad_wn_byteio0_dq0
P
G
B00
USER
RST
pad_wn_byteio0_dq1
N
G
B00
USER
RST
pad_wn_byteio0_dq2
P
G
B00
USER
RST
pad_wn_byteio0_dq3
N
G
B00
USER
RST
pad_ws_byteio12_dq6
P
G
B22
USER
RST
pad_ws_byteio12_dq7
N
G
B22
USER
RST
pad_ws_byteio12_dq8
P
G
B22
USER
RST
pad_ws_byteio12_dq9
N
G
B22
USER
RST
pad0_clk_bank_ne
P
C
CB3
USER
CLK RST
pad0_clk_bank_nw
P
C
CB0
USER
CLK RST
I/O ring.
For resets going into the FPGA fabric, the guideline is to use logic in the fabric to ensure that
the resets can be synchronized to the particular clock domains. One way of doing this is
shown in Figure 15 below, whereby the logic enables going into reset asynchronously, but
coming out of it asynchronously.
Figure 15: Reset Synchronization Logic
As far as the external reset sources arc concerned, all dedicated clock buffers can be used as
inputs to the Reset Input Block. Also, for IO banks bonded out on the East-North (EN), EastSouth (ES), West-North (WN) or West-South (WS) sides of the device, a total of four IO pads
can be used as inputs to the Reset Input Block. The table below from the HD1000 52.5mm
package spreadsheet dumped out by ACE highlights all the RST pins that can be used. The 8
shown in the red rectangle are only available on the 52.5mm package and not on the 45mm
package since the IO banks on the East-South (ES) and West-North (WN) sides of the device
are not bonded out on the 45mm package.
Table 5: Snapshot of HD1000 52.5mm package spreadsheet to show resets
UG027, May 21, 2014
Port_Name
DIFF-PAIR
USE
BANK
PURPOSE
CLK/RST USE
pad0_clk_bank_se
P
C
CB2
USER
CLK RST
pad0_clk_bank_sw
P
C
CB1
USER
CLK RST
pad1_clk_bank_ne
N
C
CB3
USER
CLK RST
pad1_clk_bank_nw
N
C
CB0
USER
CLK RST
pad1_clk_bank_se
N
C
CB2
USER
CLK RST
pad1_clk_bank_sw
N
C
CB1
USER
CLK RST
pad2_clk_bank_ne
P
C
CB3
USER
CLK RST
pad2_clk_bank_nw
P
C
CB0
USER
CLK RST
pad2_clk_bank_se
P
C
CB2
USER
CLK RST
pad2_clk_bank_sw
P
C
CB1
USER
CLK RST
pad3_clk_bank_ne
N
C
CB3
USER
CLK RST
pad3_clk_bank_nw
N
C
CB0
USER
CLK RST
pad3_clk_bank_se
N
C
CB2
USER
CLK RST
pad3_clk_bank_sw
N
C
CB1
USER
CLK RST
pad4_clk_bank_ne
P
C
CB3
USER
CLK RST
pad4_clk_bank_nw
P
C
CB0
USER
CLK RST
pad4_clk_bank_se
P
C
CB2
USER
CLK RST
pad4_clk_bank_sw
P
C
CB1
USER
CLK RST
pad5_clk_bank_ne
N
C
CB3
USER
CLK RST
pad5_clk_bank_nw
N
C
CB0
USER
CLK RST
pad5_clk_bank_se
N
C
CB2
USER
CLK RST
pad5_clk_bank_sw
N
C
CB1
USER
CLK RST
Reset Distribution
Reset signals need to be distributed to both the FPGA core fabric as well as the IO ring, which
includes the GPIOs, SerDes and hard IP. There is no dedicated reset network in the FPGA
programmable fabric, so distribution to the core is generally recommended to be done using
the global and direct clock network resources described above, to take advantage of timing
and load balancing.
For the IO ring, there is a dedicated 16-bit reset bus that ensures a balanced reset assertion
and de-assertion latency across the entire device. This is made possible by pipelining the
reset distribution using the clock to which the reset is synchronized.
Each side of the device has two groups of reset signals running in opposite directions. Each
group consists of eight reset signals each, spanning the entire edge of the device in a
pipelined manner. The two groups of reset signals are tapped at each I/O bank or logic block
(eg. DDR controller, SERDES), using a configurable pipeline multiplexer with configurable
pipelined latency. The configuration is automatically set for each individual multiplexer to
balance the latency for the reset signals across the entire device. The outputs of the pipeline
multiplexer are subsequently distributed to the reset network inside the IO banks and logic
blocks. This is shown in Figure 16 below.
26 UG027, May 21, 2014
27
FPGA Core
Reset
source
Reset
source
Reset
source
Reset
source
8 bit
8 bit
8 bit
8 bit
8 bit
8 bit
8 bit
8 bit
P
Logic Block
PPP
PP
P
PPP
PP
P
P
P
P
P
P
P
P
P
P
P
P
Logic Block
Logic BlockLogic Block
Logic BlockLogic Block
Logic BlockLogic Block
P
P
P
P
P
P
P
P
PP
PP
PP
PP
Programmable
Pipeline
Figure 16: IO Ring Reset Network
UG027, May 21, 2014
Revision History
Date
Version
Revisions
04/05/2013
1.0
Initial Achronix release.
05/22/2013
1.1
Incorporated additional inputs from engineering.
06/07/2013
1.2
Updated PLL description and diagrams.
04/24/2014
1.3
Updated byte-lane clocks. Added in boundary clock network
and settings sections. Corrected clock sources.
05/01/2014
1.4
Updated PLL specs.
05/21/2014
1.5
Added information about PLL cascading.
The following table shows the revision history for this document.
28 UG027, May 21, 2014
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.