PCS Blocks in the Receiver (RX) .................................................................... 22
Transition Density Checker (TDC) ................................................................................................... 22
Polarity Bit Reversal (PB R) .............................................................................................................. 23
Symbol Alignment ............................................................................................................................ 23
Modes of Operation ........................................................................................................................................ 24
Bit Slider .......................................................................................................................................... 31
Test bench Setup for Simulation ..................................................................................................................... 79
SerDes Placement and Clocking Limitations .................................................................................................. 80
Wide Bus ......................................................................................................................................................... 86
Figure 7: 20 bit Order Reversal ............................................................................................................................................. 17
Figure 8: 20-bit Byte Order Swap/Reversal ......................................................................................................................... 17
Figure 10: Bit Order Inversion (16-bit Word) ...................................................................................................................... 18
Figure 11: Word Order Inversion (16-bit Word) ................................................................................................................. 19
Figure 12: 8b/10b Encoding Process ..................................................................................................................................... 21
Figure 20 Worst-case latency across PMA and PCS (in terms of clock-cycles) .............................................................. 44
Figure 21: Opening IP Configuration Perspective .............................................................................................................. 50
Figure 22: New IP Configuration Window ......................................................................................................................... 51
Figure 23: New IP Configuration Window- Overview Page ............................................................................................ 52
Table 4: List of Important Interface Signals for bit slider .................................................................................................. 32
Table 6: PRBS Patterns in PMA ............................................................................................................................................. 40
Table 7: PRBS Patterns in the PCS ........................................................................................................................................ 40
Table 8: Analog latency as a function of databus width .................................................................................................... 42
Table 9: Latency across the PCS blocks ................................................................................................................................ 43
Table 10:
Table 11: Supported Receiver (RX) Features ....................................................................................................................... 47
Table 12: Entry fields for Overview page ............................................................................................................................ 53
Table 23: DC and AC Switching Characteristics .............................................................................................................. 105
Table 25: Return Loss ........................................................................................................................................................... 107
Table 26: DC and AC Switching Characteristics .............................................................................................................. 108
Achronix Speedster22i FPGAs provide very high core fabric and I/O performance which
exceeds the system bandwidth requirements of various high end applications. The
Speedster22i device family supports up to 64 full-duplex SerDes lanes, each supporting up to
11.3 Gbps data rate.
The Physical Coding Sublayer (PCS) and Physical Media Attachment (PMA) sub-blocks
together comprise a single SerDes block. The SerDes PCS has explicit support for PCIe,
10GBASE-R, 1G Ethernet and XAUI. It also has some support for various other interconnect
protocols through PCS such as Interlaken, SPI4.2, Infiniband, Fiber-Channel, SAS/SATA,
SONET, OC, OBSAI and CPRI. The SerDes can be connected either to the embedded HardIPs (PCIe, Interlaken, and 10/40/100G MAC) or to the FPGA Fabric for soft implementation of
any other protocol supported.
Physical Media Attachment (PMA)
• Data rates supported
o 1.0625 – 11.3 Gbps
o 531.25 – 1062.5 Mbps using 2X over-sampling
o 265.625 – 531.25 Mbps using 4X over-sampling
•Independent lane architecture with dedicated synthesizer for each lane with no off-
chip components required
• Low power architecture (<100mW at 10Gbps)
• Support both AC and DC coupling
• Input driver with Continuous Time Linear Equalizer (CTLE) and Decision Feedback
Equalizer (DFE)
o Input voltage: 50 – 2000 mVp-p differential
o Auto-calibrating CTLE and DFE
o CTLE with up to 20dB gain tuned for key data rates
o Pulse-shaped 5-tap DFE
•Output driver with 4-tap Finite Input Response (FIR) filter with Feed Forward
Equalizer (FFE)
o Output voltage: 400 – 1500 mVp-p differential
o Slew rate: 31 – 170 ps
• Highly digital PLL architecture for the Synthesizer and CDR
oAccuracy & low jitter of an analog PLL
UG028, July 1, 2014
oTuning range of a digital PLL
7
o Programmable spread spectrum generation
o Support for 16-bit fractional multiplication factors
o Programmable spread spectrum clocking
o Support for fast lock mode for EPON/GPON
•On-chip scope in the receiver for measuring eye width, eye height and BER for the
incoming signal
• On-chip calibrated 100 ohm termination
• Transparent calibration engine to compensate for PVT variation
Clocking
• Support for external reference clock from 50 MHz – 300 MHz
• Support for recovered reference clock for loop timing and re-timer type applications
that eliminates the need for a cleanup PLL
Physical Coding Sublayer (PCS)
• Bypassable and Modular PCS architecture
• Support for 8b/10b and 128b/130b encoding
• Symbol alignment
• Clock and phase compensation FIFO
• Lane to lane de-skew
• Polarity inversion
• Bit reversal
• Lane bonding
• Low/Deterministic latency modes for protocols such as CPRI and OBSAI
Debug and Test
• Up to seven different near-end and far-end loopback modes in PMA and PCS
• Built-in self test (BIST)
o PRBS 7, 15, 23, 31 and 40-bit user defined pattern generators and checkers in
the PCS
o PRBS 7, 23, 31 and 40-bit user defined pattern generators and checkers in the
PMA
8 UG028, July 1, 2014
Major standards supported
Gen 2
5.0 Gbps
Gen 3
8.0 Gbps
SGMII
1.25 Gbps
XAUI (802.3ae)
3.125 Gbps
10GBASE-R (802.3ae)
10.3125 Gbps
(802.3ae)
Interlaken
--
3.125 – 10.3125 Gbps
SPI5
3.125 Gbps
SFI5.1
3.125 Gbps
SFI5.2
9.1 – 10.3125
CEI 6G
4.976 – 6.375 Gbps
CEI 11G
9.95 – 11.2 Gbps
FC-2
2.125 Gbps
FC-4
4.25 Gbps
FC-10
10.52 Gbps
OC-12
622.08 Mbps
OC-48
2488.32 Mbps
OC-192
9953.28 Mbps
Table 1: SerDes Standards
Standards Variation Data Rate(s)
Gen1 2.5 Gbps
PCI Express
Gigabit Ethernet
10 Gigabit Ethernet
OIF
1000BASE-CX 1.25 Gbps
XFI 10.3125 Gbps
10GBase-KR
10.3125 Gbps
XLAUI/CAUI
10.3125 Gbps
(802.3ae)
SFI4.2 3.125 Gbps
SFI-S 11.1 Gbps
UG028, July 1, 2014
FC-1 1.0625 Gbps
Fiber Channel
FC-8 8.5 Gbps
OC-24 1244.16 Mbps
SONET
9
Standards Variation Data Rate(s)
6.4 Gbps
SATA-1
1.5 Gbps
SATA-2
3.0 Gbps
SATA-3
6.0 Gbps
SAS-1
3.0 Gbps
SAS-2
6.0 Gbps
SAS-3
12.0 Gbps
Gen2
6.125 Gbps
10 Gbps
10 Gbps
QDR
10.0 Gbps
JESD204B
Up to 12.5 Gbps
CPRI
--
614.4 – 9830.4 Mbps
OBSAI
--
768 – 6144 Mbps
USB
3.0
5.0 Gbps
USB
3.1
10.0 Gbps
QPI
SATA
SAS
Serial Rapid I/O
E-PON 802.3av
Gen1
Gen1
Gen1
Gen2
4.8 Gbps
1.25 Gbps
2.5 Gbps
3.125 Gbps
5.0 Gbps
1.25 Gbps
2.5 Gbps
GPON --
InfiniBand
SDR
DDR
1.25 Gbps
2.5 Gbps
2.5 Gbps
5.0 Gbps
10 UG028, July 1, 2014
SerDes Placement
The Speedster22i device supports up to sixty-four (64), 11.3 Gbps SerDes lanes. Each side
(Top and Bottom) has thirty-two (32), 11.3 Gbps SerDes. The lanes are organized by channel
based, and are placed as illustrated in “
Figure 1: Location of SerDes Lanes
” below.
UG028, July 1, 2014
Figure 1: Location of SerDes Lanes
11
SerDes Architecture Overview
The SerDes has an independent lane architecture. Each lane has a Physical Media Attachment
(PMA), Synthesizer (Transmit PLL), Clock and Data Recovery (CDR) and Physical Coding
Sublayer (PCS). The Receiver PMA and Transmitter PMA block diagrams are shown in
“Figure 2: SerDes Architecture” below.
Figure 2: SerDes Architecture
The SerDes primarily consists of the following blocks:
• PMA
• PCS
• PCS interface to FPGA fabric
• Clocking
• Debug and Test
12 UG028, July 1, 2014
Physical Media Attachment (PMA)
The PMA architecture is shown in “Figure 3: PMA Architecture” below.
Figure 3: PMA Architecture
The PMA consists three major blocks:
1. Common
2. Receiver/Transmitter (RX/TX)
3. Digital PMA (DPMA)
1. Common
The common block consists of the following circuits:
•Reference clock: This circuit performs reference clock buffering and division before
feeding it to the Synthesizer.
•Synthesizer: The synthesizer (transmit PLL) generates the high speed clock for the
serializer of the Transmitter. It also has in-built circuit for spread-spectrum clocking
•Bias: The biasing circuit is responsible for controlling the offsets and biasing for the
all the analog circuits in the PMA
•Analog Test Port: This port is used by Achronix for manufacturing tests and for
debugging purposes
UG028, July 1, 2014
13
2. Receiver (RX)/Transmitter (TX)
The RX/TX block consists of the following circuits:
•TX buffer: Converts single-ended signal to differential and performs equalization on
(or pre-emphasis) the outgoing serial signal
•RX buffer: Converts differential signal to single ended and performs equalization on
incoming signal using Continuous Time Linear Equalizer (CTLE) and Decision
Feedback Equalizer (DFE)
•Clock Data Recovery (CDR): Recovers clock and data from the incoming signal for
deserialization
•On-Chip Scope: Used for plotting an eye of the incoming signal post equalization for
debug
•Serializer/Deserializer: Converts parallel data to serial data using a high speed clock
from the synthesizer
3. Digital PMA (DPMA)
The DPMA block consists of the following circuits:
•Calibration: Performs calibration of all the analog circuits using trim settings and
offsets
•PMA BIST: Includes PRBS 7, 23, 31 and 40-bit user defined pattern generators and
checkers Power management
• Configuration registers (Memory)
• JTAG and Boundary Scan
Figure 4: Synthesizer Architecture
14 UG028, July 1, 2014
Figure 5: Receiver Architecture
UG028, July 1, 2014
15
PCS Blocks in the Transmitter (TX)
This section presents the transmitter (TX) data path within a PCS. The key blocks within the
SerDes transmitter are:
•Encoder: Encodes the data for transmission line. Primary goal is to ensure DC
balance by eliminating long sequence of 1’s or 0’s.
•Polarity Bit Reversal (PBR): Inverts the polarity of data and ordering of data to be
transmitted.
The building block for the SerDes IP is the 1 lane configuration. A simplified block diagram
of the TX data path is shown in Figure 6: - PCS Transmitter Block Overview . The functional
blocks shown in the diagram represent the functionality supported by a single SerDes lane. A
summary of the supported standards is covered in “Table 1 – SerDes Standards”.
Figure 6: PCS Transmitter Block Overview
* SerDes configured in Generic mode supports only 8b/10b encoding.
** Either of PBR#0 or PBR#1 can be used or both may be bypassed.
Note: The PCS block will support lane-bonding across multiple SerDes lanes (max 12)
Chapter – “Design Flow: Creating a SerDes Design” presents the ground-up steps that can be
followed to prepare a design that supports lane-bonding.
The PCS blocks on TX path are detailed below.
PCS Self Test Logic
This block generates transmit data for PCS self test, detailed in “PCS Test Pattern Generator”
and “PCS Test Pattern Checker”.
Polarity bit reversal (PBR) #0 and #1
This block can invert the polarity of the incoming data. It can also reverse the bits of the
incoming data such that effectively the most significant bit is sent first, rather than the least
significant bit (default). For 16/20bit (2 words) bit streams, the word order can also be
inverted such that effectively the most significant byte is sent first, rather than the least
significant byte (default).
There are two PBR blocks on transmission data path, as shown in “Figure 6: PCS Transmitter
Block Overview”. PBR0 is used before the protocol encapsulation block and PBR1 is used on
encoded data. Either PBR0 or PBR1 can be used. Alternatively, both of these two blocks can
be bypassed.
16 UG028, July 1, 2014
Polarity and Bit Inversion – 10/20 bit Operation
When operating in 10bit/20bit mode, the bit order within each 10-bit word can be inverted.
This is illustrated in “Figure 7: 20 bit Order Reversal”. Effectively the most significant bit of
the least significant byte is transmitted first (i.e. bit 9 of byte 0 is transmitted first).
Figure 7: 20 bit Order Reversal
When the word order is reversed in 20-bit mode, the most significant byte (byte 1) is
swapped with the least significant byte (byte 0). This is illustrated in “Figure 8: 20-bit Word
Order Inversion”. The most significant byte will be transmitted first in such a case
Figure 8: 20-bit Byte Order Swap/Reversal
The polarity for the entire 10bit or 20bit word can be inverted as well. Polarity inversion
applies to the entire word (10 bits or 20 bits).
UG028, July 1, 2014
17
Polarity and Bit Inversion – 8/16 bit Operation
When the polarity is inverted in 8bit/16bits mode, only bits [17:10] and [7:0] are inverted, bits
[19:18] and [9:8] are not inverted. This is illustrated in “Figure 9: Polarity Inversion (16-bit
Word)”.
Figure 9: Polarity Inversion (16-bit Word)
When the bit order is inverted in 8bit/16bit mode, bits [7:0] of byte 0 are swapped while bits
[9:8] are not swapped. Similarly bits [17:10] of byte 1 are swapped. This is illustrated in
“Figure 10: Bit Order Inversion (16-bit Word)”. In this mode, the most significant bit of the
least significant byte is transmitted first.
Figure 10: Bit Order Inversion (16-bit Word)
When the word order is inverted in 16-bit mode, byte 1 is swapped with byte 0. This is
illustrated in “Figure 11: Word Order Inversion (16-bit Word)”.
18 UG028, July 1, 2014
Figure 11: Word Order Inversion (16-bit Word)
UG028, July 1, 2014
19
Interface Encapsulation
This block encapsulates the protocols supported by the SerDes in Achronix FPGA. The user
may refer to Section – “PCS Interface” for details on the protocols supported. It may be noted
again that the SerDes configured in Generic mode supports only 8b/10b encoding.
8b/10b Encoder
The 8b/10b encoder generates 10-bit code groups from 8-bit data and a 1-bit control input. It
uses the code group mapping specified in IEEE 802.3 clause 36. If the fabric interface is a 16bit data path, then two 8b/10b encoders are cascaded to produce a 20-bit code group output
to the PMA for serialization.
The 8b/10b encoder essentially translates 8-bit words to 10-bit symbols. This encoding
scheme has been proven to achieve DC-balance and running disparity while providing
sufficient information for clock recovery. (See the later sections for more information on DCBalance, running disparity and clock recovery.) The 10-bit encoded output TX_dataout[9:0]
will map to bits {jhgf iedcba}per the labeling used in IEEE 802.3-2005 clause 36.
Symbols and Comma Character
While translating 8-bit words into 10-bit symbols, the 8b/10b encoder (in SerDes PCS) form
two groups of data. The lower 5-bits of data are encoded into a 6-bit group and the upper 3bits of data are encoded into a 4-bit group. Furthermore, there are 12 control symbols that are
used by 8b/10b encoding scheme for special purposes and are called K-symbols. For instance
three of these control symbols can be used for defining the boundary between data packets.
These three control symbols are called comma symbols.
The 8b/10b encoder generates 10-bit code groups from 8-bit data and a 1-bit control input. It
uses the code group mapping specified in IEEE 802.3 clause 36. If the fabric interface is a 16bit data path, then two 8b/10b encoders are cascaded to produce a 20-bit code group output
to the PMA for serialization. The 1-bit control input (datak signal) is used to identify whether
data being transmitted is a comma symbol. Asserted value for datak signal on control-line
indicates that the symbol on data-line is a comma symbol.
In Section-“Design and Wrapper Files” of the Chapter – “Design Flow: Creating a SerDes
Design”, details are provided on how to transmit 8’hBC (K.28.5) as comma symbol and 1’b1
as control signal, for a sample design. For a 20-bit data width, that design essentially uses
{2’h1, 8’hBC, 2’h1, 8’hBC}. In other words, while sending a comma symbol, TX_data[8:8] =
TX_data[18:18] = 1’b1 is sent through the control-line.
Note: On the receiver end, when the decoder finds an ‘asserted’ control-bit on control-line, it
will consider the symbol on data-line as a comma symbol. Error conditions occur if the datak
signal is asserted while there is no comma symbol on the data line (e.g. K21.5).
Running Disparity
A non-encoded data stream may have differences between the number of 1’s and the number
of 0’s. The primary goal of using running disparity in the encoding scheme is to limit the
difference between the number of 1’s and the number of 0’s that are being transmitted. This
ensures DC balance on the transmission line. A side-benefit of using running disparity is that
information from running disparity can be used in locating transmission errors. This ensures
that the output data is DC balanced. The maximum run length for 8b/10b words is 5 bits.
20 UG028, July 1, 2014
The input disparity for the 6 bit block is based on the disparity of previous word’s 4 bit block
while the disparity for the 4 bit block is the disparity of the current word’s 6 bit block. This is
illustrated in “Figure 12: 8b/10b Encoding Process”.
Figure 12: 8b/10b Encoding Process
UG028, July 1, 2014
21
PCS Blocks in the Receiver (RX)
This chapter describes the PCS components on the receiver data path. The functional block
diagram of the receiver is shown in “Figure 13: - PCS Receive Block Overview”. The key
blocks in the RX-PCS include:
•Transition Density Checker (TDC): Generates a trigger bit when the number of
consecutive 1’s or 0’s reaches a pre-defined value.
•Polarity Bit Reversal (PBR): Inverts data, swaps byte ordering and reverses bit-
ordering, if used on the TX data path.
•Symbol Alignment: Uses alignment characters and sequences to define the symbol
boundary on the incoming data-stream.
•Decoders: Generates 8-bit code group and 1-bit control signal from the 10-bit
encoded (received) data.
•Deskew First-In-First-Out (FIFO): Synchronizes the data received across the lanes
when lane-bonding is used.
•Clock Compensation (Elastic FIFO): Synchronizes the data received on PMA at
recovered clock domain with a system clock (typically the transmit clock).
• Bit Slider: Takes care of bit-wise skew from the fabric, when used.
• PCS Interface Encapsulation: Provides interface with the fabric. Supports Gigabit
Ethernet, XAUI, Pipe and 10G Ethernet interfaces.
•PCS Self Test Checker: Self checking module, detailed in Chapters “PCS Test Pattern
Generator” and “PCS Test Pattern Checker”
The main features for the supported standards in the PCS side can be found in Chapter
“Major standards supported”
Figure 13: PCS Receive Block Overview
Transition Density Checker (TDC)
The transition density checker monitors the parallel RX data bus from the PMA and monitors
the number of consecutive 0s or 1s, called run length. If the number reaches a pre-configured
value, the checker sets a trigger bit to indicate the transition density violation. This preconfigured value is called threshold and the minimum threshold programmed is half the
width of data path. In case scaling is used the actual threshold effective will be the one shown
in “Equation 1”
22 UG028, July 1, 2014
Equation 1:
+
The assert signal from Transition Density Checker can be taken to fabric.
Note: Any bit transition would cause the counter to clear and the count to restart.
= (
)
Polarity Bit Reversal (PBR)
The polarity bit reversal block is used to invert data, swap byte ordering, and reverse bitordering. There are two such PCS blocks on the receive path, corresponding to the two
polarity bit reversal blocks on the transmit path.
When the polarity bit reversal on transmit path is performed before protocol encapsulation
(PBR #0 on “Figure 6: PCS Transmitter Block Overview”), the PBR block after protocol
encapsulation is used on receive path (PBR #0 on “Figure 13: - PCS Receive Block
Overview”). In contrast, if PBR operation is performed on encoded data on the transmit path
(PBR #1 on “Figure 6: PCS Transmitter Block Overview”), the PBR block before symbol
alignment/decoder block is used on the receive path (PBR #1 on “Figure 13: - PCS Receive
Block Overview”). As noted earlier, both of these blocks can be disabled, both on the transmit
and the receive paths.
Symbol Alignment
Symbol alignment uses alignment and sequence characters for identifying the correct symbol
boundary in the received data-stream. Attributes for alignment and sequence detect symbols
are specified to be 10-bit wide. But when received data-path is in 8-bit (or 16-bit) wide mode,
only the lower 8-bits of attribute will be considered.
The symbol alignment block can be configured to support a variety of standards. Some of
these standards are listed below:
• PCIe
• XAUI
• GigE
• Infiniband
• Serial Rapid IO
• SPI-5 (lock to training pattern)
• CPRI
• OBSAI
• Fiber Channel
Symbol alignment can be programmed to function in the following modes:
UG028, July 1, 2014
• Manual Mode
• Bit slip Mode
• Automatic Mode
23
Modes of Operation
Manual Mode:
In manual alignment mode, the symbol alignment will attempt to identify a pre-configured
pattern and lock to the incoming de-serialized data-stream from the output of the PMA or
phase picking block. The alignment operation is triggered by the user logic in the FPGA on
the rising edge of RX_com_det_en. The symbol alignment block then searches for the preconfigured alignment pattern with or without trailing sequence pattern. Fabric will wait for
the lock status. Once lock to the incoming stream is achieved, the fabric can monitor error
status from the 8b/10b decoder or employ any other mechanism in fabric to identify loss of
lock. The Fabric asserts another rising edge to trigger a new alignment cycle.
Bit Slip Mode:
In bit slip mode, the user logic controls the symbol alignment using the RX_bit_slip_en
signal. Each rising edge of RX_bit_slip_en causes the symbol alignment logic to shift the
word boundary by 1-bit, and symbol alignment will attempt to match the alignment pattern
within the new word boundary. If the word boundary is not matched, the user logic can
again assert RX_bit_slip_en, possibly after waiting for a timeout causing the word boundary
to shift by another bit position. This loop continues until lock is achieved. Once lock to the
incoming stream is achieved, logic in the fabric can monitor error status from 8b/10b decoder
or employ some other mechanism in fabric to identify loss of lock. The bit slip mode supports
all attributes used for manual alignment mode. The maximum number of slips that will cause
a true change in alignment is limited to the data path width.
Automatic Mode:
In automatic alignment mode, the symbol alignment block will automatically determine the
location of the word boundary based on the pre-configured alignment characters. It will also
establish a lock acquired condition based on receiving a pre-con d count of alignment
characters (hysteresis). A loss of lock condition also can be detected by this block based on a
pre-configured count of bad code words (or alignment characters at a different word
boundary). Instead of counting every bad code word, the user can decide to count every ‘n’
bad code word for an incrementing unlock count. Also, the user can use decode/disparity
errors as per clause 36 of IEEE 802.3 to increment and decrement the unlock counter. Support
for Fiber Channel protocol involves synchronization with the 4-symbol wide transmission
word (a special code word K28.5 followed by 3 data code words). In case of Fiber Channel,
any malformed transmission word causes the symbol alignment to go out of lock based on
the un-lock count programmed.
Comma symbols are used for identifying the correct symbol boundary. Section – “Symbols
and Comma Character” introduces comma symbols and discusses on how they are used in
data output from 8b/10b encoder on the TX side of a SerDes. At the receiver end, the
incoming data is scanned for comma symbols. Once the comma symbol is found, the
deserializer resets the word boundary of the received data. The received data is continuously
scanned for the subsequent comma symbols.
24 UG028, July 1, 2014
Deskew FIFO
The deskew block provides support for standards which require multiple lane bonding and
de-skewing of received data across multiple lanes. Lane bonding is required when the users
want to transmit data faster than is possible by using one serial link (lane). In such case, the
data is received must be aligned across the lanes. Deskew module within the SerDes takes
care of this.
UG028, July 1, 2014
Figure 14: Operating principle of deskew technique
“Figure 14 - Operating principle of deskew technique” shows the operating principle of
deskew operation. In this figure, data is being sent using four lanes. On the receiver side,
before lane-bonding, we find that the data at time t+2on lane-1 is aligned with data at time
t+1 on lane-2 and so on. The deskew technique aims to align the data with respect to the clock
cycles. In other words, data at time t+2on lane-2 should be aligned with data at time t+2 on
the other lanes. The red lines for the clock at receiver end demonstrates this.
For lane bonding, all lanes should use the same reference clock and insert de-skew characters
at the same time on each lane. Skew between lanes is introduced by both active (CDR) and
passive (board) elements of the link. The deskew operation can result in some loss of data
when it aligns characters to the same clock cycle.
25
Functional Description
The de-skew block uses a deskew FIFO on each lane. The writes to the deskew FIFO are
performed in the recovered clock domain for each lane. The read side of the deskew FIFO is
clocked by the clock from the initiator lane. The lanes are categorized as initiator and
followers. Any lane can be an initiator and skew is always calculated between the initiator
and each of follower lanes.
Once deskew is enabled, the skew between initiator and follower lanes are calculated
continuously by sensing deskew characters in the read side of the FIFO. The read threshold
for the FIFO needs to be programmed appropriately based on skew tolerance to avoid FIFO
under/over run. Once a deskew character is sensed, each lane starts a skew window equal to
the maximum skew allowed in the system. Based on how the lanes are skewed, the follower
lane is either lagging or leading and adjust the read clock cycles accordingly. Once the
initiator gets indication from all lanes of the bonding group that the skew calculation is over,
it declares that all lanes are aligned and asserts data valid for the down-stream logic. The
same data valid is used by the follower lanes to assert respective lane data valid. When the
initiator does not find such overlap of skew windows, it issues a reset to all FIFOs in the
bonding group and restarts the de-skew operation.
To summarize, the initiator lane generates various control signals for the follower lanes and
follower lanes send various status signals back to the initiators. Status signals are AND-ed
(e.g. for checking if the skew calculation completed in all lanes) or OR-ed (e.g. for checking if
any follower lanes window has not started), whereas control signals are used directly. These
signals go from one lane to another. The status and control signals are registered at time
intervals determined based on the number of lanes bonded
Lane-to-Lane Deskew Modes of Operation
The deskew module can work in three modes:
Manual Mode:
The rising edge of i_dskew_start will start one round of deskew operation. Lanes are
declared aligned either just after the deskew operation is completed or after an additional
check of a programmed number of aligned deskew characters in all bonded lanes at the same
time. The fabric needs to monitor received data for identifying any misalignment, and thus to
restart deskew operation. Infiniband uses manual mode of deskew operation.
Auto Mode:
The deskew module is always active. Once lanes are deskewed, all lanes will continuously
look for deskew characters in data read from the FIFO. The initiator should see deskew
characters on all lanes of the bonding group at the same time. The initiator looks for aligned
deskew characters on all lanes for a certain number of times based on the value programmed
in the register, and once detected the initiator declares bonded lanes aligned. Any time the
initiator finds deskew characters not aligned on all lanes, it starts an unlock count. If the
unlock count hits the value programmed in the register, the initiator declares that the lanes
are out of lock and re-starts the de-skew operation. While unlock count is incrementing, if the
initiator finds de-skew characters are aligned on all lanes again it starts decrementing the
unlock counter. This decrement can happen once in every ‘n’ (programmed in the register)
times when lanes have de-skew characters aligned to make sure the link has overcome error
conditions. If the unlock counter reaches zero, the link remains aligned.
26 UG028, July 1, 2014
Symbol slip mode:
symbol_slip_up
symbol_slip_dn
Comments
0
0
Increment read pointer by 1
0
1
No increment 1 0
Increment read pointer by 2
1
1
Increment read pointer by 1
The deskew module does not actively remove skew across lanes. Each lane is controlled by
the fabric. Fabric continuously monitors incoming data and employ a mechanism to find out
the skew across lanes. Based on the calculation, it instructs each lane to adjust the read
pointer of FIFO. The read pointer can be incremented once by 0, 1 or 2 based on the
combination of rising edges on symbol_slip_up and symbol_slip_dn. Based on the skew
computed, the fabric may need to provide multiple transitions on symbol_slip_up and
symbol_slip_dn to get the required number of pointer adjustments.
Table 2: Symbol Slip Paramaters
Standards Supported by Deskew Module
The deskew module in Achronix SerDes has explicit support for XAUI and Infiniband. For
XAUI, align(||A||) characters are sent periodically as per section 48 in IEEE 802.3. For
Infiniband, training sequences (TS1/TS2) are used as deskew characters. Though each of
TS1/TS2 is 16 code words long, the de-skew module forms de-skew ordered set with COM
and four data symbols (D10.2). The distance (gap) between COM and data symbols should be
programmed to ‘d1 for Infiniband. In case of 10-bit data path, the max skew handled is 6bytes and for 20-bit max skew handled 2-bytes. For training in Infiniband, initially data valid
will be asserted to pass TS1/TS2/TS3 to fabric. Subsequently, data valid is removed when link
training is completed and the fabric decides to de-skew lanes bonded. Once the de-skew
operation is completed, data valid is asserted again.
UG028, July 1, 2014
Besides these two protocols, the user can use this module for deskew functions of any
protocols provided that the minimum spacing between de-skew characters are maintained.
Elastic FIFO (Elastic Buffer)
An elastic FIFO is used to synchronize the received data from the PMA recovered clock to a
system clock, typically the transmit clock. The Elastic FIFO also compensates for any
frequency offset between the recovered clock and the system clock. It compensates for the
frequency offset by adding or deleting pre-configured skip (or pad) characters from the
received data stream. The elastic FIFO in Achronix SerDes provides an indication that skip
(or pad) characters were added or deleted to the downstream logic. For PCIe, the elastic FIFO
also includes the appropriate status encoding to indicate add/delete operation.
The elastic FIFO can also be configured to be used as a simple phase compensation FIFO for
synchronizing data. When used as a phase compensation FIFO, it is left to the user to
guarantee that there is no frequency offset (jitter) between the read and write clocks.
27
EFIFO Standards and Skip Characters
PCIe Gen3: To support PCIe Gen3, 4-bytes of skip are added at byte positions 4-7 from the
sync header associated with the skip ordered set. Skip removal happens from bytes 0-3 from
the sync header associated with the skip ordered set. Due to this particular rule of removal,
sync header and receive start block indications are delayed by 4-bytes.
PCIe Gen1/Gen2: For PCIe Gen1/Gen2, the skip ordered set is two 10-bit words – the elastic
buffer adds or deletes only the second word.
Fiber Channel: To support Fiber channel, 4-bytes of skip are added and deleted. The PCS
operates in 16-bit data-path mode at the fabric interface and 20-bit encoding internally.
XAUI: To support XAUI, the skip ordered set is one 10-bit word, which is added or deleted
by the elastic buffer.
GigE: For GigE, the skip ordered set is two 10-bit words – control followed by data. The
elastic FIFO adds or removes both of these two 10-bit words.
Other Standards: Besides these specific standards, the elastic FIFO can handle any generic
protocols in the similar line due to the programmable nature of SKIP and inverted SKIP
ordered set of length 2. The user has flexibility to include an alternate (mostly inverted) word
in the ordered set. Beyond two words skip ordered sets, only 4 words skip ordered sets can
be used, which are specific to fiber channel. The elastic FIFO generates the final data valid
from the PCS, which is used by the fabric to register data.
28 UG028, July 1, 2014
EFIFO Operation
“
Figure 15: EFIFO SKP Addition/Removal
” illustrates the process of SKP addition/removal.
UG028, July 1, 2014
Figure 15: EFIFO SKP Addition/Removal
In “Figure 15: EFIFO SKP Addition/Removal” upon reset, the difference between the read
and write counters is equal to fifo_mid (half the size of the buffer; default 16).
If clk_in is operating at a lower frequency than clk_out, then the read operation is faster than
the write operation and the difference between the write and read counters will be less than
fifo_mid. In this case, to compensate for clk_in being slower, an SKP is added to the data
stream.
If clk_in is operating at a higher frequency than clk_out, then the read operation is slower
than the write operation and the difference between the write and read counters will be
greater than fifo_mid. In this case, to compensate for clk_out being slower, an SKP is
removed from the data stream.
29
“Figure 16: EFIFO SKP Addition/Removal: PCIE, GigE (802.3) and XAUI (802.3)” illustrates
SKP additions and removals for PCIe, GigE (802.3), and XAUI (802.3ae). Note that in the
figure, data_i and data_o are not actually aligned, they are merely depicted so for clarity.