NOTICE of DISCLAIMER: The information given in this document is
believed to be accurate and reliable. However, Achronix Semiconductor
Corporation does not give any representations or warranties as to the
completeness or accuracy of such information and shall have no liability for
the use of the information contained herein. Achronix Semiconductor
Corporation reserves the right to make changes to this document and the
information contained herein at any time and without notice. All Achronix
trademarks, registered trademarks, and disclaimers are listed at
http://www.achronix.com and use of this document and the Information
contained therein is subject to such terms.
UG030, April 26, 2013
Page 3
3
Table of Contents
Copyright Info .................................................................................. 2
Table of Contents ............................................................................ 3
Table of Figures .............................................................................. 5
The Achronix PCI Express (PCIe) hard core provides a flexible and high-performance
Transaction Layer Interface to the PCIe Bus for the Speedster22i device. The core implements all
three layers (Physical, Data Link, and Transaction) defined by the PCIe standard, as well as a
high-performance DMA interface to facilitate efficient data transfer between the PCIe Bus and
user logic. The core is available in numerous configurations including x16, x8, x4, x2, and x1
Lanes.
It is recommended to use the Achronix Cad Environment (ACE) PCIe Configuration GUI to
implement the core as per the desired configuration (see Appendix A for additional details).
The following protocol features are offered:
PCI Express Base Specification Revision 3.0 version 1.0 compliant
o Backward compatible with PCI Express 2.1/2.0/1.1/1.0a
x16, x8, x4, x2, or x1 PCI Express Lanes
8.0GT/s, 5.0 GT/s, and 2.5 GT/s line rate support
Comprehensive application support:
o Endpoint
o Root Port
o Endpoint/Root Port
o Upstream Switch Port
o Downstream Switch Port
o Bifurcation options
o Cross-link support
PIPE-Compatible PHY interface for easy connection to PIPE PHY
ECC RAM and Parity Data Path Protection option
Support for Autonomous and Software-Controlled Equalization
Flexible Equalization methods (Algorithm, Preset, User-Table)
Transaction Layer Bypass and Partial Transaction Layer core interface options
Available in four Core Data Widths and five lane widths for maximum flexibility in
supporting a wide spectrum of silicon devices with differing capabilities
o 32-bit (x1, x2, x4)
o 64-bit (x2, x4, x8)
o 128-bit (x4, x8, x16)
Supports Lane Reversal, Up-configure, Down-configure, Autonomous Link Width and
Speed
Implements Type 0 Configuration Registers in Endpoint Mode
Implements Type 1 Configuration Registers in Root Port and Switch Modes
o Complete Switch and Root Port Configuration Register implementation
Supports user expansion of Configuration Space
Easy to use
UG030, April 26, 2013
Page 7
7
o Decodes received packets to provide key routing (BAR hits, Tag, etc.)
information for the user
o Implements all aspects of the required PCIe Configuration Space; user can add
custom Capabilities Lists and Configuration Registers via the Configuration
Register Expansion Interface
o Consumes PCI Express Message TLPs and provides contents on the simple–to-
use Message Interface
o Complete and easy-to-use Legacy/MSI/MSI-X interrupt generation
o Interfaces have consistent timing and function over all modes of operation
o Provides a wealth of diagnostic information for superior system-level debug and
link monitoring
Implements all 3 PCI Express Layers (Transaction, Data Link, Physical)
UG030, April 26, 2013
Page 8
8
Design Overview
DMA BE core
S2C
C2S
ATar
AMas
S2c_aclk[1:0]
C2S_aclk[1:0]
m_aclkt_aclk
Transaction Layer
Bypass_clk
VC_interface
Link Layer
PHY Layer
Serdes 8 lanes (PMA)
Data/control for Fabric
Clk ( serdes 0 – pll word clk )
DMA Bypass
interface
PCIe core with DMA
Config/ registers
Apb_pclk
CLK
i_po_ctl_clk(FCU)
PCIe- IP Clocks
MUX
The PCI Express (PCIe) standard can be implemented in the Achronix22i device. Figure 1 shows
a block diagram of the PCIe hard IP with the DMA core for high-speed data transfer to/from the
user fabric. Figure 2shows the DMA’s major interfaces, which will be discussed later.
Figure 1: PCIe with DMA Block Diagram
UG030, April 26, 2013
Page 9
9
DMA
Core
AXI Master
Interface
AXI Target
Interface
AXI DMA C2S
Interface
AXI DMA S2C
Interface
Figure 2: DMA Block Diagram
UG030, April 26, 2013
Page 10
10
Major Interfaces
clk
target_aclk
RX
TX
PCIe-CoreTarget Controller (Fabric Core)
SDRAM
Controller
Internal
Registers
Internal
SRAM
Target
Control
SDRAM
GPI/O
AXI Target Interface
The AXI Target Interface implements AXI3/AXI4 Master Protocol. Write and
read requests received via PCI Express (from remote PCI Express masters)
that target AXI regions of enabled Base Address or Expansion ROM regions
are forwarded to the AXI Target Interface for completion. Accesses to
registers inside the AXI DMA Back-End Core are handled by the core and do
not appear on the AXI Target Interface.
Figure 3 shows the interface connection between AXI Target Interface and
Fabric Controller:
Figure 3: AXI Target Interface
The design in Figure 3 enables another PCI Express master device (such as a
CPU) to access external SDRAM, internal SRAM, and General Purpose I/O.
In this design each of the three interfaces is assigned a memory Base Address
Register. The Target Control module steers the incoming packets from PCIeCore to the appropriate destination based upon the Base Address Register
that was hit and formats the data (if needed) to the appropriate width. The
design receives Posted Request packets containing write requests and data
and performs the writes on the addressed interface. The design receives
Non-Posted Request packets containing read requests, performs the read on
the addresses interface, and then transmits one or more Completion packets
containing the requested read data.
The AXI Target Interface implements write-read ordering to maintain
coherency for PCI Express transactions (see Figure 4).
UG030, April 26, 2013
Page 11
11
Ordering is maintained separately for internal DMA Register and AXI
destinations
The completion of a read request to the same destination (DMA
Registers or AXI) can be used to guarantee that prior writes to the
same destination have completed
Reads are blocked until all writes occurring before the read have fully
completed; for AXI, a write is completed when it returns a completion
response on the Write Response Channel; for internal DMA Registers,
a write is completed when it is written into the DMA Registers such
that a following read will return the new value
Supports full duplex bandwidth utilization when being driven by a
remote PCI Express DMA Master
Supports multiple simultaneously outstanding write and read requests
Utilizes maximum 16 beat bursts for compatibility with AXI3
and AXI4
UG030, April 26, 2013
Page 12
12
0
31ff 8000
2
4
0
0
0000 0000 0000 0000 0000 0000 0000 0000
t_areset_n
St1
t_aclk
St1
t_awvalid
St0
St1
t_awready
3'h0
t_awregion[2:0]
t_awaddr[31:0]
4'h0
t_awlen[3:0]
3'h4
t_awsize[2:0]
St0
t_wvalid
St0
t_wready
t_wdata[127:0]
t_wstrb[15:0]
St1
t_wlast
St0
t_bvalid
St1
t_bready
2'h0
t_bresp[1:0]
St0
t_arvalid
St0
t_arready
3'h0
t_arregion[2:0]
t_araddr[31:0]
4'h0
t_arlen[3:0]
3'h4
t_arsize[2:0]
St0
t_rvalid
St1
t_rready
t_rdata[127:0]
2'h0
t_rresp[1:0]
St1
t_rlast
0
000 0000 0000 0000 0000
31ff 8080
*0
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
000ff000
* 0002 0000 0000
* 808031ff 808431ff 808831ff 808c
* aea1 0001 000000 0000 0000
00
0
4
0
4
0
0
4
0
00f0
0007 0006 0000 0000 0000 1a2ab733
0000
0000 0000 0000
...
...
...
...
...
Figure 4: Timing Diagram for Target Interface
Moreover, the AXI Target Interface implements FIFOs to buffer multiple
writes and reads simultaneously to enable maximum bandwidth.
The AXI Target Interface implements a dual clock interface. The AXI clock
domain may be different than the PCI Express clock domain. Gray Code
UG030, April 26, 2013
Page 13
13
synchronization techniques can be used to enable support for a wide variety
of AXI clock rates.
User’s Task: It is important to consume target write and read transactions
relatively quickly as it is possible to stall PCI Express completions (used for
S2C DMA for example) if target write and read transactions are allowed to
languish in the PCI Express Core Receive Buffer.
Target Only Design
The Target-Only design is the simplest design to implement and works well
if the master device transmits packets with larger burst widths. Throughput
in PCI Express (and in its predecessors PCI and PCI-X) is directly
proportional to burst size, so small transaction burst sizes result in low
throughput.
User’s Task: To fix the inherent limitations of CPU and other small burst
size masters, a design must be able to master the PCI Express bus and enact
transactions with larger burst sizes (see the AXI Master Interfacesection for
additional details).
Still, the Target-Only design is ideal, due to its simplicity and hence smaller
design size, for lower bandwidth applications and applications where
another master is available to master transactions at larger burst sizes.
System software is also easier to write for target-only applications since only
basic CPU move instructions need to be used and the software complexities
of a DMA-based system (interrupts, DMA system memory allocation, etc.)
do not need to be handled.
AXI Back-End DMA Interface
The AXI DMA Interface is the mechanism through which user logic interacts
with the DMA Engine. The AXI DMA Interface orchestrates the flow of DMA
data between user logic and PCI Express.
The AXI DMA System to Card and AXI DMA Card to System interfaces
support multiple AXI protocol options (which are selected with the DMA
Back End DMA Engine inputs c2s/s2c_fifo_addr_n):
AXI3/AXI4
AXI4-Stream
Addressable FIFO DMA
When a DMA Engine is configured to implement an AXI3/AXI4 interface,
system software can set the “Addressable FIFO DMA” Descriptor bit in all
Descriptors of an application DMA transfer to instruct the DMA Back End to
provide the same starting AXI address provided by software for all AXI
UG030, April 26, 2013
Page 14
14
transactions for this packet. This allows the user hardware design to
Figure 6: Timing Diagram for System-to-Card DMA Interface
Packet DMA Descriptor Format
A 256-bit (32-byte) Descriptor is defined for Packet DMA which contains the
Control fields required to specify a packet copy operation and the Status
fields required to specify the success/failure of the packet copy operation.
The Descriptor is split into Control and Status fields:
Control fields are fields that are written into the Descriptor by
software before the Descriptor is passed to the DMA Engine. Control
fields specify to the DMA Engine what copy operation to perform.
Status fields are fields that are written into the Descriptor by the
DMA Engine after completing the DMA operation described in the
UG030, April 26, 2013
Page 16
16
Control portion of the Descriptor. Status fields indicate to software
the Descriptor completion status. Software should zero all status
fields prior to making the Descriptor available to the DMA Engine.
To promote ease of re-using Descriptors (for circular queues),
Control and Status fields are assigned their own locations in the
Descriptor.
Table 1 described the Packet DMA Descriptor format.
Table 1: Packet DMA Descriptor Format
Card-to-System Descriptor Field Descriptors
Data flow for Card-to-System DMA is from the user design to system
memory. The DMA Engine receives packets on its DMA Interface from the
user hardware design and writes the packets into system memory at the
locations specified by the Descriptors. The Packet DMA Engine assumes that
the packet sizes are variable and unknown in advance. The Descriptor Status
fields contain the necessary information for software to be able to determine
the received packet size and which Descriptors contain the packet data.
Packet start and end are indicated by the SOP and EOP C2SDescStatusFlag
bits. A packet may span multiple Descriptors. SOP=1, EOP=0 is a packet start,
SOP=EOP=0 is a packet continuation, SOP=0, EOP=1 is a packet end, and
SOP=EOP=1 is a packet starting and ending in the same Descriptor. The
received packet size is the sum of the C2SDescByteCount fields for all
Descriptors that are part of a packet.
UG030, April 26, 2013
Descriptor fields specific to Card-to-System DMA:
Page 17
17
C2SDescControlFlags[7:0] – Control
• Bit 7 – SOP – Set if this Descriptor contains the start of a
packet; clear otherwise; only set for addressable Packet
DMA
• Bit 6 – EOP – Set if this Descriptor contains the end of a
packet; clear otherwise; only set for addressable Packet
DMA
• Bits[5:3] – Reserved
• Bit[2] – Addressable FIFO DMA – If set to 1, the DMA Back-
End will use the same Card Starting Address for all DMA
Interface transactions for this Descriptor; this bit must be set
the same for all Descriptors that are part of the same packet
transfer; Addressable FIFO AXI addresses must be chosen
by the user design such that they are aligned to AXI max
burst size * AXI data width address boundaries; For
example: 16 * 16 == 256 bytes (addr[7:0] == 0x00) for AXI3
max burst size == 16 and AXI_DATA_WIDTH == 128-bits ==
16 bytes
• Bit[1] – IRQOnError – Set to generate an interrupt when this
Descriptor Completes with error; clear to not generate an
interrupt when this Descriptor Completes with error
• Bit[0] – IRQOnCompletion – Set to generate an interrupt
when this Descriptor Completes without error; clear to not
generate an interrupt when this Descriptor Completes
without error
C2SDescStatusFlags[7:0] – Status
• Bit 7 – SOP – Set if this Descriptor contains the start of a
packet; clear otherwise
• Bit 6 – EOP – Set if this Descriptor contains the end of a
packet; clear otherwise
• Bits[5] – Reserved
• Bit 4 – Error – Set when the Descriptor completes due to an
error; clear otherwise
• Bit 3 – C2SDescUserStatusHighIsZero – Set if
C2SDescUserStatus[63:32] == 0; clear otherwise
• Bit 2 – C2SDescUserStatusLowIsZero – Set if
C2SDescUserStatus[31:0] == 0; clear otherwise
• Bit 1 – Short – Set when the Descriptor completed with a
byte count less than the requested byte count; clear
otherwise; this is normal for C2S Packet DMA for packets
containing EOP since only the portion of the final Descriptor
required to hold the packet is used.
• Bit 0 – Complete – Set when the Descriptor completes
without an error; clear otherwise
C2SDescByteCount[19:0] - Status
• The number of bytes that the DMA Engine wrote into the
Descriptor. If EOP=0, then C2SDescByteCount will be the
same as the Descriptor size DescByteCount. If EOP=1 and
UG030, April 26, 2013
Page 18
18
the packet ended before filling the entire Descriptor, then
C2SDescByteCount will be less than the Descriptor size
DescByteCount. The received packet size is the sum of the
C2SDescByteCount fields for all Descriptors that are part of
a packet.
• C2SDescByteCount is 20-bits so supports Descriptors up to
2^20-1 bytes. Note that since packets can span multiple
Descriptors, packets may be significantly larger than the
Descriptor size limit.
C2SDescUserStatus[63:0] – Status
• Contains application specific status received from the user
when receiving the final data byte for the packet;
C2SDescUserStatus is only valid if EOP is asserted in
C2SDescStatusFlags. C2SDescUserStatus is not used by the
DMA Engine and is purely for application specific needs to
communicate information between the user hardware
design and system software. Example usage includes
communicating a hardware calculated packet CRC,
communicating whether the packet is an Odd/Even video
frame, etc. Use of C2SDescUserStatus is optional.
• C2SDescUserStatusHighIsZero and
C2SDescUserStatusLowIsZero are provided for ensuring
coherency of status information.
System-to-Card Descriptor Field Descriptors
Data flow for System-to-Card DMA is from system memory to the user
design. Software places packets into the Descriptors and then passes the
Descriptors to the DMA Engine for transmission. The DMA Engine reads the
packets from system memory and provides them to the user hardware
design on its DMA Interface. The software knows the packet sizes in advance
and writes this information into the Descriptors. Software sets SOP and EOP
S2CDescControlFlags during packet to Descriptor mapping to indicate
Packet start and end information. A packet may span multiple Descriptors.
SOP=1, EOP=0 is a packet start, SOP=EOP=0 is a packet continuation, SOP=0,
EOP=1 is a packet end, and SOP=EOP=1 is a packet starting and ending in
the same Descriptor. The transmitted packet size is the sum of all
S2CDescByteCount fields for all Descriptors that are part of a packet. The
Descriptor Status fields contain the necessary information for software to be
able to determine which Descriptors the DMA Engine has completed.
Descriptor fields specific to System-to-Card DMA:
S2CDescControlFlags[7:0] – Control
Bit 7 – SOP – Set if this Descriptor contains the start of a
packet; clear otherwise
Bit 6 – EOP – Set if this Descriptor contains the end of a
packet; clear otherwise
Bits[5:3] – Reserved
UG030, April 26, 2013
Page 19
19
Bit[2] – Addressable FIFO DMA – If set to 1, the DMA Back-
End will use the same Card Starting Address for all DMA
Interface transactions for this Descriptor; this bit must be set
the same for all Descriptors that are part of the same packet
transfer; Addressable FIFO AXI addresses must be chosen
by the user design such that they are aligned to AXI max
burst size * AXI data width address boundaries; For
example: 16 * 16 == 256 bytes for AXI3 max burst size == 16
and AXI_DATA_WIDTH == 128-bits == 16 bytes
Bit[1] – IRQOnError – Set to generate an interrupt when this
Descriptor Completes with error; clear to not generate an
interrupt when this Descriptor Completes with error
Bit[0] – IRQOnCompletion – Set to generate an interrupt
when this Descriptor Completes without error; clear to not
generate an interrupt when this Descriptor Completes
without error
S2CDescStatusFlags[7:0] - Status
• Bits[7:5] – Reserved
• Bit 4 – Error – Set when the Descriptor completes due to an
error; clear otherwise
• Bits[3:2] - Reserved
• Bit 1 – Short – Set when the Descriptor completed with a
byte count less than the requested byte count; clear
otherwise; this is generally an error for S2C Packet DMA
since packets are normally not truncated by the user design.
• Bit 0 – Complete – Set when the Descriptor completes
without an error; clear otherwise
S2CDescStatusErrorFlags[3:0] – Status – Additional information as
to why S2CDescStatusFlags[4] == Error is set. If
S2CDescStatusFlags[4] == Error is set then one or more of the
following bits will be set to indicate the additional error source
information.
• Bit 3 – Reserved
• Bit 2 – Set when received one or more DMA read data
completions with ECRC Errors
• Bit 1 – Set when received one or more DMA read data
completions marked as Poisoned (EP == 1)
• Bit 0 – Set when received one or more DMA read data
completions with Unsuccessful Completion Status
S2CDescUserControl[63:0] – Control
• Contains application specific control information to pass
from software to the user hardware design; the DMA Engine
provides the value of S2CDescUserControl to the user
design the same clock that SOP is provided.
S2CDescUserControl is not used by the DMA Engine and is
purely for application specific needs. Use of
S2CDescUserControl is optional.
S2CDescByteCount[19:0] – Control & Status
UG030, April 26, 2013
Page 20
20
• Control - During packet to Descriptor mapping, software
• Status – After completing a DMA operation, the DMA
• Note: S2CDescByteCount is 20-bits so supports Descriptors
AXI Master Interface
writes the number of bytes that it wrote into the Descriptor
into S2CDescByteCount. If EOP=0, then S2CDescByteCount
must be the same as the Descriptor size DescByteCount. If
EOP=1 and the packet ends before filling the entire
Descriptor, then S2CDescByteCount is less than the
Descriptor size DescByteCount. The transmitted packet size
is the sum of the S2CDescByteCount fields for all
Descriptors that are part of a packet
Engine writes the number of bytes transferred for the
Descriptor into S2CDescByteCount. Except for error
conditions, S2CDescByteCount should be the same as
originally provided.
up to 2^20-1 bytes. Note that since packets can span multiple
Descriptors, packets may be significantly larger than the
Descriptor size limit.
The AXI Master Interface is an AXI4-Lite Slave interface that enables the user
to:
Generate PCI Express requests with up to 1 DWORD (32-bit)
payload
Write and read DMA Back-End internal registers to start DMA
operation and obtain interrupt status
The AXI Master Interface implements a register set to enable the
above functions.
A PCI Express request is carried out by writing the PCI Express-
specific information (PCI Express Address, Format and Type, etc.) to
the register set and then writing to another register to execute the
request.
DMA Registers are made accessible via AXI reads and writes
The design in Figure 7 contains the same elements as the Target-Only
design described in section AXI Target Interface, but is enhanced with Direct
Memory Access (DMA) capability to achieve greater throughput. For the
transfer of large volumes of data, the DMA has inherently better throughput
than target-only designs both because the burst sizes are generally much
larger, but also because DMA read transactions can be cascaded while most
software using CPU move instructions will block on a read until it
UG030, April 26, 2013
Page 21
21
completes. Writes perform reasonably well in either case since writes are
PCIe-Core
Arbiter
Arbiter
TX
RX
DMA
Control
Target
Control
Arbiter
Internal
Registers
SDRAM
Cntrl
Internal
SRAM
GPIO
SDRAM
Speedster22i
always posted and software will generally not block on write transactions.
In the Target with Master DMA design each of the three interfaces (external
SDRAM, internal SRAM, and DMA registers/General Purpose I/O) are
assigned a memory Base Address Register. The Target Control module steers
the incoming packets to the appropriate destination based upon the Base
Address Register that was hit and formats the data (if needed) to the
appropriate width.
As a target, the design receives Posted Request packets containing write
requests and data and performs the writes on the addressed interface. The
design receives Non-Posted Request packets containing read requests,
performs the read on the addresses interface, and then transmits one or more
Completion packets containing the requested read data.
The DMA Control module masters transactions on the PCI Express bus (it
generates requests, rather than just responding to requests). System software
controls the DMA transactions via target writes and reads to the Internal
Registers.
As a master, the design transmits Posted (write request + data) and NonPosted (read request) requests and monitors the RX bus for (reads only) the
corresponding Completion packets containing the transaction status/data.
Since the SDRAM Controller module must be shared, an SDRAM Arbiter is
required to arbitrate between servicing DMA and target SDRAM accesses.
Since there are two modules that need access to the Transmit and Receive
Interfaces, arbiters are required.
The Target with Master DMA design is well suited to applications that need
to move a lot of data at very high throughput rates. The higher throughput
comes at a price however. Design complexity is significantly greater than a
target-only design and system software is more complicated to write.
UG030, April 26, 2013
Figure 7: AXI Target with Master DMA (Control Flow)
Page 22
22
DMA Bypass Interface
The bypass interface disables DMA backend, and communicates directly to
the PCI Express core. In its place, the user can build a soft DMA engine that
connects to this interface.
Transmit Interface
The Transmit Interface is the mechanism with which the user transmits PCIe
transaction-layer packets (TLPs) over the PCI Express bus. The user
formulates TLPs for transmission in the same format as defined in the PCI
Express Specification
User’s task: Supply a complete TLP comprised of packet header, data
payload, and optional TLP Digest.
The core Data Link Layer adds the necessary framing (STP/END), sequence
number, Link CRC (LCRC), and optionally ECRC (when ECRC support is
present and enabled).
Packets are transmitted to master write and read requests, to respond with
completions to target reads and target I/O requests, to transmit messages,
etc.
The Achronix PCIe Core automatically implements any necessary replays
due to transmission errors, etc. If the remote device does not have sufficient
space in its Receive Buffer for the packet, the core pauses packet transmission
until the issue is resolved.
PCI Express packets are transmitted exactly as received by the core on the
Transmit Interface with no validation that the packets are formulated
correctly by the user.
User’s task: It is critical that all packets transmitted are formed correctly and
that vc0_tx_eop is asserted at the appropriate last vc0_tx_data word in each
packet.
PCI Express Packets are integer multiples of 32-bits in length. Thus, 64-bit,
128-bit, and 256-bit Core Data Width cores may have an unused remainder
portion in the final data word of a packet. The core uses the packet TLP
header (Length, TLP Digest, and Format and Type) to detect whether the
packet has an unused remainder and will automatically discard and not
transmit the unused portion of the final data word.
The core contains transmit DLLP-DLLP, TLP-TLP, and TLP-DLLP packing to
maximize link bandwidth by eliminating, whenever possible, idle cycles left
by user TLP transmissions that end without using the full Core Data Width
word.
The Transmit Interface includes the option to nullify TLPs (instruct Receiver
to discard the TLP) to support cut-through routing and the user being able to
cancel TLP transmissions when errors are detected after the TLP
transmission has started. Nullified TLPs that target internal core resources
(Root Port & Downstream Switch Port Configuration Registers and Power
Management Messages) are discarded without affecting the internal core
resources.
UG030, April 26, 2013
Page 23
23
Receive Interface
The Receive Interface is the mechanism with which the user receives PCIe
packets from the PCIe bus. Packets are received and presented on the
interface in the same format defined in the PCI Express Specification; the
user receives complete Transaction Layer packets comprised of packet
header, data payload, and optional TLP Digest. The core automatically
checks packets for errors, requests replay of packets as required, and strips
the Physical Layer framing and Data Link Layer sequence number, and Link
CRC (LCRC) before presenting the packet to the user.
The core decodes received TLPs and provides useful transaction attributes
such that the packet can be directed to the appropriate destination without
the need for the user to parse the packet until its destination. If the packet is
an I/O or Memory write or read request, the base address register resource
that was hit is indicated. If the packet is a completion, the packet’s tag field is
provided. The core also provides additional useful transaction attributes.
Packets that appear on the Receive Interface have passed the Sequence
Number, Link CRC, and malformed TLP checks required by the PCI Express
Specification.
UG030, April 26, 2013
Page 24
24
Port List
Pin Name
Direction
Clock
Description
pcie_refclk_p[7:0]
Input
Reference Clock Input
pcie_refclk_n[7:0]
Input
Reference Clock Input
tx_p[7:0]
Output
Serial Transmit
tx_n[7:0]
Output
Serial Transmit
rx_p[7:0]
Input
Serial Receive
rx_n[7:0]
Input
Serial Receive
i_serdes_sbus_req [7:0]
Input
i_sbus_clk
SerDes side SBUS request
i_serdes_sbus_data [15:0]
Input
i_sbus_clk
SerDes side SBUS data to write
o_serdes_sbus_data [15:0]
Output
i_sbus_clk
SerDes side SBUS data to read
o_serdes_sbus_ack [7:0]
Output
i_sbus_clk
SerDes side SBUS
acknowledgement
SerDes Interface
Table 2: SerDes Interface Pin Descriptions
UG030, April 26, 2013
Page 25
25
Fabric-Side Interface
Port Name
Direction
Clock
Description
perst_n
Input
user_clk
Fundamental Reset; active-low asynchronous
assert, synchronous de-assert; resets the entire
core except for Configuration Registers which
are defined by PCI Express to be unaffected by
fundamental reset; on rst_n de-assertion the
core starts in the Detect Quiet Link Training
and Status State Machine (LTSSM) state with
the Physical Layer down (mgmt_pl_link_up_o
== 0) and Data Link Layer down
(mgmt_dl_link_up_o == 0).
clk_out
Output
core_clk
Core clock; all core ports are synchronous to
the rising edge of clk_out.
The PIPE Specification defines two possible
approaches to adapting to changes in the line
rate of PCI Express (changing between 2.5, 5,
and 8GT/s operation). The core natively
supports PHY that implement the PIPE
constant-data-width, variable-clock-frequency
PIPE interface and PHY that implement the
PIPE variable-data-width, constant-clockfrequency PIPE interface.
The frequency of clk_out must be the fullbandwidth frequency for the PHY per-lane
data width (Core Data Width/Max Lane
Width; which is static for a given core
configuration) and the current line rate:
clk_out is connected to the PHY’s clk_out, or a
binary multiple/divisor of clk_out when PHY
and Core have different data widths.
Note: Per PCI Express Specification, PHYs
must use the same clock reference as the
remote PCIe device to be compatible with
systems implementing Spread Spectrum
Table 3: Fabric-Side Port Descriptions
UG030, April 26, 2013
Page 26
26
Port Name
Direction
Clock
Description
clocking (majority of open systems). The
required 600 ppm maximum clock difference
between devices may not be met when Spread
Spectrum clocking is in use unless both devices
in the link are using the same Spread
Spectrum-modulated clock reference.
i_sbus_clk
Input
i_sbus_clk
Serial-Bus clock
i_sbus_req
Input
i_sbus_clk
SBUS interface request
i_sbus_sw_rst
Input
i_sbus_clk
Soft reset to the SBUS interface
i_sbus_data[1:0]
Input
i_sbus_clk
SBUS write data
o_sbus_ack
Output
i_sbus_clk
SBUS acknowledgment
o_sbus_rdata[1:0]
Output
i_sbus_clk
SBUS read data
bypass_clk
Input
bypass_clk
DMA Bypass clock
bypass_rst_n
Input
bypass_clk
DMA Bypass Reset
bypass_tx_valid
Input
bypass_clk
DMA Bypass Tx Data Valid
bypass_tx_ready
Output
bypass_clk
DMA Bypass Tx ready
bypass_tx_almost_full
Output
bypass_clk
DMA Bypass Tx Data Fifo almost full
bypass_tx_sop
Input
bypass_clk
Start of packet indicator and packet transmit
request; set == 1 coincident with the first
vc0_tx_data word in each packet. vc0_tx_sop
may not be asserted until the user is ready to
provide the entire packet with the minimum
possible timing of the core’s vc0_tx_en
assertions.
The user may wait state the transmit interface
only between packets; the user may choose to
hold off on transmitting a packet by not
asserting vc0_tx_sop.
bypass_tx_eop
Input
bypass_clk
End of packet indicator; set == 1 coincident
with the last vc0_tx_data word in each packet.
bypass_tx_data[127:0]
Input
bypass_clk
Packet data to transfer; vc0_tx_data must be
valid from the assertion of vc0_tx_sop until the
packet is fully consumed with the assertion of
vc0_tx_eop == vc0_tx_en == 1. The core may
assert and de-assert vc0_tx_en at any time, so
the user must ensure that vc0_tx_sop,
vc0_tx_eop, and vc0_tx_data are always valid.
Packet data must comprise a complete
Transaction Layer packet as defined by the PCI
Express Specification including the entire
packet header, data payload, and optional TLP
Digest (ECRC). The core adds the necessary
STP/END/EDB framing, Sequence Number,
LCRC, and for cores with ECRC support,
ECRC as part of its Data Link Layer
UG030, April 26, 2013
Page 27
27
Port Name
Direction
Clock
Description
functionality.
PCI Express Packets are integer multiples of
32-bits in length. Thus, 64-bit and 128-bit Core
Data Width cores may have an unused
remainder portion in the final data word of a
packet. The core uses the packet TLP header
(Length, TLP Digest, and Format and Type) to
detect whether the packet has an unused
remainder and will automatically discard and
not transmit the unused portion of the final
data word.
bypass_tx_data_valid[15:0]
Input
bypass_clk
DMA Bypass Tx Data Byte Valid
bypass_tx_np_ok
Output
bypass_clk
vc0_tx_np_ok indicates when the user is
allowed to transmit non-posted requests.
1: Non-Posted Requests are permitted
0: Non-Posted Requests are not permitted
If a non-posted request is transmitted when
there are no non-posted receive buffer credits
available in the remote PCI Express device,
then the core will be unable to send the nonposted request until credits are freed. If the
remote device is unable to free non-posted
credits until receiving a TLP from the core then
this leads to a deadlock condition that cannot
be resolved.
vc0_tx_np_ok is implemented to avoid this
condition by making it not possible for
transmissions to be stalled by the inability to
transmit non-posted requests. The core
implements a small non-posted request FIFO.
When non-posted requests cannot be accepted
by the remote device, this FIFO will fill, and
when it’s almost full threshold is hit,
vc0_tx_np_ok will de-assert (== 0) stopping the
user from being able to transmit additional
non-posted requests. Additional posted
requests and completions are not blocked by
vc0_tx_np_ok and continue to transmit if
credits are available in the remote Receive
Buffer. Per PCI Express transaction ordering
rules, Posted Requests and Completions must
be allowed to pass Non-Posted requests to
avoid deadlocks; Completions and Posted
Requests are not required to be able to pass
one another.
UG030, April 26, 2013
Page 28
28
Port Name
Direction
Clock
Description
User’s task: User logic must stop the
transmission of new Non-Posted requests
when vc0_tx_np_ok == 0. A non-posted packet
transmission that has already asserted
vc0_tx_sop must continue to be transmitted in
full. vc0_tx_np_ok should be used to stop new
assertions of vc0_tx_sop for non-posted
requests. Because vc0_tx_np_ok is an almost
full flag, it is allowed for vc0_tx_np_ok to be
used as the input to the register that generates
vc0_tx_sop for non-posted transactions
(vc0_tx_np_ok does not have to be used
combinatorial to mask vc0_tx_sop).
It is recommended for all user designs to use
vc0_tx_np_ok.
bypass_rx_valid
Output
bypass_clk
DMA Bypass Rx Data Valid
bypass_rx_ready
Input
bypass_clk
DMA Bypass Rx Ready
bypass_rx_data[127:0]
Output
bypass_clk
TLP data to receive; vc0_rx_data is valid from
the assertion of vc0_rx_sop until the packet is
fully consumed with the assertion of
vc0_rx_eop == vc0_rx_en == 1.
TLP data comprises a complete Transaction
Layer packet as defined by the PCI Express
Specification including the entire packet
header, data payload, and optional TLP Digest
(ECRC). The core strips the packet’s
STP/END/EDB framing, Sequence Number,
and Link CRC as part of its Data Link Layer
functionality prior to the TLP appearing on
this interface. The core checks TLP ECRC,
when present and when checking is enabled,
but does not remove the ECRC from the TLP.
PCI Express TLPs are integer multiples of 32bits in length. Thus, 64-bit and 128-bit Core
Data Width cores may have an unused
remainder portion in the final data word of a
packet. The user is responsible for detecting
and discarding any unused remainder at the
end of the TLP. All of the necessary
information to detect a remainder is located in
the packet TLP header (Length, TLP Digest,
and Format and Type) fields.
bypass_rx_data_valid[15:0]
Output
bypass_clk
DMA Bypass Rx Data Byte Valid
bypass_rx_sop
Output
bypass_clk
Start of TLP indicator and packet receive
request; set == 1 coincident with the first
vc0_rx_data word in each TLP. Once
UG030, April 26, 2013
Page 29
29
Port Name
Direction
Clock
Description
vc0_rx_sop is asserted, the user may assert
vc0_rx_en as desired to consume the TLP.
bypass_rx_eop
Output
bypass_clk
End of TLP indicator; set == 1 coincident with
the last vc0_rx_data word in each packet.
bypass_rx_ecrc_error
Output
bypass_clk
ECRC error indicator; set == 1 from vc0_rx_sop
to vc0_rx_eop inclusive for received TLPs
which contain a detected ECRC error. Clear ==
0 otherwise.
vc0_rx_err_ecrc only reports ECRC errors
when ECRC checking is enabled. ECRC
checking is enabled by software through the
AER Capability.
Packets with ECRC errors are presented on the
Receive Interface in the same format that they
are received including the TLP Digest (ECRC).
User’s task: The user design must decide how
to handle/recover from the error including
whether to use the TLP with the error. ECRC
errors need for higher level software to
correct/handle the error since it is unknown
where in the PCIe hierarchy the error occurred
and PCIe does not have a standard mechanism
for rebroadcasting packets end to end as it
does for a given PCIe link via the Link CRC.
bypass_rx_decode_info[12:0]
Output
bypass_clk
TLP type indicator; provides advance
information about the TLP to facilitate TLP
consumption; this port has a different meaning
in Root Port and Switch Modes.
The core decodes received TLP headers to
determine their destination; the core passes
this information to the Transaction Layer
Interface by asserting the appropriate bits in
this field. See the description of
mgmt_cfg_constants: Base Address Cfg[5:0]
sub fields in
Individual bits of vc0_rx_cmd_data[12:0] carry
the following meaning:
Bits[12:10] – Traffic Class of the packet
Bit[9] – Completion/Base Address
Region indicator
1: indicates the TLP is a Completion or
Message routed by ID
0: indicates the TLP is a read or write
request (or a Message routed by
address) targeting a Base Address
Region; the remaining bits in this field
UG030, April 26, 2013
Page 30
30
Port Name
Direction
Clock
Description
are decoded differently for
Completion versus Base Address
Region hits
Bits[8:0] –
o If Completion TLP (Bit[9] == 1)
Bits[8] - Reserved
Bits[7:0] – Tag; the
Requestor Tag
contained in the TLP;
use to route
completions to the
associated requestor
logic; this field is
reserved if the TLP is a
message rather than a
completion
o If Base Address Region TLP
(Bit[9] == 0)
Bit[8] – When (1), the
packet is a “write”
transaction; when (0),
the packet is a “read”
transaction
Bit[7] – When (1), the
packet requires one or
more Completion
transactions as a
response; (0)
otherwise
Bit[6] – (1) if the TLP
targets the Expansion
ROM Base Address
region
Bit[5] – (1) if the TLP
targets Base Address
Region 5
Bit[4] – (1) if the TLP
targets Base Address
Region 4
Bit[3] – (1) if the TLP
targets Base Address
Region 3
Bit[2] – (1) if the TLP
targets Base Address
Region 2
Bit[1] – (1) if the TLP
UG030, April 26, 2013
Page 31
31
Port Name
Direction
Clock
Description
targets Base Address
Region 1
Bit[0] – (1) if the TLP
targets Base Address
Region 0
vc0_rx_cmd_data is valid for the entire packet
(from vc0_rx_sop == 1 through vc0_rx_eop ==
vc0_rx_en == 1)
bypass_interrupt
Input
bypass_clk
mgmt_interrupt is used to generate interrupt
events on the PCI Express link.
Interrupt support is enabled by setting
mgmt_cfg_constants[128] (Interrupt Enable) ==
1.
The core contains the following two interrupt
configuration options:
Single Interrupt Configuration
o Support for 1 Legacy Interrupt
o Support for 1 MSI Interrupt
o mgmt_interrupt is used to
signal both Legacy and MSI
interrupts
Multiple Interrupt Configuration
o Support for 1 Legacy Interrupt
o Support for up to 32 MSI
Interrupts
o Support for up to 2048 MSI-X
Interrupts
o mgmt_interrupt is used to
signal only Legacy interrupts
o mgmt_interrupt_msix_req,
mgmt_interrupt_msix_ack and
mgmt_interrupt_msix_vector,
available only in this
configuration, are used to
signal MSI and MSI-X
interrupts.
System software selects MSI-X, MSI, or Legacy
Interrupt mode as part of the boot process by
writing MSI-X_Enable==1 or MSI_Enable ==1
or leaving both MSI-X_Enable and MSI_Enable
Configuration Registers at their default
disabled value.
The current interrupt mode of operation is
available by monitoring
mgmt_cfg_status[1296] (MSI_Enable) and
UG030, April 26, 2013
Page 32
32
Port Name
Direction
Clock
Description
mgmt_cfg_status[1183] (MSI-X Enable):
MSI-X_Enable==1 : MSI-X
Interrupt Mode
MSI_Enable ==1 : MSI Interrupt
Mode
MSI-X_Enable == 0 & MSI_Enable
== 0 : Legacy Interrupt Mode
Note: It is illegal for software to set both MSIX_Enable and MSI_Enable at the same time.
User’s task: User interrupt logic must behave
differently depending upon the value of MSIX_Enable and MSI_Enable and whether the
core is a Single or Multiple Interrupt
Configuration:
Single Interrupt Configuration
When Legacy Interrupt Mode is
enabled (MSI_Enable == 0),
mgmt_interrupt implements one levelsensitive interrupt (INTA, INTB,
INTC, or INTD as selected by
mgmt_cfg_constants[132:131]). All
interrupt sources should be logically
ORed together to generate
mgmt_interrupt. Each interrupt source
should continue to drive a 1 until it has
been serviced and cleared by software
at which time it should switch to
driving 0. The core monitors high and
low transitions on mgmt_interrupt and
sends an Interrupt Assert message on
each 0 to 1 transition and an Interrupt
De-Assert Message on each 1 to 0
transition. Transitions which occur too
close together to be independently
transmitted are merged.
When MSI Interrupt Mode is enabled
(MSI_Enable == 1), mgmt_interrupt is
used to implement one MSI Message.
An MSI Interrupt Message is
generated each time mgmt_interrupt
transitions from 0 to 1. To promote
sharing of mgmt_interrupt among
several interrupt sources, each source
should assert mgmt_interrupt for a
single clock cycle and all sources
UG030, April 26, 2013
Page 33
33
Port Name
Direction
Clock
Description
should be ORed together onto
mgmt_interrupt. 0 to 1 transition
events which occur too close together
to be independently transmitted are
merged together into one MSI
message.
Multiple Interrupt Configuration
When Legacy Interrupt Mode is
enabled (MSI-X_Enable == 0 &
MSI_Enable == 0), mgmt_interrupt
implements one level-sensitive
interrupt (INTA, INTB, INTC, or INTD
as selected by
mgmt_cfg_constants[132:131]). All
interrupt sources should be logically
ORed together to generate
mgmt_interrupt. Each interrupt source
should continue to drive a 1 until it has
been serviced and cleared by software
at which time it should switch to
driving 0. The core monitors high and
low transitions on mgmt_interrupt and
sends an Interrupt Assert message on
each 0 to 1 transition and an Interrupt
De-Assert Message on each 1 to 0
transition. Transitions which occur too
close together to be independently
transmitted are merged.
When MSI-X or MSI Interrupt Mode is
enabled (MSI-X_Enable == 1 or
MSI_Enable == 1), mgmt_interrupt is
not used and MSI-X/MSI interrupts are
signaled on mgmt_interrupt_msix_req,
mgmt_interrupt_msix_ack, and
mgmt_interrupt_msix_vector instead.
bypass_msi_en
Output
bypass_clk
MSI interrupt enable
bypass_msix_en
Output
bypass_clk
MSI-X interrupt enable
bypass_interrupt_msix_req
Input
bypass_clk
mgmt_interrupt_msix_req,
mgmt_interrupt_msix_ack, and
mgmt_interrupt_msix_vector are used to
signal MSI-X and MSI interrupts when the
MSI-X/Multi-Vector MSI Configuration core
option is present.
To request an MSI-X or MSI interrupt message
to be transmitted, mgmt_interrupt_msix_req is
UG030, April 26, 2013
Page 34
34
Port Name
Direction
Clock
Description
set to 1 and mgmt_interrupt_msix_vector
indicates the interrupt vector that is to be
transmitted. Once mgmt_interrupt_msix_req is
set, mgmt_interrupt_msix_req and
mgmt_interrupt_msix_vector must remain at
their same values until
mgmt_interrupt_msix_ack is asserted == 1
indicating that the requested interrupt message
was transmitted.
If MSI_En == 1, then the design is operating in
MSI interrupt mode. The core supports up to
the maximum of 32 interrupt vectors
supported by the MSI Capability. The interrupt
vector number to transmit is placed on
mgmt_interrupt_msix_vector[4:0] and
mgmt_interrupt_msix_vector[127:5] is set to all
zeros. The core transmits MSI Interrupts by
transmitting a Memory Write containing the
address and data value (with lower data bits
modified to signal the vector number) setup by
software in the MSI Capability. System
software may not allocate as many MSI
interrupt vectors as requested by the design so
user hardware and software must be designed
to share interrupts if required. The core
performs the necessary aliasing (dropping the
higher mgmt_interrupt_msix_vector[4:0] bits
as required) so the user may drive a full 5-bit
vector number even if fewer vectors are
assigned.
If MSI-X_En == 1, then the design is operating
in MSI-X interrupt mode. The core supports up
to the maximum of 2048 interrupt vectors
supported by the MSI-X Capability.
User’s task: In MSI-X Interrupt mode, the user
implements the required MSI-X Table and
MSI-X PBA in memory space mapped by a
Base Address Register. Each Table entry/vector
consists of a 64-bit address, 32-bit data value,
and 32-bit vector control word. To request an
interrupt be transmitted, the MSI-X Table entry
corresponding to the desired vector number is
fetched and placed onto
mgmt_interrupt_msix_vector[127:0] and
mgmt_interrupt_msix_req is set == 1. If the
interrupt is masked by the MSI-X Capability
UG030, April 26, 2013
Page 35
35
Port Name
Direction
Clock
Description
global Function Mask (mgmt_cfg_status[1182])
or by the per vector Mask Bit (MSI-X Table
entry bit 96) then that vector is masked and
cannot be requested by asserting
mgmt_interrupt_msix_req until the vector is
unmasked. For each clock cycle that
mgmt_interrupt_msix_req == 1 and
mgmt_interrupt_msix_ack ==1, the core
transmits a MSI-X Interrupt by transmitting a
Memory Write containing the address and
data value contained in the provided
mgmt_interrupt_msix_vector[127:0]. System
software may not allocate as many MSI-X
interrupt vectors as requested by the design so
user hardware and software must be designed
to share interrupts if required. The user
hardware design must take into account any
interrupt sharing and always provide a valid,
system-software-allocated vector.
bypass_interrupt_msix_ack
Output
bypass_clk
bypass_interrupt_msix_vector
[127:0]
Input
bypass_clk
bypass_enable
Input
bypass_clk
DMA Bypass interface enable
UG030, April 26, 2013
Page 36
36
DMA-Side Port Descriptions
Pin Name
Direction
Clock
Description
t_areset_n
Input
t_aclk
Active-low asynchronous assert, t_aclk-synchronous
de-assert reset;
Must be asserted when DMA Back End PCI Express
reset is asserted.
t_aclk
Input
AXI interface clock; may be a different clock than the
clock used on the PCI Express-side of the AXI DMA
Back-End Core; synchronization techniques are used
to enable support for a wide variety of clock rates
t_awvalid
Output
t_aclk
Write Address Channel; Optional AWBURST,
AWLOCK, AWCACHE, AWPROT are not
implemented; AWBURST is always incrementingaddress burst; cache, protected, and exclusive
accesses not supported; see below for t_awregion
information
t_awready
Input
t_aclk
t_awregion
[2:0]
Output
t_aclk
t_awaddr
[31:0]
Output
t_aclk
t_awlen
[3:0]
Output
t_aclk
t_awsize
[2:0]
Output
t_aclk
t_wvalid
Output
t_aclk
Write Data Channel
t_wready
Input
t_aclk
t_wdata
[127:0]
Output
t_aclk
t_wstrb
[15:0]
Output
t_aclk
t_wlast
Output
t_aclk
t_bvalid
Input
t_aclk
Write Response Channel; space is reserved in the
master to receive response from all outstanding
write requests, so t_bready is always 1 and does not
need to be used.
t_bready
Output
t_aclk
t_bresp
[1:0]
Input
t_aclk
t_arvalid
Output
t_aclk
Read Address Channel; Optional ARBURST,
ARLOCK, ARCACHE, ARPROT are not
implemented; ARBURST is always incrementingaddress burst; cache, protected, and exclusive
accesses not supported; see below for t_arregion
information .
t_arready
Input
t_aclk
t_arregion
[2:0]
Output
t_aclk
t_araddr
[31:0]
Output
t_aclk
AXI Target Interface
Table 4: Target Interface Pin Descriptions
UG030, April 26, 2013
Page 37
37
Pin Name
Direction
Clock
Description
t_arlen
[3:0]
Output
t_aclk
target_awregion and target_arregion indicate PCI
Express Base Address Region hit information:
• 0: BAR0
• 1: BAR1
• 2: BAR2
• 3: BAR3
• 4: BAR4
• 5: BAR5
• 6: Expansion ROM
• 7: Reserved
t_arsize
[2:0]
Output
t_aclk
t_rvalid
Input
t_aclk
Read Data Channel; space is reserved in the master
to receive data from all outstanding read requests,
so t_rready is always 1
AXI interface clock; may be a different clock than
the clock used on the PCI Express-side of the AXI
DMA Back-End Core; synchronization techniques
are used to enable support for a wide variety of
clock rates
m_awvalid
Input
m_aclk
A Write Address Channel transfer occurs when
m_awvalid == 1 and m_awready == 1
implemented
m_awready
Output
m_aclk
m_awaddr[15:0]
Input
m_aclk
Byte address of register to write
m_wdata [31:0]
Input
m_aclk
Data to write
m_wstrb [3:0]
Input
m_aclk
Byte enables for write
m_wvalid
Input
m_aclk
A Write Data Channel transfer occurs when
m_wvalid == 1 and m_wready == 1
m_wready
Output
m_aclk
m_bvalid
Output
m_aclk
A Write Response Channel transfer occurs when
m_bvalid == 1 and m_bready == 1
m_bready
Input
m_aclk
m_bresp [1:0]
Output
m_aclk
Status of write request: 0 – Successful; 1, 2,3 Error
m_araddr [15:0]
Input
m_aclk
Byte address of register to read
m_rvalid
Output
m_aclk
A Read Response Channel transfer occurs when
AXI Master Interface
Table 5: Master Interface Pin Descriptions
UG030, April 26, 2013
Page 38
38
Pin Name
Direction
Clock
Description
m_rready
Input
m_aclk
m_rvalid == 1 and m_rready == 1
m_rdata [31:0]
Output
m_aclk
Data read
m_rresp [1:0]
Output
m_aclk
Status of read request: 0 – Successful; 1, 2,3 Error
m_interrupt
[4:0]
Output
m_aclk
Pin Name
Direction
Clock
Description
s2c_areset_n
Output
s2c_aclk
Active-low asynchronous assert, s2c_aclk synchronous
de-assert reset; asserted when the DMA Engine has been
reset by software or by PCI Express reset
s2c_aclk [1:0]
Input
s2c_aclk
AXI interface clock; may be a different clock than the
clock used on the PCI Express-side of the AXI DMA
Back-End Core; synchronization techniques are used to
enable support for a wide variety of clock rates
s2c_fifo_addr_n
[1:0]
Input
s2c_aclk
Interface AXI Protocol Selection:
1 – FIFO DMA using AXI4-Stream Protocol
0 – Addressable DMA using AXI3/AXI4
Protocol
This port selects the interface protocol and affects the
operation of the remaining ports
s2c_awvalid [1:0]
Output
s2c_aclk
FIFO DMA: Write Address Channel is unused; tie
s2c_awready == 1 and ignore s2c_aw* outputs
Addressable DMA: Write Address Channel; Optional
AWBURST, AWLOCK, AWCACHE, AWPROT are not
implemented; AWBURST is always incrementingaddress burst; cache, protected, and exclusive accesses
not supported; s2c_awusereop is a non-standard AXI
signal that when 1 indicates that this is the final write
request of a DMA packet transfer
s2c_awready [1:0]
Input
s2c_aclk
s2c_awaddr [71:0]
Output
s2c_aclk
s2c_awlen [7:0]
Output
s2c_aclk
s2c_awusereop
[1:0]
Output
s2c_aclk
s2c_awsize [5:0]
Output
s2c_aclk
s2c_wvalid [1:0]
Output
s2c_aclk
FIFO DMA: Write Data Channel implements AXI4Stream Master protocol using s2c_wdata(tdata),
s2c_wstrb(tkeep), s2c_wlast(tlast), s2c_wvalid(tvalid),
and s2c_wready(tready); NULL (TKEEP == 0) bytes are
only placed at the end of a stream (packet); position
bytes not implemented; optional TSTRB, TID, and
TDEST not implemented; interleaving of streams is not
performed; a new stream will start only after the prior
stream finishes; s2c_wusercontrol is a non-standard AXI
signal, valid for the entire packet transfer (typically
s2c_wready [1:0]
Input
s2c_aclk
s2c_wdata [255:0]
Output
s2c_aclk
s2c_wstrb [31:0]
Output
s2c_aclk
s2c_wlast [1:0]
Output
s2c_aclk
s2c_wusereop
[1:0]
Output
s2c_aclk
System-to-Card Engine Interface
Table 6: System-to-Card Interface Port Descriptions
UG030, April 26, 2013
Page 39
39
Pin Name
Direction
Clock
Description
multiple AXI transfers), that provides the
UserControl[63:0] value software placed in the first
Descriptor of the packet. Optional signal which may be
used to pass information on a per packet basis from user
software to user hardware; s2c_wusercontrol is only
valid for FIFO DMA
Addressable DMA: Write Data Channel implements
AXI3/AXI4 Master protocol; s2c_wusereop is a nonstandard AXI signal, with same timing as s2c_wlast, that
when 1 indicates that this is the final data transfer of a
DMA packet transfer
s2c_bvalid [1:0]
Input
s2c_aclk
FIFO DMA: Write Response Channel is unused; tie
s2c_bready == 1 and ignore s2c_b* outputs
Addressable DMA: Write Response Channel; space is
reserved in the master to receive response from all
outstanding write requests, so t_bready is always 1 and
need not be used
s2c_bready [1:0]
Output
s2c_aclk
s2c_bresp [3:0]
Input
s2c_aclk
Pin Name
Direction
Clock
Description
c2s_areset_n [1:0]
Output
c2s_aclk
Active-low asynchronous assert, c2s_aclk synchronous
de-assert reset; asserted when the DMA Engine has been
reset by software or by PCI Express reset
c2s_aclk [1:0]
Input
c2s_aclk
AXI interface clock; may be a different clock than the
clock used on the PCI Express-side of the AXI DMA
Back-End Core; synchronization techniques are used to
enable support for a wide variety of clock rates
c2s_fifo_addr_n
[1:0]
Input
c2s_aclk
Interface AXI Protocol Selection:
1 – FIFO DMA using AXI4-Stream Protocol
0 – Addressable DMA using AXI3/AXI4
Protocol
This port selects the interface protocol and affects the
operation of the remaining ports
c2s_arvalid [1:0]
Output
c2s_aclk
FIFO DMA: Read Address Channel is unused; tie
c2s_arready == 1 and ignore c2s_ar* outputs
Addressable DMA: Read Address Channel; Optional
AWBURST, AWLOCK, AWCACHE, AWPROT are not
implemented; AWBURST is always incrementingaddress burst; cache, protected, and exclusive accesses
not supported
c2s_arready [1:0]
Input
c2s_aclk
c2s_araddr [71:0]
Output
c2s_aclk
c2s_arlen [7:0]
Output
c2s_aclk
c2s_arsize [5:0]
Output
c2s_aclk
Card-to-System Engine Interface
Table 7: Card-to-System Interface Port Descriptions
UG030, April 26, 2013
Page 40
40
Pin Name
Direction
Clock
Description
c2s_rvalid [1:0]
Input
c2s_aclk
FIFO DMA: Read Data Channel implements AXI4Stream Slave protocol using c2s_rdata(tdata),
c2s_ruserstrb(tkeep), c2s_rlast(tlast), c2s_rvalid(tvalid),
and c2s_rready(tready). NULL (TKEEP[i] == 0) bytes
only permitted at the end of a stream; position bytes not
implemented; optional TSTRB, TID, and TDEST not
implemented; interleaving of streams is not supported; a
new stream may only start after the prior stream
finishes; c2s_ruserstatus is a non-standard AXI signal,
which must be valid when c2s_rlast (tlast) == 1 &
c2s_rvalid (tvalid) == 1, that is used to update the
UserStatus[63:0] value in the last Descriptor of the
packet; optional signal which may be used to pass
information on a per packet basis from user hardware to
user software; c2s_ruserstatus is only valid for FIFO
DMA; if unused, tie to 0
Addressable DMA: Read Data Channel implements
AXI3/AXI4 Master protocol; non-standard AXI ports
c2s_ruserstrb & c2s_ruserstatus are unused and must be
tied to 0.
c2s_rready [1:0]
Output
c2s_aclk
c2s_rdata [255:0]
Input
c2s_aclk
c2s_rresp [3:0]
Input
c2s_aclk
c2s_rlast [1:0]
Input
c2s_aclk
c2s_ruserafull
[1:0]
Output
c2s_aclk
c2s_ruserstrb
[31:0]
Input
c2s_aclk
Pin Name
Direction
Clock
Description
mgmt_pl_link_up_o
Output
Physical Layer Status; (1) Up; (0) Down
mgmt_cfg_id [15:0]
Output
Every PCI Express device is assigned a unique ID which
it must use to generate Requests and to respond with
Completion packets. The ID is assigned by system
software on every Configuration Write, but practically
does not change during regular operation. The core
holds the current ID assigned by system software and
makes it available as mgmt_cfg_id. mgmt_cfg_id must
be used in place of the Requestor ID packet header field
when generating Requests and must be used in place of
the Completer ID packet header field when generating
Completions. See PCI Express Base Specification, Rev 2.0,
Section 2.2.6.2 for additional detail.
mgmt_transactions_pending
Input
Management transaction pending from user.
user_interrupt
Input
User Interrupt to the PCIe core.
mgmt_rp_leg_int_o [3:0]
Output
Legacy interrupts generated from Interrupt Messages.
pm_power_state [1:0]
Output
Value of the core’s Power Management Capability:
Power_State [1:0] Configuration register. This register is
Management Interface
Table 8: Management Interface Port Descriptions
UG030, April 26, 2013
Page 41
41
Pin Name
Direction
Clock
Description
useful for user designs to monitor power state changes
and to determine if they want to assert a PME event to
change the power state back to D0.
pm_l1_enter
Output
Set to 1 by the core for 1 clock when the core begins the
process of entering the L1 link state; 0 otherwise. The
core enters L1 whenever Power_State is programmed to
a value other than D0=-00 and core support for L1 has
been enabled (see mgmt_cfg_constants [377]).
pm_l1_exit
Output
Set to 1 by the core for 1 clock when the core exits the L1
link state back to L0; 0 otherwise. The core exits L1
under system control or in response to a user PME
request via pm_d3cold_n_pme_assert assertion.
pm_l1_enter and pm_l1_exit are informational and can
be ignored for most applications.
pm_l2_enter
Output
Set to 1 by the core for 1 clock when the core begins the
process of entering the L2 link state; 0 otherwise. The
core begins the process of entering L2 whenever a
PM_Turn_Off message is received and core support for
L2 has been enabled (see mgmt_cfg_constants[376]).
pm_l2_enter_ack
Input
The system transmits a PME_Turn_Off message to
Endpoints to instruct them to prepare for power down.
When a PME_Turn_Off message is received, a
PME_TO_Ack message must be transmitted to inform
the system that the Endpoint is ready for power down.
The core provides the option for the core or user to
control the timing of the PME_TO_Ack message
generation via
Disable_Auotmatic_PME_TO_Ack_Message_Generation
== mgmt_cfg_constants[382]:
0 – PME_TO_Ack message is transmitted
automatically in response to PME_Turn_Off
message; In this case, tie pm_l2_enter_ack = 0.
1 – PME_TO_Ack message is transmitted when
the user asserts pm_l2_enter_ack == 1. An
assertion of pm_l2_enter must be followed by
an assertion of pm_l2_enter_ack as soon as the
user design can prepare for power down. The
user‟s device driver should already have been
informed of, and allowed the transition to L2, so
only information required to be stored that the
driver does not have access to (such as registers
that need to be maintained through D3cold)
should need to be stored at this point.
Systems are permitted to implement a time out
mechanism to power down the system if a
UG030, April 26, 2013
Page 42
42
Pin Name
Direction
Clock
Description
PME_TO_Ack message does not arrive in a timely
fashion. The PCI Specification recommends a system
timeout be implemented in the 1mS to 10mS range.
pm_enter_l2_ack delay should be significantly less than
the system timeout (system dependent).
pm_l2_exit
Output
Set to 1 by the core for 1 clock when the core exits the L2
link state back to L0; 0 otherwise. The core exits L2
under system control or in response to a user PME
request via pm_d3cold_n_pme_assert assertion. This
output is only asserted if the core remained powered
and clocked while in L2.
pm_l2_store[2:0]
Output
This port contains Configuration Register information
that must be maintained through D3cold (power and
clock removed from core):
If the user indicates a need (via mgmt_cfg_control) for
Auxiliary Power or the ability to assert PME from
D3cold then the contents of pm_l2_store must be saved
when pm_enter_l2 is asserted and subsequently placed
onto pm_d3cold_restore when power is restored
(exiting D3cold).
If PME_En == 1 then the user may use Beacon/WAKE#
to wake the link. If PME_En == 0, then Beacon/WAKE#
may not be asserted by the user in any .
pm_d3cold_exit
Input
Asserted to signal an exit from D3cold (main power
removed) to D0 (main power restored) so that the core
can restore state information saved by the user in
D3cold.
pm_d3cold_exit must be asserted only when main
power is restored and prior to main power having been
being removed, pm_l2_enter was asserted without a
corresponding pm_l2_exit (core was in L2 when power
was removed).
pm_d3_cold_exit is set to 1 and held at 1 until
pm_d3_cold_exit_ack is asserted at which time
pm_d3_cold_exit must de-assert to 0 within 64 core
clocks (the 64 clocks are to allow the user design time to
perform clock synchronization between the core and
auxiliary power clock domains).
When pm_d3_cold_exit ==1, pm_d3cold_restore must
contain the value saved on pm_l2_store when
pm_l2_enter was last set prior to power removal. Also
when pm_d3_cold_exit ==1, pm_d3cold_pme_asserted
UG030, April 26, 2013
Page 43
43
Pin Name
Direction
Clock
Description
must be 1 if the user asserted WAKE# or generated a
Beacon to wake-up the system while in D3cold and is 0
otherwise.
pm_d3cold_exit_ack
Output
Set to 1 for 1 clock to acknowledge pm_d3cold_exit == 1;
0 otherwise. When pm_d3cold_exit_ack == 1, the value
on pm_d3cold_restore is used to restore Configuration
Registers values that must be saved through D3cold and
the value on pm_d3cold_pme_asserted is used to set the
PME_Status Configuration Register and send a
PM_PME message if the PME_En configuration register
is set.
pm_d3cold_restore
Input
See pm_d3cold_exit above.
pm_d3cold_pme_asserted
Input
See pm_d3cold_exit above.
pm_d3cold_n_pme_assert
Input
When the core is in any power state other than D3cold
(D0, D1, D2, D3hot), this port may be used to transmit a
PME message to the system to request that the system
raise the core to a lower D power level (D1 -> D0 for
example). If the core is in L2 and is configured for
Endpoint training (Upstream Lanes) and
pm_d3cold_n_pme_assert is asserted, then the core will
de-assert the PHYs TX electrical idle signal to cause the
PHY to transmit a Beacon (as per PIPE Specification)
and wait for the remote device to exit electrical idle
before retraining the link and transmitting the PME
message. Note that all exit conditions from L2 result in
the Physical & Data Link Layers going down which
resets the Data Link & Transaction Layers.
Set for one clock to cause the core to set PME_Status and
transmit a PME message (if PME_En == 1) when the core
is in any state other than D3cold; 0 otherwise. This port
may not be asserted when the core is in D3cold.
This port may only be asserted from power states that
the user is advertising the ability to assert PME from via
mgmt_cfg_control.
Pin Name
Direction
Clock
Description
Table 9: Configuration Register Expansion Interface Port Descriptions
UG030, April 26, 2013
Configuration Register Expansion Interface
Page 44
44
core_cfg_exp_addr [11:2]
Input
core_clk
Configuration register being addressed for a
write; all accesses are DWORD aligned (address
bits 1:0 are always 00); {cfg_exp_wr_addr [11:2],
00} addresses in the range of 0x000 to 0x0BC and
0x100 to 0x1FC are handled exclusively by the
core . Addresses between 0x0C0 and 0xFC and
0x200 and 0xFFC are forwarded to the
Configuration Register Expansion Interface for
termination by user logic. This interface is active
for all Configuration Requests, even those that
target core configuration regions, so a full address
decode must be completed.
core_cfg_exp_wr_en
Input
core_clk
When cfg_exp_wr_en is high, the Configuration
Register addressed by cfg_exp_addr must be
written with core_cfg_exp_wr_data conditioned
by cfg_exp_wr_ be byte enables.
core_cfg_exp_wr_data[31:0]
Input
core_clk
Data to write to the addressed Configuration
Register; must be conditionally applied using the
cfg_exp_wr_be byte enables.
core_cfg_exp_wr_be[3:0]
Input
core_clk
Active high byte enables; 1 == write byte; 0 == do
not write byte
core_cfg_exp_rd_en
Input
core_clk
Configuration register read enable; when
core_cfg_exp_rd_en == 1, core_cfg_exp_addr is
valid and specifies the address of the
configuration register that is being read; 1 clock
following core_cfg_exp_rd_en == 1,
core_cfg_exp_rd_data must be valid and contain
the contents of the register accessed by
core_cfg_exp_addr; core_cfg_exp_rd_data must
be held until the next read request
core_cfg_exp_rd_data[31:0]
Output
core_clk
core_cfg_exp_rd_val
Output
core_clk
This signal indicate core_cfg_exp_rd_data is
valid.
The Achronix Cad Environment (ACE) PCIe Configuration GUI (pci.acxip) provides a graphical
and intuitive method by which the user can generate HDL files for the desired PCIe core
functionality. Table 10 describes the values encountered in the GUI.
Class Code[23:0] – Value returned
when the Class Code
Configuration Register is read.
Must be set to the correct value
for the type of device being
implemented; see PCI Local Bus Specification Revision 2.3 Appendix
D for details on setting Class
Code.
CFG_CONSTANTS_CLA
SS_CODE
Operating
Mode
Endpoint
Endpoint,
Upstream
Switch Port,
Downstream
Switch Port
CFG_CONSTANTS_SWI
TCH_PORT_MODE ,
CFG_CONSTANTS_ROO
T_PORT_MODE
2’b11 – Upstream Switch
Port
2’b10 – Downstream
Switch Port
2’b00 – Endpoint
Root Port ID
0x0000
Root Port ID – This 16 bit field is
used to define the ID used for
PCIe Requester ID and Completer
ID when the core is operating as a
Downstream Port (Root Port,
CFG_CONSTANTS_ROO
T_PORT_ID
UG030, April 26, 2013
Page 47
47
Field Name
Default
Values
Description
Verilog Parameter
Downstream Switch Port). When
the core is operating as an
Upstream Port (Endpoint,
Upstream Switch Port), the core
captures its Requestor/Completer
ID from received Configuration
Write transactions.
DMA
Bypass
Disable
Enable,
Disable
Bypass DMA interface and use
only bypass interface
No parameter. Wrapper
changes only. Tie
bypass_enable to 1’b1 if
Enable and show
bypass_* ports, else tie to
0 and hide bypass_*
interface. Also, other
ports m_*, t_*, s2c_* and
c2s_* ports should be
hidden if Enable, shown
if Disable.
Memory Map
BAR0 Base
0xFFFF0
A 32-bit Memory BAR uses one
CFG_CONSTANTS_BAS
UG030, April 26, 2013
Page 48
48
Field Name
Default
Values
Description
Verilog Parameter
Address
000
32-bit Base Address Register
location and is created by setting
bits [2:0] == 000 of the starting
Base Address CfgX register. Bit[3]
is set to indicate a BAR is
prefetchable, but this bit should
only be set, for PCI Express
Devices, for 64-bit BARs. Bits
[31:4] are set to determine the size
of the BAR. The minimum BAR
size is 16 bytes although a
minimum of 4K bytes is
recommended. To determine the
BAR size, bits are set
consecutively from bit 31 down to
the last desired address bit to
implement. The remaining bits
are all set to zero. For example a
64Kbyte, not-prefetchable, 32-bit
Memory BAR is created by
setting Base Address CfgX =
0xFFFF0000.
A 32-bit I/O BAR uses one 32-bit
Base Address Register location
and is created by setting bits [1:0]
== 01 of the starting Base Address
CfgX register. Bits [31:4] are set to
determine the size of the BAR.
Bits[3:2] must be 0 and make the
minimum BAR size 16 bytes. To
determine the BAR size, bits are
set consecutively from bit 31
down to the last desired address
bit to implement. The remaining
bits are all set to zero. For
example a 256 byte, 32-bit I/O
BAR is created by setting Base
Address CfgX = 0xFFFFFF01. I/O
BARs should only be used to
implement legacy I/O functions.
A 64-bit Memory BAR uses two
32-bit Base Address Register
locations and is created by setting
bits [2:0] == 100 of the starting
Base Address CfgX register and
then using the next Base Address
E_ADDRESS_CFG0
Or
CFG_CONSTANTS_BAS
E_ADDRESS_CFG0,
CFG_CONSTANTS_BAS
E_ADDRESS_CFG1.
If Off, 0x00000000, else
=Base Address
BAR0 Size
64K
16-64G
BAR0 Width
32
32, 64
BAR0
Prefetchable
No
No, Yes (64
bit)
BAR0 Type
Memory
Memory/IO
BAR1
Enable
Yes
No, Yes
CFG_CONSTANTS_BAS
E_ADDRESS_CFG1
Or
CFG_CONSTANTS_BAS
E_ADDRESS_CFG2,
CFG_CONSTANTS_BAS
E_ADDRESS_CFG3
BAR1 Base
Address
0xFFFFE
000
BAR1 Size
8K
16-64G
BAR1 Width
32
32, 64
BAR1
Prefetchable
No
No, Yes (64
bit)
BAR1 Type
Memory
Memory/IO
BAR2
Enable
Off
Off, On
CFG_CONSTANTS_BAS
E_ADDRESS_CFG2
Or
CFG_CONSTANTS_BAS
E_ADDRESS_CFG3
Or
CFG_CONSTANTS_BAS
E_ADDRESS_CFG2,
CFG_CONSTANTS_BAS
E_ADDRESS_CFG3
Or
CFG_CONSTANTS_BAS
E_ADDRESS_CFG3,
CFG_CONSTANTS_BAS
E_ADDRESS_CFG4
Or
CFG_CONSTANTS_BAS
E_ADDRESS_CFG4,
CFG_CONSTANTS_BAS
E_ADDRESS_CFG5
BAR2 Base
Address
0xFFFFE
000
BAR2 Size
8k
128-64G
BAR2 Width
32
32, 64
BAR2
Prefetchable
No
No, Yes (64
bit)
BAR2
Type
Memory
Memory/IO
UG030, April 26, 2013
Page 49
49
Field Name
Default
Values
Description
Verilog Parameter
CfgX register to complete the
BAR as a 64-bit register[63:0].
Bit[3] is set to indicate the BAR is
prefetchable. Bits [63:4] are set to
determine the size of the BAR.
The minimum BAR size is 16
bytes, although a minimum of 4K
bytes is recommended. To
determine the BAR size, bits are
set consecutively from bit 63
down to the last desired address
bit to implement. The remaining
bits are all set to zero. For
example a 64Kbyte, prefetchable,
64-bit Memory BAR is created by
setting Base Address CfgX =
0xFFFF000C and Base Address
Cfg(X+1) to 0xFFFFFFFF to create
a 64-bit BAR with value
0xFFFFFFFFFFFF000C. For
example a 1Gbyte, prefetchable,
64-bit Memory BAR is created by
setting Base Address CfgX =
0xC000000C and Base Address
Cfg(X+1) to 0xFFFFFFFF to create
a 64-bit BAR with value
0xFFFFFFFFC000000C.
Expansion
ROM
Enable
Off
Off, On
Determines whether an
Expansion ROM Base Address
Register is implemented, and if
so, its size. Bits [31:11] are set to
determine the size of the BAR.
Bits[10:0] must be 0 to make the
minimum BAR size 2K bytes. To
determine the Expansion ROM
BAR size, bits are set
consecutively from bit 31 down to
the last desired address bit to
implement. The remaining bits
are all set to zero. For example a
64Kbyte Expansion ROM BAR is
created by setting Expansion
ROM Cfg = 0xFFFF0000. The
Expansion ROM BAR is used to
store device specific initialization
or boot instructions that must
CFG_CONSTANTS_EXP
ANSION_ROM_CFG
If Off, 0x00000000, else
=Base Address
Expansion
ROM Base
Address
0xFFFFF
800
Expansion
ROM Size
2K
2K-16M
UG030, April 26, 2013
Page 50
50
Field Name
Default
Values
Description
Verilog Parameter
execute during the boot process.
Use of the Expansion ROM Base
Address is rare. If implemented a
valid Expansion ROM structure
must be implemented at this BAR
location or the system may fail to
boot. If unused, this field must be
0x00000000. See PCI Local Bus Specification Revision 2.3, Sections
6.2.5.2 and 6.3 for additional
detail. Expansion ROM Cfg is the
same in both Endpoint and Root
Port modes of operation.
Power Management
NFTS
0x00
0x00-0xFF
NFTS - Number of NFTS sets to
request when exiting L0s. This is
the NFTS value transmitted in
TS1 and TS2 Ordered Sets during
training. When the remote
device’s transmitter exits L0s, the
CFG_CONSTANTS_NFT
S
UG030, April 26, 2013
Page 51
51
Field Name
Default
Values
Description
Verilog Parameter
PHY receiver uses the FTS sets to
recover symbol lock. NFTS
should be chosen in accordance
with the required time for the PCI
Express PHY which is being used
with the core to achieve symbol
lock when exiting Electrical Idle
from L0s and should also take
into account the PHY RX_IDLE to
RX_DATA latency. Valid values
are 0x00 and 0x10 to 0xFF (0x01 to
0x0F are not permitted). 0x00 is a
special case and selects the
maximum value or 0xFF. Lower
values may only be used by PHY
with low RX_IDLE to RX_DATA
latency. See NFTS Timeout
Extend for additional detail.
L0s Tx Entry
Time
0x0000
0x00000xFFFF
ASPM L0s TX Entry Time –
Number of nanoseconds of idle
time to wait before entering L0s
TX. Idle time is defined as no TLP
or DLLP transmission pending or
actively being transmitted. By
PCIe Specification, the value
programmed should be <= 7 uS
(0x1B58). Too low a value risks
wasting link bandwidth due to
L0s entry/exit latencies. Too high
a value will reduce L0s power
savings. Only used if Enable L0s
Power Mgmt is set. 0 is a special
case and selects 6.9 uS (0x1AF4).
CFG_CONSTANTS_ASP
M_L0S_TX_ENTRY_TIM
E
Endpoint
L0s
Acceptable
Latency
64ns
64ns, 128ns,
256ns, 512ns,
1us, 2us,
4us, No limit
Endpoint L0s Acceptable Latency
– From PCI Express Base
Specification, Rev 2.1 section
7.8.3: “Acceptable total latency
that an Endpoint can withstand
due to the transition from L0s
state to the L0 state. It is
essentially an indirect measure of
the Endpoint’s internal buffering.
Power management software
uses the reported L0s Acceptable
Latency number to compare
against the L0s exit latencies
reported by all components
comprising the data path from
this Endpoint to the Root
Complex Root Port to determine
whether ASPM L0s entry can be
used with no loss of
performance.” Note that the
amount of buffering refers to user
application buffering. Users
should set this field in accordance
with how long a delay is
acceptable for their application.
000 - Maximum of 64 ns
001 - Maximum of 128 ns
010 - Maximum of 256 ns
011 - Maximum of 512 ns
100 - Maximum of 1 μs
101 - Maximum of 2 μs
110 - Maximum of 4 μs
111 - No limit
Non-Endpoints must
hard wire this field to
000.
L0s Exit
Latency
More
than 4us.
Less than
64ns, 64ns to
less than
128ns, 128ns
to less than
256ns, 256ns
to less than
512ns, 512ns
to less than
1us, 1us to
less than
2us, 2us-4us,
more than
4us.
L0s Exit Latency - Length of time
required to complete transition
from L0s to L0:
000 - Less than 64 ns
001 - 64 ns to less than
128 ns
010 - 128 ns to less than
256 ns
011 - 256 ns to less than
512 ns
100 - 512 ns to less than 1
μs
101 - 1 μs to less than 2 μs
110 - 2 μs-4 μs
111 - More than 4 μs
Exit latencies may be significantly
increased if the PCI Express
reference clocks used by the two
devices in the link are common or
separate.
Also:
CFG_CONSTANTS_ENA
BLE_L1S_POWER_MGM
T
And
CFG_CONSTANTS_ENA
BLE_L1_POWER_MGMT
Should be set to 1 if yes.
L1 Entry
Time
0x0000
0x00000xFFFF
Enable ASPM L1 Power Mgmt:
Set to enable the core’s ASPM L1
power management functions.
Clear to disable. This bit should
be clear for PHYs which cannot
support power management due
to missing PCI Express features
such as Electrical Idle Detection
and Generation. If this bit is set,
then ASPM L1 functionality is
implemented and may or may
not be enabled and used by
system software. If ASPM L1
support is enabled, then
mgmt_cfg_constants: ASPM L1
Entry Time specifies the ASPM L1
idle entry time. If ASPM L1
support is advertised in
mgmt_cfg_control: Active State
Power Management (ASPM)
Support in PCIe Link
Capabilities, then Enable ASPM
L1 Power Management must be 1.
CFG_CONSTANTS_ASP
M_L1S_TX_ENTRY_TIM
E
Endpoint L1
Acceptable
Latency
Maximu
m of 1us
Endpoint L1 Acceptable Latency
– From PCI Express Base
Specification, Rev 2.1 section
7.8.3: “This field indicates the
acceptable latency that an
Endpoint can withstand due to
the transition from L1 state to the
L0 state. It is essentially an
indirect measure of the
Endpoint’s internal buffering.
Power management software
uses the reported L1 Acceptable
Latency number to compare
against the L1 Exit Latencies
reported (see below) by all
components comprising the data
path from this Endpoint to the
Root Complex Root Port to
determine whether ASPM L1
entry can be used with no loss of
performance.” Note that the
amount of buffering refers to the
user application buffering. Users
should set this field in accordance
with how long a delay is
acceptable for their application.
000 - Maximum of 1 μs
001 - Maximum of 2 μs
010 - Maximum of 4 μs
011 - Maximum of 8 μs
100 - Maximum of 16 μs
101 - Maximum of 32 μs
110 - Maximum of 64 μs
111 - No limit
Non-Endpoints must
hard wire this field to
000.
L1 Exit
Latency
More
than
64us
L1 Exit Latency – Length of time
required to complete transition
from L1 to L0:
000 - Less than 1μs
001 - 1 μs to less than 2 μs
010 - 2 μs to less than 4 μs
011 - 4 μsto less than 8 μs
100 - 8 μs to less than 16
μs
101 - 16 μs to less than 32
μs
110 - 32 μs-64 μs
111 - More than 64 μs
Exit latencies may be significantly
increased if the PCI Express
reference clocks used by the two
devices in the link are common or
separate.
bit of the Function
number in Requester ID
is used for Phantom
Functions; a multiFunction device is
permitted to implement
Functions 0-3. Functions
0, 1, 2, and 3 are
permitted to use Function
Numbers 4, 5, 6, and 7
CFG_CONTROL_PCIE_D
EV_CAP_PHANTOM_F
UNCTIONS_SUPPORTE
D
UG030, April 26, 2013
Page 56
56
Field Name
Default
Values
Description
Verilog Parameter
respectively as Phantom
Functions.
10 - The two most
significant bits of
Function Number in
Requester ID are used for
Phantom Functions; a
multi-Function device is
permitted to implement
Functions 0-1. Function 0
is permitted to use
Function Numbers 2, 4,
and 6 for Phantom
Functions. Function 1 is
permitted to use Function
Numbers 3, 5, and 7 as
Phantom Functions.
11 - All 3 bits of Function
Number in Requester ID
used for Phantom
Functions. The device
must have a single
Function 0 that is
permitted to use all other
Function Numbers as
Phantom Functions.
Phantom Function support for the
Function must be enabled by the
Phantom Functions Enable field
in the Device Control register
before the Function is permitted
to use the Function Number field
in the Requester ID for Phantom
Functions. If Phantom Functions
Supported do not equal 00, the
core implements the Phantom
Functions Enable register as
read/write resetting to 0 and
otherwise implements Phantom
Functions Enable as read only
tied to 0.
Completion
Timeout
Disable
Supported
Yes
Yes, No
Set to signal that user Completion
Timeout mechanism supports
being disabled; clear to indicate
that the user Completion Timeout
mechanism may not be disabled.
CFG_CONTROL_PCIE_D
EV_CAP2_CPL_TIMEOU
T_DISABLE_SUPPORTE
D
UG030, April 26, 2013
Page 57
57
Field Name
Default
Values
Description
Verilog Parameter
Setting this bit is required by
PCIe Specification for Endpoints
which issue requests on their own
behalf so 1 is the recommended
value.
Completion
Timeout
Range
50us to
50ms
50us to
10ms, 10ms
to 250ms,
250ms to 4s,
4s to 64s
Each bit is set to indicate whether
the user supports a particular
range of completion timeouts:
0000 – Programming not
supported; completion
timeout is in range 50uS
to 50mS
xxx1 – 50 uS to 10 mS
supported
xx1x – 10 mS to 250 mS
supported
x1xx – 250 mS to 4 s
supported
1xxx – 4s to 64 s
supported
Ex: 0110 indicates support for
both 10mS to 250 mS and 250 ms
to 4s
Devices are not required to
support several timeout ranges.
0000 is the recommended value.
CFG_CONTROL_PCIE_D
EV_CAP2_CPL_TIMEOU
T_RANGES_SUPPORTE
D
AER Version
0x2 Enable
Yes
Yes, No
AER Version 0x2 Enable
1 == Implement AER to
version 0x2 (PCIe 2.1 and
later Specification revisions)
o Correctable Errors:
Corrected Internal
Error & Header Log
Overflow are enabled
o Uncorrectable Error:
Uncorrected Internal
Error is enabled
0 == Implement AER to
version 0x1 (PCIe 2.0 and
earlier Specification revisions)
o Correctable Errors:
Corrected Internal
Error & Header Log
Overflow are hidden
and cannot be
CFG_CONTROL_AER_V
ERSION_0X2_ENABLE
UG030, April 26, 2013
Page 58
58
Field Name
Default
Values
Description
Verilog Parameter
signaled
o Uncorrectable Error:
Uncorrected Internal
Error is hidden and
cannot be signaled
MSI
Capability
Disable
No
Yes, No
MSI Capability Disable – (1)
Disable MSI Capability; (0)
Enable MSI Capability; when (1),
the core’s MSI Capability is
removed from the Configuration
Registers Capabilities List, MSI
Interrupt functionality is
disabled, and it will not be
possible to send MSI interrupts
CFG_CONTROL_MSI_C
APABILITY_DISABLE
Number of
MSI vectors
32
1, 2, 4, 8, 16,
32
MSI Multiple Message Capable
[2:0] – This field directly controls
the values of the MSI Capability:
Multiple Message Capable field.
Multiple-message MSI
functionality requires the user
design to indicate the interrupt
vector number that they want
signaled when mgmt_interrupt is
asserted. MSI Multiple Message
Capable advertises the desired
number of vectors. System
software is not required to
provide the desired number of
vectors and programs the
allocated number of vectors into
the Multiple Message Enable
configuration register. The
number of allocated vectors will
be a binary multiple between the
requested amount and 1. The user
hardware design and software
must be able to operate with any
subset of vectors assigned by the
system.
System software reads this field
to determine the number of
requested MSI vectors. The
number of requested vectors
must be aligned to a power of
two (if a function requires three
vectors, it requests four by
MSI-X Capability Disable – (1)
Disable MSI-X Capability; (0)
Enable MSI-X Capability; when
(1) the core’s MSI-X Capability is
removed from the Configuration
Registers Capabilities List, MSI-X
Interrupt functionality is
disabled, and it will not be
possible to send MSI-X interrupts;
this bit only affects core
configurations that support MSIX
CFG_CONTROL_MSI_X_
CAPABILITY_DISABLE
MSI-X Table
Size
31
0 to 2^11
MSI-X Table Size[10:0] – Value to
place into MSI-X Capability:
Table Size field.
MSI-X functionality requires the
user design to implement the
MSI-X Table in Memory Space.
MSI-X Table Size[10:0] is set to
indicate the number of MSI-X
Table entries (Interrupt Vectors)
implemented. MSI-X Table Size is
read by software to determine the
size of the MSI-X Table.
MSI-X Table Size is set to the
number of MSI-X Table entries
(Interrupt Vectors) supported by
the user’s MSI-X Table minus 1.
For example if 32 Table entries
(Interrupt Vectors) are supported,
then MSI-X Table Size[10:0] ==
0x01F.
Each MSI-X Table entry (Interrupt
Vector) requires 4 DWORDs to
store a 64-bit address, 32-bit data
CFG_CONTROL_MSI_X_
TABLE_SIZE
UG030, April 26, 2013
Page 60
60
Field Name
Default
Values
Description
Verilog Parameter
value, and 32-bit Vector Control
field, so a 32 Interrupt Vector
MSI-X Table requires a 512 (32 *
16) byte table.
MSI-X Table
BAR
indicator
BAR0
BAR0,
BAR1, BAR2
MSI-X Table BIR[2:0] – Value to
place into MSI-X Capability :
Table BIR field.
MSI-X functionality requires the
user design to implement the
MSI-X Table in Memory Space
mapped by 1 (32-bit) or 2 (64-bit)
Memory Base Address Registers.
MSI-X Table BIR and MSI-X Table
Offset indicate to system software
where the MSI-X Table is located.
Software writes and reads to the
MSI-X Table and MSI-X PBA are
handled by the user hardware
design. When a MSI-X interrupt is
desired to be generated, the user
hardware design passes the core a
single MSI-X Table entry
corresponding to the desired
interrupt vector. The core uses
this information to create a MSI-X
write request (Memory Write
Request) packet.
MSI-X functionality is described
in the PCI Local Bus Specification, Rev. 3.0. The specification
recommends mapping the MSI-X
Table and MSI-X PBA into
separate, dedicated Base Address
Registers. If this is not possible
then it is recommended to map
the MSI-X Table and MSI-X PBA
into the same dedicated Base
Address Register. If this is not
possible then the MSI-X Table
and MSI-X PBA may be mapped
into a Memory Base Address
Register that is shared with other
functions. If the MSI-X Table and
MSI-X PBA are mapped into a
Base Address Register that is
shared with other functions, then
CFG_CONTROL_MSI_X_
TABLE_BIR
UG030, April 26, 2013
Page 61
61
Field Name
Default
Values
Description
Verilog Parameter
it is required to map the MSI-X
Table and MSI-X PBA into a
dedicated, aligned 4 KByte (OS
page size) or larger (8 KByte
recommended) address region of
the shared BAR.
MSI-X Table BIR[2:0] indicates to
system software which one of a
function’s Base Address registers
is used to map the function’s
MSI-X Table into Memory Space.
For a 64-bit BAR, the BAR
location that contains the lower
32-bits of address is indicated. For
example if the MSI-X table is
located in a 64-bit Memory Space
implemented via {BAR1, BAR0}
then Table BIR is set to 000
(BAR0). For example if the MSI-X
table is located in a 32-bit
Memory Space implemented via
BAR2 then Table BIR is set to 010
(BAR2). MSI-X Table BIR[2:0] is
set as follows:
000 – Base Address
Register 0 (0x10)
001 – Base Address
Register 1 (0x14)
010 – Base Address
Register 2 (0x18)
011 – Base Address
Register 3 (0x1C)
100 – Base Address
Register 4 (0x20)
101 – Base Address
Register 5 (0x24)
110 – Reserved
111 – Reserved
MSI-X Table
Offset
0x0C00
29-Bit hex
MSI-X Table Offset[31:3] - Value
to place into MSI-X Capability :
Table Offset field.
MSI-X Table BIR indicates which
Base Address Register contains
the MSI-X Table. See MSI-X Table
BIR description for additional
information.
CFG_CONTROL_MSI_X_
TABLE_OFFSET
UG030, April 26, 2013
Page 62
62
Field Name
Default
Values
Description
Verilog Parameter
{MSI-X Table Offset[31:3], 000} is
the QWORD aligned address
offset in the Base Address
Register where the MSI-X Table
starts. For example if the MSI-X
Table is located at BAR0 offset
0x10000, then MSI-X Table BIR ==
000 and {MSI-X Table Offset[31,3],
000} = 0x00010000.
MSI-X PBA
Bar
Indicator
BAR0
BAR0,
BAR1, BAR2
MSI-X PBA BIR[2:0] – Value to
place into MSI-X Capability : PBA
BIR field.
Same as MSI-X Table BIR, but
indicates the Base Address
Register of the MSI-X PBA rather
than the MSI-X Table. See MSI-X
Table BIR and MSI-X Table Offset
description for additional
information.
CFG_CONTROL_MSI_X_
PBA_BIR
MSI-X PBA
Offset
0x0000
29-Bit hex
MSI-X PBA Offset[31:3] - Value to
place into MSI-X Capability : PBA
Offset field.
Same as MSI-X Table Offset, but
indicates the Base Address
Register offset for the MSI-X PBA
rather than the MSI-X Table. See
MSI-X Table BIR and MSI-X Table
Offset description for additional
information.
CFG_CONTROL_MSI_X_
PBA_OFFSET
Gen 3 Equalization
UG030, April 26, 2013
Page 63
63
Field Name
Default
Values
Description
Verilog Parameter
Equalization
Method
Preset
Preset,
Algorithm,
Table
CFG_8G_CONSTANTS_
EQ_METHOD
2’b00 – Preset, 2’b01 –
Algorithm, 2’b10 – Table
Equalization
TS1 Ack
Delay
256
1-256
Defines how long the upstream
port (Phase 2) or downstream
port (Phase 3) waits after
requesting new
coefficients/presets before looking
for incoming EQ TS1 sets from
the remote link partner. This
delay by specification should be
set to the round trip delay to the
remote link partner (including
logic delays in the requesting
port) + 500ns. The delay used will
be equal to (eq_ts1_ack_delay[7:0]
* 16) + 500 ns. If eq_ts1_ack_delay
is set to 0, then this will be equal
to a maximum setting of 256, or
256*16 + 500 ns = 4.6
CFG_8G_CONSTANTS_
EQ_TS1_ACK_DELAY.
UG030, April 26, 2013
Page 64
64
Field Name
Default
Values
Description
Verilog Parameter
microseconds. This is the default
value, but can be reduced to
speed up equalization if the
round trip delay is understood in
detail.
Preset - Max
Preset Addr
9
0-9
5.3.3.1 Preset Method
Step through the PCI Express
Specification-defined Tx Presets
(0 through 9).
The Preset Method can optionally
be configured to communicate the
desired Preset for the remote
device to use by Preset Number
(let the remote device convert the
Preset to the appropriate
coefficients) or by calculating the
coefficients equivalent to the
Preset Number and
communicating the coefficients to
the remote device.
When the core is configured to
communicate the Preset via
coefficients rather than Preset
Number, the core uses the
coefficients from the Pre-Cursor
Coefficient and Post-Cursor
Coefficient columns in the table
above to perform the calculation.
The coefficient target, expressed
as a real number in parenthesis, is
given along with the rounded (to
1/64) coefficient value that is
used.
The Preset method works well
with PHY which take the
maximum of 2mS to evaluate
equalization settings since there
are only 10 (the max number that
could be attempted) settings that
would typically be tried.
Typically Preset 0xA would not
be tried since it is primarily
intended for diagnostics.
The Preset method trying all
presets 0 to 9 is recommended for
users to start with if they are
CFG_8G_CONSTANTS_
EQ_PRESET_ADDR_LIM
IT
UG030, April 26, 2013
Page 65
65
Field Name
Default
Values
Description
Verilog Parameter
unsure which method they
should use.
Algorithm –
Pre Cursor
Step Size
4
1-16
5.3.3.2 Algorithm Method
Evenly step through the possible
coefficient values. Complete
coefficient range coverage at the
expense of longer run time.
for (pre = 0; pre <=
eq_alg_pre_cursor_limit; pre =
pre +
eq_alg_pre_cursor_step_size)
for (post = 0; post <=
eq_alg_post_cursor_limit; post =
post +
eq_alg_post_cursor_step_size)
try {post, pre}
Note: Post-Cursor values (post)
from 0 to 32 (0 to 0.5) are possible
Note: Pre-Cursor values (pre)
from 0 to 16 (0 to 0.25) are
possible
Stepping through all 17 (0-16)
Pre-Cursor values and all 33 (0-
32) Post-Cursor values takes 561
iterations. Step size is increased to
walk through the values more
quickly (and coarsely). Limits are
lowered to exclude larger values
that are less likely to produce the
desired results.
For example:
Steps of 4 for Pre-Cursor and 8 for
Post Cursor with Limits == 16 and
32 respectively requires 25
iterations.
Steps of 8 for Pre-Cursor and 16
for Post Cursor with Limits == 16
and 32 respectively requires 9
iterations.
Be careful when assigning step
sizes not to exceed the
Equalization time limit.
The Algorithm Method works
best with PHY which take
significantly less than the
maximum of 2mS to evaluate
CFG_8G_CONSTANTS_
EQ_ALG_PRE_CURSOR_
STEP_SIZE
Algorithm –
Post Cursor
Step Size
8
1-32
CFG_8G_CONSTANTS_
EQ_ALG_POST_CURSO
R_STEP_SIZE
Algorithm –
Pre Cursor
Limit
16
0-16
CFG_8G_CONSTANTS_
EQ_ALG_PRE_CURSOR_
LIMIT
Algorithm –
Post Cursor
Limit
32
0-32
CFG_8G_CONSTANTS_
EQ_ALG_POST_CURSO
R_LIMIT
UG030, April 26, 2013
Page 66
66
Field Name
Default
Values
Description
Verilog Parameter
equalization settings, so that fine
step sizes can be used.
Table –
Address
Limit
8
0-31
5.3.3.3 Table Method
Step through the user-provided
coefficient table.
for (i = 0; i <= eq_table_addr_limit;
i = i + 1) {
pre =
eq_table_pre_cursor_ceof[((i+1)*6)
-1:(i*6)]
post =
eq_table_post_cursor_ceof[((i+1)*
6)-1:(i*6)]
try {post, pre}
}
Note: Pre-Cursor values (pre)
from 0 to 16 (0 to 0.25) are
possible
Note: Post-Cursor values (post)
from 0 to 32 (0 to 0.5) are possible
In this method the user specifies
up to 32 coefficient pairs to try
and may select the 32 (or fewer)
coefficient pairs that are most
likely to work for the given PHY.
The Table Method works well for
users that know the range of
settings that typically work well
for their PHY. The table values
can be concentrated on coefficient
ranges that are more likely to
work well.
Be careful when assigning
eq_table_addr_limit not to
exceed the Equalization time
limit.