Serial Flash interface
Configurable LED operation for software or customizing OEM
LED displays
Device disable capability
Package size - 25 mm x 25 mm
Networking
10 GbE/1 GbE/100 Mb/s copper PHYs integrated on-chip
Support for jumbo frames of up to 15.5 KB
Flow control support: send/receive pause frames and receive
FIFO thresholds
Statistics for management and RMON
802.1q VLAN support
TCP segmentation offload: up to 256 KB
IPv6 support for IP/TCP and IP/UDP receive checksum offload
Fragmented UDP checksum offload for packet reassembly
Message Signaled Interrupts (MSI)
Message Signaled Interrupts (MSI-X)
Interrupt throttling control to limit maximum interrupt rate
and improve CPU usage
Flow Director (16 x 8 and 32 x 4)
128 transmit queues
Receive packet split header
Receive header replication
Dynamic interrupt moderation
DCA support
TCP timer interrupts
No snoop
Relaxed ordering
Support for 64 virtual machines per port (64 VMs x 2 queues)
Support for Data Center Bridging (DCB);(802.1Qaz,
802.1Qbb, 802.1p)
PCIe base specification 2.1 (2.5GT/s or 5GT/s)
Bus width — x1, x2, x4, x8
64-bit address support for systems using more than 4 GB of
physical memory
UNCTIONS
MAC F
Descriptor ring management hardware for transmit and
receive
ACPI register set and power down functionality supporting
D0 and D3 states
A mechanism for delaying/reducing transmit interrupts
Software-controlled global reset bit (resets everything
except the configuration registers)
Four Software-Definable Pins (SDP) per port
Wake up
IPv6 wake-up filters
Configurable flexible filter (through NVM)
LAN function disable capability
Programmable memory transmit buffers (160 KB/port)
Default configuration by NVM for all LEDs for pre-driver
functionality
Manageability
SR-IOV support
Eight VLAN L2 filters
16 Flex L3 port filters
Four Flexible TCO filters
Four L3 address filters (IPv4)
Advanced pass through-compatible management packet
transmit/receive support
SMBus interface to an external Manageability Controller
(MC)
NC-SI interface to an external MC
Four L3 address filters (IPv6)
Four L2 address filters
Revision Number: 2.7
March 2014
X540 — Revisions
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS
OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING
TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death.
SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND
ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL
CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF
PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR
ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or
characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no
responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice.
Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-
• Revised table note references in section 11.7.2.2.3 (Read Status Command).
• Revised section 6.2.1 (NVM Organization).
• Revised section 8.2.4.23.1 (Core Control 0 Register; bit 1 description).
• Revised section 8.2.4.4.14 (PCIe Control Extended Register; bit 30 description).
• Revised section 8.2.4.8.9 (PCIe Control Extended Register; bit 1 description).
• Revised section 8.2.4.23.10 (MAC Control Register; bits 7:5).
• Removed PSRTYPE from note 11 in section 4.2.3.
5
NOTE:This page intentionally left blank.
X540 — Revision History
6
Introduction—X540 10GBase-T Controller
1.0 Introduction
1.1 Scope
This document describes the external architecture (including device operation, pin
descriptions, register definitions, etc.) for the Intel
dual port 10GBASE-T Network Interface Controller.
This document is intended as a reference for logical design group, architecture validation,
firmware development, software device driver developers, board designers, test
engineers, or anyone else who might need specific technical or programming information
about the X540.
1.2 Product Overview
The X540 is a derivative of the 82599, the Intel 10 GbE Network Interface Controller
(NIC) targeted for blade servers. Many features of its predecessor remain intact;
however, some have been removed or modified as well as new features introduced.
The X540 includes two integrated 10GBASE-T copper Physical Layer Transceivers (PHYs).
A standard MDIO interface, accessible to software via MAC control registers, is used to
configure and monitor each PHY operation.
The X540 also supports a single port configuration.
®
Ethernet Controller X540, a single or
5
1.2.1 System Configurations
The X540 is targeted for system configurations such as rack mounted or pedestal
servers, where it can be used as an add-on NIC or LAN on Motherboard (LOM). Another
system configuration is for high-end workstations.
X540 10GBase-T Controller—Introduction
Figure 1-1 Typical Rack / Pedestal System Configuration
Figure 1-3 X540 External Interfaces Diagram (Single Port Configuration)
7
1.2.3 PCIe* Interface
The X540 supports PCIe v2.1 (2.5GT/s or 5GT/s). See Section 2.1.2 for full pin
description and Section 12.4.7 for interface timing characteristics.
1.2.4 Network Interfaces
Two independent 10GBASE_T (10BASE-T_0 and 10GBASE-T_1) interfaces are used to
connect the two the X540 ports to external devices. Each 10GBASE-T interface can
operate at any of the following speeds:
• 10 Gb/s, 10GBASE-T mode
• 1 Gb/s, 1000BASE-T mode
• 100 Mb/s, 100BASE-TX mode
Refer to Section 2.1.3for full-pin descriptions.For the timing characteristics of those
interfaces, refer to the relevant external specifications listed in Section 12.4.8.
X540 10GBase-T Controller—Introduction
1.2.5 Serial Flash Interface
The X540 provides an external SPI serial interface to a Flash device, also referred to as
Non-Volatile Memory (NVM). The X540 supports serial Flash devices with up to 16 Mb (2
MB) of memory.
1.2.6 SMBus Interface
SMBus is an optional interface for pass-through and/or configuration traffic between an
external Manageability Controller (MC) and the X540.
The X540's SMBus interface supports a standard SMBus, up to a frequency of 400 KHz.
Refer to Section 2.1.5for full-pin descriptions and Section 12.4.6.3 for timing
characteristics of this interface.
1.2.7 NC-SI Interface
NC-SI is an optional interface for pass-through traffic to and from an MC. The X540
meets the NC-SI version 1.0.0 specification.
Refer to Section 2.1.6 for the pin descriptions, and Section 11.7.1 for NC-SI
programming.
The X540 has four SDP pins per port that can be used for miscellaneous hardware or
software-controllable purposes. These pins can each be individually configured to act as
either input or output pins. Via the SDP pins, the X540 can support IEEE1588 auxiliary
device connections, and other functionality. For more details on the SDPs see Section 3.5
and the ESDP register section.
1.2.9 LED Interface
The X540 implements four output drivers intended for driving external LED circuits per
port. Each of the four LED outputs can be individually configured to select the particular
event, state, or activity, which is indicated on that output. In addition, each LED can be
individually configured for output polarity as well as for blinking versus non-blinking
(steady-state) indications.
The configuration for LED outputs is specified via the LEDCTL register. In addition, the
hardware-default configuration for all LED outputs can be specified via an NVM field (see
Section 6.4.6.3), thereby supporting LED displays configured to a particular OEM
preference.
1.3 Features Summary
Table 1-1 to Table 1-7 list the X540's features in comparison to previous dual-port 10
GbE Ethernet controllers.
Table 1-1 General Features
FeatureX5408259982598 Reserved
Serial Flash InterfaceYYY
4-wire SPI EEPROM InterfaceNYY
Configurable LED Operation for Software or OEM
Customization of LED Displays
Protected EEPROM/NVM1 Space for Private
Configuration
Device Disable CapabilityYYY
Package Size25 mm x 25
YYY
YYY
25 mm x 25 mm31 x 31 mm
mm
9
X540 10GBase-T Controller—Introduction
Table 1-1 General Features
FeatureX5408259982598 Reserved
Embedded Thermal DiodeYNY
Watchdog TimerYYN
Time Sync (IEEE 1588)Y
2
YN
1. X540 Only.
2. Time sync not supported at 100 Mb/s link speed.
Table 1-2 Network Features
FeatureX5408259982598 Reserved
Compliant with the 10 GbE and 1 GbE Ethernet/
802.3ap (KX/KX4) Specification
Compliant with the 10 GbE 802.3ap (KR) specificationNYN
Support of 10GBASE-KR FECNYN
Compliant with the 10 GbE Ethernet/802.3ae (XAUI)
Specification
Compliant with XFI interfaceNYN
Compliant with SFI interfaceNYN
Support for EDCNNN
Compliant with the 1000BASE-BX SpecificationNYY
Auto Negotiation/Full-Duplex at 100 Mb/s Operation
NYY
NYY
Y
(100 Mb/s FDX)Y (100 Mb/s FDX)
NA
10000/1000/100 Mb/s Copper PHYs Integrated On-
YNN
Chip
Support Jumbo Frames of up to 15.5 KBY
1
1
Y
Auto-Negotiation Clause 73 for Supported ModesNYY
MDIO Interface Clause 45Y
YY
(internally)
Flow Control Support: Send/Receive Pause Frames
YYY
and Receive FIFO Thresholds
Statistics for Management and RMONYYY
802.1q VLAN SupportYYY
SerDes Interface for External PHY Connection or
NYY
System Interconnect
10
Y
Introduction—X540 10GBase-T Controller
Table 1-2 Network Features
FeatureX5408259982598 Reserved
SGMII Interface
Support of non Auto-Negotiation PartnerNYY
Double VLANYYN
1. The X540 and 82599 support full-size 15.5 KB jumbo packets while in a basic mode of operation. When DCB mode is enabled,
or security engines enabled, or virtualization is enabled, or OS2BMC is enabled, then the X540 supports 9.5 KB jumbo packets.
Packets to/from MC longer than 2KB are filtered out.
N
Y
(100 Mb/s and 1
GbE only)
N
Table 1-3 Host Interface Features
FeatureX5408259982598 Reserved
PCIe* version (speed)
Number of Lanesx1, x2, x4, x8x1, x2, x4, x8x1, x2, x4, x8
PCIe v2.1 (5GT/s)
PCIe v2.0 (2.5GTs
& 5GT/s)
PCIe Gen 1
v2.0 (2.5GT/s)
64-bit Address Support for Systems Using More
Than 4 GB of Physical Memory
Outstanding Requests for Tx Data Buffers161616
Outstanding Requests for Tx Descriptors888
Outstanding Requests for Rx Descriptors884
Credits for P-H/P-D/NP-H/NP-D (shared for the two
ports)
Max Payload Size Supported512 Bytes512 Bytes256 Bytes
Max Request Size Supported2 KB2 KB256 Bytes
Link Layer Retry Buffer Size (shared for the two
ports)
Vital Product Data (VPD)YYN
End to End CRC (ECRC)YYN
TLP Processing Hints (TPH)NNN
Latency Tolerance Reporting (LTR)NNN
ID-Based Ordering (IDO)NNN
Access Control Services (ACS)YNN
YYY
16/16/4/416/16/4/48/16/4/4
3.4 KB3.4 KB2 KB
ASPM Optional Compliance CapabilityYNN
PCIe Functions Off Via Pins, While LAN Ports Are
On
YNN
11
Table 1-4 LAN Functions Features
X540 10GBase-T Controller—Introduction
FeatureX5408259982598
Programmable Host Memory Receive Buffers YYY
Descriptor Ring Management Hardware for
Transmit and Receive
ACPI Register Set and Power Down Functionality
Supporting D0 & D3 States
Integrated MACsec, 801.2AE Security Engines:
AES-GCM 128-bit; Encryption + Authentication;
One SC x 2 SA Per Port. Replay Protection with
Zero Window
Integrated IPsec Security Engines: AES-GCM 128bit; AH or ESP encapsulation; IPv4 and IPv6 (no
option or extended headers)
Software-Controlled Global Reset Bit (Resets
Everything Except the Configuration Registers)
Software-Definable Pins (SDP) (per port)488
Four SDP Pins can be Configured as General
Purpose Interrupts
Data Center Bridging (DCB), IEEE Compliance to
Enhanced Transmission Selection (ETS) -
802.1Qaz
Priority-based Flow Control (PFC) - 802.1Qbb
Rate Limit VM Tx Traffic per TC (per TxQ)YYN
IPv6 Support for IP/TCP and IP/UDP Receive
Checksum Offload
Fragmented UDP Checksum Offload for Packet
Reassembly
FCoE Tx / Rx CRC OffloadYYN
FCoE Transmit Segmentation256 KB256 KBN
FCoE Coalescing and Direct Data Placement512 outstanding
Message Signaled Interrupts (MSI)YYY
Message Signaled Interrupts (MSI-X)YYY
Interrupt Throttling Control to Limit Maximum
Interrupt Rate and Improve CPU Use
1
Y (up to 8)
Y (up to 8)
YYY
YYY
Read — Write
requests / port
YYY
Y (up to 8)
Y (up to 8)
512 outstanding
Read — Write
requests / port
Y (up to 8)
Y (up to 8)
N
N
Rx Packet Split Header YYY
Multiple Rx Queues (RSS)Y (multiple
Flow Director Filters: up to 32 KB -2 Flows by Hash
Filters or up to 8 KB -2 Perfect Match Filters
Number of Rx Queues (per port)12812864
Number of Tx Queues (per port)12812832
Low Latency Interrupts
DCA Support
TCP Timer Interrupts
No Snoop
Relax Ordering
Rate Control of Low Latency InterruptsYYN
1. The X540 performance features are focused on 10 GbE performance improvement whereas 1 GbE was optimized for power
saving.
modes)
YYN
Yes to allYes to allYes to all
Y (multiple
modes)
8x8
16x4
13
Table 1-6 Virtualization Features
FeatureX5408259982598 Reserved
X540 10GBase-T Controller—Introduction
Support for Virtual Machine Device Queues
(VMDq1 and Next Generation VMDq)
L2 Ethernet MAC Address Filters (unicast and
multicast)
L2 VLAN filters6464-
PCI-SIG SR IOVYYN
Multicast and Broadcast Packet ReplicationYYN
Packet MirroringYYN
Packet LoopbackYYN
Traffic ShapingYYN
646416
12812816
Table 1-7 Manageability Features
FeatureX5408259982598 Reserved
Advanced Pass Through-Compatible Management
Packet Transmit/Receive Support
SMBus Interface to an External MCYYY
NC-SI Interface to an External MCYYY
YYY
New Management Protocol Standards Support
(NC-SI)
L2 Address Filters444
VLAN L2 Filters888
Flex L3 Port Filters161616
Flexible TCO Filters444
L3 Address Filters (IPv4)444
L3 Address Filters (IPv6)444
Host-Based Application-to-BMC Network
Communication Patch (OS2BMC)
Flexible MAC AddressYNN
MC Inventory of LOM Device InformationYNN
iSCSI Boot Configuration Parameters via MCYNN
14
YYY
YNN
Introduction—X540 10GBase-T Controller
Table 1-7 Manageability Features
FeatureX5408259982598 Reserved
MC Monitoring YNN
NC-SI to MCYNN
NC-SI ArbitrationYNN
MCTP over SMBus
NC-SI Package ID Via SDP PinsYNN
1. The X540's MCTP protocol implementation is based on an early draft of the DSP0261 Standard and it includes a Payload Type
field that was removed in the final release of the standard.
1
YNN
1.4 Overview of New Capabilities Beyond
82599
1.4.1 OS-to-BMC Management Traffic
Communication (OS2BMC)
OS2BMC is a filtering method that enables server management software to communicate
with a MC
interface. Functionality includes:
• A single PCI function (for multi-port devices, each LAN function enables
• One or more IP address(es) for the host along with a single (and separate) IP
1
via standard networking protocols such as TCP/IP instead of a chipset-specific
communication to the MC)
address for the MC
• One or more host MAC address(es) along with a single (and separate) MAC address
for the MC
• ARP/RARP/ICMP protocols supported in the MC
1.4.2 MCTP Over SMBus
Allow reporting and controlling of all the information exposed in a LOM device via NC-SI,
in NIC devices via MCTP over SMBus.
MCTP is a transport protocol that does not provide a way to control a device. In order to
allow a consistent interface for both LOM and NIC devices, it is planned to implement an
NC-SI over MCTP protocol.
1. Also referred to as Baseboard Management Controller (BMC).
15
X540 10GBase-T Controller—Introduction
An Intel NIC can connect through MCTP to a MC. The MCTP interface will be used by the
MC to control the NIC and not for pass-through traffic.
Note:The X540's MCTP protocol implementation is based on an early draft of the
DSP0261 Standard and it includes a Payload Type field that was removed in
the final release of the standard.
1.4.3 PCIe v2.1 Features
1.4.3.1 Access Control Services (ACS)
the X540 supports ACS Extended Capability structures on all functions. the X540 reports
no support for the various ACS capabilities in the ACS Extended Capability structure.
Further information can be found in Section 9.4.5.
1.4.3.2 ASPM Optionality Compliance Capability
A new capability bit, ASPM (Active State Power Management) Optionality Compliance bit
has been added to the X540. Software is permitted to use the bit to help determine
whether to enable ASPM or whether to run ASPM compliance tests. New bit indicates that
the X540 can optionally support entry to L0s. Further information can be found in
Section 9.3.11.7.
1.5 Conventions
1.5.1 Terminology and Acronyms
See Section 17.0.
This section defines the organization of registers and memory transfers, as it relates to
information carried over the network:
• Any register defined in Big Endian notation can be transferred as is to/from Tx and Rx
buffers in the host memory. Big Endian notation is also referred to as being in
network order or ordering.
• Any register defined in Little Endian notation must be swapped before it is
transferred to/from Tx and Rx buffers in the host memory. Registers in Little Endian
order are referred to being in host order or ordering.
Tx and Rx buffers are defined as being in network ordering; they are transferred as is
over the network.
Note:Registers not transferred on the wire are defined in Little Endian notation.
Registers transferred on the wire are defined in Big Endian notation, unless
specified differently.
16
Introduction—X540 10GBase-T Controller
1.6 References
The X540 implements features from the following specifications:
IEEE Specifications
• 10GBASE-T as per the IEEE 802.3an standard.
• 1000BASE-T and 100BASE-TX as per the IEEE standard 802.3-2005 (Ethernet).
Incorporates various IEEE Standards previously published separately. Institute of
Electrical and Electronic Engineers (IEEE).
• IEEE 1149.6 standard for Boundary Scan (MDI pins excluded)
• IEEE standard 802.3ap, draft D3.2.
• IEEE standard 1149.1, 2001 Edition (JTAG). Institute of Electrical and Electronics
Engineers (IEEE).
• IEEE standard 802.1Q for VLAN.
• IEEE 1588 International Standard, Precision clock synchronization protocol for
networked measurement and control systems, 2004-09.
• IEEE P802.1AE/D5.1, Media Access Control (MAC) Security, January 19, 2006.
PCI-SIG Specifications
• PCI Express® Base Specification Revision 2.1, March 4, 2009
• Definition for new PAUSE function, Rev. 1.2, 12/26/2006.
• GCM spec — McGrew, D. and J. Viega, “The Galois/Counter Mode of Operation
(GCM)”, Submission to NIST. http://csrc.nist.gov/CryptoToolkit/modes/
proposedmodes/gcm/gcm-spec.pdf, January 2004.
• FRAMING AND SIGNALING-2 (FC-FS-2) Rev 1.00
• Fibre Channel over Ethernet Draft Presented at the T11 on May 2007
• Per Priority Flow Control (by Cisco Systems) — Definition for new PAUSE function,
Rev 1.2, EDCS-472530
In addition, the following document provides application information:
Tx data flow provides a high-level description of all data/control transformation steps
needed for sending Ethernet packets over the wire.
18
Introduction—X540 10GBase-T Controller
Table 1-8 Tx Data Flow
StepDescription
1The host creates a descriptor ring and configures one of the X540’s transmit queues with the address location,
2The host is requested by the TCP/IP stack to transmit a packet, it gets the packet data within one or more data
3The host initializes the descriptor(s) that point to the data buffer(s) and have additional control parameters
4The host updates the appropriate Queue Tail Pointer (TDT).
5The X540’s DMA senses a change of a specific TDT and as a result sends a PCIe request to fetch the
6The descriptor(s) content is received in a PCIe read completion and is written to the appropriate location in the
7The DMA fetches the next descriptor and processes its content. As a result, the DMA sends PCIe requests to
8The packet data is being received from PCIe completions and passes through the transmit DMA that performs
9While the packet is passing through the DMA, it is stored into the transmit FIFO.
10The transmit switch arbitrates between host and management packets and eventually forwards the packet to
length, head, and tail pointers of the ring (one of 128 available Tx queues).
buffers.
that describes the needed hardware functionality. The host places that descriptor in the correct location at the
appropriate Tx ring.
descriptor(s) from host memory.
descriptor queue.
fetch the packet data from system memory.
all programmed data manipulations (various CPU offloading tasks as checksum offload, TSO offload, etc.) on
the packet data on the fly.
After the entire packet is stored in the transmit FIFO, it is then forwarded to transmit switch module.
the MAC.
11The MAC appends the L2 CRC to the packet and delivers the packet to the integrated PHY.
12The PHY performs the PCS encoding, scrambling, Loopback Dropped Packet Count (LDPC) encoding, and the
other manipulations required to deliver the packet over the copper wires at the selected speed.
13When all the PCIe completions for a given packet are complete, the DMA updates the appropriate
descriptor(s).
14The descriptors are written back to host memory using PCIe posted writes. The head pointer is updated in host
memory as well.
15An interrupt is generated to notify the host driver that the specific packet has been read to the X540 and the
driver can then release the buffer(s).
19
1.7.2 Receive (Rx) Data Flow
Rx data flow provides a high-level description of all data/control transformation steps
needed for receiving Ethernet packets.
Table 1-9 Rx Data Flow
StepDescription
X540 10GBase-T Controller—Introduction
1The host creates a descriptor ring and configures one of the X540’s receive queues with the address location,
2The host initializes descriptor(s) that point to empty data buffer(s). The host places these descriptor(s) in the
3The host updates the appropriate Queue Tail Pointer (RDT).
4A packet enters the PHY through the copper wires.
5The PHY performs the required manipulations on the incoming signal such as LDPC decoding, descrambling,
6The PHY delivers the packet to the Rx MAC.
7The MAC forwards the packet to the Rx filter.
8If the packet matches the pre-programmed criteria of the Rx filtering, it is forwarded to an Rx FIFO.
9The receive DMA fetches the next descriptor from the appropriate host memory ring to be used for the next
10After the entire packet is placed into an Rx FIFO, the receive DMA posts the packet data to the location
11When the packet is placed into host memory, the receive DMA updates all the descriptor(s) that were used by
12The receive DMA writes back the descriptor content along with status bits that indicate the packet information
length, head, and tail pointers of the ring (one of 128 available Rx queues).
correct location at the appropriate Rx ring.
PCS decoding, etc.
received packet.
indicated by the descriptor through the PCIe interface.
If the packet size is greater than the buffer size, more descriptor(s) are fetched and their buffers are used for
the received packet.
the packet data.
including what offloads were done on that packet.
13The X540 initiates an interrupt to the host to indicate that a new received packet is ready in host memory.
14The host reads the packet data and sends it to the TCP/IP stack for further processing. The host releases the
20
associated buffer(s) and descriptor(s) once they are no longer in use.
Pin Interface—X540 10GBase-T Controller
2.0 Pin Interface
2.1 Pin Assignments
2.1.1 Signal Type Definition
SignalDefinitionDC Specification
InStandard 2.5V I/O buffer, functions as input-only signal. 3.3V
Out (O)Standard 2.5V I/O buffer, functions as output-only signal. 3.3V
T/sTri-state is a 2.5V bi-directional, tri-state input/output pin. 3.3V
O/dOpen drain enables multiple devices to share as a wire-OR.Section 12.4.3
A-inAnalog input signals.Section 12.4.6 and Section 12.4.7
A-outAnalog output signals.Section 12.4.6 and Section 12.4.7
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
22
PET_4_p
PET_4_n
AC15
AD15
A-OutPCIe Serial Data Output. A serial
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
Pin Interface—X540 10GBase-T Controller
ReservedPin NameBall #Type
PET_5_p
PET_5_n
PET_6_p
PET_6_n
PET_7_p
PET_7_n
PER_0_p
PER_0_n
AC16
AD16
AC21
AD21
AC22
AD22
AB2
AB1
A-OutPCIe Serial Data Output. A serial
A-OutPCIe Serial Data Output. A serial
A-OutPCIe Serial Data Output. A serial
A-InPCIe Serial Data Output. A serial
Internal
Pup/Pdn
External
Pup/Pdn
Name and Function
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
PER_1_p
PER_1_n
PER_2_p
PER_2_n
PER_3_p
PER_3_n
AD6
AC6
AD7
AC7
AD12
AC12
A-InPCIe Serial Data Output. A serial
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
A-InPCIe Serial Data Output. A serial
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
A-InPCIe Serial Data Output. A serial
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
23
X540 10GBase-T Controller—Pin Interface
ReservedPin NameBall #Type
PER_4_p
PER_4_n
PER_5_p
PER _5_n
PER _6_p
PER _6_n
PER _7_p
PER _7_n
AD13
AC13
AD18
AC18
AD19
AC19
AB23
AB24
A-InPCIe Serial Data Output. A serial
A-InPCIe Serial Data Output. A serial
A-InPCIe Serial Data Output. A serial
A-InPCIe Serial Data Output. A serial
Internal
Pup/Pdn
External
Pup/Pdn
Name and Function
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
differential output pair running at
5 Gb/s or 2.5 Gb/s. This output
carries both data and an
embedded 5 GHz or 2.5 GHz clock
that is recovered along with data
at the receiving end.
PE_CLK_p
PE_CLK_n
PE_RBIAS0V1A-InoutConnection point for the band-gap
PE_RBIAS1V2A-InoutConnection point for the band-gap
PE_WAKE_NW1O/dPup
PE_RST_NW2InPower and Clock Good Indication.
1. Pup value should be considered as 10 K.
Y2
Y1
A-InPCIe Differential Reference Clock
In (a 100 MHz differential clock
input).
This clock is used as the reference
clock for the PCIe Tx/Rx circuitry
and by the PCIe core PLL to
generate clocks for the PCIe core
logic.
reference resistor. This should be a
precision 1% 3.01 K resistor tied
to ground.
reference resistor. This should be a
precision 1% 3.01 K resistor tied
to ground.
1
Wake. Pulled low to indicate that a
Power Management Event (PME) is
pending and the PCIe link should
be restored. Defined in the PCIe
specifications.
Indicates that power and the PCIe
reference clock are within
specified values. Defined in the
PCIe specifications. Also called
PCIe Reset.
24
Pin Interface—X540 10GBase-T Controller
2.1.3 MDI
See AC/DC specifications in Section 12.4.7.
ReservedPin NameBall #Type
MDI0_p_0 A3A-
Inout
MDI0_n_0B3A-
Inout
MDI0_p_1A5A-
Inout
MDI0_n_1B5A-
Inout
MDI0_p_2A7A-
Inout
MDI0_n_2B7A-
Inout
Internal
Pup/Pdn
External
Pup/Pdn
Name and Function
Port 0 pair A+ of the line interface.
Connects to the Pair A+ input of the
transformer. On reset, set to high
impedance.
Port 0 pair A- of the line interface.
Connects to the Pair A- input of the
transformer. On reset, set to high
impedance.
Port 0 pair B+ of the line interface.
Connects to the Pair B+ input of the
transformer. On reset, set to high
impedance.
Port 0 pair B- of the line interface.
Connects to the Pair B- input of the
transformer. On reset, set to high
impedance.
Port 0 pair C+ of the line interface.
Connects to the Pair C+ input of the
transformer. On reset, set to high
impedance.
Port 0 pair C- of the line interface.
Connects to the Pair C- input of the
transformer. On reset, set to high
impedance.
MDI0_p_3A9A-
MDI0_n_3B9A-
MDI0_p_4A11A-
MDI0_n_4B11A-
MDI1_p_0
1
A22A-
Inout
Inout
Inout
Inout
Inout
Port 0 pair D+ of the line interface.
Connects to the Pair D+ input of the
transformer. On reset, set to high
impedance.
Port 0 pair D- of the line interface.
Connects to the Pair D- input of the
transformer. On reset, set to high
impedance.
Port 0 Analog Test+. Connects to
the pair E+ input of the transformer.
Port 0 Analog Test-. Connects to the
pair E- input of the transformer.
Port 1 pair A+ of the line interface.
Connects to the Pair A+ input of the
transformer. On reset, set to high
impedance.
25
X540 10GBase-T Controller—Pin Interface
ReservedPin NameBall #Type
MDI1_n_0
1
B22A-
Inout
MDI1_p_1
1
A20A-
Inout
MDI1_n_1
1
B20A-
Inout
MDI1_p_2
1
A18A-
Inout
MDI1_n_2
1
B18A-
Inout
MDI1_p_3
1
A16A-
Inout
Internal
Pup/Pdn
External
Pup/Pdn
Name and Function
Port 1 pair A- of the line interface.
Connects to the Pair A- input of the
transformer. On reset, set to high
impedance.
Port 1 pair B+ of the line interface.
Connects to the Pair B+ input of the
transformer. On reset, set to high
impedance.
Port 1 pair B- of the line interface.
Connects to the Pair B- input of the
transformer. On reset, set to high
impedance.
Port 1 pair C+ of the line interface.
Connects to the Pair C+ input of the
transformer. On reset, set to high
impedance.
Port 1 pair C- of the line interface.
Connects to the Pair C- input of the
transformer. On reset, set to high
impedance.
Port 1 pair D+ of the line interface.
Connects to the Pair D+ input of the
transformer. On reset, set to high
impedance.
LAN Power Good. A transition from
low to high initializes the X540 into
operation.
2
Reserved.
Must be connected to pull-down
resistor.
3
Auxiliary Power Available. When
set, indicates that auxiliary power
is available and the X540 should
support D3
enabled to do so. This pin is
power state if
COLD
latched at the rising edge of
LAN_PWR_GOOD.
4
Main Power Good. Indicates that
platform main power is up. Must be
connected externally.
1
This pin is a strapping pin latched
at the rising edge of
LAN_PWR_GOOD or PE_RST_N or
In-Band PCIe Reset. If this pin is
not connected or driven high
during initialization, LAN 1 is
enabled. If this pin is driven low
during initialization, LAN 1 port is
disabled.
LAN0_DIS_NK23InPupPup
SEC_ENM1InPupPup
1
This pin is a strapping option pin
latched at the rising edge of
LAN_PWR_GOOD or PE_RST_N or
In-Band PCIe Reset. If this pin is
not connected or driven high
during initialization, LAN 0 is
enabled. If this pin is driven low
during initialization, LAN 0 port is
disabled.
When LAN 0 port is disabled
manageability is not functional and
it must not be enabled in the NVM
Control Word 1.
1
Enable/Disable for the internal
MACsec/IPSec engines.
34
Pin Interface—X540 10GBase-T Controller
ReservedPin NameBall # Type
THERM_D1_P
THERM_D1_N
PHY0_RVSLT3T/sPupNote
PHY1_RVSLV23T/sPupNote
1. Pup value should be considered as 10 K.
2. Pdn value should be considered as 10 K.
3. Connect AUX_PWR signal to Pup if AUX power is available. Connect Pdn if AUX power is not available. Pup/Pdn value should be
considered as 10 K.
4. Connect MAIN_PWR_OK signal to Main Power through Pup resistor. Pup value should be considered as 10 K.
5. For pin change order A, B, C, and D, connect PHY_RVSL signal to Pdn. For pin change order D, C, B, and A, connect PHY_RVSL
signal to Pup. Pup value should be considered as 10 K. Pdn value should be considered as 3.3 K.
G21
G22
AInout
AInout
Internal
Pup/Pdn
External
Pup/Pdn
5
5
Name and Function
Thermal Diode Reference. Can be
used to measure on-die
temperature.
Pin change order of MDI lanes port
0:
0b = Lane order A, B, C, D.
1b = Lane order D, C, B, A.
Pin change order of MDI lanes port
1:
0b = Lane order A, B, C, D.
1b = Lane order D, C, B, A.
2.1.11 JTAG
See AC specifications in Section 12.4.5.2.
ReservedPin NameBall # Type
TCKY22In-
TDIW22In-
TDOV22OutPup
TMSW21In-
TRST_NW23In-
1. Pdn value should be considered as 470 .
2. Pup value should be considered as 10 K.
3. Pup value should be considered as 3.3 K
Note:If the JTAG is disconnected, use the external pull-up or pull-down values
listed.
Only
Only
Only
Only
Internal
Pup/Pdn
PupPdn
PupPup
PupPup
PupPdn
External
Pup/Pdn
Name and Function
1
2
3
2
1
JTAG Clock Input.
JTAG Data Input.
JTAG Data Output.
JTAG TMS Input.
JTAG Reset Input. Active low reset
for the JTAG port.
PCIe is an I/O architecture that enables cost competitive solutions as well as provide
industry leading price/performance and feature richness. It is an industry-driven specification.
PCIe defines a basic set of requirements that addresses the majority of the targeted
application classes. Higher-end applications’ requirements (Enterprise class servers and
high-end communication platforms) are addressed by a set of advanced extensions that
compliment the baseline requirements.
To guarantee headroom for future applications, PCIe provides a software-managed
mechanism for introducing new, enhanced capabilities.
Figure 3-1 shows the PCIe architecture.
Figure 3-1 PCIe Stack Structure
39
X540 10GBase-T Controller—Interconnects
The PCIe physical layer consists of a differential transmit pair and a differential receive
pair. Full-duplex data on these two point-to-point connections is self-clocked such that no
dedicated clock signals are required. The bandwidth of this interface increases in direct
proportion with frequency increases.
The packet is the fundamental unit of information exchange and the protocol includes a
message space to replace a variety of side-band signals found on previous interconnects.
This movement of hard-wired signals from the physical layer to messages within the
transaction layer enables easy and linear physical layer width expansion for increased
bandwidth.
The common base protocol uses split transactions along with several mechanisms to
eliminate wait states and to optimize the re-ordering of transactions to further improve
system performance.
3.1.1.1 Architecture, Transaction and Link Layer
Properties
• Split transaction, packet-based protocol
• Common flat address space for load/store access (for example, PCI addressing
model)
— 32-bit memory address space to enable a compact packet header (must be used
to access addresses below 4 GB)
— 64-bit memory address space using an extended packet header
• Transaction layer mechanisms:
— PCI-X style relaxed ordering
— Optimizations for no-snoop transactions
• Credit-based flow control
• Packet sizes/formats:
— Maximum packet size: 512 bytes
— Maximum read request size: 2 KB
• Reset/initialization:
— Frequency/width/profile negotiation performed by hardware
• Data integrity support
— Using CRC-32 for Transaction layer Packets (TLP)
• Link Layer Retry (LLR) for recovery following error detection
— Using CRC-16 for Link Layer (LL) messages
• No retry following error detection
— 8b/10b encoding with running disparity
40
Interconnects—X540 10GBase-T Controller
• Software configuration mechanism:
— Uses PCI configuration and bus enumeration model
— PCIe-specific configuration registers mapped via PCI extended capability
mechanism
• Baseline messaging:
— In-band messaging of formerly side-band legacy signals (interrupts, etc.)
— System-level power management supported via messages
• Power management:
— Full support for PCIm
— Wake capability from D3cold state
— Compliant with ACPI, PCIm software model
— Active state power management
• Support for PCIe Gen 1 v2.0 (2.5GT/s) or PCIe Gen2 v1.0 (5GT/s)
— Support for completion time out control
3.1.1.2 Physical Interface Properties
• Point-to-point interconnect
— Full-duplex; no arbitration
• Signaling technology:
— Low Voltage Differential (LVD)
— Embedded clock signaling using 8b/10b encoding scheme
• Serial frequency of operation: PCIe Gen 1 v2.0 (2.5GT/s) or PCIe Gen2 v1.0 (5GT/s)
• Interface width of 1, 2, 4, or 8 PCIe lanes
• DFT and DFM support for high-volume manufacturing
3.1.1.3 Advanced Extensions
PCIe defines a set of optional features to enhance platform capabilities for specific usage
modes. The X540 supports the following optional features:
• Advanced Error Reporting (AER) — Messaging support to communicate multiple
types/severity of errors
• Device Serial Number — Allows exposure of a unique serial number for each device
• Alternative RID Interpretation (ARI) — allows support of more than eight functions
per device
• Single Root I/O Virtualization (SR-IOV) — allows exposure of virtual functions
controlling a subset of the resources to Virtual Machines (VMs)
41
3.1.2 General Functionality
3.1.2.1 Native/Legacy
All the X540 PCI functions are native PCIe functions.
3.1.2.2 Locked Transactions
The X540 does not support locked requests as a target or a master.
3.1.3 Host Interface
PCIe device numbers identify logical devices within the physical device (the X540 is a
physical device). The X540 implements a single logical device with two separate PCI
functions: LAN 0 and LAN 1. The device number is captured from each type 0 configuration write transaction.
X540 10GBase-T Controller—Interconnects
Each of the PCIe functions interfaces with the PCIe unit through one or more clients. A
client ID identifies the client and is included in the Tag field of the PCIe packet header.
Completions always carry the tag value included in the request to enable routing of the
completion to the appropriate client.
3.1.3.1 TAG ID Allocation
Tag IDs are allocated differently for read and write as detailed in the following sections.
3.1.3.1.1 TAG ID Allocation for Read Transactions
Table 3-1 lists the Tag ID allocation for read accesses. The Tag ID is used by hardware in
order to be able to forward the read data to the required internal client.
Table 3-1 TAG ID Allocation Table for Read Transactions
• System type: Legacy DCA versus DCA 1.0 (DCA_CTRL.DCA_MODE)
• CPU ID (DCA_RXCTRL.CPUID or DCA_TXCTRL.CPUID)
Case 1 — DCA Disabled in the System:
The following table lists the write requests tags:
Tag IDDescription
2Write-back descriptor Tx /write-back head.
4Write-back descriptor Rx.
6Write data.
Case 2 — DCA Enabled in the System, but Disabled for the Request:
• Legacy DCA platforms — If DCA is disabled for the request, the tags allocation is
identical to the case where DCA is disabled in the system (refer to the previous
table).
• DCA 1.0 platforms — All write requests have the tag of 0x00.
Case 3 — DCA Enabled in the System, DCA Enabled for the Request:
• Legacy DCA Platforms: the request tag is constructed as follows:
— Bit[0] — DCA Enable = 1b
— Bits[3:1] — The CPU ID field taken from the CPUID[2:0] bits of the DCA_RXCTRL
or DCA_TXCTRL registers
— Bits[7:4] — Reserved
• DCA 1.0 Platforms: the request tag (all eight bits) is taken from the CPU ID field of
the DCA_RXCTRL or DCA_TXCTRL registers
43
X540 10GBase-T Controller—Interconnects
3.1.3.2 Completion Timeout Mechanism
In any split transaction protocol, there is a risk associated with the failure of a requester
to receive an expected completion. To enable requesters to attempt recovery from this
situation in a standard manner, the completion timeout mechanism is defined.
The completion timeout mechanism is activated for each request that requires one or
more completions when the request is transmitted. The X540 provides a programmable
range for the completion timeout, as well as the ability to disable the completion timeout
altogether. The completion timeout is programmed through an extension of the PCIe
capability structure.
The X540’s reaction to a completion timeout is listed in Table 3-9.
The X540 controls the following aspects of completion timeout:
• Disabling or enabling completion timeout
• Disabling or enabling resending a request on completion timeout
• A programmable range of timeout values
• Programming the behavior of completion timeout is listed in Table 3-2. Note that
system software can configure a completion timeout independently per each LAN
function.
Table 3-2 Completion Timeout Programming
CapabilityProgramming Capability
Completion Timeout EnablingControlled through PCI configuration. Visible through a read-only CSR bit.
Resend Request EnableLoaded from the NVM into a R/W CSR bit.
Completion Timeout PeriodControlled through PCI configuration.
Completion Timeout Enable — Programmed through the PCI configuration space. The
default is: Completion Timeout Enabled.
Resend Request Enable — The Completion Timeout Resend NVM bit (loaded to the
Completion_Timeout_Resend bit in the PCIe Control Register (GCR) enables resending
the request (applies only when completion timeout is enabled). The default is to resend a
request that timed out.
3.1.3.2.1 Completion Timeout Period
Programmed through the PCI configuration. Visible through the
Completion_Timeout_Value bits in the GCR. The X540 supports all four ranges defined by
PCIe Gen 1 v2.0 (2.5GT/s) or PCIe Gen2 v1.0 (5GT/s):
• 50 μs to 10 ms
• 10 ms to 250 ms
• 250 ms to 4 s
44
• 4 s to 64 s
Interconnects—X540 10GBase-T Controller
System software programs a range (one of nine possible ranges that sub-divide the four
previous ranges) into the PCI configuration register. The supported sub-ranges are:
• 50 μs to 50 ms (default).
• 50 μs to 100 μs
• 1 ms to 10 ms
• 16 ms to 55 ms
• 65 ms to 210 ms
• 260 ms to 900 ms
• 1 s to 3.5 s
• 4 s to 13 s
• 17 s to 64 s
A memory read request for which there are multiple completions are considered
completed only when all completions have been received by the requester. If some, but
not all, requested data is returned before the completion timeout timer expires, the
requestor is permitted to keep or to discard the data that was returned prior to timer
expiration.
3.1.4 Transaction Layer
The upper layer of the PCIe architecture is the transaction layer. The transaction layer
connects to the X540's core using an implementation-specific protocol. Through this
core-to-transaction-layer protocol, the application-specific parts of the X540 interact with
the PCIe subsystem and transmits and receives requests to or from the remote PCIe
agent, respectively.
3.1.4.1 Transaction Types Accepted by the X540
Table 3-3 Transaction Types Accepted by the Transaction Layer
Transaction TypeFC Type
Configuration Read RequestNPHCPLH + CPLDRequester ID, TAG, attribute Configuration space
Configuration Write RequestNPH + NPDCPLHRequester ID, TAG, attribute Configuration space
Memory Read RequestNPHCPLH + CPLDRequester ID, TAG, attribute CSR space
Memory Write RequestPH + PD--CSR space
IO Read RequestNPHCPLH + CPLDRequester ID, TAG, attribute CSR space
IO Write RequestNPH + NPDCPLHRequester ID, TAG, attribute CSR space
Read CompletionsCPLH + CPLD--DMA
MessagePH--Message unit (PM)
Tx Layer
Reaction
Hardware Should Keep Data
From Original Packet
For Client
Flow Control Types Legend:
45
X540 10GBase-T Controller—Interconnects
CPLD — Completion Data Payload
CPLH — Completion Headers
NPD — Non-Posted Request Data Payload
NPH — Non-Posted Request Headers
PD — Posted Request Data Payload
PH — Posted Request Headers
3.1.4.2 Transaction Types Initiated by the X540
Table 3-4 Transaction Types Initiated by the Transaction Layer
Note:MAX_PAYLOAD_SIZE is loaded from the NVM (up to 512 bytes). Effective
MAX_PAYLOAD_SIZE is defined for each PCI function according to the
configuration space register for that function.
3.1.4.2.1 Data Alignment
Note:Requests must never specify an address/length combination that causes a
memory space access to cross a 4 KB boundary.
The X540 breaks requests into 4 KB-aligned requests (if needed). This does not pose any
requirement on software. However, if software allocates a buffer across a 4 KB boundary,
hardware issues multiple requests for the buffer. Software should consider aligning
buffers to a 4 KB boundary in cases where it improves performance.
The general rules for packet alignment are as follows. Note that these apply to all the
X540 requests (read/write, snoop and no snoop):
• The length of a single request does not exceed the PCIe limit of MAX_PAYLOAD_SIZE
for write and MAX_READ_REQ for read.
• The length of a single request does not exceed the X540 internal limitations.
• A single request does not span across different memory pages as noted by the 4 KB
boundary alignment previously mentioned.
If a request can be sent as a single PCIe packet and still meet the general rules for
packet alignment, then it is not broken at the cache line boundary but rather sent as a
46
Interconnects—X540 10GBase-T Controller
single packet (the intent is that the chipset can break the request along cache line
boundaries, but the X540 should still benefit from better PCIe use). However, if any of
the three general rules require that the request is broken into two or more packets, then
the request is broken at the cache line boundary.
3.1.4.2.2 Multiple Tx Data Read Requests (MULR)
The X540 supports 16 multiple pipelined requests for transmit data. In general, requests
can belong to the same packet or to consecutive packets. However, the following
restrictions apply:
• All requests for a packet must be issued before a request is issued for a consecutive
packet.
• Read requests can be issued from any of the supported queues, as long as the
previous restriction is met. Pipelined requests can belong to the same queue or to
separate queues. However, as previously noted, all requests for a certain packet are
issued (from the same queue) before a request is issued for a different packet
(potentially from a different queue).
• The PCIe specification does not insure that completions for separate requests return
in-order. Read completions for concurrent requests are not required to return in the
order issued. The X540 handles completions that arrive in any order. Once all
completions arrive for a given request, it can issue the next pending read data
request.
• The X540 incorporates a reorder buffer to support re-ordering of completions for all
issued requests. Each request/completion can be up to 512 bytes long. The
maximum size of a read request is defined as the minimum {2 KB bytes,
MAX_READ_REQ}.
• In addition to the transmit data requests, the X540 can issue eight pipelined read
requests for Tx descriptors and eight pipelined read requests for Rx descriptors. The
requests for Tx data, Tx descriptors, and Rx descriptors are independently issued.
3.1.4.3 Messages
3.1.4.3.1 Received Messages
• Message packets are special packets that carry a message code. The upstream
device transmits special messages to the X540 by using this mechanism. The
transaction layer decodes the message code and responds to the message
accordingly.
Table 3-5 Supported Message in the X540 (as a Receiver)
Message
Code [7:0]
0x14100bPM_Active_State_NAKInternal signal set.
0x19011bPME_Turn_OffInternal signal set.
0x50100bSlot power limit support (has one Dword data)Silently drop.
0x7E010b, 011b, 100bVendor_defined type 0 No dataUnsupported request.
Routing r2r1r0MessageX540 Later Response
47
X540 10GBase-T Controller—Interconnects
Message
Code [7:0]
0x7E010b, 011b, 100bVendor_defined type 0 dataUnsupported request.
0x7F010b, 011b, 100bVendor_defined type 1 no dataSilently drop.
0x7F010b, 011b, 100bVendor_defined type 1 dataSilently drop.
0x00011bUnlockSilently drop.
Routing r2r1r0MessageX540 Later Response
3.1.4.3.2 Transmitted Messages
The transaction layer is also responsible for transmitting specific messages to report
internal/external events (such as interrupts and PMEs).
Table 3-6 Supported Message in X540 (as a Transmitter)
Message code
[7:0]
0x20100bAssert INT A
0x21100bAssert INT B
0x22100bAssert INT C
0x23100bAssert INT D
0x24100bDe- Assert INT A
0x25100bDe- Assert INT B
0x26100bDe- Assert INT C
0x27100bDe- Assert INT D
0x30000bERR_COR
0x31000bERR_NONFATAL
0x33000bERR_FATAL
0x18000bPM_PME
0x1B101bPME_TO_Ack
Routing r2r1r0Message
3.1.4.4 Ordering Rules
The X540 meets the PCIe ordering rules by following the PCI simple device model:
1. Deadlock Avoidance – The X540 meets the PCIe ordering rules that prevent
deadlocks:
a. Posted writes overtake stalled read requests. This applies to both target and
master directions. For example, if master read requests are stalled due to lack of
credits, master posted writes are allowed to proceed. On the target side, it is
acceptable to timeout on stalled read requests in order to allow later posted writes
to proceed.
b. Target posted writes overtake stalled target configuration writes.
c. Completions overtake stalled read requests. This applies to both target and master
directions. For example, if master read requests are stalled due to lack of credits,
completions generated by the X540 are allowed to proceed.
48
Interconnects—X540 10GBase-T Controller
2. Descriptor/Data Ordering — The X540 insures that a Rx descriptor is written back on
PCIe only after the data that the descriptor relates to is written to the PCIe link.
3. MSI and MSI-X Ordering Rules – System software might change the MSI or MSI-X
tables during run-time. Software expects that interrupt messages issued after the
table has been updated are using the updated contents of the tables.
a. Since software doesn’t know when the tables are actually updated in the X540, a
common scheme is to issue a read request to the MSI or MSI-X table (a PCI
configuration read for MSI and a memory read for MSI-X). Software expects that
any message issued following the completion of the read request, is using the
updated contents of the tables.
b. Once an MSI or MSI-X message is issued using the updated contents of the
interrupt tables, any consecutive MSI or MSI-X message does not use the contents
of the tables prior to the change.
4. The X540 meets the rules relating to independence between target and master
accesses:
a. The acceptance of a target posted request does not depend upon the transmission
of any TLP.
b. The acceptance of a target non-posted request does not depend upon the
transmission of a non-posted request.
c. Accepting a completion does not depend upon the transmission of any TLP.
3.1.4.4.1 Out of Order Completion Handling
In a split transaction protocol, when using multiple read requests in a multi-processor
environment, there is a risk that completions for separate requests arrive from the host
memory out of order and interleaved. In this case, the X540 sorts the completions and
transfers them to the network in the correct order.
Note:Completions for separate read requests are not guaranteed to return in
order. Completions for the same read request are guaranteed to return in
address order.
3.1.4.5 Transaction Definition and Attributes
3.1.4.5.1 Max Payload Size
The X540's policy for determining Max Payload Size (MPS) is as follows:
1. Master requests initiated by the X540 (including completions) limit MPS to the value
defined for the function issuing the request.
2. Target write accesses to the X540 are accepted only with a size of one Dword or two
Dwords. Write accesses in the range from three Dwords to MPS are flagged as UR
(Unsupported Request) Write accesses above MPS are flagged as malformed.
3.1.4.5.2 Traffic Class (TC) and Virtual Channels (VCs)
The X540 only supports TC = 0 and VC = 0 (default).
49
3.1.4.5.3 Relaxed Ordering
The X540 takes advantage of the relaxed ordering rules in PCIe. By setting the relaxed
ordering bit in the packet header, the X540 enables the system to optimize performance
in the following cases:
1. Relaxed ordering for descriptor and data reads — When the X540 masters a read
transaction, its split completion has no ordering relationship with the writes from the
CPUs (same direction). It should be allowed to bypass the writes from the CPUs.
2. Relaxed ordering for receiving data writes — When the X540 masters receive data
writes, it also enables them to bypass each other in the path to system memory
because software does not process this data until their associated descriptor writes
are done.
3. The X540 cannot relax ordering for descriptor writes or an MSI write.
Relaxed ordering can be used in conjunction with the no-snoop attribute to enable the
memory controller to advance no-snoop writes ahead of earlier snooped writes.
Relaxed ordering is enabled in the X540 by clearing the CTRL_EXT.RO_DIS bit. The actual
setting of relaxed ordering is done for LAN traffic by the host through the DCA registers.
3.1.4.5.4 No Snoop
X540 10GBase-T Controller—Interconnects
Note:The X540 enables the No Snoop feature by default after power on. The No
Snoop feature must be disabled during Rx flow software initialization if
there is no intention to use it. To disable No Snoop, the CTRL_EXT.NS_DIS
bit should be set to 1b.
The X540 sets the Snoop Not Required attribute for master data writes. System logic can
provide a separate path into system memory for non-coherent traffic. The non-coherent
path to system memory provides a higher, more uniform, bandwidth for write requests.
Note:The Snoop Not Required attribute does not alter transaction ordering.
Therefore, to achieve the maximum benefit from Snoop Not Required
transactions, it is advisable to set the relaxed ordering attribute as well
(assuming that system logic supports both attributes). In fact, some
chipsets require that relaxed ordering is set for no-snoop to take effect.
No snoop is enabled in the X540 by clearing the CTRL_EXT.NS_DIS bit. The actual setting
of no snoop is done for LAN traffic by the host through the DCA registers.
3.1.4.5.5 No Snoop and Relaxed Ordering for LAN Traffic
Software can configure no-snoop and relax order attributes for each queue and each type
of transaction by setting the respective bits in the DCA_RXCTRL and TCA_TXCTRL
registers.
Table 3-7 lists the default behavior for the No-Snoop and Relaxed Ordering bits for LAN
traffic when I/OAT 2 is enabled.
50
Interconnects—X540 10GBase-T Controller
Table 3-7 LAN Traffic Attributes
TransactionNo Snoop Default
Rx Descriptor ReadNY
Rx Descriptor Write-BackNNRead-only. Must never be used for this
Rx Data WriteYYSee note and the section that follows.
Tx Descriptor ReadNY
Tx Descriptor Write-BackNY
Tx Data ReadNY
Relaxed Ordering
Default
Comments
traffic.
Note:RX payload no-snoop is also conditioned by the NSE bit in the receive
descriptor (RDESC.NSE).
No-Snoop Option for Payload
Under certain conditions, which occur when I/OAT 2 is enabled, software knows that it is
safe to transfer a new packet into a certain buffer without snooping on the FSB. This
scenario occurs when software is posting a receive buffer to hardware that the CPU has
not accessed since the last time it was owned by hardware. This might happen if the data
was transferred to an application buffer by the data movement engine. In this case,
software should be able to set a bit in the receive descriptor indicating that the X540
should perform a no-snoop transfer when it eventually writes a packet to this buffer.
When a no-snoop transaction is activated, the TLP header has a no-snoop attribute in the
Transaction Descriptor field. This is triggered by the NSE bit in the receive descriptor.
3.1.4.6 Flow Control
3.1.4.6.1 Flow Control Rules
The X540 only implements the default Virtual Channel (VC0). A single set of credits is
maintained for VC0.
Table 3-8 Flow Control Credits Allocation
Credit TypeOperationsNumber of Credits (Dual Port)
Posted Request Header (PH)Target write
Message (one unit)
Posted Request Data (PD)Target Write (Length/16 bytes = one)
16 credit units to support tail write at wire speed.
max{MAX_PAYLOAD_SIZE/16, 32}.
Four credit units (to enable concurrent target
accesses to both LAN ports).
51
X540 10GBase-T Controller—Interconnects
Rules for FC updates:
• The X540 maintains two credits for NPD at any given time. It increments the credit
by one after the credit is consumed, and sends an UpdateFC packet as soon as
possible. UpdateFC packets are scheduled immediately after a resource is available.
• The X540 provides 16 credits for PH (such as for concurrent target writes) and four
credits for NPH (such as for four concurrent target reads). UpdateFC packets are
scheduled immediately after a resource is available.
• The X540 follows the PCIe recommendations for frequency of UpdateFC FCPs.
3.1.4.6.2 Upstream Flow Control Tracking
The X540 issues a master transaction only when the required flow control credits are
available. Credits are tracked for posted, non-posted, and completions (the later to
operate against a switch).
3.1.4.6.3 Flow Control Update Frequency
In all cases, UpdateFC packets are scheduled immediately after a resource is available.
When the link is in the L0 or L0s link state, Update FCPs for each enabled type of non-
infinite flow control credit must be scheduled for transmission at least once every 30 μs
(-0% /+50%), except when the Extended Sync bit of the Control Link register is set, in
which case the limit is 120 μs (-0% /+50%).
3.1.4.6.4 Flow Control Timeout Mechanism
The X540 implements the optional flow control update timeout mechanism.
The mechanism is active when the link is in L0 or L0s link state. It uses a timer with a
limit of 200 μs (-0% /+50%), where the timer is reset by the receipt of any Init or
Update FCP. Alternately, the timer can be reset by the receipt of any DLLP.
Upon timer expiration, the mechanism instructs the PHY to retrain the link (via the LTSSM
recovery state).
3.1.5 Link Layer
3.1.5.1 ACK/NAK Scheme
The X540 supports two alternative schemes for ACK/NAK rate:
• ACK/NAK is scheduled for transmission following any TLP.
52
• ACK/NAK is scheduled for transmission according to timeouts specified in the PCIe
specification.
Interconnects—X540 10GBase-T Controller
3.1.5.2 Supported DLLPs
The following DLLPs are supported by the X540 as a receiver:
• ACK
• NAK
• PM_Request_Ack
• InitFC1-P
• InitFC1-NP
• InitFC1-Cpl
• InitFC2-P
• InitFC2-NP
• InitFC2-Cpl
• UpdateFC-P
• UpdateFC-NP
• UpdateFC-Cpl
The following DLLPs are supported by the X540 as a transmitter:
• ACK
• NAK
• PM_Enter_L1
• PM_Enter_L23
• InitFC1-P
• InitFC1-NP
• InitFC1-Cpl
• InitFC2-P
• InitFC2-NP
• InitFC2-Cpl
• UpdateFC-P
• UpdateFC-NP
Note:UpdateFC-Cpl is not sent because of the infinite FC-Cpl allocation.
3.1.5.3 Transmit End Data Bit (EDB) Nullifying — End
Bad
If retrain is necessary, there is a need to guarantee that no abrupt termination of the Tx
packet happens. For this reason, early termination of the transmitted packet is possible.
This is done by appending the EDB to the packet.
53
3.1.6 Physical Layer
3.1.6.1 Link Speed
The X540 supports PCIe Gen 1 v2.0 (2.5GT/s) or PCIe Gen2 v1.0 (5GT/s). The following
configuration controls link speed:
• PCIe Supported Link Speeds bit — Indicates the link speeds supported by the X540.
Loaded from the PCIe Analog Configuration Module in the NVM, and could be set as
follows.
X540 10GBase-T Controller—Interconnects
NVM Word Offset
(Starting at Odd Word)
2*N+10x094MORIA6 register OFFSET (lower word).
2*N+20x00000x0100Disabling gen2 is controlled by setting bit[8] in this register. When the bit
Allow Gen 1
and Gen 2
(Default)
Force Gen 1
Setting
Description
is set, the X540 does not advertise gen 2 link-speed support.
• PCIe Current Link Speed bit — Indicates the negotiated link speed.
• PCIe Target Link Speed bit — used to set the target compliance mode speed when
software is using the Enter Compliance bit to force a link into compliance mode. The
default value is loaded from the highest link speed supported defined by the above
Supported Link Speeds.
The X540 does not initiate a hardware autonomous speed change.
The X540 supports entering compliance mode at the speed indicated in the Target Link
Speed field in the PCIe Link Control 2 register. Compliance mode functionality is
controlled via the PCIe Link Control 2 register.
3.1.6.2 Link Width
• The X540 supports a maximum link width of x8, x4, x2, or x1 as determined by the
"PCIe Analog Configuration" Module in the NVM and could be set as follow. Note that
these setting are not likely being needed in nominal operation:
NVM Word Offset
(starting at odd word)
2*N+10x094MORIA6 register OFFSET (lower word)
2*N+20x00000x00F00x00FC0x00FELanes can be disabled, by setting bits[7:0] in
Enable x8
setting
(Default)
Limit to x4
setting
Limit to x2
setting
Limit to x1
setting
Description
this offset. Having bit[X] set will cause laneX
to be disabled, resulting in narrower link
widths (bit per lane)
The maximum link width is loaded into the Max Link Width field of the PCIe Capability
register (LCAP[11:6]). Hardware default is the x8 link.
During link configuration, the platform and the X540 negotiate on a common link width.
The link width must be one of the supported PCIe link widths (x1, 2x, x4, x8), such that:
54
Interconnects—X540 10GBase-T Controller
• If Maximum Link Width = x8, then the X540 negotiates to either x8, x4, x2 or x1
• If Maximum Link Width = x4, then the X540 negotiates to either x4 or x1
• If Maximum Link Width = x1, then the X540 only negotiates to x1
When negotiating for x4, x2, or x1 link, the X540 may negotiate the link to reside
starting from physical lane 0 or starting from physical lane 4.
The X540 does not initiate a hardware autonomous link width change. However, it will
move to recovery if it detects a low reliability link, and will finally form a degraded link.
3.1.6.3 Polarity Inversion
If polarity inversion is detected, the receiver must invert the received data.
During the training sequence, the receiver looks at symbols 6-15 of TS1 and TS2 as the
indicators of lane polarity inversion (D+ and D- are swapped). If lane polarity inversion
occurs, the TS1 symbols 6-15 received are D21.5 as opposed to the expected D10.2.
Similarly, if lane polarity inversion occurs, symbols 6-15 of the TS2 ordered set are D26.5
as opposed to the expected 5 D5.2. This provides the clear indication of lane polarity
inversion.
3.1.6.4 L0s Exit Latency
1
The number of Fast Training Sequence (FTS) sequences (N_FTS) sent during L0s exit is
loaded from the NVM into an 8-bit read-only register.
3.1.6.5 Lane-to-Lane De-Skew
A multi-lane link can have many sources of lane-to-lane skew. Although symbols are
transmitted simultaneously on all lanes, they cannot be expected to arrive at the receiver
without lane-to-lane skew. The lane-to-lane skew can include components, which are less
than one bit time, bit time units (400/200 ps for 2.5/5 Gb), or full symbol time units (4/2
ns). This type of skew is caused by the retiming repeaters' insert/delete operations.
Receivers use TS1 or TS2 or Skip Ordered Sets (SOS) to perform link de-skew functions.
The X540 supports de-skew of up to 12 symbols time — 48 ns for PCIe Gen 1 v2.0
(2.5GT/s) and 24 ns for PCIe Gen2 v1.0 (5GT/s).
3.1.6.6 Lane Reversal
Auto lane reversal is supported by the X540 at its hardware default setting. The following
lane reversal modes are supported:
• Lane configurations x8, x4, x2, and x1
• Lane reversal in x8, x4, x2, and in x1
• Degraded mode (downshift) from x8 to x4 to x2 to x1 and from x4 to x2 to x1.
1. See restriction in Section 3.1.6.6.
55
Figure 3-2 through Figure 3-5 shows the lane downshift examples in both regular and
reversal connections as well as lane connectivity from a system level perspective.
Figure 3-2 Lane Downshift in an x8 Configuration
X540 10GBase-T Controller—Interconnects
56
Interconnects—X540 10GBase-T Controller
Figure 3-3 Lane Downshift in a Reversal x8 Configuration
Figure 3-4 Lane Downshift in a x4 Configuration
57
X540 10GBase-T Controller—Interconnects
Figure 3-5 Lane Downshift in an x4 Reversal Configuration
3.1.6.7 Reset
The PCIe PHY supplies the core reset to the X540. The reset can be caused by the
following events:
• Upstream move to hot reset — Inband Mechanism (LTSSM).
• Recovery failure (LTSSM returns to detect)
• Upstream component moves to disable.
3.1.6.8 Scrambler Disable
• The scrambler/de-scrambler functionality in the X540 can be eliminated by three
mechanisms:
• Upstream according to the PCIe specification
• NVM bit — Scram_dis
58
Interconnects—X540 10GBase-T Controller
3.1.7 Error Events and Error Reporting
3.1.7.1 General Description
PCIe defines two error reporting paradigms: the baseline capability and the Advanced
Error Reporting (AER) capability. The baseline error reporting capabilities are required of
all PCIe devices and define the minimum error reporting requirements. The AER
capability is defined for more robust error reporting and is implemented with a specific
PCIe capability structure. Both mechanisms are supported by the X540.
The SERR# Enable and the Parity Error bits from the Legacy Command register also take
part in the error reporting and logging mechanism.
In a multi-function device, PCIe errors that are not associated to any specific function
within the device are logged in the corresponding status and logging registers of all
functions in that device. These include the following cases of Unsupported Request (UR):
• A memory or I/O access that does not match any Base Address Register (BAR) for
any function
• Messages
• Configuration accesses to a non-existent function
Figure 3-6 shows, in detail, the flow of error reporting in the X540.
59
X540 10GBase-T Controller—Interconnects
Figure 3-6 Error Reporting Mechanism
60
Interconnects—X540 10GBase-T Controller
3.1.7.2 Error Events
Table 3-9 lists the error events identified by the X540 and the response in terms of
logging, reporting, and actions taken. Refer to the PCIe specification for the effect on the
PCI Status register.
Table 3-9 Response and Reporting of PCIe Error Events
Error is non-fatal (default case)
Send error message if advisory
Retry the request once and send advisory
error message on each failure
If fails, send uncorrectable error message
Error is defined as fatal
Send uncorrectable error message
Send completion with CA
Discard TLP
Receiver Behavior is Undefined
Receiver Behavior is Undefined
Drop the Packet, Free FC Credits
Completion with
Unsuccessful
Completion Status
No Action (already done
by originator of
completion)
Free FC Credits
3.1.7.3 Error Forwarding (TLP Poisoning)
If a TLP is received with an error-forwarding trailer, the packet is dropped and is not
delivered to its destination. The X540 then reacts as listed in Table 3-9.
The X540 does not initiate any additional master requests for that PCI function until it
detects an internal software reset for the associated LAN port. Software is able to access
device registers after such a fault.
System logic is expected to trigger a system-level interrupt to signal the operating
system of the problem. Operating systems can then stop the process associated with the
transaction, re-allocate memory to a different area instead of the faulty area, etc.
62
Interconnects—X540 10GBase-T Controller
3.1.7.4 End-to-End CRC (ECRC)
The X540 supports ECRC as defined in the PCIe specification. The following functionality
is provided:
• Inserting ECRC in all transmitted TLPs:
— The X540 indicates support for inserting ECRC in the ECRC Generation Capable
bit of the PCIe configuration registers. This bit is loaded from the ECRC Generation NVM bit.
— Inserting ECRC is enabled by the ECRC Generation Enable bit of the PCIe
configuration registers.
• ECRC is checked on all incoming TLPs. A packet received with an ECRC error is
dropped. Note that for completions, a completion timeout occurs later (if enabled),
which results in re-issuing the request.
— The X540 indicates support for ECRC checking in the ECRC Check Capable bit of
the PCIe configuration registers. This bit is loaded from the ECRC Check NVM bit.
— Checking of ECRC is enabled by the ECRC Check Enable bit of the PCIe
configuration registers.
• ECRC errors are reported
• System software can configure ECRC independently per each LAN function
3.1.7.5 Partial Read and Write Requests
Partial memory accesses
The X540 has limited support of read and write requests with only part of the byte enable
bits set:
• Partial writes with at least one byte enabled are silently dropped.
• Zero-length writes have no internal impact (nothing written, no effect such as clear-
by-write). The transaction is treated as a successful operation (no error event).
• Partial reads with at least one byte enabled are handled as a full read. Any side effect
of the full read (such as clear by read) is also applicable to partial reads.
• Zero-length reads generate a completion, but the register is not accessed and
undefined data is returned.
Note:The X540 does not generate an error indication in response to any of the
previous events.
Partial I/O accesses
• Partial access on address
— A write access is discarded
— A read access returns 0xFFFF
• Partial access on data, where the address access was correct
— A write access is discarded
— A read access performs the read
63
X540 10GBase-T Controller—Interconnects
3.1.7.6 Error Pollution
Error pollution can occur if error conditions for a given transaction are not isolated to the
error's first occurrence. If the PHY detects and reports a receiver error, to avoid having
this error propagate and cause subsequent errors at the upper layers, the same packet is
not signaled at the data link or transaction layers. Similarly, when the data link layer
detects an error, subsequent errors that occur for the same packet are not signaled at
the transaction layer.
3.1.7.7 Completion With Unsuccessful Completion
Status
A completion with unsuccessful completion status is dropped and not delivered to its
destination. The request that corresponds to the unsuccessful completion is retried by
sending a new request for undeliverable data.
3.1.7.8 Error Reporting Changes
The PCIe Rev. 1.0 specification defines two changes to advanced error reporting. A (new)
Role Based Error Reporting bit in the Device Capabilities register is set to 1b to indicate
that these changes are supported by the X540.
1. Setting the SERR# Enable bit in the PCI Command register also enables UR reporting
(in the same manner that the SERR# Enable bit enables reporting of correctable and
uncorrectable errors). In other words, the SERR# Enable bit overrides the
Unsupported Request Error Reporting Enable bit in the PCIe Device Control register.
2. Changes in the response to some uncorrectable non-fatal errors detected in nonposted requests to the X540. These are called Advisory Non-Fatal Error cases. For
each of the errors listed, the following behavior is defined:
— The Advisory Non-Fatal Error Status bit is set in the Correctable Error Status
register to indicate the occurrence of the advisory error and the Advisory Non-Fatal Error Mask corresponding bit in the Correctable Error Mask register is
checked to determine whether to proceed further with logging and signaling.
— If the Advisory Non-Fatal Error Mask bit is clear, logging proceeds by setting the
corresponding bit in the Uncorrectable Error Status register, based upon the
specific uncorrectable error that's being reported as an advisory error. If the
corresponding Uncorrectable Error bit in the Uncorrectable Error Mask register is
clear, the First Error Pointer and Header Log registers are updated to log the
error, assuming they are not still occupied by a previous unserviced error.
— An ERR_COR Message is sent if the Correctable Error Reporting Enable bit is set
in the Device Control register. An ERROR_NONFATAL message is not sent for this
error.
The following uncorrectable non-fatal errors are considered as advisory non-fatal errors:
64
• A completion with an Unsupported Request or Completer Abort (UR/CA) status that
signals an uncorrectable error for a non-posted request. If the severity of the UR/CA
error is non-fatal, the completer must handle this case as an advisory non-fatal error.
Interconnects—X540 10GBase-T Controller
• When the requester of a non-posted request times out while waiting for the
associated completion, the requester is permitted to attempt to recover from the
error by issuing a separate subsequent request or to signal the error without
attempting recovery. The requester is permitted to attempt recovery zero, one, or
multiple (finite) times, but must signal the error (if enabled) with an uncorrectable
error message if no further recovery attempt is made. If the severity of the
completion timeout is non-fatal, and the requester elects to attempt recovery by
issuing a new request, the requester must first handle the current error case as an
advisory non-fatal error.
• Receiving a poisoned TLP. See Section 3.1.7.3.
• When a receiver receives an unexpected completion and the severity of the
unexpected completion error is non-fatal, the receiver must handle this case as an
advisory non-fatal error.
3.1.8 Performance Monitoring
The X540 incorporates PCIe performance monitoring counters to provide common
capabilities to evaluate performance. The X540 implements four 32-bit counters to
correlate between concurrent measurements of events as well as the sample delay and
interval timers. The four 32-bit counters can also operate in a two 64-bit mode to count
long intervals or payloads. Software can reset, stop, or start the counters (all at the same
time).
Some counters operate with a threshold — the counter increments only when the
monitored event crossed a configurable threshold (such as the number of available
credits is below a threshold).
Counters operate in one of the following modes:
• Count mode — the counter increments when the respective event occurred
• Leaky Bucket mode — the counter increments only when the rate of events exceeded
a certain value. See Section 3.1.8.1.
The list of events supported by the X540 and the counters Control bits are described in
the PCIe Registers section.
3.1.8.1 Leaky Bucket Mode
Each of the counters can be configured independently to operate in a leaky bucket mode.
When in leaky bucket mode, the following functionality is provided:
• One of four 16-bit Leaky Bucket Counters (LBC) is enabled via the LBC Enable [3:0]
bits in the PCIe Statistic Control register #1.
• The LBC is controlled by the GIO_COUNT_START, GIO_COUNT_STOP,
GIO_COUNT_RESET bits in the PCIe Statistic Control register #1.
• The LBC increments every time the respective event occurs.
• The LBC is decremented every 1 s as defined in the LBC Timer field in the PCIe
Statistic Control registers.
65
• When an event occurs and the value of the LBC meets or exceeds the threshold
defined in the LBC Threshold field in the PCIe Statistic Control registers, the
respective statistics counter increments, and the LBC counter is cleared to zero.
3.2 SMBus
SMBus is a management interface for pass-through and/or configuration traffic between
an external Management Controller (MC) and the X540.
3.2.1 Channel Behavior
The SMBus specification defines the maximum frequency of the SMBus as 100 KHz.
However, the SMBus interface can be activated up to 400 KHz without violating any hold
and setup time.
SMBus connection speed bits define the SMBus mode. Also, SMBus frequency support
can be defined only from the NVM.
X540 10GBase-T Controller—Interconnects
3.2.2 SMBus Addressing
The SMBus is presented as two SMBus devices on the SMBus (two SMBus addresses). All
pass-through functionality is duplicated on the SMBus address, where each SMBus
address is connected to a different LAN port.
Note:Designers are not allowed to configure both ports to the same address.
When a LAN function is disabled, the corresponding SMBus address is not
presented to the MC.
The SMBus addresses are set using the SMBus 0 Slave Address and SMBus 1 Slave Address fields in the NVM.
Note:For the X540 single port configuration, the SMBus Single Port Mode bit
should be set in the NVM, and only the SMBus 0 Slave Address field is valid.
The SMBus addresses (those that are enabled from the NVM) can be re-assigned using
the SMBus ARP protocol.
Besides the SMBus address values, all the previously listed parameters of the SMBus
(SMBus channel selection, single port mode, and address enable) can be set only through
the NVM.
All SMBus addresses should be in Network Byte Order (NBO) with the most significant
byte first.
66
Interconnects—X540 10GBase-T Controller
3.2.3 SMBus Notification Methods
The X540 supports three methods of signaling the external MC that it has information
that needs to be read by the external MC:
• SMBus alert — Refer to Section 3.2.3.1.
• Asynchronous notify — Refer to Section 3.2.3.2.
• Direct receive — Refer to section Section 3.2.3.3.
The notification method that is used by the X540 can be configured from the SMBus
using the Receive Enable command. The default method is set from the Notification Method field in NVM word LRXEN1.
The following events cause the X540 to send a notification event to the external MC:
• Receiving a LAN packet, designated for the MC.
• Receiving a Request Status command from the MC that initiates a status response.
• The X540 is configured to notify the external MC upon status changes (by setting the
EN_STA bit in the Receive Enable Command) and one of the following events happen:
• TCO Command Aborted
• Link Status changed
• Power state change
• MACsec indication.
There can be cases where the external MC is hung and cannot not respond to the SMBus
notification. The X540 has a timeout value defined in the NVM (refer to Section 6.5.4.3)
to avoid hanging while waiting for the notification response. If the MC does not respond
until the timeout expires, the notification is de-asserted.
3.2.3.1 SMBus Alert and Alert Response Method
SMBALRT_N (SMBus Alert) is an additional SMBus signal that acts as an asynchronous
interrupt signal to an external SMBus master. The X540 asserts this signal each time it
has a message that it needs the external MC to read and if the chosen notification
method is the SMBus alert method.
Note:SMBALRT_N is an open-drain signal, which means that devices other than
the X540 can be connected to the same alert pin. The external MC requires
a mechanism to distinguish between the alert sources as follows:
The external MC responds to the alert by issuing an Alert Response Address (ARA) cycle
to detect the alert source device. The X540 responds to the ARA cycle (if it was the
SMBus alert source) and de-asserts the alert when the ARA cycle completes. Following
the ARA cycle, the MC issues a Read command to retrieve the the X540 message.
Note:Some MCs do not implement the ARA cycle transaction. These MCs respond
to an alert by issuing a Read command to the X540 (0xC0/0xD0 or 0xDE).
The X540 always responds to a Read command even if it is not the source of
the notification. The default response is a status transaction. If the X540 is
the source of the SMBus alert, it replies to the read transaction.
67
X540 10GBase-T Controller—Interconnects
The ARA cycle is an SMBus receive byte transaction to SMBus Address 0x18.
Note:The ARA transaction does not support PEC.
The alert response address transaction format is as follows:
17 11811
SARARdASlave Device AddressAP
0001 100001
Figure 3-7 SMBus ARA Cycle Format
3.2.3.2 Asynchronous Notify Method
When configured using the asynchronous notify method, the X540 acts as an SMBus
master and notifies the external MC by issuing a modified form of the write word
transaction. The asynchronous notify transaction SMBus address and data payload are
configured using the Receive Enable command or by using the NVM defaults (see
Section 6.5.3.20).
Note:The asynchronous notify is not protected by a PEC byte.
1711711
STarget AddressWrASending Device AddressA
MC Slave Address00Manageability Slave SMBus
Address
81 8 11
Data Byte LowAData Byte HighAP
Interface0Alert Value 0
00
Figure 3-8 Asynchronous Notify Command Format
3.2.3.3 Direct Receive Method
If configured, the X540 has the capability to send the message it needs to transfer to the
external MC, as a master over the SMBus instead of alerting the MC and waiting for it to
read the message.
The message format is shown in Figure 3-9. Note that the command that should be used
is the same command that should be used by the MC in the Block Read command and the
opcode that the X540 puts in the data is the same as it would have put in the Block Read
command of the same functionality. The rules for the F and L flags are also the same as
in the Block Read command.
68
Interconnects—X540 10GBase-T Controller
171111 61
STarget AddressWrAFLCommandA
First
MC Slave Address00
81 8118 11
Byte Count AData Byte 1AAData Byte NAP
N0000
Flag
Last
Flag
Receive TCO Command
Figure 3-9 Direct Receive Transaction Format
3.2.4 Receive TCO Flow
The X540 is used as a channel for receiving packets from the network link and passing
them to an external MC. The MC can configure the X540 to pass specific packets to the
MC (see Section 11.2). Once a full packet is received from the link and identified as a
manageability packet that should be transferred to the MC, the X540 starts the receive
TCO transaction flow to the MC.
The maximum SMBus fragment length is defined in the NVM (see Section 6.5.4.2). The
X540 uses the SMBus notification method to notify the MC that it has data to deliver. The
packet is divided into fragments, where the X540 uses the maximum fragment size
allowed in each fragment. The last fragment of the packet transfer is always the status of
the packet. As a result, the packet is transferred in at least two fragments. The data of
the packet is transferred in the receive TCO LAN packet transaction.
01 0000b
0
When SMBus Alert is selected as the MC notification method, the X540 notifies the MC on
each fragment of a multi-fragment packet.
When asynchronous notify is selected as the MC notification method, the X540 notifies
the MC only on the first fragment of a received packet. It is the MC's responsibility to
read the full packet including all the fragments.
Any timeout on the SMBus notification results in discarding of the entire packet. Any
NACK by the MC on one of the X540's receive bytes also causes the packet to be silently
discarded.
Since SMBus throughput is lower than the network link throughput, the X540 uses an 8
KB internal buffer per LAN port, which stores incoming packets prior to being sent over
the SMBus interface. The X540 services back-to-back management packets as long as
the buffer does not overflow.
The maximum size of the received packet is limited by the X540 hardware to 1536 bytes.
Packets larger then 1536 bytes are silently discarded. Any packet smaller than 1536
bytes is processed by the X540.
Note:When the RCV_EN bit is cleared, all receive TCO functionality is disabled
including packets directed to the MC as well as auto ARP processing.
69
3.2.5 Transmit TCO Flow
The X540 is used as a channel for transmitting packets from the external MC to the
network link. The network packet is transferred from the external MC over the SMBus,
and then, when fully received by the X540, is transmitted over the network link.
In dual-address mode, each SMBus address is connected to a different LAN port. When a
packet received in SMBus transactions using SMBus 0 Slave Address, it is transmitted to
the network using LAN port 0 and is transmitted through LAN port 1 if received on SMB
address 1. In single-address mode, the transmitted port is chosen according to the failover algorithm (see Section 11.2.2.2).
The X540 supports packets up to an Ethernet packet length of 1536 bytes. SMBus
transactions can be up to 240 bytes in length, which means that packets can be
transferred over the SMBus in more than one fragment. In each command byte there are
the F and L bits. When the F bit is set, it means that this is the first fragment of the
packet and L means that it is the last fragment of the packet (when both are set, it
means that the entire packet is in one fragment). The packet is sent over the network
link only after all its fragments have been received correctly over the SMBus.
The X540 calculates the L2 CRC on the transmitted packet, and adds its four bytes at the
end of the packet. Any other packet field (such as XSUM) must be calculated and inserted
by the external MC (the X540 does not change any field in the transmitted packet other
than adding padding and CRC bytes). If the packet sent by the MC is bigger than 1536
bytes, then the packet is silently discard by the X540.
X540 10GBase-T Controller—Interconnects
The minimum packet length defined by the 802.3 specification is 64 bytes. The X540
pads packets that are less than 64 bytes to meet the specification requirements (no need
for the external MC to do it). There is one exception, that is if the packet sent over the
SMBus is less than 32 bytes, the MC must pad it for at least 32 bytes. The passing bytes
value should be zero. Packets that are smaller then 32 bytes (including padding) are
silently discarded by the X540.
If the network link is down when the X540 has received the last fragment of the packet,
it silently discards the packet.
Note:Any link down event while the packet is being transferred over the SMBus
does not stop the operation, since the X540 waits for the last fragment to
end to see whether the network link is up again.
The transmit SMBus transaction is described in Section 11.7.2.1.
3.2.5.1 Transmit Errors in Sequence Handling
Once a packet is transferred over the SMBus from the MC to the X540 the F and L flags
should follow specific rules. The F flag defines that this is the first fragment of the packet,
and the L flag defines that the transaction contains the last fragment of the packet.
Table 3-10 lists the different option of the flags in transmit packet transactions.
70
Interconnects—X540 10GBase-T Controller
Table 3-10 SMBus Transmit Sequencing
PreviousCurrent Action/Notes
LastFirstAccept both.
Last Not First Error for current transaction. Current transaction is discarded and an abort status is asserted.
Not LastFirst Error for previous transaction. The previous transaction (until the previous first) is discarded. The
Not Last Not FirstThe X540 can process the current transaction.
current packet is processed.
No abort status is asserted.
Note:Since every other Block Write command in the TCO protocol has both the
First (F) and Last (L) flags on, they cause flushing any pending transmit
fragments that were previously received. As such, when running the TCO
transmit flow, no other Block Write transactions are allowed in between the
fragments.
3.2.5.2 TCO Command Aborted Flow
Bit 6 in first byte of the status returned from the X540 to the external MC indicates that
there was a problem with previous SMBus transactions or with the completion of the
operation requested in previous transaction.
The abort can be asserted due to any of the following reasons:
• Any error in the SMBus protocol (NACK, SMBus time-outs).
• Any error in compatibility due to required protocols to specific functionality (Rx
Enable command with byte count not 1/14 as defined in the command specification).
• If the X540 does not have space to store the transmit packet from the MC (in an
internal buffer) before sending it to the link. In this case, all transactions are
completed but the packet is discarded and the BMC is notified through the Abort bit.
• Error in First/Last bit sequence during multi-fragment transactions.
• The Abort bit is asserted after an internal reset to the X540 manageability unit.
Note:The abort in the status does not always imply that the last transaction of
the sequence was bad. There is a time delay between the time the status is
read from the X540 and the time the transaction has occurred.
3.2.6 Concurrent SMBus Transactions
Concurrent SMBus write transactions are not permitted. Once a transaction is started, it
must be completed before additional transaction can be initiated.
71
X540 10GBase-T Controller—Interconnects
3.2.7 SMBus ARP Functionality
The X540 supports the SMBus ARP protocol as defined in the SMBus 2.0 specification.
The X540 is a persistent slave address device when its SMBus address is valid after
power-up and loaded from the NVM. The X540 also supports all SMBus ARP commands
defined in the SMBus specification, both general and directed.
Note:SMBus ARP can be disabled through NVM configuration (See
Section 6.5.4.3).
3.2.7.1 SMBus ARP
the X540 responds as two SMBus devices, in which it has two sets of AR/AV flags — one
for each port. The X540 should respond twice to the SMBus-ARP master, one time for
each port. Both SMBus addresses are taken from the SMBus-ARP addresses word of the
NVM. The UDID is different between the two ports in the version ID field, which
represents the Ethernet MAC address, which is different between the two ports. It is
recommended for the X540 to first answer as port 0, and only when the address is
assigned, to answer as port 1 to the Get UDID command.
3.2.7.2 SMBus-ARP Flow
SMBus-ARP flow is based on the status of two AVs and ARs:
• Address Valid — This flag is set when the X540 has a valid SMBus address.
• Address Resolved — This flag is set when the X540 SMBus address is resolved:
SMBus address was assigned by the SMBus-ARP process.
Note:These flags are internal the X540 flags and not shown to external SMBus
devices.
Since the X540 is a Persistent SMBus Address (PSA) device, the AV flag is always set,
while the AR flag is cleared after power-up until the SMBus-ARP process completes. Since
AV is always set, it means that the X540 always has a valid SMBus address. The entire
SMBus ARP Flow is described in Figure 3-10.
When the SMBus master needs to start the SMBus-ARP process, it resets (in terms of
ARP functionality) all the devices on the SMBus, by issuing either Prepare to ARP or Reset
Device commands. When the X540 accepts one of these commands, it clears its AR flag
(if set from previous SMBus-ARP process), but not its AV flag (The current SMBus
address remains valid until the end of the SMBus ARP process).
A cleared AR flag means that the X540 answers the following SMBus ARP transactions
that are issued by the master. The SMBus master then issues a Get UDID command
(General or Directed), to identify the devices on the SMBus. The X540 responds to the
Directed command all the time, and to the General command only if its AR flag is not set.
After the Get UDID, the master assigns the X540 SMBus address, by issuing Assign
Address command. The X540 checks whether the UDID matches its own UDID, and if
there is a match it switches its SMBus address to the address assigned by the command
(byte 17). After accepting the Assign Address command, the AR flag is set, and from this
point (as long as the AR flag is set), the X540 does not respond to the Get UDID General
command, while all other commands should be processed even if the AR flag is set.
72
Interconnects—X540 10GBase-T Controller
After SMBus ARP is successfully carried out, the new address is stored in the NVM, and
will thus be the address used at the next power up.
Figure 3-10 SMBus-ARP Flow
3.2.8 Fairness Arbitration
When sending MCTP messages over SMBus and when fairness arbitration is enabled (see
Section 6.5.4.3), the X540 should respect the fairness arbitration as defined in section
The NC-SI interface in the X540 is a connection to an external MC.
The X540 NC-SI interface meets the NC-SI version 1.0.0 specification as a PHY-side
device.
3.3.1 Electrical Characteristics
The X540 complies with the electrical characteristics defined in the NC-SI specification.
3.3.2 NC-SI Transactions
Compatible with the NC-SI specification.
3.4 Non-Volatile Memory (NVM)
3.4.1 General Overview
The X540 uses a Flash device for storing product configuration information. The Flash is
divided into three general regions:
• Hardware Accessed — Loaded by the X540 hardware after power-up, PCI reset deassertion, D3 to D0 transition, or software reset. Different hardware sections in the
Flash are loaded at different events. For more details on power-up and reset
sequences, see Section 4.0.
• Firmware Area — Includes structures used by the firmware for management
configuration in its different modes.
• Software Accessed — This region is used by software entities such as LAN drivers,
option ROM software and tools, PCIe bus drivers, VPD software, etc.
3.4.2 Flash Device Requirements
The X540 merges the 82599 legacy EEPROM and Flash content in a single Flash device.
Flash devices require a sector erase instruction in case a cell is modified from 0b to 1b.
As a result, in order to update a single byte (or block of data) it is required to erase it
first. The X540 supports Flash devices with a sector erase size of 4 KB. Note that many
Flash vendors are using the term sector differently. The X540 Datasheet uses the term
Flash sector for a logic section of 4 KB.
74
Interconnects—X540 10GBase-T Controller
The X540 supports Flash devices that are either write-protected by default after powerup or not. The X540 is responsible to remove the protection by sending the writeprotection removal OpCode to the Flash after power up.
The following OpCodes are supported by the X540 as they are common to all supported
Flash devices:
1. Write Enable (0x06)
2. Read Status Register (0x05)
3. Write Status Register (0x01). The written data is 0x00 to cancel the Flash default
protection.
4. Read Data (0x03). Burst read is supported.
5. Byte/Page Program (0x02). To program 1 to 256 data bytes.
6. 4 KB Sector-Erase (0x20)
7. Chip-Erase (0xC7)
3.4.3 Shadow RAM
The X540 maintains the first two 4 KB sectors, Sector 0 and Sector 1, for the
configuration content. At least one of these two sectors must be valid at any given time
or else the X540 is set by hardware default. Following a Power On Reset (POR) the X540
copies the valid lower 4 KB sector of the Flash device into an internal shadow RAM. Any
further accesses of the software or firmware to this section of the NVM are directed to the
internal shadow RAM. Modifications made to the shadow RAM content are then copied by
the X540 into the other 4KB sector of the NVM, flipping circularly the valid sector
between sector 0 and 1 of NVM.
This mechanism provides the following advantage:
1. A seamless backward compatible read/write interface for software/firmware to the
first 4 KB of the NVM as if an external EEPROM device were connected. This interface
is referred as EEPROM-Mode access to the Flash.
2. A way for software to protect image-update procedure from power down events by
establishing a double-image policy. See Section 6.2.1.1 for a description of the
double-image policy. It relies on having pointers to all the other NVM modules
mapped in the NVM sector which is mirrored in the internal shadow RAM.
Figure 3-11 shows the shadow RAM mapping and interface.
75
X540 10GBase-T Controller—Interconnects
Figure 3-11 NVM Shadow RAM
Following a write access by software or firmware to the shadow RAM, the data should
finally be updated in the Flash as well. The X540 updates the Flash from the shadow RAM
when software/firmware requests explicitly to update the Flash by setting the FLUPD bit
in the EEC register. For saving Flash updates, it is expected that software/firmware set
the FLUPD bit only once it has completed their last write access to the Flash. The X540
then copies the content of the shadow RAM to the non-valid configuration sector and
makes it the valid one. The Flash update sequence handled by the device is listed in the
steps that follow:
1. Initiate sector erase instruction(s) to the non-valid sector, either sector 0 or sector 1
(the non-valid sector is defined by the inverse value of the SEC1VAL bit in the EEC
register).
2. Copy the shadow RAM to the non-valid sector, with the signature field present in NVM
Control Word 1 copied last.
3. Toggle the state of the SEC1VAL bit in the EEC register to indicate that the non-valid
sector became the valid one and visa versa.
4. Clear the signature field in the valid sector to make it invalid. Since a valid signature
is 01b, it is enough to program the bits to 00b, without issuing a sector erase
command to the Flash.
Note:Software should be aware that programming the Flash might require a long
latency due to the Flash update sequence handled by hardware. The sector
erase command by itself can last tens of s. Software must poll the
FLUDONE bit in the EEC register to check whether or not the Flash
programming completed.
76
Note:The X540 always effectively updates the Flash after any VPD write access
(no use of the EEC.FLUPD bit is required in this case).
Interconnects—X540 10GBase-T Controller
Note:Contents of the shadow RAM is reset only at LAN_PWR_GOOD events. It is
protected against an ECC error at the shell level in such a way that the
probability of an error is close to zero.
Note:Each time the Flash content is not valid (blank configuration sectors or
wrong signature on both sector 0 and 1) EEPROM access mode is turned off.
Software should rather use either the bit banging interface to the Flash
through FLA register or the memory mapped Flash BAR access.
3.4.4 NVM Clients and Interfaces
Note:Access to the NVM should be done exactly according to the flows described
in this section. Any read or write access to the NVM that does not follow
exactly to the rules and steps listed in this section might lead to unexpected
results.
There are several clients that can access the NVM to different address ranges via
different access modes, methods, and interfaces. The various clients to the NVM are
Software Tools (BIOS, etc.), Drivers, MC (via Firmware), and VPD Software.
Table 3-11 lists the different accesses to the NVM.
Table 3-11 Clients and Access Types to the NVM
NVM
Client
VPD SoftwareParallel (32-bits)EEPROM0x000000 -
SoftwareParallel (16-bits)EEPROM0x000000 -
SoftwareBit Banging (1-bit) Flash0x000000 -
Access
Method
Parallel (32-bits
read, 8-bits write)
NVM Access
Mode
Flash0x000000 -
Flash0x002000 -
Logical Byte
Address Range
0x000FFF
0x000FFF
0x001FFF
0xFFFFFF
0x001FFF
0x002000 0xFFFFFFF
Note:Firmware saves words like SMBus Slave Addresses or Signature, which are
saved into the NVM at the firmware’s initiative. Note that the VPD module
must be mapped to the first valid 4 KB sector.
NVM Access Interface (CSRs or Other)
VPD Address and Data registers, via shadow RAM
logic. Any write access is immediately pushed by
the X540 into the Flash. VPD module must be
located in the first valid Flash sector.
EERD, EEWR, via shadow RAM logic.
Memory mapped via BARs. Accessing this range
via Flash BAR should be avoided during normal
operation as it might cause non-coherency
between the Flash and the shadow RAM.
Memory mapped via BARs.
FLA. Accessing this range via bit-banging should
be avoided during normal operation as it might
cause non-coherency between the Flash and the
shadow RAM.
FLA
77
X540 10GBase-T Controller—Interconnects
3.4.4.1 Memory Mapped Host Interface
Using the legacy Flash transactions the Flash is read from, or written to, by the X540
each time the host CPU performs a read or a write operation to a memory location that is
within the Flash address mapping or upon boot via accesses in the space indicated by the
Expansion ROM Base Address register. Accesses to the Flash are based on a direct
decode of CPU accesses to a memory window defined in either:
• Memory CSR + Flash Base Address Register (PCIe Control Register at offset 0x10).
• The Expansion ROM Base Address Register (PCIe Control Register at offset 0x30).
• The X540 is responsible to map accesses via the Expansion ROM BAR to the physical
NVM. The offset in the NVM of the Expansion ROM module is defined by the PCIe
Expansion/Option ROM Pointer (Flash word address 0x05). This pointer is loaded by
the X540 from the Flash before enabling any access to the Expansion ROM memory
space.
— When modifying the PXE Driver Section Pointer in the NVM, it is required to issue
a PCIe reset on which the updated offset is sampled by the hardware.
— In case there is no valid NVM signature in the two first 4 KB sectors, then
expansion ROM BAR is disabled.
Note:The X540 controls accesses to the Flash when it decodes a valid access.
Attempt to out of range write access the PCIe Expansion/Option ROM
module (according to NVM size field in NVM Control Word 1) is ignored,
while read access would return value of 0xDEADBEAF. The X540 supports
only byte writes to the Flash.
Note:Flash read accesses are assembled by the X540 each time the access is
greater than a byte-wide access.
Note:The X540 byte reads or writes to the Flash take about 2-30 s time. The
device continues to issue retry accesses during this time.
Note:During normal operation, the host should avoid memory mapped accesses
to the first two 4 KB sectors of the Flash because it might be non-coherent
with the shadow RAM contents.
Caution:Flash BAR access while FLA.FL_REQ is asserted (and granted) is forbidden.
It can lead to a PCIe hang as a bit-banging access requires several PCIe
accesses.
3.4.4.2 CSR Mapped Host Interface
Software has bit banging or parallel accesses to the NVM or to the shadow RAM (refer to
Table 3-11) via registers in the CSR space. The X540 supports the following cycles on the
EEPROM-Mode provides a parallel interface to the first valid 4KB sector of the NVM, aka
base sector, which is agnostic to the Flash device type. It also minimizes excessive sector
erase cycles to the Flash device by coalescing an update of the whole base sector to a
single programming cycle.
78
Interconnects—X540 10GBase-T Controller
3.4.4.2.2 Bit Banging Host Interface
Software can access the Flash directly by using the Flash's 4-wire interface through the
Flash Access (FLA) register. It can use this for reads, writes, or other Flash operations
(accessing the Flash status register, erase, etc.).
3.4.4.3 MC Interface
The MC can access several fields in the NVM and/or shadow RAM via dedicated NC-SI
commands.
3.4.5 Flash Access Contention
Flash accesses initiated through the LAN "A" device and those initiated through the LAN
"B" device may occur during the same approximate size window. The X540 does not
synchronize between the two entities accessing the Flash so contentions caused from one
entity reading and the other modifying the same locations is possible.
To avoid such a contention between software LANs or between software and firmware
accesses, these entities are required to make use of the semaphore registers. Refer to
Section 11.7.5. Any read or write access to the NVM made by software/firmware must be
preceded by acquiring ownership over the NVM. This is also useful to avoid the timeout of
the PCIe transaction made to a memory mapped Flash address while the Flash is
currently busy with a long sector erase operation.
Two software entities could however not use the semaphore mechanism: BIOS and VPD
software.
• Since VPD software accesses only the VPD module, which is located in the first valid
sector of the NVM, VPD accesses are always performed against the shadow RAM first.
In this case, hardware must take/release ownership over the NVM as if it was the
originator of the Flash access. It is then hardware’s responsibility to update the NVM
according to the Flash update sequence described in Shadow RAM.
• No contention can occur between BIOS and any other software entity (VPD included)
as it accesses the NVM while the operating system is down.
• Contention between BIOS and firmware can however happen if a system reboot
occurs while the MC is accessing the NVM.
— If a system reboot is caused by a user pushing on the standby button, it is
required to route the wake-up signal from the standby button to the MC and not
to the chipset. The MC issues a system reboot signal to the chipset only after the
NVM write access completes. Firmware is responsible to poll whether the NVM
write has completed before sending the response to the MC NC-SI command.
— If a system reboot is issued by a local user on the host, there is no technical way
to avoid NVM access contention between BIOS and the MC to occur.
Caution:It is the user’s responsibility when accessing the NVM remotely via the MC
to make sure another user in not currently initiating a local host reboot
there.
79
X540 10GBase-T Controller—Interconnects
Note:The PHY auto-load process from the Flash device is made up of short read
bursts (32-bits) that can be inserted by hardware in between other NVM
clients’ accesses, at the lowest priority. It is the user’s responsibility to
avoid initiating PHY auto-load while updating the PHY NVM modules.
Note:The MAC auto-load from the Flash device itself occurs only after power-up
and before host or firmware can attempt to access the Flash. The host must
wait until PCIe reset is de-asserted (after ~1 sec, which is enough time for
the MAC auto-load to complete), and firmware starts its auto-load after the
EEC.AUTO_RD bit is asserted by hardware.
Note:Other MAC auto-load events are performed from the internal shadow RAM
which do not compete with memory mapped accesses to the Flash device.
During such MAC auto-load, accesses from other clients via EEPROM-Mode
registers are delayed until the auto-load process completes.
Note:Software and firmware should avoid holding Flash ownership (via the
dedicated semaphore bit) for more than 500 ms.
3.4.6 NVM Read, Write, and Erase Sequences
Refer to Section 6.2.1.1 to establish the required double-image policy prior to updating
any Flash module.
Any software or firmware flow described in this section (excepted for VPD and BIOS) shall
be preceded by taking NVM ownership via semaphores as described in Section 11.7.5.
3.4.6.1 Flash Erase Flow by the Host
1. Erase access to the Flash must first be enabled by clearing the FWE field in the EEC
register.
2. Poll the FL_BUSY flag in the FLA register until cleared.
3. Set the Flash Device Erase bit (FL_DER) in the FLA register or the Flash Sector Erase
bit (FL_SER) together with the Flash sector index to be erased (FL_SADDR).
4. Clear the erase enable by setting the FWE field to 01b in the EEC register to protect
the Flash device.
Note:Trying to erase a sector in the Flash device when writes are disable
(FWE=01b) cannot be performed by the X540.
Hardware gets the Erase command from FLA register and sends the corresponding Erase
command to the Flash. The erase process then finishes by itself. Software should wait for
the end of the erase process before any further access to the Flash. This can be checked
by polling the FLA.FL_BUSY bit.
80
Interconnects—X540 10GBase-T Controller
3.4.6.2 Software Flow to the Bit Banging Interface
To directly access the Flash, software should follow these steps:
1. Write a 1b to the Flash Request bit (FLA.FL_REQ).
2. Read the Flash Grant bit (FLA.FL_GNT) until it becomes 1b. It remains 0b as long as
there are other accesses to the Flash.
3. Write or read the Flash using the direct access to the 4-wire interface as defined in
the FLA register. The exact protocol used depends on the Flash placed on the board
and can be found in the appropriate datasheet.
4. Write a 0b to the Flash Request bit (FLA.FL_REQ).
5. Following a write or erase instruction, software should clear the Request bit only after
it has checked that the cycles were completed by the NVM. This can be checked by
reading the BUSY bit in the Flash device STATUS register. Refer to Flash datasheet for
the opcode to be used for reading the STATUS register.
Note:Bit Banging Interface is not expected to be used during nominal operation.
Software/firmware should rather use the EEPROM-Mode when accessing the
base sector and the Flash-Mode for other sectors.
Note:If software must use the Bit Banging Interface in nominal operation it
should adhere to the following rules:
•Gain access first to the Flash using the flow described in Section 11.7.5
•Minimize FLA.FL_REQ setting for a single byte/word/dword access or
other method that guarantee fast enough release of the FLA.FL_REQ.
3.4.6.3 Software Word Program Flow to the EEPROMMode Interface
Read Interface:
Software initiates a read cycle to the NVM via the EEPROM-mode by writing the address
to be read and the Start bit to the EERD register.
As a response, hardware executes the following steps:
1. The X540 reads the data from the shadow RAM.
2. Puts the data in Data field of the EERD register.
3. Sets the Done bit in the EERD register.
Note:Any word read this way is not loaded into the X540's internal registers. This
happens only at an hardware auto-load event.
Write Interface:
Software initiates a write cycle to the NVM via the EEPROM-mode as follows:
1. Poll the Done bit in the EEWR register until its set.
2. Write the data word, its address, and the Start bit to the EEWR register.
As a response, hardware executes the following steps:
1. The X540 writes the data to the shadow RAM.
81
X540 10GBase-T Controller—Interconnects
2. The X540 sets the Done bit in the EEWR register.
Note:In addition, the VPD area of the NVM can be accessed via the PCIe VPD
capability structure.
Note:EEPROM-Mode writes are performed into the internal shadow RAM.
Section 6.2.1.1 describes the procedure for copying the internal shadow
RAM content into the base sector of the Flash device.
3.4.6.4 Flash Program Flow via the Memory Mapped
Interface
Software initiates a write cycle via the Flash BAR as follows:
1. Enable Flash BAR writes by setting EEC.FWE to 10b.
2. Poll the FL_BUSY flag in the FLA register until cleared.
3. Write the data byte to the Flash through the Flash BAR.
4. Repeat the steps 2 and 3 if multiple bytes should be programmed.
5. Disable Flash BAR writes by setting EEC.FWE to 01b.
As a response, hardware executes the following steps for each write access:
1. Set the FL_BUSY bit in the FLA register.
2. Initiate autonomous write enable instruction.
3. Initiate the program instruction right after the enable instruction.
4. Poll the Flash status until programming completes.
5. Clear the FL_BUSY bit in the FLA register.
Note:Software must erase the sector prior to programming it.
3.4.7 Signature Field
The only way The X540 can tell if a Flash is present is by trying to read the Flash. The
X540 first reads the Control word at word address 0x000000 and at word address
0x000800. It then checks the signature value at bits 7 and 6 in both addresses.
If bit 7 is 0b and bit 6 is 1b in (at least) one of the two addresses, it considers the Flash
to be present and valid. It then reads the additional Flash words and programs its
internal registers based on the values read. Otherwise, it ignores the values it reads from
that location and does not read any other words.
If the signature bits are valid at both addresses the X540 assumes that the base sector
starts at address zero.
82
Interconnects—X540 10GBase-T Controller
3.4.8 Flash Recovery
The first two sectors of the Flash contains fields that if programmed incorrectly might
affect the functionality of the X540. The impact might range from an incorrect setting of
some function (like LED programming), via disabling of entire features (such as no
manageability) and link disconnection, to the inability to access the device via the regular
PCIe interface.
The X540 implements a mechanism that enables recovery from a faulty Flash no matter
what the impact is, using an SMBus message that instructs the firmware to invalidate the
first two sectors of the Flash.
This mechanism uses an SMBus message that the firmware is able to receive in all
modes, no matter what is in the content of the first two sectors of the Flash. After
receiving this message, firmware erases the first two sectors of the Flash that sets word
0x0 to 0xFF invalidating the signature BIOS or the operating system initiates a power
event to force a Flash auto-load process that fails and enables access to the device.
The firmware is programmed to receive such a command only from PCIe reset until one
of the functions changes its status from D0u to D0a. Once one of the functions moves to
D0a it can be safely assumed that the device is accessible to the host and there is no
further need for this function. This reduces the possibility of malicious software to use
this command as a back door and limits the time the firmware must be active in nonmanageability mode.The command is sent on a fixed SMBus address of 0xC8. The format
of the command is SMBus Block Write is as follows:
FunctionCommandData Byte
Release Flash 0xC70x12
Note:This solution requires a controllable SMBus connection to the X540.
Note:In case more than one the X540 is in a state to accept this solution, all of
the X540 devices connected to the same SMBus accept the command. The
devices in D0u state erase the first two sectors of the Flash.
After receiving a Release Flash command, firmware should keep its current state. It is the
responsibility of the user updating the Flash to send a firmware reset if required after the
entire Flash update process is done.
Data byte 0x12 is the LSB of the X540's default Device ID. The 82575, for example, uses
the same command but the data byte there is 0xAA.
An additional command is introduced to enable the write from the SMBus interface
directly into any MAC CSR register. The same rules as for the Release Flash command
that determine when the firmware accepts this command apply to this command as well.
The command is sent on a fixed SMBus address of 0xC8. The format of the command is
SMBus Block Write is as follows:
Config Address 2Config Address 1Config Address 0Config Data
MSB
… Config Data LSB
83
X540 10GBase-T Controller—Interconnects
The MSB in Configuration Address 2 indicates which port is the target of the access (0 or
1).
The X540 always enables the manageability block after power up. The manageability
clock is stopped only if the manageability function is disabled in the Flash and one of the
functions had transitioned to D0a; otherwise, the manageability block gets the clock and
is able to wait for the new command.
This command allows writing to any MAC or PHY CSR register as part of the Flash
recovery process. This command can be used to write to the Flash and update different
sections in it.
3.4.9 Flash Deadlock Avoidance
The Flash is a shared resource between the following clients:
1. Hardware auto-read.
2. LAN port 0 and LAN port 1 software accesses.
3. Manageability/firmware accesses.
4. Software tools.
All clients can access the Flash using parallel access, on which hardware implements the
actual access to the Flash. Hardware schedules these accesses, avoiding starvation of
any client.
However, the software and firmware clients can access the Flash using bit banging. In
this case, there is a request/grant mechanism that locks the Flash to the exclusive use of
one client. If one client is stuck without releasing the lock, the other clients can no longer
access the Flash. To avoid this deadlock, the X540 implements a timeout mechanism,
which releases the grant from a client that holds the Flash bit-bang interface
(FLA.FL_SCK bit) for more than 2 seconds. If any client fails to release the Flash
interface, hardware clears its grant enabling the other clients to use the interface.
Note:The bit banging interface does not guarantee fairness between the clients,
therefore it should be avoided in nominal operation as much as possible.
When write accesses to the Flash are required the software or
manageability should access the Flash one word at a time releasing the
interface after each word. Software and firmware should avoid holding the
Flash bit-bang interface for more than 500 ms.
The deadlock timeout mechanism is enabled by the Deadlock Timeout Enable bit in the
Control Word 2 in the Flash.
3.4.10 VPD Support
84
The Flash image can contain an area for VPD. This area is managed by the OEM vendor
and does not influence the behavior of hardware. Word 0x2F of the Flash image contains
a pointer to the VPD area in the Flash. A value of 0xFFFF means VPD is not supported and
the VPD capability does not appear in the configuration space.
Interconnects—X540 10GBase-T Controller
The maximal area size is 256 bytes but can be smaller. The VPD block is built from a list
of resources. A resource can be either large or small. The structure of these resources
are listed in the following tables.
Table 3-12 Small Resource Structure
Offset01 — n
Content Tag = 0xxx,xyyyb (Type = Small(0), Item Name = xxxx, length = yy bytes)Data
Table 3-13 Large Resource Structure
Offset01 — 23 — n
ContentTag = 1xxx,xxxxb (Type = Large(1), Item Name = xxxxxxxx)LengthData
The X540 parses the VPD structure during the auto-load process following PCIe reset in
order to detect the read only and read/write area boundaries. The X540 assumes the
following VPD fields with the limitations listed:
Table 3-14 VPD Structure
Tag
0x82Length of
0x90Length of RO
0x91Length of RW
0x78n/an/aEnd tag.
Length
(Bytes)
identifier string
area
area
DataResource Description
IdentifierIdentifier string.
RO dataVPD-R list containing one or more VPD keywords.
RW dataVPD-W list containing one or more VPD keywords. This part is optional.
VPD structure limitations:
• The structure must start with a Tag = 0x82. If the X540 does not detect a value of
0x82 in the first byte of the VPD area or the structure does not follow the description
of Table 3-14, it assumes the area is not programmed and the entire 256 bytes area
is read only.
• The RO area and RW area are both optional and can appear in any order. A single
area is supported per tag type. Refer to Appendix I in the PCI 3.0 specification for
details of the different tags.
• If a VPD-W tag is found, the area defined by its size is writable via the VPD structure.
• Both read and write sections on the VPD area must be Dword aligned. For example,
each tag must start on Dword boundaries and each data field must end on Dword
boundary. Write accesses to Dwords that are only partially in the read/write area are
ignored. VPD software is responsible to make the right alignment to allow a write to
the entire area.
• The structure must end with a Tag = 0x78. The tag must be word aligned.
85
X540 10GBase-T Controller—Interconnects
• The VPD area is accessible for read and write via the EEPROM-mode access only. The
VPD area can be accessed through the PCIe configuration space VPD capability
structure listed in Table 3-14. Write accesses to a read only area or any accesses
outside of the VPD area via this structure are ignored.
• VPD area must be mapped to the first valid 4 KB sector of the Flash.
• VPD software does not check the semaphores before attempting to access the Flash
via dedicated VPD registers. Even if the Flash is owned by another entity, VPD
software read access directed to the VPD area in the Flash might complete
immediately since it is first performed against the shadow RAM. However, VPD
software write access might not complete immediately since the VPD modification is
written into the Flash device at the hardware’s initiative, once the other entity
releases Flash ownership, which may take up to several seconds.
The X540 has four software-defined pins (SDP pins) per port that can be used for
miscellaneous hardware or software-controllable purposes. Unless specified otherwise,
these pins and their function are bound to a specific LAN device. The use, direction, and
values of SDP pins are controlled and accessed by the Extended SDP Control (ESDP)
register. To avoid signal contention, following power-up, all four pins are defined as input
pins.
Some SDP pins have specific functionality:
• The default direction of the SDP pins is loaded from the SDP Control word in the NVM.
• The lower SDP pins (SDP0-SDP2) can also be configured for use as External Interrupt
Sources (GPI). To act as GPI pins, the desired pins must be configured as inputs and
enabled by the GPIE register. When enabled, an interrupt is asserted following a
rising-edge detection of the input pin (rising-edge detection occurs by comparing
values sampled at the internal clock rate, as opposed to an edge-detection circuit).
When detected, a corresponding GPI interrupt is indicated in the EICR register.
86
Interconnects—X540 10GBase-T Controller
• SDP1 pins can also be used to (electrically) disable both PCIe functions altogether.
Also, if the MC is present, the MC-to-LAN path(s) remain fully functional. This PCIeFunction-Off mode is entered when SDP1 pins of both ports are driven high while
PE_RST_N is de-asserted. For correct capturing, it is therefore recommended to set
SDP1 pins to their desired levels while the PE_RST_N pin is driven low and to
maintain the setting on the (last) rising edge of PE_RST_N. This ability is enabled by
setting bit 2 (SDP_FUNC_OFF_EN) in PCIe Control 3 Word (offset 0x07) of the NVM.
• The lowest SDP pins (SDP0_0 and SDP1_0) of the two ports can be combined to
encode the NC-SI package ID of the X540. This ability is enabled by setting bit 15
(NC-SI Package ID from SDP) in NC-SI Configuration 2 word (offset 0x07) of the
NVM. The 3-bit package ID is encoded as follows: Package ID = [0, SDP1_0,
SDP0_0], where SDP0_0 is used for the least significant bit.
• When the SDP pins are used as IEEE1588 auxiliary signals they can generate an
interrupt on any transition (rising or falling edge), refer to Section 7.9.4.
All SDP pins can be allocated to hardware functions. See more details on IEEE1588
auxiliary functionality in Section 7.9.4 while I/O pins functionality are programmed by
the TimeSync Auxiliary Control (TSAUXC) register.
If mapping of these SDP pins to a specific hardware function is not required then the pins
can be used as general purpose software defined I/Os. For any of the function-specific
usages, the SDP I/O pins should be set to native mode by software setting of the
SDPxxx_NATIVE bits in the ESDP register. Native mode in those SDP I/O pins, defines the
pin functionality at inactive state (reset or power down) while behavior at active state is
controlled by the software. The hardware functionality of these SDP I/O pins differs
mainly by the active behavior controlled by software.
87
Table 3-15 lists the setup required to achieve each of the possible SDP configurations.
Table 3-15 SDP Settings
X540 10GBase-T Controller—Interconnects
SDP UsageNVM Settings
0SDPNC-SI Package
GPI (EICR bit 25)NC-SI Package
NC-SI package ID NC-SI Package
1588 functionality:
Drive Target Time
0 /Clock Out
1SDPSDP_FUNC_OFF_
GPI (EICR bit 26)SDP_FUNC_OFF_
PCI disableSDP_FUNC_OFF_
1588 functionality:
Drive Target Time
1
ID from SDP = 0
ID from SDP = 0
ID from SDP = 1
NC-SI Package
ID from SDP = 0
EN = 0
EN = 0
EN = 1
SDP_FUNC_OFF_
EN = 0
GPI Register
Settings
SDP0_GPIEN=00Input/Output
SDP0_GPIEN=10Input
SDP0_GPIEN=00Input
SDP0_GPIEN=01Output
SDP1_GPIEN=00Input/Output
SDP1_GPIEN=10Input
SDP1_GPIEN=00N/A
SDP1_GPIEN=01Output0
SDPx_NATIVE SDPx_IODIRSDP1_Function
ESDP Register Settings
N/A
2SDP
3SDP
88
Thermal sensor hot
indication
GPI (EICR bit 27)SDP2_GPIEN=10Input
1588 functionality:
Sample time in
Auxiliary Time
Stamp 0 register
1588 functionality:
Sample time in
Auxiliary Time
Stamp 1 register
SDP_FUNC_OFF_
EN = 0
N/A
N/AN/A
SDP1_GPIEN=01Output1
SDP2_GPIEN=00Input/Output
SDP2_GPIEN=01Input
0Input/Output
1Input
N/A
Interconnects—X540 10GBase-T Controller
3.6 Network Interface
3.6.1 Overview
The X540 provides dual-port network connectivity with copper media. Each port includes
integrated MAC-PHY functionalities and can be operated at either 10 GbE, 1 GbE, or 100
BASE-T(X) link speed. In terms of functionality there is no primary and secondary port as
each port can be enabled or disabled independently from the other, and they can be set
at different link speeds.
The integrated PHYs support the following specifications:
• 10GBASE-T as per the IEEE 802.3an standard.
• 1000BASE-T and 100BASE-TX as per the IEEE 802.3 standard.
Note:Designers are assumed to be familiar with the specifications included in
these standards, which is not overlapping with content of subsequent
sections.
All MAC configuration is performed using Device Control registers mapped into system
memory or I/O space; an internal MDIO/MDC interface, accessible via software, is used
to configure the PHY operation.
3.6.2 Internal MDIO Interface
The X540 implements an internal IEEE 802.3 Management Data Input/Output Interface
(MDIO Interface or MII Management Interface) between each MAC and its attached
integrated PHY. This interface provides firmware and software the ability to monitor and
control the state of the PHY. It provides indirect access to an internal set of addressable
PHY registers. It complies with the new protocol defined by Clause 45 of IEEE 802.3 std.
No backward compliance with Clause 22.
Note:MDIO access to PHY registers must be operational from the time the PHY
has completed its initialization once having read the PHY image from the
NVM.
Note:During internal PHY reset events where the MAC is not reset, PHY registers
might not be accessible and the MDIO access does not complete. Software
is notified that PHY initialization and/or reset has completed by either
polling or by PHY reset done interrupt (see Section 3.6.3.4.3).
The internal MDIO interface is accessed through registers MSCA and MSRWD. An access
transaction to a single PHY register is performed by setting bit MSCA.MDICMD to 1b after
programming the appropriate fields in the MSCA and MSRWD registers. The
MSCA.MDICMD bit is auto-cleared after the read or write transaction completes.
To execute a write access, the following steps should be done:
1. Address Cycle - Register MSCA is initialized with the appropriate PHY register address
in MDIADD DEVADD, and PORTADD fields, the OPCODE field set to 00b and MDICMD
bit set to 1b.
2. Poll MSCA.MDICMD bit until it is read as 0b.
89
X540 10GBase-T Controller—Interconnects
3. Write Data Cycle - Data to be written is programmed in field MSRWD.MDIWRDATA.
4. Write Command Cycle - OPCODE field in the MSCA register is set to 01b for a write
operation and bit MSCA.MDICMD set to 1b.
5. Wait for bit MSCA.MDICMD to reset to 0b, which indicates that the transaction on the
internal MDIO interface completed.
To execute a read access, the following steps should be done:
1. Address Cycle - Register MSCA is initialized with the appropriate PHY register address
in MDIADD DEVADD, and PORTADD fields, the OPCODE field set to 00b and MDICMD
bit set to 1b.
2. Poll MSCA.MDICMD bit until it is read as 0b.
3. Read Command Cycle - OPCODE field in the MSCA register is set to 11b for a read
operation and bit MSCA.MDICMD set to 1b.
4. Wait for bit MSCA.MDICMD to reset to 0b, which indicates that the transaction on the
internal MDIO interface completed.
5. Read Data Cycle - Read the data in field MSRWD.MDIRDDATA.
Note:A read-increment-address flow is performed if the OPCODE field is set to
10b in step 2. The address is incremented internally once data is read at
step 5 so that no address cycle is needed to perform a data read from the
next address.
Note:Before writing the MSCA register, make sure that the MDIO interface is
ready to perform the transaction by reading MSCA.MDICMD as 0b.
3.6.3 Integrated Copper PHY Functionality
3.6.3.1 PHY Performance
3.6.3.1.1 Reach
Table 3-16 BER and Ranges vs. Link Speed and Cable Types
SpeedCable
CAT-7Full reach: 100 m
10GBASE-T
1000BASE-TCAT-5e
100BASE-TXCAT-5e
CAT-6aFull reach: 100 m
CAT-655 m
Committed
Reach
Full reach:
130m/100 m
Full reach:
130m/100 m
Committed
BER
-16
< 10
-15
< 10
-14
< 10
/10
/10
/10
-12
-10
-8
90
Note:Reaches specified in Table 3-16 refer to real cable lengths and not to the
IEEE standard model.
Interconnects—X540 10GBase-T Controller
3.6.3.1.2 MDI / Magnetics Spacing
The X540 supports a variable distance of 0 to 4 inches with the magnetics.
3.6.3.1.3 Cable Discharge
The X540 is capable passing the Intel cable discharge test.
3.6.3.2 Auto-Negotiation and Link Setup
Link configuration is determined by PHY auto-negotiation with the link partner. The
software device driver must change auto-negotiation settings in cases where a successful
link is not negotiated or the designer desires to change link properties. Note that the link
partner should always have auto-negotiation enabled.
3.6.3.2.1 Automatic MDI Cross-Over and Lane Inversion
Note:The X540 uses an automatic MDI/MDI-X configuration. Intel recommends
using straight through cables. Where crossover cables are used, all four
pairs must be crossed. Using crossover cables where only some pairs are
crossed is not supported and might result in link failure or slow links.
Twisted pair Ethernet PHYs must be correctly configured for MDI (no cross-over) or MDIX (cross-over) operation to inter operate. This has historically been accomplished using
special patch cables, magnetics pinouts or Printed Circuit Board (PCB) wiring. The PHY
supports the automatic MDI/MDI-X configuration (like automatic cross-over detection)
originally developed for 1000Base-T and standardized in IEEE 802.3 clause 40, at any
link speed and also during auto-negotiation. Manual (non-automatic) MDI/MDI-X
configuration is still possible via bits 1:0 of Auto-Negotiation Reserved Vendor
Provisioning 1 register at address 7.C410.
In addition to supporting MDI/MDI-X, the PHY supports lane inversion (MDI swap) of the
ABCD pairs to DCBA. It is useful for tab up or tab down RJ45 or integrated magnetics
modules on the board. The default setting is ABCD on PHY0 and DCBA to PHY1. One
dedicated pin per PHY (PHY0_RVSL / PHY1_RVSL) is controlling the MDI configuration for
MDI reversal, such as ABCD to DCBA pair inversion. It is also configurable via provisional
PHY register 1.E400.
91
Figure 3-12 Cross-Over Function
3.6.3.2.2 Auto-Negotiation Process
The integrated copper PHY performs the auto-negotiation function. Auto-negotiation
provides a method for two link partners to exchange information in a systematic manner
in order to establish a link configuration providing the highest common level of
functionality supported by both partners. Once configured, the link partners exchange
configuration information to resolve link settings such as:
• Speed: 100/1000 Mb/s or 10 Gb/s
• Link flow control operation (known as PAUSE operation)
Note:When operating in Data Center Bridging (DCB) mode, generally, priority
flow control is used instead of link flow control, and it is negotiated via
higher layer protocol (DCBx protocol) and not via auto-negotiation. Refer to
Section 3.6.5.
Note:Each PHY is capable of successfully auto-negotiating with any device that
supports 100 Mb/s or higher Ethernet, regardless of its method of Power
over Ethernet (PoE) detection.
Note:The X540 supports only full duplex mode of operation at any speed.
X540 10GBase-T Controller—Interconnects
PHY specific information required for establishing the link is also exchanged.
If link flow control is enabled in the X540, the settings for the desired flow control
behavior must be set by software in the PHY registers and auto-negotiation is restarted.
After auto-negotiation completes, the software device driver must read the PHY registers
to determine the resolved flow control behavior of the link and reflect these in the MAC
register settings (FCCFG.TFCE and MFLCN.RFCE).
Once PHY auto-negotiation completes, the PHY asserts a link-up indication to the MAC
that might notify software by an interrupt if the Link Status Change (LSC) interrupt is
enabled. The resolved speed is also indicated by the PHY to the MAC. The status of both
is directed to software via LINKS.LINK UP and LINKS.LINK_SPEED bits.
92
Interconnects—X540 10GBase-T Controller
3.6.3.2.2.1 Speed Resolution and Partner Presence
At the end of the auto-negotiation process, the link speed is automatically set to the
highest common denominator between the abilities advertised by the link partners.
If there is no common denominator, the PHY asserts the Device Present bit (AutoNegotiation Reserved Vendor Status 1: Address 7.C810, bit E) if it detected valid link
pulses during auto-negotiation even though there is no common link speed with the link
partner. This bit is valid only if auto-negotiation is enabled.
If the PHY training sequence cannot complete properly in spite of auto-negotiation
completing, then the PHY retries auto-negotiation for a programmable number of times
(set by PHY register 7.C400: 3:0) before downshifting cyclically. Downshifting is enabled
by PHY register 7.C400: 4. Automatic downshifting events are reported by the Automatic Downshift bit in PHY register 7.CC00.
3.6.3.2.2.2 Link Flow Control Resolution
Flow control is a function that is described in Clause 31 of the IEEE 802.3 standard. It
allows congested nodes to pause traffic. Flow control is essentially a MAC-to-MAC
function. PHYs indicate their MAC ability to implement flow control during autonegotiation. These advertised abilities are controlled through two bits in the autonegotiation registers (Auto-negotiation Advertisement Register: Address 7.10), bits 5
and 6 for PAUSE and Asymmetric PAUSE, respectively.
After auto-negotiation, the link partner's flow control capabilities are indicated in AutoNegotiation Link Partner Base Page Ability Register: Address 7.13, bits 5 and 6.
There are two forms of flow control that can be established via auto-negotiation:
symmetric and asymmetric. Symmetric flow control was defined originally for point-topoint links; and asymmetric for hub-to-end-node connections. Symmetric flow control
enables either node to flow-control the other. Asymmetric flow-control enables a repeater
or switch to flow-control a DTE, but not vice versa.
Generally either symmetric PAUSE is used or PAUSE is disabled, even between a endnode and a switch.
Table 3-17 lists the intended operation for the various settings of ASM_DIR and PAUSE.
This information is provided for reference only; it is the responsibility of the software to
implement the correct function. The PHY merely enables the two MACs to communicate
their abilities to each other.
Table 3-17 Pause And Asymmetric Pause Settings
Local and Remote
ASM_DIR Settings
Both ASM_DIR = 1b11Symmetric - Either side can flow control the other.
Either or both ASM_DIR = 0b11Symmetric - Either side can flow control the other.
Local Pause
Setting
10Asymmetric - Remote can flow control local only.
01Asymmetric - Local can flow control remote.
00No flow control.
Either or both = 0 No flow control.
Remote Pause
Setting
Result
93
3.6.3.2.3 Fast Retrain
In 10GBASE-T mode, the X540 PHY supports the Cisco Fast Retrain mode. If enabled, the
PHY upon losing frame can inject a programmable ordered set onto the line that tells the
far-end PHY to implement a very short resynchronization sequence to enable the nearend PHY to re-acquire frame synchronization. This saves roughly four seconds off of the
link-reconnection time on simple link breaks, as the two second link break time-out and
re-auto-negotiating.
This X540 feature requires that the far-end PHY support this proprietary mode as well.
Fast Retrain capability Exchange is done during the auto-negotiation flow.
Fast Retrain mode is enabled via PHY registers 1E.C475 and 1.E400.
3.6.3.3 PHY Initialization
3.6.3.3.1 PHY Boot
Each PHY has an Embedded Microprocessor (MCP). Each MCP has its own instruction RAM
(IRAM) and Data RAM (DRAM). The MCP code/data segment and the PHY default
configuration are fetched from the external Flash device, right after power-on reset and
also per PHY MMD register set to force a reload (Global General Provisioning 3: Address
1E.C442, bit 0).
X540 10GBase-T Controller—Interconnects
PHY access to the Flash device is controlled by the MAC. Assuming the PHY is granted by
the MAC with back-to-back access to the Flash, the PHY initialization process should take
less than 200 ms, at the end of which a PHY reset done interrupt is issued and/or
reported in PHY register 1E.CC00.6.
Internal MDIO interface provides access to the PHY registers but it does not provide the
software with the ability to overwrite the PHY image located in the NVM. MDIO access is
done via dedicated MAC registers only.
The X540 maintains a CRC-16 (standard CCITT CRC: x
image in the NVM, and checks this on NVM loads. Inversion of the CRC after calculation is
not required. If a CRC error occurs, the PHY image is reloaded again. If an error also
occurs on the second try, the PHY is stopped and a fatal interrupt is generated to the
host.
Default configuration read from the Flash overrides the default register values of the PHY.
The same MCP code/data segment is auto-loaded to both PHYs, but each PHY has its own
default configuration.
MCP code/data segment and default configuration read from the Flash are stored into
internal shadow RAMs. At PHY reset events, which are either issued by software (Global
Standard Control 1: Address 1E.0000, bit F) or internally by the MAC, there is a reset of
the micro controller; however there is no reload of ISRAM/DSRAM from the Flash. The
micro controller begins executing instructions out of internal memory loaded from the
previous Flash load. The same stands for PHY registers, which retrieves their default
values loaded from the previous Flash load.
16
+ x12 + x5 + 1) over the PHY
94
Interconnects—X540 10GBase-T Controller
3.6.3.3.2 PHY Power-Up Operations
The integrated PHY is designed to perform the following operations at boot:
1. Power-up calibration of VCOs and power supplies.
2. Provision stored default values (from Flash into internal data RAM and then into PHY
registers).
3. Calibration of the Analog Front-End (AFE).
4. Cable diagnostics.
5. Auto-negotiation.
6. Perform training (as required).
7. If running in 10GBASE-T mode, and power minimization mode is enabled, shut down
unused taps.
8. Verify error-free operation.
9. Enter steady state.
3.6.3.3.3 PHY Reset
Each PHY protects its data RAM via parity bits and its code RAM via ECC. In the event
data corruption is detected, a PHY fault interrupt is issued (see Section 3.6.3.3.1).
Each PHY supports a watchdog timer to detect a stuck micro controller. Upon failure, a
PHY fault interrupt is issued as well. Watchdog timer is set to 5 seconds by default.
The PHY is also reset on the same occasions that MAC is reset, except on software reset
events for which the PHY does not get reset. A dedicated PHY reset command is provided
to software instead, via a PHY register (Global Standard Control 1: Address 1E.0, bit F).
Refer to Section 4-4.
At PHY reset events, all the PHY functionalities go to reset including the micro controller
except the PHY PLLs that go to reset only at power-up.
PHY reset completion is expected to take up to 5 ms, with no MDIO access during that
time. PHY reset event causes link failure, which can take up to several seconds for
resuming via auto-negotiation.
3.6.3.4 PHY Interrupts
The interrupt structure of each internal PHY is hierarchical in nature, and allows masking
of all interrupts, at each of the levels of the hierarchy. The PHY has two interrupt
hierarchies one is fully clause 45 compliant, the other is vendor defined, which is
intended to allow determining the cause of an interrupt with only two status reads.
The values of these interrupt masks are visible via the internal MDIO interface in the
vendor specific areas of each MMD, and the global summary register is located in the
vendor specific area of the PHY registers (Global PHY Standard Interrupt Flags:
Addresses 1E.FC00 and Global PHY Vendor Interrupt Flags: Addresses 1E.FC01).
The interrupt structure of each PHY is such that all standards-based interrupts can be
read and cleared using a maximum of two PHY register reads.
95
X540 10GBase-T Controller—Interconnects
There are two types of PHY interrupts according to their severity, normal or fatal:
• Fatal PHY interrupts are reported together with other fatal interrupts by the ECC bit
in the EICR register. They concern the following events:
— ECC error when reading PHY micro controller code
— CRC error on the second attempt to load the PHY image from the NVM
— PHY micro controller watchdog failure
• Normal PHY interrupts are reported by the PHY Global Interrupt bit in EICR register.
They concern all other PHY interrupt causes.
Note:The PHY micro controller never resets itself to a fatal interrupt or to any
other event. The host is responsible to reset the link in such situations. The
link is down until then.
Many of the interrupt causes are mostly useful to debug the PHY hardware. Therefore,
they are masked by default and unless a specific need arises should remain so.
By default, Link State Change and Global Fault are the only interrupts that should be
unmasked by software. To enable them software should set the following bits:
• 1E.FF01.C and 1E.FF01.2 — PHY vendor mask
• 1E.D400.4 — Enable chip fault interrupt
• 1E.FF00.8 — Enable standard autoneg interrupt 1
Additionally, software can enable an interrupt on reset complete:
• 1E.D400.6 — Enable reset done interrupt
3.6.3.4.1 PHY Fault Interrupt
In the event of a PHY fatal error, 1E.CC00.4 is set and an error code is written to
1E.C850. Software should log this code and attempt to reset the PHY.
Among others, a fatal interrupt is generated on one of the following events:
• CRC error over the PHY image when trying to load it from Flash twice without success
• ECC error on one of the PHY’s internal memory that contains control data
• Watchdog failure of the PHY embedded micro controller
In reaction to a fatal error, the MAC drops the link until the fatal error is cleared. Software
is therefore required to reset the link (not only the PHY).
If three fatal PHY interrupts are handled with no link-up event in between, the link shall
be considered to be down and the port shall be disabled.
96
Interconnects—X540 10GBase-T Controller
3.6.3.4.2 Link State Change Interrupt
When an interrupt is caused by a change in the link state, bit 7.1.2 is latching low. The
actual link state can be found in register 1.E800.0.
7.C800 0 Connect Type (Duplex)1b = Full.
0b = Half.
7.C810 FEnergy Detect1b = Detected.
7.C800 EFar End Device Present1b = Present.
7.C800 D:9Connection State0x00 = Inactive (such as low-power or high-impedance).
0x01 = Cable diagnostics.
0x02 = Auto-negotiation.
0x03 = Training (10 GbE and 1 GbE only).
0x04 = Connected.
0x05 = Fail (waiting to retry auto-negotiation).
0x06 = Test mode.
0x07 = Loopback mode.
0x08 = Reserved.
0x09 = Reserved.
0x0A = Reserved.
0x0B:0x10 = Reserved.
3.6.3.4.3 Reset Done Interrupt
If software has enabled the reset done interrupt, such an event generates an interrupt,
which is indicated by bit 1E.CC00.6 being set. Note that a boot complete event is
simultaneous with the reset event.
3.6.3.4.4 PHY Interrupt Handling Flow
Firmware is responsible to guarantee an operative PHY even when host is down or
malfunctioning, in order to:
• Provide a remote access to MC from the network
• Receive WoL packets
Firmware cannot be sure the host is well functioning and consequently it always handles
PHY interrupts first. Once it has completed to do its handling of PHY interrupts, firmware
sets the relevant EEMNGCTL.CFG_DONE0/1 bit and notifies the host it can start its own
handling by issuing EICR.MNG interrupt. Since the PHY interrupt flags are cleared by
read, the following flow shall be run by host and firmware whenever a PHY interrupt
occurs:
97
X540 10GBase-T Controller—Interconnects
1. Host does not attempt to take ownership over the PHY semaphore until CFG_DONE
bit is set by firmware.
— In case the PHY semaphore is currently owned by the host, it stops accessing
PHYINT_STATUS or PHY registers and releases the PHY ownership as soon as
possible. Refer to Section 11.7.5 for the maximum semaphore ownership time
allowed.
2. Firmware takes ownership of PHY semaphore
3. Firmware copies the PHY interrupt flags read from PHY registers into the
PHYINT_STATUS registers
— When writing PHYINT_STATUS registers firmware shall not clear bits that were
not cleared by the host yet
4. Firmware handles the PHY interrupt by resetting the PHY (only if it is a fatal PHY
interrupt)
5. Firmware sets CFG_DONE bit, releases ownership of the PHY semaphore. and issues
EICR.MNG interrupt to host.
6. Host takes semaphore ownership over the PHY.
7. Host reads the PHYINT_STATUS registers and clears them (by writing zeros)
8. Host handles the PHY interrupts.
9. Prior to do a PHY reconfiguration that might drop the link (e.g. restart autonegotiation), the host must wait until the VETO bit is read as 0b
10. Host releases PHY semaphore.
Note:CFG_DONE bits are set by firmware and cleared by software. They cannot
be cleared by firmware, and cannot be set by software.
Note:For simplifying drivers, firmware runs the above flow even if there is no MC
or WoL. No wake up of the host occurs for the fatal PHY events handled by
firmware.
Note:PHYINT_STATUS registers and EEMNGCTL.CFG_DONE bits are reset by
hardware only at power-up events.
When the host is down, interrupts from MAC blocks which are critical for MC/WoL are also
handled by the firmware:
• ECC-Error from Security Rx/Tx blocks
• ECC-Error from Rx-Filter
• ECC-Error from DMA-Tx
3.6.3.5 Cable Diagnostics
The PHY implements a powerful cable diagnostic algorithm to accurately measure all of
the TDR and TDT sequences within the group of four channels. The algorithm used
transmits a pseudo-noise sequence with an amplitude of less than 300 mV for a brief
period of time during startup. From the results of this measurement, the length of each
pair, the top four impairments along the pair, and the impedance of the cable are flagged.
These measurements are accurate to ±1 m under the assumption of the ISO 11801 cable
98
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.