Intel Gigabit Ethernet Controllers, PCI-X, PCI User Manual

Download

PCI/PCI-X Family of Gigabit Ethernet Controllers Software Developer’s Manual

82540EP/EM, 82541xx, 82544GC/EI, 82545GM/EM, 82546GB/EB, and 82547xx

317453-005

Revision 3.8

Legal Notice

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications.

Intel may make changes to specifications and product descriptions at any time, without notice.

This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information.

Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

This product has not been tested with every possible configuration/setting. Intel is not responsible for the product’s failure in any configuration/setting, whether tested or untested.

The Intel product(s) discussed in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be obtained from:

Intel Corporation P.O. Box 5937 Denver, CO 80217-9808

or call in North America 1-800-548-4725, Europe 44-0-1793-431-155, France 44-0-1793-421-777, Germany 44-0-1793-421-333, other Countries 708-296-9333.

Intel

is a trademark or registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

ii Software Developer’s Manual

Revision History

Date Version Comments

June 2008 3.8 June 2008 3.7

Jan 2007 3.6

Sept 2007 3.5

May 2007 3.4 Dec 2006 3.3

June 2006 3.2 Updated Table 13.47. Changed the default setting of reserved bit 3 from 0b

April 2006 3.1 Added bit definitions (bits 9:8) to PHY register PSCON (16d).

Nov 2005 3.0 Updated Device Control/Status, EEPROM Flash Control & Data, Extended

July 2005 2.5 Initial Public Release.

Updated EEPROM Word 21h bit descriptions (section 5.6.18).

Updated Sections 13.4.30 and 13.4.31 (added text stating to use the Interrupt Throttling Register (ITR) instead of registers RDTR and RADV for applications requiring an interrupt moderation mechanism).

Added a note to sections 13.4.20 and 13.4.21 for the 82547Gi/EI. Updated section 13.4.16. Updated section 6.4.1. Changed acronym “WCR” to “WUC”. Updated Table 13-87. Changed bit 24 settings to:

0b = Cache line granularity. 1b = Descriptor granularity.

to 1b.

Updated Figure 3.2 (added Receive Queue artwork). Changed 81541ER-C0 to 82541ER-CO in Table 5-1.

Device Control, and TCTL register bit assignments. Updated PHY register 00d - 03d, 07d, 09d, 17d - 21d, and 23d bit assign-

ments.

Software Developer’s Manual iii

Note: This page is intentionally left blank.

iv Software Developer’s Manual

Contents

1 Introduction ..................................................................................................................1

1.1 Scope .................................................................................................................... 1

1.2 Overview ...............................................................................................................1

1.3 Ethernet Controller Features .................................................................................2

1.3.1 PCI Features ........................................................................................ 2

1.3.2 CSA Features (82547GI/EI Only) .........................................................2

1.3.3 Network Side Features......................................................................... 2

1.3.4 Host Offloading Features ..................................................................... 3

1.3.5 Additional Performance Features.........................................................4

1.3.6 Manageability Features (Not Applicable to the 82544GC/EI or

82541ER) ............................................................................................. 5

1.3.7 Additional Ethernet Controller Features ............................................... 5

1.3.8 Technology Features............................................................................ 5

1.4 Conventions .......................................................................................................... 6

1.4.1 Register and Bit References ................................................................ 6

1.4.2 Byte and Bit Designations ....................................................................6

1.5 Related Documents ...............................................................................................6

1.6 Memory Alignment Terminology............................................................................6

2 Architectural Overview ............................................................................................7

2.1 Introduction............................................................................................................7

2.2 External Architecture ............................................................................................. 8

2.3 Microarchitecture ................................................................................................. 10

2.3.1 PCI/PCI-X Core Interface ................................................................... 10

2.3.2 82547GI/EI CSA Interface .................................................................. 11

2.3.3 DMA Engine and Data FIFO .............................................................. 11

2.3.4 10/100/1000 Mb/s Receive and Transmit MAC Blocks ...................... 12

2.3.5 MII/GMII/TBI/Internal SerDes Interface Block ....................................12

2.3.6 10/100/1000 Ethernet Transceiver (PHY) .......................................... 13

2.3.7 EEPROM Interface............................................................................. 13

2.3.8 FLASH Memory Interface ................................................................... 14

2.4 DMA Addressing ................................................................................................. 14

2.5 Ethernet Addressing ............................................................................................ 15

2.6 Interrupts ............................................................................................................. 16

2.7 Hardware Acceleration Capability ....................................................................... 17

2.7.1 Checksum Offloading ......................................................................... 17

2.7.2 TCP Segmentation ............................................................................. 17

2.8 Buffer and Descriptor Structure...........................................................................17

3 Receive and Transmit Description.................................................................... 19

3.1 Introduction.......................................................................................................... 19

3.2 Packet Reception ................................................................................................19

3.2.1 Packet Address Filtering .................................................................... 19

3.2.2 Receive Data Storage ........................................................................ 20

3.2.3 Receive Descriptor Format................................................................. 20

3.2.4 Receive Descriptor Fetching .............................................................. 25

Software Developer’s Manual v

Contents

3.2.5 Receive Descriptor Write-Back .......................................................... 26

3.2.6 Receive Descriptor Queue Structure.................................................. 26

3.2.7 Receive Interrupts .............................................................................. 28

3.2.8 82544GC/EI Receive Interrupts ......................................................... 31

3.2.9 Receive Packet Checksum Offloading ............................................... 31

3.3 Packet Transmission ........................................................................................... 34

3.3.1 Transmit Data Storage ....................................................................... 35

3.3.2 Transmit Descriptors .......................................................................... 35

3.3.3 Legacy Transmit Descriptor Format ................................................... 36

3.3.4 Transmit Descriptor Special Field Format .......................................... 40

3.3.5 TCP/IP Context Transmit Descriptor Format...................................... 41

3.3.6 TCP/IP Context Descriptor Layout ..................................................... 42

3.3.7 TCP/IP Data Descriptor Format ......................................................... 46

3.4 Transmit Descriptor Ring Structure..................................................................... 51

3.4.1 Transmit Descriptor Fetching ............................................................. 53

3.4.2 Transmit Descriptor Write-back.......................................................... 53

3.4.3 Transmit Interrupts ............................................................................. 54

3.5 TCP Segmentation .............................................................................................. 55

3.5.1 Assumptions....................................................................................... 56

3.5.2 Transmission Process ........................................................................ 56

3.5.3 TCP Segmentation Performance ....................................................... 57

3.5.4 Packet Format .................................................................................... 57

3.5.5 TCP Segmentation Indication............................................................. 58

3.5.6 TCP Segmentation Use of Multiple Data Descriptors ........................ 59

3.5.7 IP and TCP/UDP Headers.................................................................. 60

3.5.8 Transmit Checksum Offloading with TCP Segmentation ................... 64

3.5.9 IP/TCP/UDP Header Updating ........................................................... 65

3.6 IP/TCP/UDP Transmit Checksum Offloading...................................................... 68

4 PCI Local Bus Interface......................................................................................... 71

4.1 PCI Configuration ................................................................................................ 71

4.1.1 PCI-X Configuration Registers ........................................................... 79

4.1.2 Reserved and Undefined Addresses.................................................. 82

4.1.3 Message Signaled Interrupts.............................................................. 83

4.2 Commands .......................................................................................................... 85

4.3 PCI/PCI-X Command Usage............................................................................... 87

4.3.1 Memory Write Operations .................................................................. 87

4.3.2 Memory Read Operations .................................................................. 89

4.4 Cache Line Information ....................................................................................... 90

4.4.1 Target Transaction Termination ......................................................... 91

4.5 Interrupt Assignment (82547GI/EI Only) ............................................................. 91

4.6 LAN Disable ........................................................................................................ 91

4.7 CardBus Application (82541PI/GI/EI Only) ......................................................... 92

5 EEPROM Interface ................................................................................................... 93

5.1 General Overview ............................................................................................... 93

5.2 Component Identification Via Programming Interface......................................... 94

5.3 EEPROM Device and Interface........................................................................... 95

5.3.1 Software Access................................................................................. 96

5.4 Signature and CRC Fields .................................................................................. 96

vi Software Developer’s Manual

Contents

5.5 EEUPDATE Utility ............................................................................................... 97

5.5.1 Command Line Parameters ............................................................... 97

5.6 EEPROM Address Map....................................................................................... 98

5.6.1 Ethernet Address (Words 00h-02h)..................................................103

5.6.2 Software Compatibility Word (Word 03h) .........................................103

5.6.3 SerDes Configuration (Word 04h) .................................................... 104

5.6.4 EEPROM Image Version (Word 05h)............................................... 104

5.6.5 Compatibility Fields (Word 05h - 07h) .............................................. 104

5.6.6 PBA Number (Word 08h, 09h) .........................................................104

5.6.7 Initialization Control Word 1 (Word 0Ah) ..........................................105

5.6.8 Subsystem ID (Word 0Bh)................................................................ 106

5.6.9 Subsystem Vendor ID (Word 0Ch)................................................... 106

5.6.10 Device ID (Word 0Dh, 11h) .............................................................. 107

5.6.11 Vendor ID (Word 0Eh)......................................................................107

5.6.12 Initialization Control Word 2 (Word 0Fh) .......................................... 107

5.6.13 PHY Register Address Data (Words 10h, 11h, and 13h - 1Eh) .......109

5.6.14 OEM Reserved Words (Words 10h, 11h, 13h - 1Fh) .......................109

5.6.15 EEPROM Size (Word 12h)............................................................... 109

5.6.16 Common Power (Word 12h).............................................................109

5.6.17 Software Defined Pins Control (Word 10h, 20h) ..............................109

5.6.18 CSA Port Configuration 2 (Word 21h) .............................................. 111

5.6.19 Circuit Control (Word 21h)................................................................112

5.6.20 D0 Power (Word 22h high byte) .......................................................112

5.6.21 D3 Power (Word 22h low byte) ........................................................ 112

5.6.22 Reserved Words (23h - 2Eh)............................................................ 112

5.6.23 Reserved Words (23h - 2Fh)............................................................112

5.6.24 Management Control (Word 13h, 23h)............................................. 113

5.6.25 SMBus Slave Address (Word 14h low byte, 24h low byte) ..............114

5.6.26 Initialization Control 3 (Word 14h high byte, 24h high byte)............. 115

5.6.27 IPv4 Address (Words 15h - 16h and 25h - 26h) ............................... 116

5.6.28 IPv6 Address (words 17h - 1Eh

5.6.29 LED Configuration Defaults (Word 2Fh)........................................... 116

5.6.30 Boot Agent Main Setup Options (Word 30h) ....................................116

5.6.31 Boot Agent Configuration Customization Options (Word 31h) ......... 118

5.6.32 Boot Agent Configuration Customization Options (Word 32h) ......... 120

5.6.33 IBA Capabilities (Word 33h) .............................................................121

5.6.34 IBA Secondary Port Configuration (Words 34h-35h) ....................... 121

5.6.35 Checksum Word Calculation (Word 3Fh)......................................... 122

5.6.36 82546GB/EB Dual-Channel Fiber Wake on LAN (WOL) Mode and

Functionality (Word 0Ah, 20h)..........................................................122

5.6.37 EEPROM Images .............................................................................122

5.7 Parallel FLASH Memory .................................................................................... 123

and 27h - 2Eh) .............................116

7 FLASH Memory Interface .................................................................................... 125

7.1 FLASH Interface Operation ............................................................................... 125

7.2 FLASH Control and Accesses ........................................................................... 125

7.2.1 Read Accesses ................................................................................126

7.2.2 Write Accesses.................................................................................126

Software Developer’s Manual vii

Contents

6 Power Management............................................................................................... 129

6.1 Introduction to Power Management .................................................................. 129

6.2 Assumptions...................................................................................................... 129

6.3 D3cold support .................................................................................................. 130

6.3.1 Power States .................................................................................... 130

6.3.2 Timing............................................................................................... 132

6.3.3 PCI Power Management Registers .................................................. 137

6.4 Wakeup ............................................................................................................. 141

6.4.1 Advanced Power Management Wakeup .......................................... 141

6.4.2 ACPI Power Management Wakeup.................................................. 142

6.4.3 Wakeup Packets .............................................................................. 143

8 Ethernet Interface .................................................................................................. 153

8.1 Introduction ....................................................................................................... 153

8.2 Link Interfaces Overview................................................................................... 153

8.2.1 Internal SerDes Interface/TBI Mode– 1Gb/s .................................... 154

8.2.2 GMII – 1 Gb/s ................................................................................... 155

8.2.3 MII – 10/100 Mb/s............................................................................. 156

8.3 Internal Interface ............................................................................................... 156

8.4 Duplex Operation .............................................................................................. 156

8.4.1 Full Duplex ....................................................................................... 157

8.4.2 Half Duplex....................................................................................... 157

8.5 Auto-Negotiation and Link Setup ...................................................................... 159

8.6 Auto-Negotiation and Link Setup ...................................................................... 159

8.6.1 Link Configuration in Internal Serdes/TBI Mode............................... 160

8.6.2 Internal GMII/MII Mode..................................................................... 163

8.6.3 Internal SerDes Mode Control Bit Resolution................................... 166

8.6.4 Internal PHY Mode Control Bit Resolution ....................................... 167

8.6.5 Loss of Signal/Link Status Indication................................................ 169

8.7 10/100 Mb/s Specific Performance Enhancements .......................................... 170

8.7.1 Adaptive IFS..................................................................................... 170

8.7.2 Flow Control ..................................................................................... 171

8.7.3 MAC Control Frames & Reception of Flow Control Packets ............ 171

8.7.4 Discard PAUSE Frames and Pass MAC Control Frames ................ 173

8.7.5 Transmission of PAUSE Frames...................................................... 173

8.7.6 Software Initiated PAUSE Frame Transmission............................... 174

8.7.7 External Control of Flow Control Operation...................................... 174

9 802.1q VLAN Support ........................................................................................... 175

9.1 802.1q VLAN Packet Format ............................................................................ 175

9.1.1 802.1q Tagged Frames .................................................................... 175

9.2 Transmitting and Receiving 802.1q Packets ..................................................... 176

9.2.1 Adding 802.1q Tags on Transmits ................................................... 176

9.2.2 Stripping 802.1q Tags on Receives ................................................. 176

9.3 802.1q VLAN Packet Filtering ........................................................................... 176

10 Configurable LED Outputs................................................................................. 179

10.1 Configurable LED Outputs ................................................................................ 179

10.1.1 Selecting an LED Output Source ..................................................... 179

10.1.2 Polarity Inversion.............................................................................. 180

viii Software Developer’s Manual

Contents

10.1.3 Blink Control .....................................................................................180

11 PHY Functionality and Features ...................................................................... 183

11.1 Auto-Negotiation................................................................................................ 183

11.1.1 Overview ..........................................................................................183

11.1.2 Next Page Exchanges......................................................................184

11.1.3 Register Update ...............................................................................184

11.1.4 Status ............................................................................................... 185

11.2 MDI/MDI-X Crossover (copper only) ................................................................. 185

11.2.1 Polarity Correction (copper only) ......................................................186

11.2.2 10/100 Downshift (82540EP/EM Only).............................................186

11.3 Cable Length Detection (copper only)............................................................... 187

11.4 PHY Power Management (copper only)............................................................ 187

11.4.1 Link Down – Energy Detect (copper only)........................................ 187

11.4.2 D3 State, No Link Required (copper only)........................................188

11.4.3 D3 Link-Up, Speed-Management Enabled (copper only)................. 188

11.4.4 D3 Link-Up, Speed-Management Disabled (copper only) ................188

11.5 Initialization........................................................................................................ 189

11.5.1 MDIO Control Mode ......................................................................... 189

11.6 Determining Link State ...................................................................................... 190

11.6.1 False Link .........................................................................................191

11.6.2 Forced Operation ............................................................................. 191

11.6.3 Auto Negotiation............................................................................... 192

11.6.4 Parallel Detection .............................................................................192

11.7 Link Criteria .......................................................................................................192

11.7.1 1000BASE-T ....................................................................................192

11.7.2 100BASE-TX ....................................................................................192

11.7.3 10BASE-T ........................................................................................193

11.8 Link Enhancements........................................................................................... 193

11.8.1 SmartSpeed .....................................................................................193

11.8.2 Flow Control ..................................................................................... 193

11.9 Management Data Interface.............................................................................. 194

11.10 Low Power Operation........................................................................................ 194

11.10.1 Powerdown via the PHY Register .................................................... 195

11.10.2 Smart Power-Down ..........................................................................195

11.11 1000 Mbps Operation........................................................................................ 195

11.11.1 Introduction....................................................................................... 195

11.11.2 Transmit Functions ........................................................................... 197

11.11.3 Transmit FIFO .................................................................................. 197

11.11.4 Receive Functions............................................................................ 199

11.12 100 Mbps Operation.......................................................................................... 200

11.13 10 Mbps Operation............................................................................................ 200

11.13.1 Link Test...........................................................................................201

11.13.2 10Base-T Link Failure Criteria and Override .................................... 201

11.13.3 Jabber ..............................................................................................201

11.13.4 Polarity Correction............................................................................ 201

11.13.5 Dribble Bits .......................................................................................201

11.14 PHY Line Length Indication...............................................................................201

Software Developer’s Manual ix

Contents

12 Dual Port Characteristics.................................................................................... 203

12.1 Introduction ....................................................................................................... 203

12.2 Features of Each MAC...................................................................................... 203

12.2.1 PCI/PCI-X interface .......................................................................... 203

12.2.2 MAC Configuration Register Space ................................................. 205

12.2.3 SDP, LED, INT# output .................................................................... 205

12.3 Shared EEPROM ..............................................................................................206

12.3.1 EEPROM Map.................................................................................. 206

12.3.2 EEPROM Arbitration ........................................................................206

12.4 Shared FLASH .................................................................................................. 207

12.4.1 FLASH Access Contention............................................................... 207

12.5 LAN Disable ...................................................................................................... 208

12.5.1 Overview .......................................................................................... 208

12.5.2 Values Sampled on Reset................................................................ 208

12.5.3 Multi-Function Advertisement........................................................... 209

12.5.4 Interrupt Use..................................................................................... 209

12.5.5 Power Reporting............................................................................... 209

12.5.6 Summary .......................................................................................... 210

13 Register Descriptions........................................................................................... 211

13.1 Introduction ....................................................................................................... 211

13.2 Register Conventions........................................................................................ 211

13.2.1 Memory and I/O Address Decoding ................................................. 212

13.2.2 I/O-Mapped Internal Register, Internal Memory, and Flash .............213

13.3 PCI-X Register Access Split.............................................................................. 219

13.4 Main Register Descriptions ............................................................................... 220

13.4.1 Device Control Register ................................................................... 220

13.4.2 Device Status Register..................................................................... 225

13.4.3 EEPROM/Flash Control & Data Register ......................................... 228

13.4.4 EEPROM Read Register.................................................................. 230

13.4.5 Flash Access .................................................................................... 232

13.4.6 Extended Device Control Register ................................................... 233

13.4.7 MDI Control Register........................................................................ 238

13.4.8 Flow Control Address Low ............................................................... 279

13.4.9 Flow Control Address High............................................................... 279

13.4.10 Flow Control Type ............................................................................ 280

13.4.11 VLAN Ether Type ............................................................................. 280

13.4.12 Flow Control Transmit Timer Value.................................................. 281

13.4.13 Transmit Configuration Word Register ............................................. 282

13.4.14 Receive Configuration Word Register .............................................. 283

13.4.15 LED Control...................................................................................... 285

13.4.16 Packet Buffer Allocation ................................................................... 288

13.4.17 Interrupt Cause Read Register......................................................... 289

13.4.18 Interrupt Throttling Register.............................................................. 291

13.4.19 Interrupt Cause Set Register............................................................ 292

13.4.20 Interrupt Mask Set/Read Register .................................................... 293

13.4.21 Interrupt Mask Clear Register .......................................................... 294

13.4.22 Receive Control Register ................................................................. 296

13.4.23 Flow Control Receive Threshold Low............................................... 300

13.4.24 Flow Control Receive Threshold High.............................................. 301

x Software Developer’s Manual

Contents

13.4.25 Receive Descriptor Base Address Low ............................................ 302

13.4.26 Receive Descriptor Base Address High ........................................... 302

13.4.27 Receive Descriptor Length ............................................................... 303

13.4.28 Receive Descriptor Head ................................................................. 303

13.4.29 Receive Descriptor Tail ....................................................................304

13.4.30 Receive Delay Timer Register..........................................................304

13.4.31 Receive Interrupt Absolute Delay Timer...........................................305

13.4.32 Receive Small Packet Detect Interrupt.............................................306

13.4.33 Transmit Control Register ................................................................ 306

13.4.34 Transmit IPG Register ...................................................................... 308

13.4.35 Adaptive IFS Throttle - AIT ............................................................... 310

13.4.36 Transmit Descriptor Base Address Low ...........................................311

13.4.37 Transmit Descriptor Base Address High .......................................... 312

13.4.38 Transmit Descriptor Length .............................................................. 312

13.4.39 Transmit Descriptor Head ................................................................ 313

13.4.40 Transmit Descriptor Tail ...................................................................314

13.4.41 Transmit Interrupt Delay Value......................................................... 314

13.4.42 TX DMA Control (82544GC/EI only) ................................................315

13.4.43 Transmit Descriptor Control ............................................................. 315

13.4.44 Transmit Absolute Interrupt Delay Value.......................................... 317

13.4.45 TCP Segmentation Pad And Minimum Threshold............................ 318

13.4.46 Receive Descriptor Control .............................................................. 320

13.4.47 Receive Checksum Control.............................................................. 321

13.5 Filter Registers ..................................................................................................323

13.5.1 Multicast Table Array........................................................................ 323

13.5.2 Receive Address Low....................................................................... 325

13.5.3 Receive Address High...................................................................... 325

13.5.4 VLAN Filter Table Array ................................................................... 326

13.6 Wakeup Registers ............................................................................................. 327

13.6.1 Wakeup Control Register .................................................................327

13.6.2 Wakeup Filter Control Register ........................................................ 328

13.6.3 Wakeup Status Register................................................................... 329

13.6.4 IP Address Valid............................................................................... 331

13.6.5 IPv4 Address Table .......................................................................... 332

13.6.6 IPv6 Address Table .......................................................................... 333

13.6.7 Wakeup Packet Length ....................................................................334

13.6.8 Wakeup Packet Memory (128 Bytes)............................................... 334

13.6.9 Flexible Filter Length Table .............................................................. 334

13.6.10 Flexible Filter Mask Table ................................................................ 335

13.6.11 Flexible Filter Value Table................................................................336

13.7 Statistics Registers............................................................................................ 336

13.7.1 CRC Error Count .............................................................................. 337

13.7.2 Alignment Error Count...................................................................... 337

13.7.3 Symbol Error Count..........................................................................338

13.7.4 RX Error Count................................................................................. 338

13.7.5 Missed Packets Count...................................................................... 339

13.7.6 Single Collision Count ......................................................................339

13.7.7 Excessive Collisions Count .............................................................. 340

13.7.8 Multiple Collision Count .................................................................... 340

13.7.9 Late Collisions Count ....................................................................... 341

Software Developer’s Manual xi

Contents

13.7.10 Collision Count ................................................................................. 341

13.7.11 Defer Count ...................................................................................... 342

13.7.12 Transmit with No CRS...................................................................... 342

13.7.13 Sequence Error Count...................................................................... 343

13.7.14 Carrier Extension Error Count .......................................................... 343

13.7.15 Receive Length Error Count............................................................. 344

13.7.16 XON Received Count ....................................................................... 344

13.7.17 XON Transmitted Count ................................................................... 345

13.7.18 XOFF Received Count ..................................................................... 345

13.7.19 XOFF Transmitted Count ................................................................. 345

13.7.20 FC Received Unsupported Count .................................................... 346

13.7.21 Packets Received (64 Bytes) Count................................................. 346

13.7.22 Packets Received (65-127 Bytes) Count ......................................... 347

13.7.23 Packets Received (128-255 Bytes) Count ....................................... 347

13.7.24 Packets Received (256-511 Bytes) Count ....................................... 348

13.7.25 Packets Received (512-1023 Bytes) Count ..................................... 348

13.7.26 Packets Received (1024 to Max Bytes) Count................................. 349

13.7.27 Good Packets Received Count ........................................................ 349

13.7.28 Broadcast Packets Received Count................................................. 350

13.7.29 Multicast Packets Received Count................................................... 350

13.7.30 Good Packets Transmitted Count .................................................... 351

13.7.31 Good Octets Received Count........................................................... 351

13.7.32 Good Octets Transmitted Count....................................................... 352

13.7.33 Receive No Buffers Count................................................................ 352

13.7.34 Receive Undersize Count................................................................. 353

13.7.35 Receive Fragment Count ................................................................. 353

13.7.36 Receive Oversize Count................................................................... 354

13.7.37 Receive Jabber Count...................................................................... 354

13.7.38 Management Packets Received Count ............................................ 355

13.7.39 Management Packets Dropped Count ............................................. 356

13.7.40 Management Pkts Transmitted Count.............................................. 356

13.7.41 Total Octets Received ...................................................................... 356

13.7.42 Total Octets Transmitted .................................................................. 357

13.7.43 Total Packets Received.................................................................... 358

13.7.44 Total Packets Transmitted................................................................ 358

13.7.45 Packets Transmitted (64 Bytes) Count............................................. 359

13.7.46 Packets Transmitted (65-127 Bytes) Count ..................................... 359

13.7.47 Packets Transmitted (128-255 Bytes) Count ................................... 360

13.7.48 Packets Transmitted (256-511 Bytes) Count ................................... 360

13.7.49 Packets Transmitted (512-1023 Bytes) Count ................................. 361

13.7.50 Packets Transmitted (1024 Bytes or Greater) Count ....................... 361

13.7.51 Multicast Packets Transmitted Count............................................... 362

13.7.52 Broadcast Packets Transmitted Count............................................. 362

13.7.53 TCP Segmentation Context Transmitted Count ............................... 363

13.7.54 TCP Segmentation Context Transmit Fail Count ............................. 363

13.8 Diagnostics Registers ....................................................................................... 364

13.8.1 Receive Data FIFO Head Register................................................... 364

13.8.2 Receive Data FIFO Tail Register ..................................................... 364

13.8.3 Receive Data FIFO Head Saved Register ....................................... 365

13.8.4 Receive Data FIFO Tail Saved Register .......................................... 365

xii Software Developer’s Manual

Contents

13.8.5 Receive Data FIFO Packet Count ....................................................366

13.8.6 Transmit Data FIFO Head Register..................................................366

13.8.7 Transmit Data FIFO Tail Register .................................................... 367

13.8.8 Transmit Data FIFO Head Saved Register ......................................367

13.8.9 Transmit Data FIFO Tail Saved Register ......................................... 368

13.8.10 Transmit Data FIFO Packet Count ...................................................368

13.8.11 Packet Buffer Memory...................................................................... 369

14 General Initialization and Reset Operation.................................................. 371

14.1 Introduction........................................................................................................371

14.2 Power Up State ................................................................................................. 371

14.3 General Configuration ....................................................................................... 371

14.4 Receive Initialization..........................................................................................372

14.5 Transmit Initialization......................................................................................... 373

14.5.1 Signal Interface ................................................................................376

14.5.2 GMII/MII Features not Supported ..................................................... 377

14.5.3 Avoiding GMII Test Mode(s)............................................................. 378

14.5.4 MAC Configuration ........................................................................... 378

14.5.5 Link Setup ........................................................................................379

14.6 PHY Initialization (10/100/1000 Mb/s Copper Media) ....................................... 380

14.7 Reset Operation ................................................................................................ 381

14.8 Initialization of Statistics .................................................................................... 384

15 Diagnostics and Testability ...............................................................................385

15.1 Diagnostics........................................................................................................385

15.1.1 FIFO State........................................................................................ 385

15.1.2 FIFO Data......................................................................................... 385

15.1.3 Loopback.......................................................................................... 385

15.2 Testability ..........................................................................................................386

15.2.1 EXTEST Instruction.......................................................................... 387

15.2.2 SAMPLE/PRELOAD Instruction ....................................................... 387

15.2.3 IDCODE Instruction.......................................................................... 387

15.2.4 BYPASS Instruction .........................................................................387

A Appendix (Changes From 82544EI/82544GC) ............................................389

B Appendix (82540EP/EM and 82545GM/EM Differences)......................... 391

Software Developer’s Manual xiii

Contents

Note: This page intentionally left blank.

xiv Software Developer’s Manual

Introduction

Introduction 1

1.1 Scope

This document serves as a software developer’s manual for 82546GB/EB, 82545GM/EM, 82544GC/EI, 82541(PI/GI/EI), 82541ER, 82547GI/EI, and 82540EP/EM Gigabit Ethernet

Controllers. Throughout this manual references are made to the PCI/PCI-X Family of Gigabit Ethernet Controllers or Ethernet controllers. Unless specifically noted, these references apply to all the Ethernet controllers listed above.

1.2 Overview

The PCI/PCI-X Family of Gigabit Ethernet Controllers are highly integrated, high-performance Ethernet LAN devices for 1000 Mb/s, 100 Mb/s and 10 Mb/s data rates. They are optimized for LAN on Motherboard (LOM) designs, enterprise networking, and Internet appliances that use the Peripheral Component Interconnect (PCI) and PCI-X bus.

Note: The 82541xx and 82540EP/EM do not support the PCI-X bus.

The 82547GI(EI) connects to the motherboard chipset through a Communications Streaming Architecture (CSA) port. CSA is designed for low memory latency and higher performance than a comparable PCI interface.

The remaining Ethernet controllers provide a 32-/64-bit, 33/66 MHz direct interface to the PCI Local Bus Specification (revision 2.2 or 2.3), as well as the emerging PCI-X extension to the PCI Local Bus (revision 1.0a).

The Ethernet controllers provide an interface to the host processor by using on-chip command and status registers and a shared host memory area, set up mainly during initialization. The controllers provide a highly optimized architecture to deliver high performance and PCI/CSA/PCI-X bus efficiency. By implementing hardware acceleration capabilities, the controllers enable offloading various tasks such as TCP/UDP/IP checksum calculations from the host processor. They also minimize I/O accesses and interrupts required to manage the Ethernet controllers and provide a highly configurable design that can be used effectively in various environments.

The PCI/PCI-X Family of Gigabit Ethernet Controllers handle all IEEE 802.3 receive and transmit MAC functions. They contain fully integrated physical-layer circuitry for 1000 Base-T, 100 BaseTX, and 10 Base-T applications (IEEE 802.3, 802.3u, and 802.3ab) as well as on-chip Serializer/ Deserializer (SerDes)

functionality that fully complies with IEEE 802.3z PCS.

1. The 82541xx, 82547GI/EI, and 82540EP/EM do not support any SerDes functionality.

Software Developer’s Manual 1

Introduction

For the 82544GC/EI, when connected to an appropriate SerDes, it can alternatively provide an Ethernet interface for 1000 Base-SX or LX applications (IEEE 802.3z).

Note: The 82546EB/82545EM is SerDes PICMG 2.16 compliant. The 82546GB/82545GM is SerDes

PICMG 3.1 compliant.

82546GB/EB Ethernet controllers also provide features in an integrated dual-port solution comprised of two distinct MAC/PHY instances. As a result, they appear as multi-function PCI devices containing two identically-functioning Ethernet controllers. See Section 12 for details.

1.3 Ethernet Controller Features

This section describes the features of the PCI/PCI-X Family of Gigabit Ethernet Controllers.

1.3.1 PCI Features

• 32/64-bit 33/66 MHz, PCI Rev 2.3 and PCI-X 1.0a compliant Host interface (82546GB/

82545GM)

• 32/64-bit 33/66 MHz, PCI Rev 2.2 and PCI-X 1.0a compliant Host interface (82546EB,

82545EM, and 82544GC/EI)

• 32/64-bit 33/66 MHz, PCI Rev 2.3 compliant Host interface (82541xx)

• 32/64-bit 33/66 MHz, PCI Rev 2.2 compliant Host interface (82540EP/EM)

• 64-bit addressing for systems with more than 4 GB of physical memory

• Efficient PCI bus master operation

• Command usage optimization for advanced PCI commands

1.3.2 CSA Features (82547GI/EI Only)

• Uses dedicated port for client LAN controller directly on an MCH device

• High-speed interface with twice the peak bandwidth of a 32-bit 33 MHz PCI bus

• PCI power management registers recognized by the MCH

• Interface only uses 13 signals

1.3.3 Network Side Features

• Auto-Negotiation and Link Setup

— Automatic link configuration including speed, duplex and flow control under IEEE

802.3ab for copper media

— For GMII/MII mode, the driver complies with the IEEE 802.3ab standard requirements

for speed, duplex, and flow control Auto-Negotiation capabilities

• Supports half and full duplex operation at 10 Mb/s and 100 Mb/s speeds while working with

the internal PHY

2 Software Developer’s Manual

• IEEE 802.3x compliant flow control support

— Enables control of the transmission of Pause packets through software or hardware

triggering

— Provides indications of receive FIFO status

• State-of-the-art internal transceiver (PHY) with DSP architecture implementation

— Digital adaptive equalization and crosstalk

— Echo and crosstalk cancellation

— Automatic MDI/MDI-X crossover at all speeds and compensation for cable length

— Media Independent Interfaces (MII) IEEE 802.3e for supporting 10/10BASE-T

transceivers

• Integrated dual-port solution comprised of two distinct MAC/PHY instances (82546GB/EB)

• Provides on-chip IEEE 802.3z PCS SerDes functionality (82546GB/EB and 82545GM/EM)

1.3.4 Host Offloading Features

• Receive and transmit IP and TCP/UDP checksum offloading capabilities

Introduction

• Transmit TCP Segmentation (operating system support required)

• Packet filtering based on checksum errors

• Support for various address filtering modes:

— 16 exact matches (unicast, or multicast)

— 4096-bit hash filter for multicast frames

— Promiscuous, unicast and promiscuous multicast transfer modes

• IEEE 802.1q VLAN support

— Ability to add and strip IEEE 802.1q VLAN tags

— Packet filtering based on VLAN tagging, supporting 4096 tags

• SNMP and RMON statistic counters

• Support for IPv6 including (not applicable to the 82544GC/EI):

— IP/TCP and IP/UDP receive checksum offload

— Wake up filters

— TCP segmentation

1. Not applicable to the 82541ER.

Software Developer’s Manual 3

Introduction

1.3.5 Additional Performance Features

• Provides adaptive Inter Frame Spacing (IFS) capability, enabling collision reduction in half

duplex networks (82544GC/EI)

• Programmable host memory receive buffers (256 B to 16 KB)

• Programmable cache line size from 16 B to 128 B for efficient usage of PCI bandwidth

• Implements a total of 64 KB (40 KB for the 82547GI/EI) of configurable receive and transmit

data FIFOs. Default allocation is 48 KB for the receive data FIFO and 16 KB for the transmit data FIFO

• Descriptor ring management hardware for transmit and receive. Optimized descriptor fetching

and write-back mechanisms for efficient system memory and PCI bandwidth usage

• Provides interrupt coalescing to reduce the number of interrupts generated by receive and

transmit operations (82544GC/EI)

• Supports reception and transmission of packets with length up to 16 KB

• New intelligent interrupt generation features to enhance driver performance (not applicable to

the 82544GC/EI):

— Packet interrupt coalescing timers (packet timers) and absolute-delay interrupt timers for

both transmit and receive operation

— Short packet detection interrupt for improved response time to TCP acknowledges

— Transmit Descriptor Ring “Low” signaling

— Interrupt throttling control to limit maximum interrupt rate and improve CPU utilization

4 Software Developer’s Manual

Introduction

1.3.6 Manageability Features (Not Applicable to the 82544GC/EI or 82541ER)

• Manageability support for ASF 1.0 and AoL 2.0 by way of SMBus 2.0 interface and either:

— TCO mode SMBus-based management packet transmit / receive support

— Internal ASF-compliant TCO controller

1.3.7 Additional Ethernet Controller Features

• Implements ACPI

• Supports Wake on LAN (WoL)

• Provides four wire serial EEPROM interface for loading product configuration information

— Allows use of either 3.3 V dc or 5 V dc powered EEPROM

• Provides external parallel interface for up to 512 KB of FLASH memory for support of Pre-

Boot Execution Environment (PXE)

• Provides seven general purpose user mode pins

• Provides Activity and Link LED indications

• Supports little-endian byte ordering for 32- and 64-bit systems

• Provides loopback capabilities under TBI (82544GC/EI)

EB and 82545GM/EM) and GMII/MII modes of operation

• Provides IEEE JTAG boundary scan support

• Four programmable LED outputs (Not applicable to the 82544GC/EI).

—For the 82546GB/EB, four programmable LED outputs for each port

• Detection and improved power-management with LAN cable unconnected (82546GB/EB)

1.3.8 Technology Features

Implemented in 0.15µ CMOS process (0.13µ for the 82541xx and 82547GI/EI)

•

(internal SerDes for the 82546GB/

• Packaged in 364 PBGA.

—For the 82544EI, packaged in 416 PBGA.

—For the 82540EP/EM, 82541xx, and 82547GI/EI, packaged in 196 PBGA.

• Implemented in low power (3.3 V dc or 5 V dc compatible PCI signaling) CMOS process

1. Not applicable to the 82541ER.

2. Not applicable to the 82541xx, 82547GI/EI or 82540EP/EM.

Software Developer’s Manual 5

Introduction

1.4 Conventions

This document uses notes that call attention to important comments:

Note: Indicates details about the hardware’s operations that are not immediately obvious. Read these

notes to get information about exceptions, unusual situations, and additional explanations of some PCI/PCI-X Family of Gigabit Ethernet Controller features.

1.4.1 Register and Bit References

This document refers to Ethernet controller register names using all capital letters. To refer to a specific bit in a register the convention REGISTER.BIT is used. For example, CTRL.ASDE refers to the Auto-Speed Detection Enable bit in the Device Control Register (CTRL).

1.4.2 Byte and Bit Designations

This document uses “B” to abbreviate quantities of bytes. For example, a 4 KB represents 4096 bytes. Similarly, “b” is used to represent quantities of bits. For example, 100 Mb/s represents 100 Megabits per second.

1.5 Related Documents

• IEEE Std. 802.3, 2000 Edition. Incorporates various IEEE standards previously published

separately.

• PCI Local Bus Specification, Revision 2.2 and 2.3, PCI Local Bus Special Interest Group.

1.6 Memory Alignment Terminology

Some PCI/PCI-X Family of Gigabit Ethernet Controller data structures have special memory alignment requirements. This implies that the starting physical address of a data structure must be aligned as specified in this manual. The following terms are used for this purpose:

• BYTE alignment: Implies that the physical addresses can be odd or even. Examples:

0FECBD9A1h, 02345ADC6h.

• WORD alignment: Implies that physical addresses must be aligned on even boundaries. For

example, the last nibble of the address can only end in 0, 2, 4, 6, 8, Ah, Ch, or Eh (0FECBD9A2h).

• DWORD (Double-Word) alignment: Implies that the physical addresses can only be aligned

on 4-byte boundaries. For example, the last nibble of the address can only end in 0, 4, 8, or Ch (0FECBD9A8h).

• QWORD (Quad-Word) alignment: Implies that the physical addresses can only be aligned on

8-byte boundaries. For example, the last nibble of the address can only end in 0 or 8 (0FECBD9A8h).

• PARAGRAPH alignment: Implies that the physical addresses can only be aligned on 16-byte

boundaries. For example, the last nibble must be a 0 (02345ADC0h).

6 Software Developer’s Manual

Architectural Overview

Architectural Overview 2

2.1 Introduction

This section provides an overview of the PCI/PCI-X Family of Gigabit Ethernet Controllers. The following sections give detailed information about the Ethernet controller’s functionality, register description, and initialization sequence. All major interfaces of the Ethernet controllers are described in detail.

The following principles shaped the design of the PCI/PCI-X Family of Gigabit Ethernet Controllers:

1. Provide an Ethernet interface containing a 10/100/1000 Mb/s PHY that also supports 1000 Base-X implementations.

2. Provide the highest performance solution possible, based on the following:

— Provide direct access to all memory without using mapping registers

— Minimize the PCI target accesses required to manage the Ethernet controller

— Minimize the interrupts required to manage the Ethernet controller

— Off-load the host processor from simple tasks such as TCP checksum calculations

— Maximize PCI efficiency and performance

— Use mixed signal processing to assure physical layer characteristics surpass specifications

for UTP copper media

3. Provide a simple software interface for basic operations.

4. Provide a highly configurable design that can be used effectively in different environments.

The PCI/PCI-X Family of Gigabit Ethernet Controllers architecture is a derivative of the 82542 and 82543 designs. They take the MAC functionality and integrated copper PHY from their predecessors and adds SMBus-based manageability and integrated ASF controller functionality to the MAC solution comprised of two distinct MAC/PHY instances.

. In addition, the 82546GB/EB features this architecture in an integrated dual-port

1. Not applicable to the 82544GC/EI or 82541ER.

Software Developer’s Manual 7

Architectural Overview

2.2 External Architecture

Figure 2-1 shows the external interfaces to the 82546GB/EB.

MDI Interface A

1000Base-T PHY Interfaces

MDI Interface B

Design for Test Interface

External TBI Interface

LEDs LEDs

Software Defined Pins

10/100/1000 PHY

MDIO

GMII/ MII

Device Function 0 MAC/Controller (LAN A)

PCI (64-bit, 33/66 MHz)/PCI-X (133 MHz)

10/100/1000 PHY

MDIO

Device Function 1 MAC/Controller (LAN B)

GMII/ MII

SMBus Interface

EEPROM Interface

Flash Interface

Software Defined Pins

Figure 2-1. 82546GB/EB External Interface

Figure 2-2 shows the external interfaces to the 82545GM/EM, 82544GC/EI, 82540EP/EM, and

82541xx.

MDI Interface

1000Base-T PHY Interface

Design for Test Interface

External TBI Interface (

82545GM/EM only

LEDs

Software Defined Pins

)

10/100/1000 PHY

MDIO

GMII/ MII

Device Function 0 MAC/Controller

SMBus Interface

EEPROM Interface

Flash Interface

PCI (64-bit, 33/66 MHz)/PCI-X (133 MHz)

Note: 82540EP/EM and 82541xx do not support PCI-X; 82544GC/EI and 82541ER do not support SMBus interface

Figure 2-2. 82545GM/EM, 82544GC/EI, 82540EP/EM, and 82541xx External Interface

8 Software Developer’s Manual

Figure 2-3 shows the external interfaces to the 82547GI/EI.

Architectural Overview

Slave

Access

Logic

Control

Status

Logic

Statistics

CSA Port

TX/RX MAC

CSMA/CD

Trellis Viterbi

Encoder/Decoder

PCI Core EEPROM FLASH

DMA Function

Descriptor Management

RX Filters

(Perfect,

Multicast,

VLAN)

VLA

8 bits

Side-stream

Scrambler/

Descrambler

4 bits

40KB

Packet

RAM

Management

Interface

PHY

Control

ECHO, NEXT,

FEXT

Cancellers

AGC, A/D

Timing

Recovery

Media Dependent Interface

4DPAM5 Encoder

Pulse Shaper,

DAC, Filter

Line DriverHybrid

Figure 2-3. 82547GI(EI) External Interface

Software Developer’s Manual 9

Architectural Overview

2.3 Microarchitecture

Compared to its predecessors, the PCI/PCI-X Family of Gigabit Ethernet Controller’s MAC adds improved receive-packet filtering to support SMBus-based manageability, as well as the ability to transmit SMBus-based manageability packets. In addition, an ASF-compliant TCO controller is integrated into the controller’s MAC for reduced-cost basic ASF manageability.

Note: The 82544GC/EI and 82541ER do not support SMBus-based manageability.

For the 82546GB/EB, this new functionality is packaged in an integrated dual-port combination. The architecture includes two instances of both the MAC and PHY along with a single PCI/PCI-X interface. As a result, each of the logical LAN devices appear as a distinct PCI/PCI-X bus device.

The following sections describe the hardware building blocks. Figure 2-4 shows the internal microarchitecture.

2.3.1 PCI/PCI-X Core Interface

The PCI/PCI-X core provides a complete glueless interface to a 33/66 MHz, 32/64-bit PCI bus or a 33/66/133 MHz, 32/64 bit PCI-X bus. It is compliant with the PCI Bus Specification Rev 2.2 or 2.3 and the PCI-X Specification Rev. 1.0a. The Ethernet controllers provide 32 or 64 bits of addressing and data, and the complete control interface to operate on a 32-bit or 64-bit PCI or PCI-X bus. In systems with a dedicated bus for the Ethernet controller, this provides sufficient bandwidth to support sustained 1000 Mb/s full-duplex transfer rates. Systems with a shared bus (especially the 32-bit wide interface) might not be able to maintain 1000 Mb/s, but can sustain multiple hundreds of Mbps.

Host Arbiter

TX MAC (10/100/

1000 Mb)

RX MAC (10/100/

1000 Mb)

RMON

Statistics

GMII/

MII

MDIO

Link I/F

MDIO

PCI Interface

EEPROM Flash

PCI/

PCI-X

Core

DMA

Engine

Packet

Buffer

ASF

Manageability

SM Bus

Switch

Packet/

Manageability

Filter

Figure 2-4. Internal Architecture Block Diagram

10 Software Developer’s Manual

When the Ethernet controller serves as a PCI target, it follows the PCI configuration specification, which allows all accesses to it to be automatically mapped into free memory and I/O space at initialization of the PCI system.

When processing transmit and receive frames, the Ethernet controller operates as master on the PCI bus. As a master, transaction burst length on the PCI bus is determined by several factors, including the PCI latency timer expiration, the type of bus transfer being made, the size of the data transfer, and whether the data transfer is initiated by receive or transmit logic.

The PCI/PCI-X bus interfaces to the DMA engine.

2.3.2 82547GI/EI CSA Interface

CSA is derived from the Intel® Hub Architecture. The 82547EI Controller CSA port consists of 11 data and control signals, two strobes, a 66 MHz clock, and driver compensation resistor connections. The operating details of these signals and the packet data protocol that accompanies them are proprietary. The CSA port has a theoretical bandwidth of 266 MB/s — approximately twice the peak bandwidth of a 32-bit 33 MHz PCI bus.

The CSA port architecture is invisible to both system software and the operating system, allowing conventional PCI-like configuration.

Architectural Overview

2.3.3 DMA Engine and Data FIFO

The DMA engine handles the receive and transmit data and descriptor transfers between the host memory and the on-chip memory.

In the receive path, the DMA engine transfers the data stored in the receive data FIFO buffer to the receive buffer in the host memory, specified by the address in the descriptor. It also fetches and writes back updated receive descriptors to host memory.

In the transmit path, the DMA engine transfers data stored in the host memory buffers to the transmit data FIFO buffer. It also fetches and writes back updated transmit descriptors.

The Ethernet controller data FIFO block consists of a 64 KB (40 KB for the 82547GI/EI) on-chip buffer for receive and transmit operation. The receive and transmit FIFO size can be allocated based on the system requirements. The FIFO provides a temporary buffer storage area for frames as they are received or transmitted by the Ethernet controller.

The DMA engine and the large data FIFOs are optimized to maximize the PCI bus efficiency and reduce processor utilization by:

• Mitigating instantaneous receive bandwidth demands and eliminating transmit underruns by

buffering the entire out-going packet prior to transmission

• Queuing transmit frames within the transmit FIFO, allowing back-to-back transmission with

the minimum interframe spacing

• Allowing the Ethernet controller to withstand long PCI bus latencies without losing incoming

data or corrupting outgoing data

• Allowing the transmit start threshold to be tuned by the transmit FIFO threshold. This

adjustment to system performance is based on the available PCI bandwidth, wire speed, and latency considerations

Software Developer’s Manual 11

Architectural Overview

• Offloading the receiving and transmitting IP and TCP/UDP checksums

• Directly retransmitting from the transmit FIFO any transmissions resulting in errors (collision

detection, data underrun), thus eliminating the need to re-access this data from host memory

2.3.4 10/100/1000 Mb/s Receive and Transmit MAC Blocks

The controller’s CSMA/CD unit handles all the IEEE 802.3 receive and transmit MAC functions while interfacing between the DMA and TBI/internal SerDes/MII/GMII interface block. The CSMA/CD unit supports IEEE 802.3 for 10 Mb/s, IEEE 802.3u for 100 Mb/s and IEEE 802.3z and IEEE 802.3ab for 1000 Mb/s.

The Ethernet controller supports half-duplex 10/100 Mb/s MII or 1000 Mb/s GMII mode and all aspects of the above specifications in full-duplex operation. In half-duplex mode, the Ethernet controller supports operation as specified in IEEE 802.3z specification. In the receive path, the Ethernet controller supports carrier extended packets and packets generated during packet bursting operation. The 82554GC/EI, in the transmit path, also supports carrier extended packets and can be configured to transmit in packet burst mode.

The Ethernet controller offers various filtering capabilities that provide better performance and lower processor utilization as follows:

• Provides up to 16 addresses for exact match unicast/multicast address filtering.

• Provides multicast address filtering based on 4096 bit vectors. Promiscuous unicast and

promiscuous multicast filtering are supported as well.

• The Ethernet controller strips IEEE 802.1q VLAN tag and filter packets based on their VLAN

ID. Up to 4096 VLAN tags are supported

In the transmit path, the Ethernet controller supports insertion of VLAN tag information, on a packet-by-packet basis.

The Ethernet controller implements the flow control function as defined in IEEE 802.3x, as well as specific operation of asymmetrical flow control as defined by IEEE 802.3z. The Ethernet controller also provides external pins for controlling the flow control function through external logic.

2.3.5 MII/GMII/TBI/Internal SerDes Interface Block

The Ethernet controller provides the following serial interfaces:

• A GMII/MII interface to the internal PHY.

• Internal SerDes interface

82544GC/EI: The Ethernet controller implements the 802.3z PCS function, the AutoNegotiation function and 10-bit data path interface (TBI) for both receive and transmit operations. It is used for 1000BASE-SX, -LX, and -CX configurations, operating only at 1000 Mb/s full-duplex. The on-chip PCS circuitry is only used when the link interface is configured for TBI mode and it is bypassed in internal PHY modes.

1. Not applicable to the 82541ER.

2. Not applicable to the 82544GC/EI, 82540EP/EM, 82541xx, and 82547GI/EI.

(82546GB/EB and 82545GM/EM)/Ten Bit Interface (TBI)2 for the

12 Software Developer’s Manual

Architectural Overview

Note: Refer to the Extended Device Control Register (bits 23:22) for mode selection (see Section 13.4.6).

The link can be configured by several methods. Software can force the link setting to AutoNegotiation by setting either the MAC in TBI 82545GM/EM), or the PHY in internal PHY mode.

The speed of the link in internal PHY mode can be determined by several methods:

mode (internal SerDes for the 82546GB/EB and

• Auto speed detection based on the receive clock signal generated by the PHY.

• Detection of the PHY link speed indication.

• Software forcing the configuration of link speed.

2.3.6 10/100/1000 Ethernet Transceiver (PHY)

The Ethernet controller provides a full high-performance, integrated transceiver for 10/100/ 1000 Mb/s data communication. The physical layer (PHY) blocks are 802.3 compliant and capable of operating in half-duplex or full-duplex modes.

Highlights of the PHY blocks are as follows:

• Data stream serializers and encoders. Encoding techniques include Manchester, 4B/5B and

4D/PAM5. These blocks also perform data scrambling for 100/1000 Mb/s transmission as a technique to minimize radiated Electromagnetic Interference (EMI).

• A multi-mode transmit digital to analog converter, which produces filtered waveforms

appropriate for the 10BASE-T, 100BASE-TX or 1000BASE-T Ethernet standards.

• Receiver Analog-to-Digital Converter (ADC). The ADC uses a 125 MHz sampling rate.

• Receiver decoders. These blocks perform the inverse operations of serializers, encoders and

scramblers.

• Active hybrid and echo canceller blocks. The active hybrid and echo canceller blocks reduce

the echo effect of transmitting and receiving simultaneously on the same analog pairs.

• NEXT canceller. This unit removes high frequency Near End Crosstalk induced among

adjacent signal pairs.

• Additional wave shaping and slew rate control circuitry to reduce EMI.

Because the Ethernet controller is IEEE-compliant, the PHY blocks communicate with the MAC blocks through an internal GMII/MII bus operating at clock speeds of 2.5 MHz up to 125 MHz.

The Ethernet controller also uses an IEEE-compliant internal Management Data interface to communicate control and status information to the PHY.

2.3.7 EEPROM Interface

The PCI/PCI-X Family of Gigabit Ethernet Controllers provide a four-wire direct interface to a serial EEPROM device such as the 93C46 or compatible for storing product configuration information. Several words of the data stored in the EEPROM are automatically accessed by the Ethernet controller, after reset, to provide pre-boot configuration data to the Ethernet controller before it is accessible by the host software. The remainder of the stored information is accessed by various software modules to report product configuration, serial number and other parameters.

Software Developer’s Manual 13

Architectural Overview

2.3.8 FLASH Memory Interface

The Ethernet controller provides an external parallel interface to a FLASH device. Accesses to the FLASH are controlled by the Ethernet controller and are accessible to software as normal PCI reads or writes to the FLASH memory mapping area. The Ethernet controller supports FLASH devices with up to 512 KB of memory.

Note: The 82540EP/EM provides an external interface to a serial FLASH or Boot EEPROM device. See

Appendix B for more information.

2.4 DMA Addressing

In appropriate systems, all addresses mastered by the Ethernet controller are 64 bits in order to support systems that have larger than 32-bit physical addressing. Providing 64-bit addresses eliminates the need for special segment registers.

Note: The PCI 2.2 or 2.3 Specification requires that any 64-bit address whose upper 32 bits are all 0b

appear as a 32-bit address cycle. The Ethernet controller complies with the PCI 2.2 or 2.3 Specification.

PCI is little-endian; however, not all processors in systems using PCI treat memory as little-endian. Network data is fundamentally a byte stream. As a result, it is important that the processor and Ethernet controller agree about the representation of memory data. The default is little-endian mode.

Descriptor accesses are not byte swapped.

The following example illustrates data-byte ordering for little endian. Bytes for a receive packet arrive in the order shown from left to right.

01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e

Example 2-1. Byte Ordering

There are no alignment restrictions on packet-buffer addresses. The byte address for the major words is shown on the left. The byte numbers and bit numbers for the PCI bus are shown across the top.

Table 2-1. Little Endian Data Ordering

63 0

76543210

Byte

Address

00807060504030201

810 0f 0e0d0c0b0a09

10 18 17 16 15 14 13 12 11

18 20 1f 1e 1d 1c 1b 1a 19

14 Software Developer’s Manual

2.5 Ethernet Addressing

Several registers store Ethernet addresses in the Ethernet controller. Two 32-bit registers make up the address: one is called “high”, and the other is called “low”. For example, the Receive Address Register is comprised of Receive Address High (RAH) and Receive Address Low (RAL). The least significant bit of the least significant byte of the address stored in the register (for example, bit 0 of RAL) is the multicast bit. The LS byte is the first byte to appear on the wire. This notation applies to all address registers, including the flow control registers.

Figure 2-5 shows the bit/byte addressing order comparison between what is on the wire and the

values in the unique receive address registers.

Preamble & SFD Destination Address Source Address

...55 D5 00 11 22 33 ...XXX00 AA

Architectural Overview

Bit 0 of this byte is first on the wire

Destination address stored internally as shown here

33...

2233 00AA0011

001122

00AA

dest_addr[0]

Multicast bit

Figure 2-5. Example of Address Byte Ordering

The address byte order numbering shown in Figure 2-5 maps to Table 2-2. Byte #1 is first on the wire.

Table 2-2. Intel® Architecture Byte Ordering

IA Byte # 1 (LSB) 2 3 4 5 6 (MSB)

Byte Value (Hex) 00 AA 00 11 22 33

Note: The notation in this manual follows the convention shown in Table 2-2. For example, the address in

Table 2-2 indicates 00_AA_00_11_22_33h, where the first byte (00h_) is the first byte on the wire,

with bit 0 of that byte transmitted first.

Software Developer’s Manual 15

Architectural Overview

2.6 Interrupts

The Ethernet controller provides a complete set of interrupts that allow for efficient software management. The interrupt structure is designed to accomplish the following:

• Make accesses “thread-safe” by using ‘set’ and ‘clear-on-read’ rather than ‘read-modify-write’

operations.

• Minimize the number of interrupts needed relative to work accomplished.

• Minimize the processing overhead associated with each interrupt.

Intel accomplished the first goal by an interrupt logic consisting of four interrupt registers. More detail about these registers is given in sections 13.4.17 through 13.4.21.

• Interrupt Cause ‘Set’ and ‘Read’ Registers

The Read register records the cause of the interrupt. All bits set at the time of the read are autocleared. The cause bit is set for each bit written as a 1b in the Set register. If there is a race between hardware setting a cause and software clearing an interrupt, the bit remains set. No race condition exists on writing the Set register. A ‘set’ provides for software posting of an interrupt. A ‘read’ is auto-cleared to avoid expensive write operations. Most systems have write buffering, which minimizes overhead, but typically requires a read operation to guarantee that the write operation has been flushed from the posted buffers. Without autoclear, the cost of clearing an interrupt can be as high as two reads and one write.

• Interrupt Mask ‘Set’ (Read) and ‘Clear’ Registers

Interrupts appear on PCI only if the interrupt cause bit is a 1b, and the corresponding interrupt mask bit is a 1b. Software can block assertion of the interrupt wire by clearing the bit in the mask register. The cause bit stores the interrupt event regardless of the state of the mask bit. The Clear and Set operations make this register more “thread-safe” by avoiding a ‘readmodify-write’ operation on the mask register. The mask bit is set to a 1b for each bit written in the Set register, and cleared for each bit written in the Clear register. Reading the Set register returns the current value.

Intel accomplished the second goal (minimizing interrupts) by three actions:

• Reducing the frequency of all interrupts (see Section 13.4.17). Not applicable to the

82544GC/EI.

• Accepting multiple receive packets before signaling an interrupt (see Section 3.2.3)

• Eliminating (or at least reducing) the need for interrupts on transmit (see Section 3.2.7)

The third goal is accomplished by having one interrupt register consolidate all interrupt information. This eliminates the need for multiple accesses.

Note that the Ethernet controller also supports Message Signaled Interrupts as defined in the PCI

2.2, 2.3, and PCI-X specifications. See Section 4.1.3.1 for details.

16 Software Developer’s Manual

2.7 Hardware Acceleration Capability

The Ethernet controller provides the ability to offload IP, TCP, and UDP checksum for transmit. The functionality provided by these features can significantly reduce processor utilization by shifting the burden of the functions from the driver to the hardware.

The checksum offloading feature is briefly outlined in the following sections. More detail about all of the hardware acceleration capabilities is provided in Section 3.2.9.

2.7.1 Checksum Offloading

The Ethernet controller provides the ability to offload the IP, TCP, and UDP checksum requirements from the software device driver. For common frame types, the hardware automatically calculates, inserts, and checks the appropriate checksum values normally handled by software.

For transmits, every Ethernet packet might have two checksums calculated and inserted by the Ethernet controller. Typically, these would be the IP checksum, and either the TCP or UDP checksum. The software device driver specifies which portions of the packet are included in the checksum calculations, and where the calculated values are inserted via descriptors (refer to

Section 3.3.5 for details).

Architectural Overview

For receives, the hardware recognizes the packet type and performs the checksum calculations and error checking automatically. Checksum and error information is provided to software through the receive descriptors (refer to Section 3.2.9 for details).

2.7.2 TCP Segmentation

The Ethernet controller implements a TCP segmentation capability for transmits that allows the software device driver to offload packet segmentation and encapsulation to the hardware. The software device driver can send the Ethernet controller the entire IP, TCP or UDP message sent down by the Network Operating System (NOS) for transmission. The Ethernet controller segments the packet into legal Ethernet frames and transmit them on the wire. By handling the segmentation tasks, the hardware alleviates the software from handling some of the framing responsibilities. This reduces the overhead on the CPU for the transmission process thus reducing overall CPU utilization. See Section 3.5 for details.

2.8 Buffer and Descriptor Structure

Software allocates the transmit and receive buffers, and also forms the descriptors that contain pointers to, and the status of, those buffers. A conceptual ownership boundary exists between the driver software and the hardware of the buffers and descriptors. The software gives the hardware ownership of a queue of buffers for receives. These receive buffers store data that the software then owns once a valid packet arrives.

For transmits, the software maintains a queue of buffers. The driver software owns a buffer until it is ready to transmit. The software then commits the buffer to the hardware; the hardware then owns the buffer until the data is loaded or transmitted in the transmit FIFO.

Software Developer’s Manual 17

Architectural Overview

Descriptors store the following information about the buffers:

• The physical address

• The length

• Status and command information about the referenced buffer

Descriptors contain an end-of-packet field that indicates the last buffer for a packet. Descriptors also contain packet-specific information indicating the type of packet, and specific operations to perform in the context of transmitting a packet, such as those for VLAN or checksum offload.

Section 3 provides detailed information about descriptor structure and operation in the context of

packet transmission and reception.

18 Software Developer’s Manual

Receive and Transmit Description

Receive and Transmit Description 3

3.1 Introduction

This section describes the packet reception, packet transmission, transmit descriptor ring structure, TCP segmentation, and transmit checksum offloading for the PCI/PCI-X Family of Gigabit Ethernet Controllers.

Note: The 82544GC/EI does not support IPv6.

3.2 Packet Reception

In the general case, packet reception consists of recognizing the presence of a packet on the wire, performing address filtering, storing the packet in the receive data FIFO, transferring the data to a receive buffer in host memory, and updating the state of a receive descriptor.

3.2.1 Packet Address Filtering

Hardware stores incoming packets in host memory subject to the following filter modes. If there is insufficient space in the receive FIFO, hardware drops them and indicates the missed packet in the appropriate statistics registers.

The following filter modes are supported:

• Exact Unicast/Multicast — The destination address must exactly match one of 16 stored

addresses. These addresses can be unicast or multicast.

• Promiscuous Unicast — Receive all unicasts.

• Multicast — The upper bits of the incoming packet’s destination address index a bit vector

that indicates whether to accept the packet; if the bit in the vector is one, accept the packet, otherwise, reject it. The controller provides a 4096 bit vector. Software provides four choices of which bits are used for indexing. These are [47:36], [46:35], [45:34], or [43:32] of the internally stored representation of the destination address.

• Promiscuous Multicast — Receive all multicast packets.

• VLAN — Receive all VLAN

in the VLAN filter table. A detailed discussion and explanation of VLAN packet filtering is contained in Section 9.3.

Normally, only good packets are received. These are defined as those packets with no CRC error, symbol error, sequence error, length error, alignment error, or where carrier extension or receive errors are detected. However, if the store–bad–packet bit is set in the Device Control register (RCTL.SBP), then bad packets that pass the filter function are stored in host memory. Packet errors are indicated by error bits in the receive descriptor (RDESC.ERRORS). It is possible to receive all packets, regardless of whether they are bad, by setting the promiscuous enables (RCTL.UPE/MPE) and the store–bad–packet bit (RCTL.SBP).

packets that are for this station and have the appropriate bit set

1. Not applicable to the 82541ER.

Software Developer’s Manual 19

Receive and Transmit Description

If manageability is enabled and if RCMCP is enabled then ARP request packets can be directed over the SMBus or processed internally by the ASF controller rather than delivered to host memory (not applicable to the 82544GC/EI or 82541ER.

3.2.2 Receive Data Storage

Memory buffers pointed to by descriptors store packet data. Hardware supports seven receive buffer sizes:

• 256 B • 4096 B

• 512 B • 8192 B

• 1024 B • 16384 B

• 2048 B

Buffer size is selected by bit settings in the Receive Control register (RCTL.BSIZE & RCTL.BSEX). See Section 13.4.22 for details.

The Ethernet controller places no alignment restrictions on packet buffer addresses. This is desirable in situations where the receive buffer was allocated by higher layers in the networking software stack, as these higher layers may have no knowledge of a specific Ethernet controller’s buffer alignment requirements.

Although alignment is completely unrestricted, it is highly recommended that software allocate receive buffers on at least cache-line boundaries whenever possible.

3.2.3 Receive Descriptor Format

A receive descriptor is a data structure that contains the receive data buffer address and fields for hardware to store packet information. Table 3-1 lists where the shaded areas indicate fields that are modified by hardware upon packet reception.

Table 3-1. Receive Descriptor (RDESC) Layout

63 48 47 40 39 32 31 16 15 0

0 Buffer Address [63:0]

82544GC/EI only

0 Buffer Address [63:0]

8 Reserved

Special Errors Status

63 48 47 40 39 32 31 16 15 0

Packet Checksum

(See Note)

Errors Status Reserved Length

Length

Note: The checksum indicated here is the unadjusted “16 bit ones complement” of the packet. A software

assist may be required to back out appropriate information prior to sending it to upper software

20 Software Developer’s Manual

layers. The packet checksum is always reported in the first descriptor (even in the case of multidescriptor packets).

Upon receipt of a packet for Ethernet controllers, hardware stores the packet data into the indicated buffer and writes the length, Packet Checksum, status, errors, and status fields. Length covers the data written to a receive buffer including CRC bytes (if any). Software must read multiple descriptors to determine the complete length for packets that span multiple receive buffers.

For standard 802.3 packets (non-VLAN) the Packet Checksum is by default computed over the entire packet from the first byte of the DA through the last byte of the CRC, including the Ethernet and IP headers. Software may modify the starting offset for the packet checksum calculation by means of the Receive Control Register. This register is described in Section 13.4.22. To verify the TCP checksum using the Packet Checksum, software must adjust the Packet Checksum value to back out the bytes that are not part of the true TCP Checksum.

3.2.3.1 Receive Descriptor Status Field

Status information indicates whether the descriptor has been used and whether the referenced buffer is the last one for the packet. Refer to Table 3-2 for the layout of the status field. Error status information is shown in Table 3-3.

For multi-descriptor packets, packet status is provided in the final descriptor of the packet (EOP set). If EOP is not set for a descriptor, only the Address, Length, and DD bits are valid.

Receive and Transmit Description

Table 3-2. Receive Status (RDESC.STATUS) Layout

7 6 5 4 3 2 1 0

PIF IPCS TCPCS RSV VP IXSM EOP DD

Receive

Descriptor Status

Bits

PIF (bit 7)

IPCS (bit 6)

Passed in-exact filter Hardware supplies the PIF field to expedite software processing of packets.

Software must examine any packet with PIF set to determine whether to accept the packet. If PIF is clear, then the packet is known to be for this station, so software need not look at the packet contents. Packets passing only the Multicast Vector has PIF set.

IP Checksum Calculated on Packet When Ignore Checksum Indication is deasserted (IXSM = 0b), IPCS bit indicates

whether the hardware performed the IP checksum on the received packet. 0b = Do not perform IP checksum 1b = Perform IP checksum Pass/Fail information regarding the checksum is indicated in the error bit (IPE) of

the descriptor receive errors (RDESC.ERRORS) IPv6 packets do not have the IPCS bit set. Reads as 0b.

Description

Software Developer’s Manual 21

Receive and Transmit Description

Receive

Descriptor Status

Bits

TCP Checksum Calculated on Packet When Ignore Checksum Indication is deasserted (IXSM = 0b), TCPCS bit

indicates whether the hardware performed the TCP/UDP checksum on the received packet.

TCPCS (bit 5)

RSV (bit 4)

VP (bit 3)

IXSM (bit 2)

EOP (bit 1)

DD (bit 0)

0b = Do not perform TCP/UDP checksum; 1b = Perform TCP/UDP checksum Pass/Fail information regarding the checksum is indicated in the error bit (TCPE)

of the descriptor receive errors (RDESC.ERRORS). IPv6 packets may have this bit set if the TCP/UDP packet was recognized. Reads as 0b.

Reserved Reads as 0b.

Packet is 802.1Q (matched VET) Indicates whether the incoming packet’s type matches VET (i.e., if the packet is

a VLAN (802.1q) type). It is set if the packet type matches VET and CTRL.VME is set. For a further description of 802.1q VLANs, see Chapter 9.

Reads as 0b.

Ignore Checksum Indication When IXSM = 1b, the checksum indication results (IPCS, TCPCS bits) should be

ignored. When IXSM = 0b the IPCS and TCPCS bits indicate whether the hardware

performed the IP or TCP/UDP checksum(s) on the received packet. Pass/Fail information regarding the checksum is indicated in the status bits as described below for IPE and TCPE.

Reads as 1b.

End of Packet EOP indicates whether this is the last descriptor for an incoming packet.

Descriptor Done Indicates whether hardware is done with the descriptor. When set along with

EOP, the received packet is complete in main memory.

Description

Note: See Table 3-5 for a description of supported packet types for receive checksum offloading.

Unsupported packet types either have the IXSM bit set, or they don’t have the TCPCS bit set.

3.2.3.2 Receive Descriptor Errors Field

Most error information appears only when the Store Bad Packets bit (RCTL.SBP) is set and a bad packet is received. Refer to Table 3-3 for a definition of the possible errors and their bit positions.

The error bits are valid only when the EOP and DD bits are set in the descriptor status field (RDESC.STATUS)

22 Software Developer’s Manual

Receive and Transmit Description

Table 3-3. Receive Errors (RDESC.ERRORS) Layout

76 5 4321 0

RXE IPE TCPE

a. 82544GC/EI only. b. 82541xx, 82547GI/EI, and 82540EP/EM only.

RSV

CXE

RSV

Receive

Descriptor Error

bits

RX Data Error Indicates that a data error occurred during the packet reception. A data error in TBI

RXE (bit 7)

mode (82544GC/EI)/internal SerDes (82546GB/EB and 82545GM/EM) refers to the reception of a /V/ code (see Section 8.2.1.3). In GMII or MII mode, the assertion of I_RX_ER during data reception indicates a data error. This bit is valid only when the EOP and DD bits are set; it is not set in descriptors unless RCTL.SBP (Store Bad Packets) control bit is set.

IP Checksum Error When set, indicates that IP checksum error is detected in the received packet. Valid

only when the IP checksum is performed on the receive packet as indicated via the

IPE (bit 6)

IPCS bit in the RDESC.STATUS field. If receive IP checksum offloading is disabled (RXCSUM.IPOFL), the IPE bit is set to

0b. It has no effect on the packet filtering mechanism. Reads as 0b.

TCP/UDP Checksum Error When set, indicates that TCP/UDP checksum error is detected in the received

packet. Valid only when the TCP/UDP checksum is performed on the receive packet as

TCPE (bit 5)

indicated via TCPCS bit in RDESC.STATUS field. If receive TCP/UDP checksum offloading is disabled (RXCSUM.TUOFL), the TCPE

bit is set to 0b. It has no effect on the packet filtering mechanism. Reads as 0b.

Carrier Extension Error When set, indicates a packet was received in which the carrier extension error was

CXE RSV (bit 4)

signaled across the GMII interface. A carrier extension error is signaled by the PHY by the encoding of 1Fh on the receive data inputs while I_RX_ER is asserted.

Valid only while working in 1000 Mb/s half-duplex mode of operation. This bit is reserved for all Ethernet controllers except the 82544GC/EI.

RSV (Bit 3)

Reserved Reads as 0b.

SEQ

RSV

Description

Software Developer’s Manual 23

Receive and Transmit Description

Receive

Descriptor Error

bits

Sequence Error When set, indicates a received packet with a bad delimiter sequence (in TBI mode/

internal SerDes). In other 802.3 implementations, this would be classified as a

SEQ (bit 2)

SE (bit 1)

CE (bit 0)

a. Not applicable to the 82540EP/EM, 82541xx, or 82547GI/EI.

framing error. A valid delimiter sequence consists of: idle →start-of-frame (SOF) → data, →pad (optional) → end-of-frame (EOF) → fill

(optional) → idle.

Symbol Error When set, indicates a packet received with bad symbol. Applicable only in TBI mode/

internal SerDes.

CRC Error or Alignment Error CRC errors and alignment errors are both indicated via the CE bit. Software may

distinguish between these errors by monitoring the respective statistics registers.

3.2.3.3 Receive Descriptor Special Field

Description

Hardware stores additional information in the receive descriptor for 802.1q packets. If the packet type is 802.1q, determined when a packet type field matches the VLAN

Ethernet Register (VET) and RCTL.VME = 1b, then the special field records the VLAN information and the four byte VLAN information is stripped from the packet data storage. The Ethernet controller stores the Tag Control Information (TCI) of the 802.1q tag in the Special field. Otherwise, the special field contains 0000h.

Table 3-4. Special Descriptor Field Layout

802.1q Packets

15 13 12 11 0

PRI CFI VLAN

All Other Packets

15 8 7 0

00 00

Receive

Descriptor

Special Field

VLAN

CFI

PRI

VLAN Identifier 12 bits that records the packet VLAN ID number

Canonical Form Indicator 1 bit that records the packet’s CFI VLAN field

User Priority 3 bits that records the packet’s user priority field.

Description

1. Not applicable to the 82541ER.

24 Software Developer’s Manual

3.2.4 Receive Descriptor Fetching

The descriptor fetching strategy is designed to support large bursts across the PCI bus. This is made possible by using 64 on-chip receive descriptors and an optimized fetching algorithm. The fetching algorithm attempts to make the best use of PCI bandwidth by fetching a cache line (or more) descriptors with each burst. The following paragraphs briefly describe the descriptor fetch algorithm and the software control provided.

When the on-chip buffer is empty, a fetch happens as soon as any descriptors are made available (software writes to the tail pointer). When the on-chip buffer is nearly empty (RXDCTL.PTHRESH), a prefetch is performed whenever enough valid descriptors (RXDCTL.HTHRESH) are available in host memory and no other PCI activity of greater priority is pending (descriptor fetches and write-backs or packet data transfers).

When the number of descriptors in host memory is greater than the available on-chip descriptor storage, the chip may elect to perform a fetch which is not a multiple of cache line size. The hardware performs this non-aligned fetch if doing so results in the next descriptor fetch being aligned on a cache line boundary. This mechanism provides the highest efficiency in cases where fetches fall behind software.

Note: The Ethernet controller never fetches descriptors beyond the descriptor TAIL pointer.

Receive and Transmit Description

No No

Yes

Valid descriptors

in host memory >

RXDCTL.HTHRESH

Yes Yes

Pre-fetch (based

on PCI priority)

On-chip

descriptor cache

is empty

On-chip

descriptor cache <

RDXCTL.PTHRESH

Figure 3-1. Receive Descriptor Fetching Algorithm

Yes

Descriptors

are available in

host memory

Fetch

Software Developer’s Manual 25

Receive and Transmit Description

3.2.5 Receive Descriptor Write-Back

Processors have cache line sizes that are larger than the receive descriptor size (16 bytes). Consequently, writing back descriptor information for each received packet would cause expensive partial cache line updates. Two mechanisms minimize the occurrence of partial line write backs:

• Receive descriptor packing

• Null descriptor padding

The following sections explain these mechanisms.

3.2.5.1 Receive Descriptor Packing

To maximize memory efficiency, receive descriptors are “packed” together and written as a cache line whenever possible. Descriptors accumulate and are written out in one of three conditions:

• RXDCTL.WTHRESH descriptors have been used (the specified max threshold of unwritten

used descriptors has been reached)

• The receive timer expires (RADV or RDTR)

• Explicit software flush (RDTR.FPD)

For the first condition, if the number of descriptors specified by RXDCTL.WTHRESH are used, they are written back, regardless of cacheline alignment. It is therefore recommended that WTHRESH be a multiple of cacheline sizes.

In the second condition, a timer (RDTR or RADV) expiration causes all used descriptors to be written back prior to initiating an interrupt.

In the second condition for the 82544GC/EI, a timer (RDTR) is included to force timely write– back of descriptors. The first packet after timer initialization starts the timer. Timer expiration flushes any accumulated descriptors and sets an interrupt event (receiver timer interrupt). In general, the arrival rate is sufficiently fast enough that packing is the common case under load.

For the final condition, software may explicitly flush accumulated descriptors by writing the timer register with the high order bit set.

3.2.5.2 Null Descriptor Padding

Hardware stores no data in descriptors with a null data address. Software can make use of this property to cause the first condition under receive descriptor packing to occur early. Hardware writes back null descriptors with the DD bit set in the status byte and all other bits unchanged.

3.2.6 Receive Descriptor Queue Structure

Figure 3-2 shows the structure of the receive descriptor ring. Hardware maintains a circular ring of

descriptors and writes back used descriptors just prior to advancing the head pointer. Head and tail pointers wrap back to base when “size” descriptors have been processed.

Software adds receive descriptors by writing the tail pointer with the index of the entry beyond the last valid descriptor. As packets arrive, they are stored in memory and the head pointer is incremented by hardware. When the head pointer is equal to the tail pointer, the ring is empty. Hardware stops storing packets in system memory until software advances the tail pointer, making more receive buffers available.

26 Software Developer’s Manual

Receive and Transmit Description

The receive descriptor head and tail pointers reference 16-byte blocks of memory. Shaded boxes in the figure represent descriptors that have stored incoming packets but have not yet been recognized by software. Software can determine if a receive buffer is valid by reading descriptors in memory rather than by I/O reads. Any descriptor with a non-zero status byte has been processed by the hardware, and is ready to be handled by the software.

Circular Buffer Queues

Base

Head

Owned By Hardware

Base + Size

Receive

Queue

Tail

Figure 3-2. Receive Descriptor Ring Structure

Note: The head pointer points to the next descriptor that is written back. At the completion of the

descriptor write-back operation, this pointer is incremented by the number of descriptors written back. HARDWARE OWNS ALL DESCRIPTORS BETWEEN [HEAD AND TAIL]. Any descriptor not in this range is owned by software.

The receive descriptor ring is described by the following registers:

• Receive Descriptor Base Address registers (RDBAL and RDBAH)

These registers indicate the start of the descriptor ring buffer. This 64-bit address is aligned on a 16-byte boundary and is stored in two consecutive 32-bit registers. RDBAL contains the lower 32-bits; RDBAH contains the upper 32 bits. Hardware ignores the lower 4 bits in RDBAL.

• Receive Descriptor Length register (RDLEN)

This register determines the number of bytes allocated to the circular buffer. This value must be a multiple of 128 (the maximum cache line size). Since each descriptor is 16 bytes in length, the total number of receive descriptors is always a multiple of 8.

• Receive Descriptor Head register (RDH)

This register holds a value that is an offset from the base, and indicates the in–progress descriptor. There can be up to 64K descriptors in the circular buffer. Hardware maintains a shadow copy that includes those descriptors completed but not yet stored in memory.

Software Developer’s Manual 27

Receive and Transmit Description

• Receive Descriptor Tail register (RDT)

This register holds a value that is an offset from the base, and identifies the location beyond the last descriptor hardware can process. Note that tail should still point to an area in the descriptor ring (somewhere between RDBA and RDBA + RDLEN). This is because tail points to the location where software writes the first new descriptor.

If software statically allocates buffers, and uses memory read to check for completed descriptors, it simply has to zero the status byte in the descriptor to make it ready for reuse by hardware. This is not a hardware requirement (moving the hardware tail pointer is), but is necessary for performing an in–memory scan.

3.2.7 Receive Interrupts

The Ethernet controller can generate four receive-related interrupts:

• Receiver Timer Interrupt (ICR.RXT0)

• Small Receive Packet Detect (ICR.SRPD)

• Receive Descriptor Minimum Threshold (ICR.RXDMT0)

• Receiver FIFO Overrun (ICR.RX0)

3.2.7.1 Receive Timer Interrupt

The Receive Timer Interrupt is used to signal most packet reception events (the Small Receive Packet Detect interrupt is also used in some cases as described later in this section). In order to minimize the interrupts per work accomplished, the Ethernet controller provides two timers to control how often interrupts are generated.

3.2.7.1.1 Receive Interrupt Delay Timer / Packet Timer (RDTR)

The Packet Timer minimizes the number of interrupts generated when many packets are received in a short period of time. The packet timer is started once a packet is received and transferred to host memory (specifically, after the last packet data byte is written to memory) and is reinitialized (to the value defined in RDTR) and started EACH TIME a new packet is received and transferred to the host memory. When the Packet Timer expires (e.g. no new packets have been received and transferred to host memory for the amount of time defined in RDTR) the Receive Timer Interrupt is generated.

Setting the Packet Timer to 0b disables both the Packet Timer and the Absolute Timer (described below) and causes the Receive Timer Interrupt to be generated whenever a new packet has been stored in memory.

Writing to RDTR with its high order bit (FPD) set forces an explicit writeback of consumed descriptors (potentially a partial cache lines amount of descriptors), causes an immediate expiration of the Packet Timer and generates a Receive Timer Interrupt.

The Packet Timer is reinitialized (but not started) when the Receive Timer Interrupt is generated due to an Absolute timer expiration or Small Receive Packet Detect Interrupt.

See section Section 13.4.30 for more details on the Packet Timer.

28 Software Developer’s Manual

Initial State

Idle

Receive and Transmit Description

Packet received &

transferred to host

memory

Restart Count

Generate

Int

Other receive

timer interrupt

Running

Timer expires

Restart Count

Packet received & transferred to host

memory

Figure 3-3. Packet Delay Timer Operation (State Diagram)

3.2.7.1.2 Receive Interrupt Absolute Delay Timer (RADV)

The Absolute Timer ensures that a receive interrupt is generated at some predefined interval after the first packet is received. The absolute timer is started once a packet is received and transferred to host memory (specifically, after the last packet data byte is written to memory) but is NOT reinitialized / restarted each time a new packet is received. When the Absolute Timer expires (no receive interrupt has been generated for the amount of time defined in RADV) the Receive Timer Interrupt is generated.

Setting RADV to 0b or RDTR to 0b disables the Absolute Timer. To disable the Packet Timer only, RDTR should be set to RADV + 1b.

The Absolute Timer is reinitialized (but not started) when the Receive Timer Interrupt is generated due to a Packet Timer expiration or Small Receive Packet Detect Interrupt.

Software Developer’s Manual 29

Receive and Transmit Description

The diagrams below show how the Packet Timer and Absolute Timer can be used together:

Case A: Using only an absolute timer

A bsolute Timer Value

PKT #1 PKT #2 PKT #3 PKT #4

Case B: Using an absolute time in conjunction with the Packet timer

A bsolute Timer Value

PKT #1 PKT #2 PKT #3 PKT #4

1) Pa cket tim er ex pires

2) Inte rrupt g ener ated

3) Ab solute tim er reset

Case C: Packet timer expiring while a packet is transferred to host memory.

Illustrate s that p acke t timer is re-star ted on ly after a pac ket is tra nsferr ed to h ost m em ory.

A bsolute Timer Value

PKT #1 PKT #2 PKT #3 PKT #4

1) Pa cket tim er ex pires

2) Inte rrupt g ener ated

3) Ab solute tim er reset

PKT #5 PKT #6 ... ... ...

Interrupt generated due to PKT #1

A bsolute Timer Value

Interrupt generalted (due to PKT #4) as ab solute timer e xpire s. Packet delay timer disabled untill next packet is received and transferred to host memory.

A bsolute Timer Value

Interrupt generalted (due to PKT #4) as ab solute timer e xpire s. Packet delay timer disabled untill next packet is received and transferred to host memory.

3.2.7.2 Small Receive Packet Detect

A Small Receive Packet Detect interrupt (ICR.SRPD) is asserted when small-packet detection is enabled (RSRPD is set with a non-zero value) and a packet of (size ≤ RSRPD.SIZE) has been transferred into the host memory. When comparing the size the headers and CRC are included (if CRC stripping is not enabled). CRC and VLAN headers are not included if they have been stripped. A receive timer interrupt cause (ICR.RXT0) is also noted when the Small Packet Detect interrupt occurs.

For the 82541xx and 82547GI/EI, receiving a small packet does not clear the absolute or packet delay timers, so one packet might generate two interrupts, one due to small packet reception and one due to timer expiration.

30 Software Developer’s Manual

Receive and Transmit Description

3.2.7.3 Receive Descriptor Minimum Threshold (ICR.RXDMT)

The minimum descriptor threshold helps avoid descriptor under-run by generating an interrupt when the number of free descriptors becomes equal to the minimum amount defined in RCTL.RDMTS (measured as a fraction of the receive descriptor ring size).

3.2.7.4 Receiver FIFO Overrun

FIFO overrun occurs when hardware attempts to write a byte to a full FIFO. An overrun could indicate that software has not updated the tail pointer to provide enough descriptors/buffers, or that the PCI bus is too slow draining the receive FIFO. Incoming packets that overrun the FIFO are dropped and do not affect future packet reception.

3.2.8 82544GC/EI Receive Interrupts

The presence of new packets is indicated by the following:

• Absolute timer (RDTR) — A predetermined amount of time has elapsed since the first packet

received after the hardware timer was written (specifically, after the last packet data byte was written to memory); this also flushes any accumulated descriptors to memory. Software can set the timer value to 0b if it wants to be notified each time a new packet has been stored in memory.

Writing the absolute timer with its high order bit 1 forces an explicit flush of any partial cache lines. Hardware writes all used descriptors to memory and updates the globally visible value of the head pointer.

In addition, hardware provides the following interrupts:

• Receive Descriptor Minimum Threshold (ICR.RXDMT)

The minimum descriptor threshold helps avoid descriptor underrun by generating an interrupt when the number of free descriptors becomes equal to the minimum. It is measured as a fraction of the receive descriptor ring size.

• Receiver FIFO Overrun (ICR.RXO)

3.2.9 Receive Packet Checksum Offloading

The Ethernet controller supports the offloading of three receive checksum calculations: the Packet Checksum, the IP Header Checksum, and the TCP/UDP Checksum.

Note: IPv6 packets do not have IP checksums.

Software Developer’s Manual 31

Receive and Transmit Description

The Packet checksum is the one’s complement over the receive packet, starting from the byte indicated by RXCSUM.PCSS (0b corresponds to the first byte of the packet), after stripping. For example, for an Ethernet II frame encapsulated as an 802.3ac VLAN packet and with RXCSUM.PCSS set to 14 decimal, the Packet Checksum would include the entire encapsulated frame, excluding the 14-byte Ethernet header (DA,SA,Type/Length) and the 4-byte q-tag. The Packet checksum does not include the Ethernet CRC if the RCTL.SECRC bit is set.

Software must make the required offsetting computation (to back out the bytes that should not have been included and to include the pseudo-header) prior to comparing the Packet Checksum against the TCP checksum stored in the packet.

For supported packet/frame types, the entire checksum calculation may be offloaded to the Ethernet controller. If RXCSUM.IPOFLD is set to 1b, the controller calculates the IP checksum and indicates a pass/fail condition to software by means of the IP Checksum Error bit (RDESC.IPE) in the ERROR field of the receive descriptor. Similarly, if the RXCSUM.TUOFLD is set to 1b, the Ethernet controller calculates the TCP or UDP checksum and indicates a pass/fail condition to software by means of the TCP/UDP Checksum Error bit (RDESC.TCPE). These error bits are valid when the respective status bits indicate the checksum was calculated for the packet (RDESC.IPCS and RDESC.TCPCS).

If neither RXCSUM.IPOFLD nor RXCSUM.TUOFLD is set, the Checksum Error bits (IPE and TCPE) is 0b for all packets.

Supported Frame Types include:

• Ethernet II

• Ethernet SNAP

Note: See Table 3-6 for the 82544GC/EI supported receive checksum capabilities.

Table 3-5. Supported Receive Checksum Capabilities

Packet Type

IPv4 packets Yes Yes

IPv6 packets No (n/a) Yes

IPv6 packet with next header options: Hop-by-Hop options Destinations options Routing Fragment

IPv4 tunnels: IPv4 packet in an IPv4 tunnel IPv6 packet in an IPv4 tunnel

IPv6 tunnels: IPv4 packet in an IPv6 tunnel IPv6 packet in an IPv6 tunnel

Packet is an IPv4 fragment Yes No

Packet is greater than 1552 bytes; (LPE=1b)

Packet has 802.3ac tag Yes Yes

HW IP Checksum

Calculation

No (n/a) No (n/a) No (n/a) No (n/a)

No Yes ( I P v 4 )

No No

Ye s Yes

HW TCP/UDP Checksum

Yes Yes Yes No

Yes

No No

Calculation

32 Software Developer’s Manual

Receive and Transmit Description

Table 3-5. Supported Receive Checksum Capabilities

Packet Type

IPv4 Packet has IP options (IP header is longer than 20 bytes)

Packet has TCP or UDP options Yes Yes

IP header’s protocol field contains a protocol # other than TCP or UDP.

HW IP Checksum

Calculation

Yes Ye s

Yes N o

HW TCP/UDP Checksum

Calculation

a. The IPv6 header portion can include supported extension headers as described in the IPv6 Filter section. b.For the 82541xx and 82547GI/EI, frame sizes greater than 2 KB require full-duplex operation.

Table 3-6. 82544GC/EI Supported Receive Checksum Capabilities

Packet Type

IP v4 packets Yes Yes

IP v6 packets (no IP checksum in IPv6)

Packet is an IP fragment Yes No

Packet is greater than 1552 bytes; (LPE=1) Yes Yes

Packet has 802.3ac tag Yes Yes

Packet has IP options (IP header is longer than 20 bytes)

Packet has TCP or UDP options Yes Yes

IP header’s protocol field contains a protocol other than TCP or UDP.

Table 3-5 lists the general details about what packets are processed. In more detail, the packets are

passed through a series of filters (Section 3.2.9.1 through Section 3.2.9.5) to determine if a receive checksum is calculated.

Note: (Section 3.2.9.1 through Section 3.2.9.5) does not apply to the 82544GC/EI.

3.2.9.1 MAC Address Filter

HW IP Checksum

Calculation

No No

Yes Yes

Yes No

HW TCP/UDP

Checksum Calculation

This filter checks the MAC destination address to be sure it is valid (IA match, broadcast, multicast, etc.). The receive configuration settings determine which MAC addresses are accepted. See the various receive control configuration registers such as RCTL (RTCL.UPE, RCTL.MPE, RCTL.BAM), MTA, RAL, and RAH.

Software Developer’s Manual 33

Receive and Transmit Description

3.2.9.2 SNAP/VLAN Filter

This filter checks the next headers looking for an IP header. It is capable of decoding Ethernet II, Ethernet SNAP, and IEEE 802.3ac headers. It skips past any of these intermediate headers and looks for the IP header. The receive configuration settings determine which next headers are accepted. See the various receive control configuration registers such as RCTL (RCTL.VFE), VET, and VFTA.

3.2.9.3 IPv4 Filter

This filter checks for valid IPv4 headers. The version field is checked for a correct value (4). IPv4 headers are accepted if they are any size greater than or equal to 5 (dwords). If the IPv4 header is properly decoded, the IP checksum is checked for validity. The RXCSUM.IPOFL bit must be set for this filter to pass.

3.2.9.4 IPv6 Filter

This filter checks for valid IPv6 headers, which are a fixed size and have no checksum. The IPv6 extension headers accepted are: Hop-by-Hop, Destination Options, and Routing. The maximum size next header accepted is 16 dwords (64 bytes).

All of the IPv6 extension headers supported by the Ethernet controller have the same header structure:

Byte 0 Byte 1 Byte 2 Byte 3

Next Header Hdr Ext Len

• NEXT HEADER is a value that identifies the header type. The supported IPv6 next headers

values are:

— Hop-by-Hop = 00h

— Destination Options = 3Ch

— Routing = 2Bh

• HDR EXT LEN is the 8 byte count of the header length, not including the first 8 bytes. For

example, a value of 3 means that the total header size including the NEXT HEADER and HDR EXT LEN fields is 32 bytes (8 + 3*8).

— The RXCSUM.IPV6OFL bit must be set for this filter to pass.

3.2.9.5 UDP/TCP Filter

This filter checks for a valid UDP or TCP header. The prototype next header values are 11h and 06h, respectively. The RXCSUM.TUOFL bit must be set for this filter to pass.

3.3 Packet Transmission

The transmission process for regular (non-TCP Segmentation packets) involves:

• The protocol stack receives from an application a block of data that is to be transmitted.

34 Software Developer’s Manual

Receive and Transmit Description

• The protocol stack calculates the number of packets required to transmit this block based on

the MTU size of the media and required packet headers.

• For each packet of the data block:

— Ethernet, IP and TCP/UDP headers are prepared by the stack.

— The stack interfaces with the software device driver and commands the driver to send the

individual packet.

— The driver gets the frame and interfaces with the hardware.

— The hardware reads the packet from host memory (via DMA transfers).

— The driver returns ownership of the packet to the Network Operating System (NOS) when

the hardware has completed the DMA transfer of the frame (indicated by an interrupt).

Output packets are made up of pointer–length pairs constituting a descriptor chain (so called descriptor based transmission). Software forms transmit packets by assembling the list of pointer– length pairs, storing this information in the transmit descriptor, and then updating the on–chip transmit tail pointer to the descriptor. The transmit descriptor and buffers are stored in host memory. Hardware typically transmits the packet only after it has completely fetched all packet data from host memory and deposited it into the on-chip transmit FIFO. This permits TCP or UDP checksum computation, and avoids problems with PCI underruns.

3.3.1 Transmit Data Storage

Data are stored in buffers pointed to by the descriptors. Alignment of data is on an arbitrary byte boundary with the maximum size per descriptor limited only to the maximum allowed packet size (16288 bytes). A packet typically consists of two (or more) descriptors, one (or more) for the header and one for the actual data. Some software implementations copy the header(s) and packet data into one buffer and use only one descriptor per transmitted packet.

3.3.2 Transmit Descriptors

The Ethernet controller provides three types of transmit descriptor formats.

The original descriptor is referred to as the “legacy” descriptor format. The two other descriptor types are collectively referred to as extended descriptors. One of them is similar to the legacy descriptor in that it points to a block of packet data. This descriptor type is called the TCP/IP Data Descriptor and is a replacement for the legacy descriptor since it offers access to new offloading capabilities. The other descriptor type is fundamentally different as it does not point to packet data. It merely contains control information which is loaded into registers of the controller and affect the processing of future packets. The following sections describe the three descriptor formats.

The extended descriptor types are accessed by setting the TDESC.DEXT bit to 1b. If this bit is set, the TDESC.DTYP field is examined to control the interpretation of the remaining bits of the descriptor. Table 3-7 shows the generic layout for all extended descriptors. Fields marked as NR are not reserved for any particular function and are defined on a per-descriptor type basis. Notice that the DEXT and DTYP fields are non-contiguous in order to accommodate legacy mode operation. For legacy mode operation, bit 29 is set to 0b and the descriptor is defined in Section

3.3.3.

Software Developer’s Manual 35

Receive and Transmit Description

Table 3-7. Transmit Descriptor (TDESC) Layout

63 30 29 28 24 23 20 19 0

0 Buffer Address [63:0]

8 NR DEXT NR DTYP NR

3.3.3 Legacy Transmit Descriptor Format

To select legacy mode operation, bit 29 (TDESC.DEXT) should be set to 0b. In this case, the descriptor format is defined as shown in Table 3-8. The address and length must be supplied by software. Bits in the command byte are optional, as are the Checksum Offset (CSO), and Checksum Start (CSS) fields.

Table 3-8. Transmit Descriptor (TDESC) Layout – Legacy Mode

63 48 47 40 39 36 35 32 31 24 23 16 15 0

0 Buffer Address [63:0]

8 Special CSS RSV STA CMD CSO Length

Table 3-9. Transmit Descriptor Legacy Descriptions

Transmit Descriptor

Legacy

Buffer Address

Length

CSO

Buffer Address Address of the transmit descriptor in the host memory. Descriptors with a

null address transfer no data. If they have the RS bit in the command byte set (TDESC.CMD), then the DD field in the status word (TDESC.STATUS) is written when the hardware processes them.

Length is per segment. The maximum length associated with any single legacy descriptor is 16288

bytes. Although a buffer as short as one byte is allowed, the total length of the packet, before padding and CRC insertion must be at least 48 bytes. Length can be up to a default value of 16288 bytes per descriptor, and 16288 bytes total. In other words, the length of the buffer pointed to by one descriptor, or the sum of the lengths of the buffers pointed to by the descriptors can be as large as the maximum allowed transmit packet.

Descriptors with zero length transfer no data. If they have the RS bit in the command byte set (TDESC.CMD), then the DD field in the status word (TDESC.STATUS) is written when the hardware processes them.

Checksum Offset The Checksum offset field indicates where, relative to the start of the packet,

to insert a TCP checksum if this mode is enabled. (Insert Checksum bit (IC) is set in TDESC.CMD). Hardware ignores CSO unless EOP is set in TDESC.CMD. CSO is provided in unit of bytes and must be in the range of the data provided to the Ethernet controller in the descriptor. (CSO < length -

1). Should be written with 0b for future compatibility.

Description

36 Software Developer’s Manual

Receive and Transmit Description

Notes:

Transmit Descriptor

Legacy

CMD

STA

RSV

CSS

Special

Command field See Section 3.3.3.1 for a detailed field description.

Status field See Section 3.3.3.2 for a detailed field description.

Reserved Should be written with 0b for future compatibility.

Checksum Start Field The Checksum start field (TDESC.CSS) indicates where to begin computing

the checksum. The software must compute this offset to back out the bytes that should not be included in the TCP checksum. CSS is provided in units of bytes and must be in the range of data provided to the Ethernet controller in the descriptor (CSS < length). For short packets that ar padded by the software, CSS must be in the range of the unpadded data length. A value of 0b corresponds to the first byte in the packet.

CSS must be set in the first descriptor of the packet.

Special Field See the notes that follow this table for a detailed field description.

Description

1. Even though CSO and CSS are in units of bytes, the checksum calculation typically works on 16-bit words. Hardware does not enforce even byte alignment.

2. Hardware does not add the 802.1Q EtherType or the VLAN field following the 802.1Q EtherType to the checksum. So for VLAN packets, software can compute the values to back out only on the encapsulated packet rather than on the added fields.

3. Although the Ethernet controller can be programmed to calculate and insert TCP checksum using the legacy descriptor format as described above, it is recommended that software use the newer TCP/IP Context Transmit Descriptor Format. This newer descriptor format allows the hardware to calculate both the IP and TCP checksums for outgoing packets. See Section 3.3.5 for more information about how the new descriptor format can be used to accomplish this task.

Software Developer’s Manual 37

Receive and Transmit Description

3.3.3.1 Transmit Descriptor Command Field Format

The CMD byte stores the applicable command and has fields shown in Table 3-10.

Table 3-10. Transmit Command (TDESC.CMD) Layout

7 6 5 4 3 2 1 0

IDE VLE DEXT

a. 82544GC/EI only.

TDESC.CMD Description

Interrupt Delay Enable When set, activates the transmit interrupt delay timer. The Ethernet controller loads

a countdown register when it writes back a transmit descriptor that has RS and IDE set. The value loaded comes from the IDV field of the Interrupt Delay (TIDV)

IDE (bit 7)

VLE (bit 6)

DEXT (bit 5)

RPS RSV (bit 4)

RS (bit 3)

register. When the count reaches 0, a transmit interrupt occurs if transmit descriptor write-back interrupts (IMS.TXDW) are enabled. Hardware always loads the transmit interrupt counter whenever it processes a descriptor with IDE set even if it is already counting down due to a previous descriptor. If hardware encounters a descriptor that has RS set, but not IDE, it generates an interrupt immediately after writing back the descriptor. The interrupt delay timer is cleared.

VLAN Packet Enable When set, indicates that the packet is a VLAN packet and the Ethernet controller

should add the VLAN Ethertype and an 802.1q VLAN tag to the packet. The Ethertype field comes from the VET register and the VLAN tag comes from the special field of the TX descriptor. The hardware inserts the FCS/CRC field in that case.

When cleared, the Ethernet controller sends a generic Ethernet packet. The IFCS controls the insertion of the FCS field in that case.

In order to have this capability CTRL.VME bit should also be set, otherwise VLE capability is ignored. VLE is valid only when EOP is set.

Extension (0b for legacy mode). Should be written with 0b for future compatibility.

Report Packet Sent When set, the 82544GC/EI defers writing the DD bit in the status byte

(DESC.STATUS) until the packet has been sent, or transmission results in an error such as excessive collisions. It is used is cases where the software must know that the packet has been sent, and not just loaded to the transmit FIFO. The 82544GC/ EI might continue to prefetch data from descriptors logically after the one with RPS set, but does not advance the descriptor head pointer or write back any other descriptor until it sent the packet with the RPS set. RPS is valid only when EOP is set.

This bit is reserved and should be programmed to 0b for all Ethernet controllers except the 82544GC/EI.

Report Status When set, the Ethernet controller needs to report the status information. This ability

may be used by software that does in-memory checks of the transmit descriptors to determine which ones are done and packets have been buffered in the transmit FIFO. Software does it by looking at the descriptor status byte and checking the Descriptor Done (DD) bit.

RSV

RPS

RS IC IFCS EOP

38 Software Developer’s Manual

Notes:

Receive and Transmit Description

TDESC.CMD Description

Insert Checksum When set, the Ethernet controller needs to insert a checksum at the offset indicated

IC (bit 2)

IFCS (bit 1)

EOP (bit 0)

by the CSO field. The checksum calculations are performed for the entire packet starting at the byte indicated by the CCS field. IC is ignored if CSO and CCS are out of the packet range. This occurs when (CSS ≥ length) OR (CSO ≥ length - 1). IC is valid only when EOP is set.

Insert FCS Controls the insertion of the FCS/CRC field in normal Ethernet packets. IFCS is

valid only when EOP is set.

End Of Packet When set, indicates the last descriptor making up the packet. One or many

descriptors can be used to form a packet.

1. VLE, IFCS, and IC are qualified by EOP. That is, hardware interprets these bits ONLY when EOP is set.

2. Hardware only sets the DD bit for descriptors with RS set.

3. Descriptors with the null address (0b) or zero length transfer no data. If they have the RS bit set then the DD field in the status word is written when hardware processes them.

4. Although the transmit interrupt may be delayed, the descriptor write-back requested by setting the RS bit is performed without delay unless descriptor write-back bursting is enabled.

3.3.3.2 Transmit Descriptor Status Field Format

The STATUS field stores the applicable transmit descriptor status and has the fields shown in Ta ble

3-11.

The transmit descriptor status field is only present in cases where RS (or RPS for the 82544GC/EI only) is set in the command field.

Table 3-11. Transmit Status Layout

321 0

RSV

a. 82544GC/EI only.

LC EC DD

Software Developer’s Manual 39

Receive and Transmit Description

TDESC.STATUS Description

Transmit Underrun Indicates a transmit underrun event occurred. Transmit Underrun might occur if Early

Transmits are enabled (based on ETT.Txthreshold value) and the 82544GC/EI was

TU RSV (bit 3)

LC (bit 2)

EC (bit 1)

DD (bit 0)

not able to complete the early transmission of the packet due to lack of data in the packet buffer. This does not necessarily mean the packet failed to be eventually transmitted. The packet is successfully re-transmitted if the TCTL.NRTU bit is cleared (and excessive collisions do not occur).

This bit is reserved and should be programmed to 0b for all Ethernet controllers except the 82544GC/EI.

Late Collision Indicates that late collision occurred while working in half-duplex mode. It has no

meaning while working in full-duplex mode. Note that the collision window is speed dependent: 64 bytes for 10/100 Mb/s and 512 bytes for 1000 Mb/s operation.

Excess Collisions Indicates that the packet has experienced more than the maximum excessive

collisions as defined by TCTL.CT control field and was not transmitted. It has no meaning while working in full-duplex mode.

Descriptor Done Indicates that the descriptor is finished and is written back either after the descriptor

has been processed (with RS set) or for the 82544GC/EI, after the packet has been transmitted on the wire (with RPS set).

Note: The DD bit reflects status of all descriptors up to and including the one with the RS bit set (or RPS

for the 82544GC/EI).

3.3.4 Transmit Descriptor Special Field Format

The SPECIAL field is used to provide the 802.1q/802.1ac tagging information.

When CTRL.VME is set to 1b, all packets transmitted from the Ethernet controller that have VLE set in the TDESC.CMD are sent with an 802.1Q header added to the packet. The contents of the header come from the transmit descriptor special field and from the VLAN type register. The special field is ignored if the VLE bit in the transmit descriptor command field is 0b. The special field is valid only for descriptors with EOP set to 1b in TDESC.CMD.

Table 3-12. Special Field (TDESC.SPECIAL) Layout

15 13 12 11 0

PRI CFI VLAN

TDESC.SPECIAL Description

PRI

CFI Canonical Form Indicator.

VLAN

User Priority 3 bits that provide the VLAN user priority field to be inserted in the 802.1Q tag.

VLAN Identifier 12 bits that provide the VLAN identifier field to be inserted in the 802.1Q tag.

40 Software Developer’s Manual

Receive and Transmit Description

3.3.5 TCP/IP Context Transmit Descriptor Format

The TCP/IP context transmit descriptor provides access to the enhanced checksum offload facility available in the Ethernet controller. This feature allows TCP and UDP packet types to be handled more efficiently by performing additional work in hardware, thus reducing the software overhead associated with preparing these packets for transmission.

The TCP/IP context transmit descriptor does not point to packet data as a data descriptor does. Instead, this descriptor provides access to an on-chip context that supports the transmit checksum offloading feature of the controller. A “context” refers to a set of registers loaded or unloaded as a group to provide a particular function.

The context is explicit and directly accessible via the TCP/IP context transmit descriptor. The context is used to control the checksum offloading feature for normal packet transmission.

The Ethernet controller automatically selects the appropriate legacy or normal context to use based on the current packet transmission.

While the architecture supports arbitrary ordering rules for the various descriptors, there are restrictions including:

• Context descriptors should not occur in the middle of a packet.

• Data descriptors of different packet types (legacy or normal) should not be intermingled

except at the packet level.

All contexts control calculation and insertion of up to two checksums. This portion of the context is referred to as the checksum context.

In addition to checksum context, the segmentation context adds information specific to the segmentation capability. This additional information includes the total payload for the message (TDESC.PAYLEN), the total size of the header (TDESC.HDRLEN), the amount of payload data that should be included in each packet (TDESC.MSS), and information about what type of protocol (TCP, IPv4, IPv6, etc.) is used. This information is specific to the segmentation capability and is therefore ignored for context descriptors that do not have the TSE bit set.

Because there are dedicated resources on-chip for the normal context, the context remains constant until it is modified by another context descriptor. This means that a context can be used for multiple packets (or multiple segmentation blocks) unless a new context is loaded prior to each new packet. Depending on the environment, it may be completely unnecessary to load a new context for each packet. For example, if most traffic generated from a given node is standard TCP frames, this context could be set up once and used for many frames. Only when some other frame type is required would a new context need to be loaded by software. After the “non-standard” frame is transmitted, the “standard” context would be setup once more by software. This method avoids the “extra descriptor per packet” penalty for most frames. The penalty can be eliminated altogether if software elects to use TCP/IP checksum offloading only for a single frame type, and thus performs those operations in software for other frame types.

This same logic can also be applied to the segmentation context, though the environment is a more restrictive one. In this scenario, the host is commonly asked to send a message of the same type, TCP/IP for instance, and these messages also have the same total length and same maximum segment size (MSS). In this instance, the same segmentation context could be used for multiple TCP messages that require hardware segmentation. The limitations of this scenario and the relatively small performance advantage make this approach unlikely; however, it is useful in understanding the underlying mechanism.

Software Developer’s Manual 41

Receive and Transmit Description

3.3.6 TCP/IP Context Descriptor Layout

The following section describes the layout of the TCP/IP context transmit descriptor.

To select this descriptor format, bit 29 (TDESC.DEXT) must be set to 1b and TDESC.DTYP must be set to 0000b. In this case, the descriptor format is defined as shown in Table 3-13.

Note that the TCP/IP context descriptor does not transfer any packet data. It merely prepares the checksum hardware for the TCP/IP Data descriptors that follow.

Table 3-13. Transmit Descriptor (TDESC) Layout – (Type = 0000b)

63 48 47 40 39 32 31 16 15 8 7 0

0 TUCSE TUCSO TUCSS IPCSE IPCSO IPCSS

8 MSS HDRLEN RSV STA TUCMD DTYP PAYLEN

63 48 47 40 39 36 35 32 31 24 23 20 19 0

Note: The first quadword of this descriptor type contains parameters used to calculate the two checksums

which may be offloaded.

42 Software Developer’s Manual

Table 3-14. Transmit Descriptor (TDESC) Layout

Receive and Transmit Description

Transmit

Descriptor Offload

TUCSE

TUCSO

TUCSS

IPCSE

IPCSO

IPCSS

MSS

HDRLEN

Description

TCP/UDP Checksum Ending Defines the ending byte for the TCP/UDP checksum offload feature. Setting TUCSE field to 0b indicates that the checksum covers from TUCCS to the

end of the packet.

TCP/UDP Checksum Offset Defines the offset where to insert the TCP/UDP checksum field in the packet data

buffer. This is used in situations where the software needs to calculate partial checksums (TCP pseudo-header, for example) to include bytes which are not contained within the range of start and end.

If no partial checksum is required, software must write a value of 0b.

TCP/UDP Checksum Start Defines the starting byte for the TCP/UDP checksum offload feature. It must be defined even if checksum insertion is not desired for some reason. When setting the TCP segmentation context, TUCSS is used to indicate the start

of the TCP header.

IP Checksum Ending Defines the ending byte for the IP checksum offload feature. It specifies where the checksum should stop. A 16-bit value supports checksum

offloading of packets as large as 64KB. Setting IPCSE field to 0b indicates that the checksum covers from IPCCS to the

end of the packet. In this way, the length of the packet does not need to be calculated.

IP Checksum Offset The IPCSO field specifies where the resulting IP checksum should be placed. It is

limited to the first 256 bytes of the packet and must be less than or equal to the total length of a given packet. If this is not the case, the checksum is not inserted.

IP Checksum Start IPCSS specifies the byte offset from the start of the transferred data to the first

byte in be included in the checksum. Setting this value to 0b means the first byte of the data would be included in the checksum.

Note that the maximum value for this field is 255. This is adequate for typical applications.

The IPCSS value needs to be less than the total transferred length of the packet. If this is not the case, the results are unpredictable.

IPCSS must be defined even if checksum insertion is not desired for some reason. When setting the TCP segmentation context, IPCSS is used to indicate the start of

the IP header.

Maximum Segment Size Controls the Maximum Segment Size. This specifies the maximum TCP or UDP

payload “segment” sent per frame, not including any header. The total length of each frame (or “section”) sent by the TCP Segmentation mechanism (excluding

802.3ac tagging and Ethernet CRC) is MSS bytes + HRDLEN. The one exception is the last packet of a TCP segmentation context which is (typically) shorter than “MSS+HDRLEN”. This field is ignored if TDESC.TSE is not set.

Header Length Specifies the length (in bytes) of the header to be used for each frame (or

“section”) of a TCP Segmentation operation. The first HDRLEN bytes fetched from data descriptor(s) are stored internally and used as a prototype header for each section, and are pre-pended to each payload segment to form individual frames. For UDP packets this is normally equal to “UDP checksum offset + 2”. For TCP packets it is normally equal to “TCP checksum offset + 4 + TCP header option bytes”. This field is ignored if TDESC.TSE is not set.

Software Developer’s Manual 43

Receive and Transmit Description

Notes:

Transmit

Descriptor Offload

RSV

STA

TUCMD

DTYP

PAYL EN

Reserved Should be programmed to 0b for future compatibility.

TCP/UDP Status field Provides transmit status indication.

Section 3.3.6.2 provides the bit definition for the TDESC.STA field.

TCP/UDP command field The command field provides options that control the checksum offloading, along

with some of the generic descriptor processing functions.

Section 3.3.6.1 provides the bit definitions for the TDESC.TUCMD field.

Descriptor Type Set to 0000b for TCP/IP context transmit descriptor type.

The packet length field (TDESC.PAYLEN) is the total number of payload bytes for this TCP Segmentation offload context (i.e., the total number of payload bytes that could be distributed across multiply frames after TCP segmentation is performed). Following the fetch of the prototype header, PAYLEN specifies the length of data that is fetched next from data descriptor(s). This field is also used to determine when “last-frame” processing needs to be performed. Typically, a new data descriptor is used to denote the start of the payload data buffer(s), but this is not required. PAYLEN specification should not include any header bytes. There is no restriction on the overall PAYLEN specification with respect to the transmit FIFO size, once the MSS and HDRLEN specifications are legal. This field is ignored if TDESC.TSE is not set. Refer to Section 3.5 for details on the TCP Segmentation off-loading feature.

Description

1. A number of the fields are ignored if the TCP Segmentation enable bit (TDESC.TSE) is cleared, denoting that the descriptor does not refer to the TCP segmentation context.

2. Maximum limits for the HDRLEN and MSS fields are dictated by the lengths variables. However, there is a further restriction that for any TCP Segmentation operation, the hardware must be capable of storing a complete section (completely-built frame) in the transmit FIFO prior to transmission. Therefore, the sum of MSS + HDRLEN must be at least 80 bytes less than the allocated size of the transmit FIFO.

3.3.6.1 TCP/UDP Offload Transmit Descriptor Command Field

The command field (TDESC.TUCMD) provides options to control the TCP segmentation, along with some of the generic descriptor processing functions.

44 Software Developer’s Manual

Receive and Transmit Description

Table 3-15. Command Field (TDESC.TUCMD) Layout

7 6 5 4 3 2 1 0

IDE RSV DEXT RSV RS TSE IP TCP

TDESC.TUCMD Description

Interrupt Delay Enable IDE activates the transmit interrupt delay timer. Hardware loads a countdown

IDE (bit 7)

RSV (Bit 6) Reserved. Set to 0b for future compatibility.

DEXT(Bit 5)

RSV (Bit 4) Reserved. Set to 0b for future compatibility.

RS (Bit 3)

TSE (Bit 2)

IP (Bit 1)

IP (Bit 1) 82544GC/EI only

TCP (bit 0)

set. The value loaded comes from the IDV field of the Interrupt Delay (TIDV) register. When the count reaches 0, a transmit interrupt occurs. Hardware always loads the transmit interrupt counter whenever it processes a descriptor with IDE set even if it is already counting down due to a previous descriptor. If hardware encounters a descriptor that has RS set, but not IDE, it generates an interrupt immediately after writing back the descriptor. The interrupt delay timer is cleared.

Descriptor Extension Must be 1b for this descriptor type.

Report Status RS tells the hardware to report the status information for this descriptor. Because this

descriptor does not transmit data, only the DD bit in the status word is valid. Refer to

Section 3.3.6.2 for the layout of the status field.

TCP Segmentation Enable TSE indicates that this descriptor is setting the TCP segmentation context. If this bit

is not set, the checksum offloading context for normal (non-”TCP Segmentation”) packets is written. When a descriptor of this type is processed the Ethernet controller immediately updates the context in question (TCP Segmentation or checksum offloading) with values from the descriptor. This means that if any normal packets or TCP Segmentation packets are in progress (a descriptor with EOP set has not been received for the given context), the results are likely to be undesirable.

Packet Type (IPv4 = 1b, IPv6 = 0b) Identifies what type of IP packet is used in the segmentation process. This is

necessary for hardware to know where the IP Payload Length field is located. This does not override the checksum insertion bit, IXSM.

Packet Type (IP = 1b) Identifies the packet as an IP packet. The purpose of this bit is to enable/disable the

updating of the IP header during the segmentation process. This does not override the checksum insertion bit, IXSM.

Packet Type (TCP = 1b) Identifies the packet as either TCP or UDP (non-TCP). This affects the processing of

the header information.

Note:

1. The IDE, DEXT, and RS bits are valid regardless of the state of TSE. All other bits are ignored if TSE = 0b.

2. The TCP Segmentation feature also provides access to a generic block send function and may be useful for performing “segmentation offload” in which the header information is constant. By clearing both the TCP and IP bits, a block of data may be broken down into frames of a given size, a constant, arbitrary length header may be pre-pended to each frame, and two checksums optionally added.

Software Developer’s Manual 45

Receive and Transmit Description

3.3.6.2 TCP/UDP Offload Transmit Descriptor Status Field

Four bits are reserved to provide transmit status, although only one is currently assigned for this specific descriptor type. The status word is only written back to host memory in cases where the RS is set in the command.

Table 3-16. Transmit Status Layout

32 1 0

RSV DD

TDESC.STA Description

RSV

DD (bit 0)

Reserved Reserved for future use. Reads as 0b.

Descriptor Done Indicates that the descriptor is finished and is written back after the descriptor has

been processed.

3.3.7 TCP/IP Data Descriptor Format

The TCP/IP data descriptor is the companion to the TCP/IP context transmit descriptor described in the previous section. This descriptor type provides similar functionality to the legacy mode descriptor but also integrates the checksum offloading and TCP Segmentation feature.

To select this descriptor format, bit 29 in the command field (TDESC.DEXT) must be set to 1b and TDESC.DTYP must be set to 0001b. In this case, the descriptor format is defined as shown in

Table 3-17.

46 Software Developer’s Manual

Receive and Transmit Description

Table 3-17. Transmit Descriptor (TDESC) Layout – (Type = 0001b)

0 Address [63:0]

8 Special POPTS RSV STA DCMD DTYP DTALEN

0 63 48 47 40 39 36 35 32 31 24 23 20 19 0

Transmit Descriptor Description

Address

DTALEN

DTYP

DCMD

STA

RSV

POPTS

Special

Data buffer address Address of the data buffer in the host memory which contains a portion of the

transmit packet.

Data Length Field Total length of the data pointed to by this descriptor, in bytes. For data descriptors not associated with a TCP Segmentation operation

(TDESC.TSE not set), the descriptor lengths are subject to the same restrictions specified for legacy descriptors (the sum of the lengths of the data descriptors comprising a single packet must be at least 80 bytes less than the allocated size of the transmit FIFO.)

Data Type Set to 0001b to identify this descriptor as a TCP/IP data descriptor.

Descriptor Command Field Provides options that control some of the generic descriptor processing

features. Refer to Section 3.3.7.1 for bit definitions of the DCMD field.

TCP/IP Status field Provides transmit status indication.

Section 3.3.7.2 provides the bit definition for the TDESC.STA field.

Reserved Set to 0b for future compatibility.

Packet Option Field Provides a number of options which control the handling of this packet. This field

is ignored except on the first data descriptor of a packet.

Section 3.3.7.3 provides the bit definition for the TDESC.POPTS field.

Speci al field The Special field is used to provide 802.1q tagging information. This field is only valid in the last descriptor of the given packet (qualified by the

EOP bit).

Software Developer’s Manual 47

Receive and Transmit Description

3.3.7.1 TCP/IP Data Descriptor Command Field

The Command field provides options that control checksum offloading and TCP segmentation features along with some of the generic descriptor processing features.

Table 3-18. Command Field (TDESC.DCMD) Layout

7 6 5 4 3 2 1 0

IDE VLE DEXT

a. 82544GC/EI only.

TDESC.DCMD Description

Interrupt Delay Enable When set, activates the transmit interrupt delay timer. Hardware loads a countdown

IDE (bit 7)

VLE (bit 6)

DEXT (Bit 5)

RPS RSV (bit 4)

RS (bit 3)

loaded comes from the IDV field of the Interrupt Delay (TIDV) register. When the count reaches 0, a transmit interrupt occurs if enabled. Hardware always loads the transmit interrupt counter whenever it processes a descriptor with IDE set even if it is already counting down due to a previous descriptor. If hardware encounters a descriptor that has RS set, but not IDE, it generates an interrupt immediately after writing back the descriptor. The interrupt delay timer is cleared.

VLAN Enable When set, indicates that the packet is a VLAN packet and the hardware should add

the VLAN Ethertype and an 802.1q VLAN tag to the packet. The Ethertype should come from the VET register and the VLAN data comes from the special field of the TX descriptor. The hardware in that case appends the FCS/CRC.

Note that the CTRL.VME bit should also be set. If the CTRL.VME bit is not set, the Ethernet controller does not insert VLAN tags on outgoing packets and it sends generic Ethernet packets. The IFCS controls the insertion of the FCS/CRC in that case.

VLE is only valid in the last descriptor of the given packet (qualified by the EOP bit).

Descriptor Extension Must be 1b for this descriptor type

Report Packet Sent RPS is used in cases where software must know that a packet has been sent on the

wire, not just that it has been loaded into the 82544GC/EI controller’s internal packet buffer.

When set, hardware defers writing the DD bit in the status byte until the packet has been sent, or transmission results in an error such as excess collisions. Hardware can continue to pre-fetch data from descriptors logically after the one with RPS set, but does not advance the head pointer or write back any other descriptors until it has sent the packet with RPS set.

For a TCP Segmentation context, the RPS bit indicates to the 82544GC/EI that the descriptor status should only be written back once all packets that make up the given TCP Segmentation context had been sent.

This bit is reserved and should be programmed to 0b for all Ethernet controllers except the 82544GC/EI.

Report Status When set, tells the hardware to report the status information for this descriptor as

soon as the corresponding data buffer has been fetched and stored in the controller’s internal packet buffer.

RSV

RPS

RS TSE IFCS EOP

48 Software Developer’s Manual

TDESC.DCMD Description

TSE (bit 2)

IFCS (Bit 1)

EOP (Bit 0)

TCP Segmentation Enable TSE indicates that this descriptor is part of the current TCP Segmentation context. If

this bit is not set, the descriptor is part of the “normal” context.

Insert IFCS Controls the insertion of the FCS/CRC field in normal Ethernet packets. IFCS is only valid in the last descriptor of the given packet (qualified by the EOP bit).

End Of Packet The EOP bit indicates that the buffer associated with this descriptor contains the last

data for the packet or for the given TCP Segmentation context. In the case of a TCP Segmentation context, the DTALEN length of this descriptor should match the amount remaining of the original PAYLEN. If it does not, the TCP Segmentation context is terminated but the end of packet processing may be incorrectly performed. These abnormal termination events are counted in the TSCTFC statistics register.

Note: The VLE, IFCS, and VLAN fields are only valid in certain descriptors. If TSE is enabled, the VLE,

IFCS, and VLAN fields are only valid in the first data descriptor of the TCP segmentation context. If TSE is not enabled, then these fields are only valid in the last descriptor of the given packet (qualified by the EOP bit).

3.3.7.2 TCP/IP Data Descriptor Status Field

Receive and Transmit Description

Four bits are reserved to provide transmit status, although only the DD is valid1. The status word is only written back to host memory in cases where the RS bit is set in the command field. The DD bit indicates that the descriptor is finished and is written back after the descriptor has been processed.

Table 3-19. Transmit Status Layout

321 0

RSV

a. 82544GC/EI only.

TDESC.STA Description

Reserved Reserved

LC EC DD

1. Unless the RPS bit is set in the descriptor (82544GC/EI only).

Software Developer’s Manual 49

Receive and Transmit Description

TDESC.STA Description

Late Collision Indicates that late collision occurred while working in half-duplex mode.

LC (bit2)

EC (bit 1)

DD (bit 0)

It has no meaning while working in full-duplex mode. Note that the collision window is speed dependent: 64 bytes for 10/100 Mb/s and

512 bytes for 1000 Mb/s operation.

Excess Collision Indicates that the packet has experienced more than the maximum excessive

collisions as defined by TCTL.CT control field and was not transmitted. Is has no meaning while working in full-duplex mode.

Descriptor Done Indicates that the descriptor is done and is written back either after the descriptor

has been processed (with RS set), or for the 82554GC/EI only, after the packet has been transmitted on the wire (with RPS set).

3.3.7.3 TCP/IP Data Descriptor Option Field

The POPTS field provides a number of options which control the handling of this packet. This field is ignored except on the first data descriptor of a packet.

Table 3-20. Packet Options Field (TDESC.POPTS) Layout

7 6 5 4 3 2 1 0

RSV RSV RSV RSV RSV RSV TXSM IXSM

TDESC.POPTS Description

RSV (bit 2-7)

TXSM (bit1)

IXSM (bit 0)

Reserved Should be written with 0b for future compatibility.

Insert TCP/UDP Checksum Controls the insertion of the TCP/UDP checksum. If not set, the value placed into the checksum field of the packet data is not modified,

and is placed on the wire. When set, TCP/UDP checksum field is modified by the hardware.

Valid only in the first data descriptor for a given packet or TCP Segmentation context.

Insert IP Checksum Controls the insertion of the IP checksum. If not set, the value placed into the checksum field of the packet data is not modified

and is placed on the wire. When set, the IP checksum field is modified by the hardware.

Valid only in the first data descriptor for a given packet or TCP Segmentation context.

3.3.7.4 TCP/IP Data Descriptor Special Field

The SPECIAL field is used to provide the 802.1q/802.3ac tagging information.

50 Software Developer’s Manual

Receive and Transmit Description

When CTRL.VME is set to 1b, all packets transmitted from the Ethernet controller that has VLE set in the DCMD field is sent with an 802.1Q header added to the packet. The contents of the header come from the transmit descriptor special field and from the VLAN type register. The special field is ignored if the VLE bit in the transmit descriptor command field is 0b. The special field is valid only when EOP is set.

Table 3-21. Special Field (TDESC.SPECIAL) Layout

15 13 12 11 0

PRI CFI VLAN

TDESC.SPECIAL Description

PRI

CFI Canonical Form Indicator

VLAN

User Priority Three bits that provide the VLAN user priority field to be inserted in the 802.1Q tag.

VLAN Identifier 12 bits that provide the VLAN identifier field to be inserted in the 802.1Q tag.

3.4 Transmit Descriptor Ring Structure

The transmit descriptor ring structure is shown in Figure 3-4. A pair of hardware registers maintains the transmit queue. New descriptors are added to the ring by writing descriptors into the circular buffer memory region and moving the ring’s tail pointer. The tail pointer points one entry beyond the last hardware owned descriptor (but at a point still within the descriptor ring). Transmission continues up to the descriptor where head equals tail at which point the queue is empty.

Descriptors passed to hardware should not be manipulated by software until the head pointer has advanced past them.

Software Developer’s Manual 51

Receive and Transmit Description

Circular Buffer

Base

Head

Owned By Hardware

Base + Size

Transmit

Queue

Tail

Figure 3-4. Transmit Descriptor Ring Structure

Shaded boxes in Figure 3-4 represent descriptors that have been transmitted but not yet reclaimed by software. Reclaiming involves freeing up buffers associated with the descriptors.

The transmit descriptor ring is described by the following registers:

• Transmit Descriptor Base Address registers (TDBAL and TDBAH)

These registers indicate the start of the descriptor ring buffer. This 64-bit address is aligned on a 16-byte boundary and is stored in two consecutive 32-bit registers. TDBAL contains the lower 32-bits; TDBAH contains the upper 32 bits. Hardware ignores the lower 4 bits in TDBAL.

• Transmit Descriptor Length register (TDLEN)

This register determines the number of bytes allocated to the circular buffer. This value must be 128 byte aligned.

• Transmit Descriptor Head register (TDH)

This register holds a value which is an offset from the base, and indicates the in–progress descriptor. There can be up to 64K descriptors in the circular buffer. Reading this register returns the value of “head” corresponding to descriptors already loaded in the output FIFO.

• Transmit Descriptor Tail register (TDT)

This register holds a value which is an offset from the base, and indicates the location beyond the last descriptor hardware can process. This is the location where software writes the first new descriptor.

The base register indicates the start of the circular descriptor queue and the length register indicates the maximum size of the descriptor ring. The lower seven bits of length are hard–wired to 0b. Byte addresses within the descriptor buffer are computed as follows:

address = base + (ptr * 16), where ptr is the value in the hardware head or tail register.

The size chosen for the head and tail registers permit a maximum of 64 K descriptors, or approximately 16 K packets for the transmit queue given an average of four descriptors per packet.

52 Software Developer’s Manual

Receive and Transmit Description

Once activated, hardware fetches the descriptor indicated by the hardware head register. The hardware tail register points one beyond the last valid descriptor.

Software can determine if a packet has been sent by setting the RS bit (or the RPS bit for the 82544GC/EI only) in the transmit descriptor command field. Checking the transmit descriptor DD bit in memory eliminates a potential race condition. All descriptor data is written to the IO bus prior to incrementing the head register, but a read of the head register could “pass” the data write in systems performing IO write buffering. Updates to transmit descriptors use the same IO write path and follow all data writes. Consequently, they are not subject to the race condition. Other potential conditions also prohibit software reading the head pointer.

In general, hardware prefetches packet data prior to transmission. Hardware typically updates the value of the head pointer after storing data in the transmit FIFO

The process of checking for completed packets consists of one of the following:

• Scan memory for descriptor status write-backs.

• Take an interrupt. An interrupt condition can be generated whenever a transmit queue goes

empty (ICR.TXQE). Interrupts can also be triggered in other ways.

3.4.1 Transmit Descriptor Fetching

The descriptor processing strategy for transmit descriptors is essentially the same as for receive descriptors except that a different set of thresholds are used. As for receives, the number of on-chip transmit descriptors buffer space is 64 descriptors.

When the on-chip buffer is empty, a fetch happens as soon as any descriptors are made available (software writes to the tail pointer). When the on-chip buffer is nearly empty (TXDCTL.PTHRESH), a prefetch is performed whenever enough valid descriptors (TXDCTL.HTHRESH) are available in host memory and no other DMA activity of greater priority is pending (descriptor fetches and write-backs or packet data transfers).

The descriptor prefetch policy is aggressive to maximize performance. If descriptors reside in an external cache, the system must ensure cache coherency before changing the tail pointer.

When the number of descriptors in host memory is greater than the available on-chip descriptor storage, the chip may elect to perform a fetch which is not a multiple of cache line size. The hardware performs this non-aligned fetch if doing so results in the next descriptor fetch being aligned on a cache line boundary. This allows the descriptor fetch mechanism to be most efficient in the cases where it has fallen behind software.

3.4.2 Transmit Descriptor Write-back

The descriptor write-back policy for transmit descriptors is similar to that for receive descriptors with a few additional factors. First, since transmit descriptor write-backs are optional (controlled

by RS

in the transmit descriptor), only descriptors which have one (or both) of these bits set starts the accumulation of write-back descriptors. Secondly, to preserve backward compatibility with the 82542, if the TXDCTL.WTHRESH value is 0b, the Ethernet controller writes back a single byte of the descriptor (TDESCR.STA) and all other bytes of the descriptor are left unchanged.

1. With the RPS bit set, the head is not advanced until after the packet is transmitted or rejected due to excess collisions (82544GC/EI only).

2. And RPS for the 82544GC/EI only.

Software Developer’s Manual 53

Receive and Transmit Description

Since the benefit of delaying and then bursting transmit descriptor write-backs is small at best, it is likely that the threshold are left at the default value (0b) to force immediate write-back of transmit descriptors and to preserve backward compatibility.

Descriptors are written back in one of three conditions:

• TXDCTL.WTHRESH = 0b and a descriptor which has RS

• Transmit Interrupt Delay timer expires

• TXDCTL.WTHRESH > 0b and TXDCTL.WTHRESH descriptors have accumulated

For the first condition, write-backs are immediate. This is the default operation and is backward compatible. For this case, the Transmit Interrupt delay function works as described in Section

3.4.3.1.

The other two conditions are only valid if descriptor bursting is enabled (see Section 13.4.44). In the second condition, the Transmit Interrupt Delay timer (TIDV) is used to force timely write–back of descriptors. The first packet after timer initialization starts the timer. Timer expiration flushes any accumulated descriptors and sets an interrupt event (TXDW).

For the final condition, if TXDCTL.WTHRESH descriptors are ready for write-back, the writeback is performed.

set is ready to be written back

3.4.3 Transmit Interrupts

Hardware supplies three transmit interrupts. These interrupts are initiated through the following conditions:

• Transmit queue empty (TXQE) — All descriptors have been processed. The head pointer is

equal to the tail pointer.

• Descriptor done [Transmit Descriptor Write-back (TXDW)] — Set when hardware writes

back a descriptor with RS the streams interface has run out of descriptors and wants to be interrupted whenever progress is made.

• Transmit Delayed Interrupt (TXDW) — In conjunction with IDE (Interrupt Delay Enable), the

TXDW indication is delayed by a specific time per the TIDV register. This interrupt is set when the transmit interrupt countdown register expires. The countdown register is loaded with the value of the IDV field of the TIDV register, when a transmit descriptor with its RS the IDE bit are set, is written back. When a Transmit Delayed Interrupt occurs, the TXDW interrupt cause bit is set (just as when a Transmit Descriptor Write-back interrupt occurs). This interrupt may be masked in the same manner as the TXDW interrupt. This interrupt is used frequently by software that performs dynamic transmit chaining, by adding packets one at a time to the transmit chain.

Note: The transmit delay interrupt is indicated with the same interrupt bit as the transmit write-back

interrupt, TXDW. The transmit delay interrupt is only delayed in time as discussed above.

set. This is only expected to be used in cases where, for example,

bit and

1. Or RPS for the 82544GC/EI only.

54 Software Developer’s Manual

• Link status change (LSC) - Set when the link status changes. When using the internal PHY,

link status changes are determined and indicated by the PHY via a change in its LINK indication.

When using an external TBI device (82544GC/EI only), the device might indicate a link status change using its LOS (loss of sync) indication. In this TBI mode, if HW AutoNegotiation is enabled, the MAC can also detect and signal a link status change if the Configuration Base Page register is received (0b), or if either the LRST or ANE bits are changed by software.

• Transmit Descriptor Ring Low Threshold Hit (TXD_LOW) (not applicable to the 82544GC/

EI) - Set when the total number of transmit descriptors available (as measured by the difference between the Tx descriptor ring Head and Tail pointer) hits the low threshold specified in the TXDCTL.LWTHRESH field.

3.4.3.1 Delayed Transmit Interrupts

This mechanism allows software the flexibility of delaying transmit interrupts until no more descriptors are added to a transmit chain for a certain amount of time, rather than when the Ethernet controller’s head pointer catches the tail pointer. This occurs if the Ethernet controller is processing packets slightly faster than the software, a likely scenario for gigabit operations.

A software driver usually has no knowledge of when it is going to be asked to send another frame. For performance reasons, it is best to generate only one transmit interrupt after a burst of packets have been sent.

Receive and Transmit Description

Refer to Section 3.3.3.1 for specific details.

3.5 TCP Segmentation

Hardware TCP Segmentation is one of the off-loading options of most modern TCP/IP stacks. This is often referred to as “Large Send” offloading. This feature enables the TCP/IP stack to pass to the Ethernet controller software driver a message to be transmitted that is bigger than the Maximum Transmission Unit (MTU) of the medium. It is then the responsibility of the software driver and hardware to carve the TCP message into MTU size frames that have appropriate layer 2 (Ethernet), 3 (IP), and 4 (TCP) headers. These headers must include sequence number, checksum fields, options and flag values as required. Note that some of these values (such as the checksum values) are unique for each packet of the TCP message, and other fields such as the source IP address is constant for all packets associated with the TCP message.

The offloading of these processes from the software driver to the Ethernet controller saves significant CPU cycles. The software driver shares the additional tasks to support these options with the Ethernet controller.

Although the Ethernet controller’s TCP segmentation offload implementation was specifically designed to take advantage of new “TCP Segmentation offload” features, the hardware implementation was made generic enough so that it could also be used to “segment” traffic from other protocols. For instance this feature could be used any time it is desirable for hardware to segment a large block of data for transmission into multiple packets that contain the same generic header.

Software Developer’s Manual 55

Receive and Transmit Description

3.5.1 Assumptions

The following assumption applies to the TCP Segmentation implementation in the Ethernet controller:

• The RS bit operation is not changed. Interrupts are set after data in buffers pointed to by

individual descriptors is transferred to hardware.

• Checksums are not accurate above a 12 K frame size.

• The function of the RPS

make up the “TCP Segmentation” context, not the individual packets segmented by hardware.

3.5.2 Transmission Process

The transmission process for regular (non-TCP Segmentation packets) involves:

• The protocol stack receives from an application a block of data that is to be transmitted.

• The protocol stack calculates the number of packets required to transmit this block based on

the MTU size of the media and required packet headers.

• For each packet of the data block:

• Ethernet, IP and TCP/UDP headers are prepared by the stack.

bit in the Transmit Descriptor is applicable to all of the packets that

• The stack interfaces with the software device driver and commands the driver to send the

individual packet.

• The driver gets the frame and interfaces with the hardware.

• The hardware reads the packet from host memory (via DMA transfers).

• The driver returns ownership of the packet to the operating system when the hardware has

completed the DMA transfer of the frame (indicated by an interrupt).

The transmission process for the Ethernet controller TCP segmentation offload implementation involves:

• The protocol stack receives from an application a block of data that is to be transmitted.

• The stack interfaces to the software device driver and passes the block down with the

appropriate header information.

• The software device driver sets up the interface to the hardware (via descriptors) for the TCP

Segmentation context.

• The hardware transfers the packet data and performs the Ethernet packet segmentation and

transmission based on offset and payload length parameters in the TCP/IP context descriptor including:

— Packet encapsulation

— Header generation & field updates including IP and TCP/UDP checksum generation

— The driver returns ownership of the block of data to the operating system when the

hardware has completed the DMA transfer of the entire data block (indicated by an

interrupt).

1. 82544GC/EI only.

56 Software Developer’s Manual

3.5.2.1 TCP Segmentation Data Fetch Control

To perform TCP Segmentation in the Ethernet controller, the DMA unit must ensure that the entire payload of the segmented packet fits into the available space in the on-chip Packet Buffer. The segmentation process is performed without interruption. The DMA performs various comparisons between the payload and the Packet Buffer to ensure that no interruptions occur. The TCP Segmentation Pad & Minimum Threshold (TSPMT) register is used to allow software to program the minimum threshold required for a TCP Segmentation payload. Consideration should be made for the MTU value when writing this field. The TSPMT register is also used to program the threshold padding overhead. This padding is necessary due to the indeterminate nature of the MTU and the associated headers.

3.5.3 TCP Segmentation Performance

Performance improvements for a hardware implementation of TCP Segmentation offload mean:

• The operating system stack does not need to partition the block to fit the MTU size, saving

CPU cycles.

• The operating system stack only computes one Ethernet, IP, and TCP header per segment,

saving CPU cycles.

• The operating system stack interfaces with the software device driver only once per block

transfer, instead of once per frame.

• Larger PCI bursts are used which improves bus efficiency.

Receive and Transmit Description

• Interrupts are easily reduced to one per TCP message instead of one per packet.

• Fewer I/O accesses are required to command the hardware.

3.5.4 Packet Format

Typical TCP/IP transmit window size is 8760 bytes (about 6 full size frames). A TCP message can be as large as 64 KB and is generally fragmented across multiple pages in host memory. The Ethernet controller partitions the data packet into standard Ethernet frames prior to transmission. The Ethernet controller supports calculating the Ethernet, IP, TCP, and even UDP headers, including checksum, on a frame by frame basis.

Ethernet IPv4 TCP/UDP DATA FCS

Figure 3-5. TCP/IP Packet Format

Frame formats supported by the Ethernet controller’s TCP segmentation include:

• Ethernet 802.3

• IEEE 802.1q VLAN (Ethernet 802.3ac)

• Ethernet Type 2

• Ethernet SNAP

• IPv4 headers with options

• IPv6 headers with IP option next headers

• IPv6 packet tunneled in IPv4

Software Developer’s Manual 57

Receive and Transmit Description

• TCP with options

• UDP with limitations.

UDP (unlike TCP) is not a “reliable protocol”, and fragmentation is not supported at the UDP level. UDP messages that are larger than the MTU size of the given network medium are normally fragmented at the IP layer. This is different from TCP, where large TCP messages can be fragmented at either the IP or TCP layers depending on the software implementation. The Ethernet controller has the ability to segment UDP traffic (in addition to TCP traffic). This process has limited usefulness.

IP tunneled packets are not supported for TCP Segmentation operation

3.5.5 TCP Segmentation Indication

Software indicates a TCP Segmentation transmission context to the hardware by setting up a TCP/ IP Context Transmit Descriptor. The purpose of this descriptor is to provide information to the hardware to be used during the TCP segmentation offload process. The layout of this descriptor is reproduced in Section 3.3.6.

63 48 47 40 39 32 31 16 15 8 7 0

0 TUCSE TUCS0 TUCSS IPCSE IPCS0 IPCSS

8 MSS HDRLEN RSV STA TUCMD DTYP PAYLEN

63 48 47 40 39 36 35 32 31 24 23 20 19 0

7 6 5 4 3 2 1 0

IDE RSV DEXT RSV RS TSE IP TCP

Figure 3-6. TCP/IP Context Transmit Descriptor & Command Layout

Setting the TSE bit in the Command field to 1b indicates that this descriptor refers to the TCP Segmentation context (as opposed to the normal checksum offloading context). This causes the checksum offloading, packet length, header length, and maximum segment size parameters to be loaded from the descriptor into the Ethernet controller.

The TCP Segmentation prototype header is taken from the packet data itself. Software must identity the type of packet that is being sent (IP/TCP, IP/UDP, other), calculate appropriate checksum offloading values for the desired checksums, and calculate the length of the header which is pre-pended. The header may be up to 240 bytes in length.

Once the TCP Segmentation context has been set, the next descriptor provides the initial data to transfer. This first descriptor(s) must point to a packet of the type indicated. Furthermore, the data it points to may need to be modified by software as it serves as the prototype header for all packets within the TCP Segmentation context. The following sections describe the supported packet types and the various updates which are performed by hardware. This should be used as a guide to determine what must be modified in the original packet header to make it a suitable prototype header.

The following summarizes the fields considered by the driver for modification in constructing the prototype header:

58 Software Developer’s Manual

Receive and Transmit Description

• IPv4 Header

— Length should be set to zero

— Identification Field should be set as appropriate for first packet of send (if not already)

— Header Checksum should be zeroed out unless some adjustment is needed by the driver

• IPv6 Header

— Length should be set to zero

• TCP Header

— Sequence Number should be set as appropriate for first packet of send (if not already)

— PSH, and FIN flags should be set as appropriate for last

— TCP Checksum should be set to the partial pseudo-header checksum as follows:

IP Source Address

IP Destination Address

Zero

Zero Next Header

Zero

Layer 4

Protocol

packet of send

Zero

a. 82544GC/EI only

Figure 3-7. TCP Partial Pseudo-Header Checksum

• UDP Header

— Checksum should be set as in TCP header, above The Ethernet controller’s DMA function fetches the ethernet, IP, and TCP/UDP prototype header information from the initial descriptor(s) and save them on-chip for individual packet header generation. The following sections describe the updating process performed by the hardware for each frame sent using the TCP Segmentation capability.

3.5.6 TCP Segmentation Use of Multiple Data Descriptors

TCP Segmentation enables a packet to be segmented to describe more than one data descriptor. A large packet contained in a single virtual-address buffer is better described as a series of data descriptors, each referencing a single physical address page.

The only requirement for this use is if multiple data descriptors for TCP segmentation follows this guideline:

• If multiple data descriptors are used to describe the IP/TCP/UDP header section, each

descriptor must describe one or more complete headers; descriptors referencing only parts of headers are not supported.

Software Developer’s Manual 59

Receive and Transmit Description

Note: It is recommended that the entire header section, as described by the TCP Context Descriptor

HDRLEN field, be coalesced into a single buffer and described using a single data descriptor.

3.5.7 IP and TCP/UDP Headers

This section outlines the format and content for the IP, TCP and UDP headers. The Ethernet controller requires baseline information from the software device driver in order to construct the appropriate header information during the segmentation process.

Header fields that are modified by the Ethernet controller are highlighted in the figures that follow.

The IPv4 header is first shown in the traditional (RFC 791) representation, and because byte and bit ordering is confusing in that representation, the IP header is also shown in little-endian format. The actual data is fetched from memory in little-endian format.

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

Version

Time to Live Layer 4 Protocol ID Header Checksum

IP Hdr Length

Identification Flags Fragment Offset

1 23

TYPE of service

Source Address

Destination Address

Options

Total length

Figure 3-8. IPv4 Header (Traditional Representation)

60 Software Developer’s Manual

Receive and Transmit Description

Byte 3 Byte 2 Byte 1 Byte 0

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0

LSB Total length MSB TYPE of service Version

Fragment Offset Low

Header Checksum Layer 4 Protocol ID Time to Live

NFMFFragment

E S

Offset High

Source Address

Destination Address

LSB Identification MSB

Options

IP Hdr

Length

Figure 3-9. IPv4 Header (Little-Endian Order)

Flags Field Definition:

The Flags field is defined below. Note that hardware does not evaluate or change these bits.

• MF More Fragments

• NF No Fragments

• Reserved

Note: The IPv6 header is first shown in the traditional (RFC 2460), big-endian representation. The actual

data is fetched from memory in little-endian format.

0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 20 1 2 3 4 5 6 7 8 9 30 1

Version Traffic Class Flow Label

Payload Length Next Header Hop Limit

Source Address

Destination Address

Figure 3-10. IPv6 TCP Header (Traditional Representation)

A TCP or UDP frame uses a 16 bit wide one’s complement checksum. The checksum word is computed on the outgoing TCP or UDP header and payload, and on the Pseudo Header. Details on checksum computations are provided in Section 3.5. TCP requires the use of checksum, where it is optional for UDP.

Software Developer’s Manual 61

Receive and Transmit Description

The TCP header is first shown in the traditional (RFC 793) representation. Because byte and bit ordering is confusing in that representation, the TCP header is also shown in little-endian format. The actual data is fetched from memory in little-endian format.

1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

Source Port Destination Port

TCP Header

Length

Reserved

Checksum Urgent Pointer

Figure 3-11. TCP Header (Traditional Representation)

Byte3 Byte2 Byte1 Byte0

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0

Sequence Number

Acknowledgement Number

S T

Options

Window

Destination Port Source Port

LSB Sequence Number MSB

Acknowledgement Number

Window

Urgent Pointer Checksum

R E

Options

TCP

Header

Length

Reserved

Figure 3-12. TCP Header (Little-Endian)

The TCP header is always a multiple of 32 bit words. TCP options may occupy space at the end of the TCP header and are a multiple of 8 bits in length. All options are included in the checksum.

The checksum also covers a 96-bit pseudo header conceptually prefixed to the TCP Header (see

Figure 3-13 and Figure 3-14). The IPv4 pseudo header contains the IPv4 Source Address, the IPv4

Destination Address, the IPv4 Protocol field, and TCP Length. The IPv6 pseudo header contains the IPv6 Source Address, the IPv6 Destination Address, the IPv6 Payload Length, and the IPv6 Next Header field. Software pre-calculates the partial DA and protocol types, but not

the TCP length, and stores this value into the TCP checksum field

pseudo header sum, which includes IPv4 SA,

of the packet.

The Protocol ID field should always be added the least significant byte (LSB) of the 16 bit pseudo header sum, where the most significant byte (MSB) of the 16 bit sum is the byte that corresponds to the first checksum byte out on the wire.

The TCP Length field is the TCP Header Length including option fields plus the data length in bytes, which is calculated by hardware on a frame by frame basis. The TCP Length does not count the 12 bytes of the pseudo header. The TCP length of the packet is determined by hardware as:

62 Software Developer’s Manual

Receive and Transmit Description

TCP Length = Payload + HDRLEN - TUCSS

“Payload” is normally MSS except for the last packet where it represents the remainder of the payload.

031

IP Source Address

IP Destination Address

Zero

Layer 4 Protocol

TCP Length

Figure 3-13. TCP Pseudo Header Content (Traditional Representation)

IP Source Address

IP Destination Address

Upper Layer Packet Length

Zero Next Header

Figure 3-14. TCP PseudoHeader Content for IPv6

Note: The IP Destination address is the final destination of the packet. Therefore, if a routing header is

used, the last address in the route list is used in this calculation. The upper-layer packet length is the length of the TCP header and the TCP payload.

The UDP header is always 8 bytes in size with no options.

1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

Source Port Destination Port

Length Checksum

Figure 3-15. UDP Header (Traditional Representation)

Byte3 Byte2 Byte1 Byte0

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

Destination Port Source Port

Checksum Length

Figure 3-16. UDP Header (Little-Endian Order)

Software Developer’s Manual 63

Receive and Transmit Description

UDP pseudo header has the same format as the TCP pseudo header. The IPv4 pseudo header conceptually prefixed to the UDP header contains the IPv4 source address, the IPv4 destination address, the IPv4 protocol field, and the UDP length (same as the TCP Length discussed above). The IPv6 pseudo header for UDP is the same as the IPv6 pseudo header for TCP. This checksum procedure is the same as is used in TCP.

Figure 3-17. UDP Pseudo Header Diagram for IPv4

IP Source Address

IP Destination Address

Zero

Upper Layer Packet Length

Zero Next Header

Layer 4

Protocol ID

IP Source Address

IP Destination Address

UDP Length

Figure 3-18. UDP PseudoHeader Diagram for IPv6

Note: The IP Destination Address is the final destination of the packet. Therefore, if a routing header is

used, the last address in the route list is used in this calculation. The upper-layer packet length is the length of the UDP header and UDP payload.

Unlike the TCP checksum, the UDP checksum is optional. Software must set the TXSM bit in the TCP/IP Context Transmit Descriptor to indicate that a UDP checksum should be inserted. Hardware does not overwrite the UDP checksum unless the TXSM bit is set.

3.5.8 Transmit Checksum Offloading with TCP Segmentation

The Ethernet controller supports checksum off-loading as a component of the TCP Segmentation offload feature and as a standalone capability. Section 3.5.8 describes the interface for controlling the checksum off-loading feature. This section describes the feature as it relates to TCP Segmentation.

The Ethernet controller supports IP and TCP/UDP header options in the checksum computation for packets that are derived from the TCP Segmentation feature. The Ethernet controller is capable of computing one level of IP header checksum and one TCP/UDP header and payload checksum. In case of multiple IP headers, the driver has to compute all but one IP header checksum. The Ethernet controller calculates checksums on the fly on a frame by frame basis and inserts the result in the IP/TCP/UDP headers of each frame. TCP and UDP checksum are a result of performing the checksum on all bytes of the payload and the pseudo header.

64 Software Developer’s Manual

Three specific types of checksum are supported by the hardware in the context of the TCP Segmentation offload feature:

• IPv4 checksum (IPv6 does not have a checksum)

• TCP checksum

• UDP checksum

Each packet that is sent via the TCP segmentation offload feature optionally includes the IPv4 checksum and either the TCP or UDP checksum.

All checksum calculations use a 16-bit wide one’s complement checksum. The checksum word is calculated on the outgoing data. The checksum field is written with the 16 bit one’s complement of the one’s complement sum of all 16-bit words in the range of CSS to CSE, including the checksum field itself.

3.5.9 IP/TCP/UDP Header Updating

IP/TCP/UDP header is updated for each outgoing frame based on the IP/TCP header prototype which hardware transfers from the first descriptor(s) and stores on chip. The IP/TCP/UDP headers are fetched from host memory into an on-chip 240 byte header buffer once for each TCP segmentation context (for performance reasons, this header is not fetched again for each additional packet that is derived from the TCP segmentation process). The checksum fields and other header information are later updated on a frame by frame basis. The updating process is performed concurrently with the packet data fetch.

Receive and Transmit Description

The following sections define which fields are modified by hardware during the TCP Segmentation process by the Ethernet controller. Figure 3-19 illustrates the overall data flow.

Software Developer’s Manual 65

Receive and Transmit Description

PCI F IFO

IP/TC P Header

Packet Data

HOST Memory

Descriptors fetch

IP/TCP Header Buff er

TCP Segmentation Data Flow

Header processing

IP/TC P Header

Protot y pe f etch

Packet Data Fetch

Checksum

Calcul ations

Data Fetch Pause Checksum Header

Insertion

Header Update

Header proc essi ng

Data Fetch

resume

Checksum

Calc ulations

Check sum Calculation

Data F etc h Pause

Check sum Header

Insertion

TX Packet FIFO

Tim e

Eve nts Scheduling

Figure 3-19. Overall Data Flow

66 Software Developer’s Manual

3.5.9.1 TCP/IP/UDP Header for the First Frame

The hardware makes the following changes to the headers of the first packet that is derived from each TCP segmentation context.

• IPv4 Header

— IP Total Length = MSS + HDRLEN – IPCSS

— IP Checksum

— IPv6 Header

— Payload Length = MSS + HDRLEN - IPCSS

• TCP Header

— Sequence Number: The value is the Sequence Number of the first TCP byte in this frame.

— If FIN flag = 1b, it is cleared in the first frame.

— If PSH flag =1b, it is cleared in the first frame.

— TCP Checksum

• UDP Header

— UDP length: MSS + HDRLEN - TUCSS

Receive and Transmit Description

— UDP Checksum

3.5.9.2 TCP/IP/UDP Header for the Subsequent Frames

The hardware makes the following changes to the headers for subsequent packets that are derived as part of a TCP segmentation context:

Note: Number of bytes left for transmission = PAYLEN – (N * MSS). Where N is the number of frames

that have been transmitted.

• IPv4 Header

— IP Identification: incremented from last value (wrap around)

— IP Total Length = MSS + HDRLEN – IPCSS

— IP Checksum

• IPv6 Header

• Payload Length = MSS + HRDLEN - IPCSS

• TCP Header

— Sequence Number update: Add previous TCP payload size to the previous sequence

number value. This is equivalent to adding the MSS to the previous sequence number.

— If FIN flag = 1b, it is cleared in these frames.

— If PSH flag =1b, it is cleared in these frames.

— TCP Checksum

• UDP Header

— UDP Length: MSS + HDRLEN – TUCSS

— UDP Checksum

Software Developer’s Manual 67

Receive and Transmit Description

3.5.9.3 TCP/IP/UDP Header for the Last Frame

The controller makes the following changes to the headers for the last frame of a TCP segmentation context:

Note: Last frame payload bytes = PAYLEN – (N * MSS)

• IPv4 Header

— IP Total Length = (last frame payload bytes + HDRLEN) – IPCSS

— IP Identification: incremented from last value (wrap around)

— IP Checksum

• IPv6 Header

• Payload Length = MSS + HDRLEN - IPCSS

• TCP Header

— Sequence Number update: Add previous TCP payload size to the previous sequence

number value. This is equivalent to adding the MSS to the previous sequence number.

— If FIN flag = 1b, set it in this last frame

— If PSH flag =1b, set it in this last frame

— TCP Checksum

• UDP Header

— UDP length: (last frame payload bytes + HDRLEN) - TUCSS

— UDP Checksum

3.6 IP/TCP/UDP Transmit Checksum Offloading

The previous section on TCP Segmentation offload describes the IP/TCP/UDP checksum offloading mechanism used in conjunction with TCP Segmentation. The same underlying mechanism can also be applied as a standalone feature. The main difference in normal packet mode (non-TCP Segmentation) is that only the checksum fields in the IP/TCP/UDP headers need to be updated.

Before taking advantage of the Ethernet controller’s enhanced checksum offload capability, a checksum context must be initialized. For the normal transmit checksum offload feature, this task is performed by providing the Ethernet controller with a TCP/IP Context Descriptor with TSE = 0b to denote a non-segmentation context. For additional details on contexts, refer to Section 3.3.5. Enabling the checksum offloading capability without first initializing the appropriate checksum context leads to unpredictable results. Once the checksum context has been set, that context, is used for all normal packet transmissions until a new context is loaded. Also, since checksum insertion is controlled on a per packet basis, there is no need to clear/reset the context.

The Ethernet controller is capable of performing two transmit checksum calculations. Typically, these would be used for TCP/IP and UDP/IP packet types, however, the mechanism is general enough to support other checksums as well. Each checksum operates independently and provides identical functionality. Only the IP checksum case is discussed as follows.

68 Software Developer’s Manual

Receive and Transmit Description

Three fields in the TCP/IP Context Descriptor set the context of the IP checksum offloading feature:

• IPCSS

This field specifies the byte offset form the start of the transferred data to the first byte to be included in the checksum. Setting this value to 0b means that the first byte of the data is included in the checksum. The maximum value for this field is 255. This is adequate for typical applications.

Note: The IPCSS value needs to be less than the total DMA length to a packet. If this is not the case, the

result will be unpredictable.

• IPCSO

This field specifies where the resulting checksum should be placed. Again, this is limited to the first 256 bytes of the packet and must be less than or equal to the total length of a given packet. If this is not the case, the checksum is not inserted.

• IPCSE

This field specifies where the checksum should stop. A 16-bit value supports checksum offloading of packets as large as 64KB. Setting the IPCSE field to all zeros means End-ofPacket. In this way, the length of the packet does not need to be calculated.

As mentioned above, it is not necessary to set a new context for each new packet. In many cases, the same checksum context can be used for a majority of the packet stream. In this case, some of the offload feature only for a particular traffic type, thereby avoiding all context descriptors except for the initial one.

Software Developer’s Manual 69

Receive and Transmit Description

Note: This page intentionally left blank.

70 Software Developer’s Manual

PCI Local Bus Interface

PCI Local Bus Interface 4

The PCI/PCI-X Family of Gigabit Ethernet Controllers are PCI 2.2 or 2.3 compliant devices and implement the PCI-X Addendum to the PCI Local Bus Specification, Revision 1.0.

Note: The 82540EP/EM, 82541xx, and 82547GI/EI do not support PCI-X mode.

4.1 PCI Configuration

The PCI Specification requires implementation of PCI Configuration registers. After a system reset, these registers are initially configured by the BIOS, and/or a “Plug and Play” aware Operating System (OS). Device drivers read these registers to determine what resources (interrupt number, memory mapping location, etc.) the BIOS and/or OS assigned to the Ethernet controller.

The 82547GI/EI uses a dedicated CSA port for its system bus connection. Logically, it still follows PCI configuration. However, some configuration parameters, such as cache line, are irrelevant. Additionally, the 82547GI/EI requires special interrupt configuration in the BIOS (see Section

4.5).

Note: The 82547GI/EI does not support 64-bit addressing.

Four different regions of the PCI configuration space are used.

Address Item Description

00h-3Ch PCI Section 2.3.1

DCh-E0h PCI Power Management Section 6.3.3

E4h-E8h PCI-X Section 4.1.1

F0h-FCh Message Signaled Interrupt

a. Not applicable to the 82541xx and 82547GI/EI.

These spaces are linked into a linked list using the Capabilities Pointer field (Cap_Ptr) in the PCI Configuration section.

The implementation of the PCI registers for the PCI/PCI-X Family of Gigabit Ethernet Controllers are listed in Table 4-1:

Table 4-1. Mandatory PCI Registers

Byte Offset Byte 3 Byte 2 Byte 1 Byte 0

0h Device ID Vendor ID

4h Status Register Command Register

8h Class Code (020000h) Revision ID

Ch BIST (00h)

10h Base Address 0

4h Base Address 1

18h Base Address 2

Header Type

(00h)

Latency

Timer

Section 4.1.3.1

Cache Line

Size

Software Developer’s Manual 71

PCI Local Bus Interface

1Ch Base Address 3 (unused)

20h Base Address 4 (unused)

2h4 Base Address 5 (unused)

28h Cardbus CIS Pointer (not used)

2Ch Subsystem ID Subsystem Vendor ID

30h Expansion ROM Base Address

34h Reserved Cap_Ptr

38h Reserved

3Ch

a. Refer to Table 4-2.

The following list provides explanations of the various PCI registers and their bit fields:

Vendor ID This uniquely identifies all Intel PCI products. This field may be auto-loaded

Device ID This uniquely identifies the Ethernet controller. This field may be autoloaded

Max_Latency

(00h)

Min_Grant

(FFh)

Interrupt Pin

(01h)

Interrupt Line

from the EEPROM at power on or upon the assertion of PCI_RST#. A value of 8086h is the default for this field upon power up if the EEPROM does not respond or is not programmed.

from the EEPROM at power on or upon the assertion of RST#. The default value for this field is used upon power up if the EEPROM does not respond or is not programmed.

Command Reg. The layout is listed in Table 4-3. Shaded bits are not used by this implementation

and are hard wired to 0b.

Status Register The layout is listed in Table 4-4. Shaded bits are not used by this implementation

and are hard wired to 0b.

Revision Sequential stepping number starting with 00h for the A0 revision of the Ethernet

controller. Refer to the PCI/PCI-X Family of Gigabit Ethernet Controllers Specification Update for the latest stepping information.

Class Code The class code, 020000h identifies the Ethernet controller as an Ethernet adapter.

72 Software Developer’s Manual

PCI Local Bus Interface

Cache Line Size1 Used to store the cache line size. The value is in units of 4 bytes. A system with a

cache line size of 64 bytes sets the value of this register to 10h. The only sizes that are supported are 16, 32, 64, and 128 bytes. All other sizes are treated as 0b. See the information about exceptions in Section 4.4.

Unsupported values affect PCI cache line support. All writes default to using the memory write (MW) command, and memory read command determination uses a cache line size of 32 bytes.

Latency Timer The lower two bits are not implemented and return 0b. The upper six bits are

Read/Write.

Header Type This is for a normal single function Ethernet controller and reads 00h.

BIST Built in Self-test is not implemented as supportable from PCI configuration

space in this version of the Ethernet controller.

Base Address Registers

The Base Address Registers (or BARs) are used to map the Ethernet controller’s register space and flash to system memory space. In PCI-X mode or in PCI mode when the BAR32 bit of the EEPROM is 0b, two registers are used for each of the register space and the flash memory in order to map 64-bit addresses. In PCI mode, if the BAR32 bit in the EEPROM is 1b, one register is used for each to map 32-bit addresses.

64-bit BARs PCI-X mode with BAR32 bit in the EEPROM set to 0b.

Table 4-2. Base Address Registers

BAR Addr. 31 4 3 2 1 0

0 10h

1 14h Memory Register Base Address (bits 63:32)

2 18h

31ChMemory Flash Base Address (bits 63:32)

4 20h IO Register Base Address (bits 31:2) 0b mem

5 24h Reserved (read as all 0b’s)

Memory Register Base Address (bits 31:4)

Memory Flash Base Address (bits 31:4)

32-bit BARs Conventional PCI mode with BAR32 bit in the EEPROM set to 1b

BAR Addr. 31 4 3 2 1 0

0 10h Memory Register Base Address pref. type mem

1 14h Memory Flash Base Address pref. type mem

2 18h IO Register Base Address (bits 31:2) 0b mem

3 1Ch Reserved (read as all 0b’s)

4 20h Reserved (read as all 0b’s)

5 24h Reserved (read as all 0b’s)

pref. type mem

1. Not applicable to the 82547GI/EI.

Software Developer’s Manual 73

PCI Local Bus Interface

All base address registers have the following fields:

Field Bit(s)

Mem 0 R

Type 2:1 R

Prefetch 3 R 0b

Address 31:0 R/W 0b

Read/

Write

0b for mem

1b for I/O

00b for 32bit

10b for 64bit

Initial Val ue

Description

0b indicates memory space. 1b indicates I/O.

Indicates the address space size.

00b = 32-bit

10b = 64-bit

0b = non-prefetchable space

1b = prefetchable space

Ethernet controller implements non-prefetchable space since it has read side-effects.

The lower bits of the address are hard-wired to 0b. The upper bits can be written by the system software to set the base address of the register or flash address space.

The memory register space is 128K bytes. The

Memory Register BAR has:

• Bits 16:4 are hard-wired to 0b.

• Bits 63:17 or 31:17 are read/write.

The size of the flash space can very between 64 KB and 512 KB depending on the FLASH size read from the EEPROM. The Memory Flash BAR has these

characteristics:

Flash Size Valid Bits Zero Bits

(R/W) (RO)

• 64 KB 63/31:16 15:4

• 128 KB 63/31:17 16:4

• 256 KB 63/31:18 17:4

• 512 KB 63/31:19 18:4

The size of the IO register space is 8 bytes. The I/O Register BAR has:

• Bit 2 hard-wired to 0b

• Bits 31:3 as read/write

74 Software Developer’s Manual

Expansion ROM Base Address

This register is used to define the address and size information for boottime access to the optional Flash memory.

31 11 10 1 0

Expansion Rom Base Address Reserved En

PCI Local Bus Interface

Field Bit(s)

En 0 R/W 0b

Reserved 10:1 R 0b Always read as 0b. Writes are ignored.

Address 31:11 R/W 0b

Read/

Write

Initial Val ue

Description

1b = Enables expansion ROM access.

0b = Disables expansion ROM access.

The lower bits of the address are hard-wired to 0b. The upper bits can be written by the system software to set the base address of the register or flash address space.

Since the flash is used as the expansion ROM, the size of the expansion ROM can very between 64 KB and 512 KB, depending on the FLASH size read from the EEPROM.

Flash Size Valid Bits Zero Bits:

• 64 KB 63/31:16 15:11

• 128 KB 63/31:17 16:11

• 256 KB 63/31:18 17:11

• 512 KB 63/31:19 18:11

CardBus CIS Pointer (82541PI/GI/EI and 82540EP Only)

When the Enable CLK_RUN# bit of the EEPROM’s Initialization Control Word 2 and the 64/32 BAR bit of the EEPROM Initialization Control Word 1 (indicating a 32-bit BAR) are both set to 1b, the Cardbus CIS Pointer contains a value of 00000022h. Otherwise, it contains a value of 00000000h.

31 3 2 0

Offset Space

Software Developer’s Manual 75

PCI Local Bus Interface

Field Bit(s)

Space 2:0 R/W 0 or 2

Offset 31:3 R 0 or 4

Read/

Write

Initial Value

Description

Indicates the address space where the CIS is located.

0 = Configuration Space

1 = BAR0

2 = BAR1

3 = BAR2

4 = BAR3

5 = BAR4

6 = BAR5

7 = Expansion ROM

Offset within the specified address space, multiplied by eight. When enabled, the value indicates that the CIS (Card Information Structure) is at an offset of 4*8, or 32 bytes into the Flash memory.

Subsystem ID This value can be loaded automatically from the EEPROM upon power-up or

PCI reset. A value of 1008h is the default for this field upon power-up if the EEPROM does not respond or is not programmed.

Subsystem Vendor ID

This value can be loaded automatically from the EEPROM upon power-up or PCI reset. A value of 8086h is the default for this field upon power-up if the EEPROM does not respond or is not programmed.

Cap_Ptr The Capabilities Pointer field (Cap_Ptr) is an 8-bit field that provides an offset in

the Ethernet controller’s PCI Configuration Space for the location of the first item in the Capabilities Linked List. The Ethernet controller sets this bit and then implements a capabilities list to indicate that it supports PCI Power

Management, PCI-X, and Message Signaled Interrupts

is the address of the first entry: ACPI

Address Item Next Pointer

DCh-E0h ACPI Power Management E4h

E4h-E8h PCI-X F0h

F0h-FCh Message Signaled Interrupt 00h

Figure 4-1. Capabilities Linked List

In conventional PCI mode, Message Signaled interrupts can be disabled in the EEPROM. If disabled, the message signaled interrupts won’t appear on the linked list and PCI-X’s “Next Pointer” is 0b.

1. Not applicable to the 82541xx or 82547GI/EI.

2. Not applicable to the 82541ER.

Power Management.

. Its value is DCh which

76 Software Developer’s Manual

PCI Local Bus Interface

Max_Lat/Min_Gnt

Interrupt Pin

The Ethernet controller places a very high load on the PCI bus during peak transmit and receive traffic. In full duplex mode, it has a peak throughput demand of 250 MB/sec. The peak delivered bandwidth on a 64-bit PCI bus at 33 MHz is 264 MB/sec, so the bus is fully saturated when transmit and receive are operating simultaneously. In half duplex operation, the Ethernet controller has a peak throughput demand of 125 MB/sec, which still puts an enormous load on the PCI bus. Consequently, the Max_Lat should be small and is set to 00h, and Min_Gnt is set to FFh indicating that the Ethernet controller requires a very high priority and time slice.

Read only register indicating which interrupt line (INTA# vs. INTB#) the 82546GB/EB uses. A value of 1b indicates that the 82546GB/EB uses INTA# (as with all single-port Ethernet controllers). A value of 10b indicates that the 82546GB/EB uses INTB#.

For each separate device/function within the Ethernet controller, the value reported here is based on the EEPROM Initialization Control Word 3 associated with this controller, as well as whether both device/functions are enabled. Provided both functions are enabled, then the value reported for each specific function is based on the Interrupt Pin field of each Ethernet controller’s Initialization Control Word 3.

If only a single internal device/function is enabled, then the value reported here is 1b regardless of EEPROM configuration.

Interrupt Line Read write register programmed by software to indicate which of the system

interrupt request lines this Ethernet controller’s interrupt pin is bound to. See the PCI definition for more details.

Table 4-3. Command Register Layout

15 10 9 0

Reserved Command Bits

Bit(s) Initial Value Description

0 0b I/O Access Enable.

1 0b Memory Access Enable.

20b

3 0b Special Cycle Monitoring.

1. This bit is a don’t care for the 82547GI/EI.

Enable Mastering. Ethernet controller in PCI-X mode is permitted to initiate a split completion transaction regardless of the state of this bit.

Software Developer’s Manual 77

PCI Local Bus Interface

Bit(s) Initial Value Description

40b

5 0b Palette Snoop Enable.

60b

7 0b Wait Cycle Enable.

8 0b SERR# Enable (not applicable to the 82547GI/EI).

9 0b Fast Back-to-Back Enable.

15:10

15:11

a. 82541xx and 82547GI/EI only.

0b Interrupt Disable (INTA# or CSA signaled).

0b Reserved.

Table 4-4. Status Register Layout

15 4 3 0

Status Bits Reserved

Bit(s) Initial Value Description

Memory Write and Invalidate Enable (not applicable to the 82547GI/EI).

Parity Error Response (not applicable to the 82547GI/EI).

3:0

2:0

0b Reserved.

Interrupt Status. This bit is 1b when the Ethernet

controller is generating an interrupt internally. When Interrupt Disable in the Command Register is also cleared, the Ethernet controller asserts INTA# or signal an interrupt over CSA.

New Capabilities: Indicates that an Ethernet controller implements Extended Capabilities. The

41b

Ethernet controller sets this bit and implements a capabilities list to indicate that it supports PCI Power Management, PCI-X Bus, and message signaled interrupts.

5 1b 66 MHz Capable (don’t care for the 82547GI/EI).

6 0b UDF Supported. Hardwired to 0b for PCI 2.3a.

Fast Back-to-Back CapableThis bit must be

7 0b

cleared to 0b in PCI-X mode (not applicable to the 82547GI/EI).

8 0b Data Parity Reported.

10:9 01b

DEVSEL Timing (indicates medium device). Not applicable to the 82547GI/EI.

11 0b Signaled Target Abort.

78 Software Developer’s Manual

Bit(s) Initial Value Description

12 0b Received Target Abort.

13 0b Received Master Abort.

PCI Local Bus Interface

14 0b

15 0b

a. 82541xx and 82547GI/EI only.

Signaled System Error (not applicable to the 82547GI/EI).

Detected Parity Error (not applicable to the

82547GI/EI).

4.1.1 PCI-X Configuration Registers

The Ethernet controller supports additional configuration registers that are specific to PCI-X. These registers are visible in conventional PCI and PCI-X modes, although they only affect the operation of PCI-X mode. The PCI-X registers are linked into the Capabilities linked list.

Note: The 82540EP/EM, 82541xx, and 82547GI/EI do not support PCI-X mode.

Byte Offset Byte 3 Byte 2 Byte 1 Byte 0

E4h PCI-X Command Next Capability PCI-X Capability ID

E8h PCI-X Status

Figure 4-2. PCI-X Capability Registers

4.1.1.1 PCI-X Capability ID

Bits

7:0 R 7

Read/

Write

Initial Value

Description

Capability ID - Identifies the PCI-X register set in the capabilities

linked list.

4.1.1.2 Next Capability

Bits

7:0 R F0

a. In conventional PCI mode, Message Signaled Interrupts can also be disabled in the EEPROM. If disabled, the Message

Signaled Interrupt registers are not visible, and PCI-X’s “Next Capability” pointer is 0b.

Read/

Write

Software Developer’s Manual 79

Initial Value

Description

Next Capability – points to the next capability in the capabilities

linked list.

PCI Local Bus Interface

4.1.1.3 PCI-X Command

15 7 6 4 3 2 1 0

Reserved

Bits

0RW0b

1RW1b

3:2 RW 0b

6:4 RW 0b

15:7 R 0b Reserved. Reads as 0b

Read Write

Initial Value

Data Parity Error Recovery Enable. If this bit is 1b, the Ethernet

controller attempts to recover from Parity errors. If this bit is 0b, the Ethernet controller asserts SERR# (if enabled) whenever the Master Data Parity Error bit (Status Register, bit 8) is set.

Enable Relaxed Ordering. If this bit is set, the Ethernet controller sets the Relaxed Ordering attribute bit in some transactions.

Maximum Memory Read Byte Count. This register sets the maximum byte count the Ethernet controller uses for a Memory Read Sequence. The allowable values are:

0 512

1 1024

2 2048

3 4096

Maximum Outstanding Split Transactions. This register sets the maximum number of outstanding split transactions that the Ethernet controller uses. The Ethernet controller is only allowed to have one outstanding split transaction at any time.

1 2

4 8

5 12

6 16

7 32

Max. Split Trans-

Maximum Byte Count

Maximum Outstanding Transactions

actions

Description

Read

Count

RO DP

80 Software Developer’s Manual

4.1.1.4 PCI-X Status

31 29 28 26 25 23 22 21 20 19 18 17 16 15 8 7 3 2 0

Read

Size

Max.

SplitRdByte

Res.

Cplx USC SCD 133 64b Bus Number

PCI Local Bus Interface

Device

Number

Func.

Num.

Bits

2:0 R 0b

7:3 R 1Fh

15:8 R FFh

16 R 1b

17 R 1b

20 R 0b

22:21 R 2b

Read/

Write

read, write 1b to clear

Intial

Value

Description

Function Number. This number forms part of the Requester and

Completer IDs for PCI-X transactions.

Device Number. The system assigns a device number (other than 0b) to the Ethernet controller. It forms part of the Requester and Completer IDs for PCI-X transactions. The Ethernet controller updates this register with the contents of AD[15:11] on any Type 0 Configuration Write cycle.

Bus Number. This indicates the bus the Ethernet controller is placed on. It forms part of the Requester and Completer IDs for PCI-X transactions. The Ethernet controller updates this register with the contents of AD[7:0] on any Type 0 Configuration Write cycle.

64-bit Device. This indicates the Ethernet controller is a 64-bit device. It

does not indicate the current bus width. It is loaded from the EEPROM Initialization Control Word 2 (see Section 5.6.12).

133 MHz Capable. A 1b indicates that the Ethernet controller is capable of operating at 133 MHz in PCI-X mode. A 0b indicates 66 MHz capability.

This bit is loaded from the EEPROM Initialization Control Word 2 (see

Section 5.6.12).

Split Completion Discarded. (Write 1b to clear) This bit is set if the Ethernet controller discards a Split Completion because the requester would not accept it.

Unexpected Split Completion. (Write 1b to clear) This bit indicates whether the Ethernet controller received an unexpected Split Completion with its requestor ID.

Device Complexity. A 0b indicates the Ethernet controller is a simple device. A 1b indicates that the Ethernet controller is a bridge.

Designed Maximum Memory Read Byte Count. Indicates the maximum memory read byte count the Ethernet controller is designed to generate.

0 512

1 1024

2 2048

3 4096

The value of this register depends on the Max_Read bit in the EEPROM’s Initialization Control Word 2 (see Section 5.6.12).

• Max_Read = 0b then value = 2 (2 KB)

• Max_Read = 1b then value = 3 (4 KB)

Maximum Byte Count

Software Developer’s Manual 81

PCI Local Bus Interface

Bits

25:23 R 0b

28:26 R

31:30 R 0b Reserved. Reads as 0b

a. Loaded from EEPROM.

Read/

Write

Read, write 1b to clear

Intial

Value

Designed Maximum Outstanding Split Transactions. A 0b indicates that the Ethernet controller is designed to have at the most one outstanding transaction.

0 1

1 2

2 3

3 4

4 8

5 12

6 16

7 32

Designed Maximum Cumulative Read Size. Indicates a number that is greater or equal maximum cumulative outstanding bytes to be read at one time.

0 1 KB

1 2 KB

2 4 KB

3 8 KB

4 16 KB

5 32 KB

6 64 KB

7 128 KB

The value of this register depends on the DMCR_Map and Max_Read bits in the EEPROM’s Initialization Control Word 2 (see Section 5.6.12).

(see Description)

• DMCR_Map = 0b:

The value of this register reflects the number of bytes programmed in the

Maximum Memory Read Byte Count (MMRBC) field of the PCI-X Command Register as follows:

• MMRBC = 0 (512) - DMCRS = 0 (1KB)

• MMRBC = 1 (1K) - DMCRS = 0 (1KB)

• MMRBC = 2 (2K) - DMCRS = 1 (2KB)

• MMRBC = 3 (4K) - DMCRS = 2 (4KB)

• DMCR_Map = 1b and Max_Read = 0b: DMCRS = 1 (2KB)

• DMCR_Map = 1b and Max_Read = 1b: DMCRS = 2 (4KB)

Received Split Completion Error Message. This bit is set if the Ethernet controller receives a Split Completion Message with the Split Completion Error attribute bit set.

Maximum Outstanding Transactions

Maximum Outstanding Bytes

Description

4.1.2 Reserved and Undefined Addresses

Any PCI or PCI-X register address space not explicitly declared in this specification should be considered to be reserved, and should not be written. Writing to reserved or undefined configuration register addresses can cause indeterminate behavior. Reads from reserved or undefined configuration register addresses can return indeterminate values.

82 Software Developer’s Manual

PCI Local Bus Interface

4.1.3 Message Signaled Interrupts

Message Signaled Interrupt (MSI) capability is optional for PCI 2.2 or 2.3, but required for PCI-X. When Message Signaled Interrupts are enabled, instead of asserting an interrupt pin, the Ethernet controller generates an interrupt using a memory write command. The address and most of the data of the command are determined by the system and programmed in configuration registers. This permits the system to program a different message for each function so it can speed up interrupt delivery.

To enable Message Signaled Interrupts, the system software writes to the “MSI Enable” bit in the MSI “Message Control” register. When Message Signaled Interrupts are enabled, the Ethernet controller no longer asserts its INTA# pin to signal interrupts.

MSI systems allow a function to request up to 32 messages, but does not guarantee that all of them are allocated. The Ethernet controller supports only a single message. When Message Signaled Interrupts are enabled, the Ethernet controller generates a message when any of the unmasked bits in the Interrupt Cause Read register (ICR) are set to 1b. The Ethernet controller does not generate the message again until the ICR is read and a subsequent interrupt event occurs.

In conventional PCI mode, Message Signaled Interrupts can also be disabled in the EEPROM. If MSI is disabled, the Message Signaled Interrupt registers is not visible.

4.1.3.1 Message Signaled Interrupt Configuration Registers

Byte Offset Byte 3 Byte 2 Byte 1 Byte 0

F0h Message Control Next Capability

F4h Message Address

F8h Message Upper Address

FCh Reserved Message Data

MSI

Capability ID

Figure 4-3. Message Signaled Interrupt Configuration Registers

4.1.3.1.1 MSI Capability ID

Bits

7:0 R 05h

Read/

Write

Initial Value

Description

Capability ID - Identifies the Message Signaled Interrupt register set in

the capabilities linked list.

4.1.3.1.2 Next Capability

Bits

7:0 R 00h

1. Not applicable to the 82541xx or 82547GI/EI.

Read/

Write

Software Developer’s Manual 83

Initial Value

Description

Next Capability – points to the next capability in the capabilities

linked list. Its value is 0b since the Message Signaled Interrupt is the last item in the list.

PCI Local Bus Interface

4.1.3.1.3 Message Control

15 8 7 6 4 3 1 0

Reserved 64b

Multiple

Enable

Multiple

Capable

Bits

0R 0b

3:1 R 0b

6:4 RW 0b

7R 1b

15:8 R 0b Reserved. Reads as 0b.

Read/ Write

Initial Value

MSI Enable. If 1b, Message Signaled Interrupts

Ethernet controller generates Message Signaled Interrupts instead of asserting INTA#.

Multiple Message Capable. Indicates the number of messages requested. The Ethernet controller only requests one message.

0 1

1 2

2 4

3 8

4 16

5 32

6 Reserved

7 Reserved

Multiple Message Enable. Written by the system to indicate the number of messages allocated. Since the Ethernet controller only supports one message, the system should never write a value other than 0b.

64-bit capable. A value of 1b indicates that the Ethernet controller is capable of generating 64-bit message addresses.

Number of messages

Description

are enabled and the

a. Not applicable to the 82541xx or 82547GI/EI.

84 Software Developer’s Manual

4.1.3.1.4 Message Address

PCI Local Bus Interface

Bits

31:0 RW 0b

Read/

Write

Initial Value

Message Address – Written by the system to indicate the lower 32-

bits of the address to use for the MSI memory write transaction. The lower two bits are always written as 0b.

4.1.3.1.5 Message Upper Address

Bits

31:0 RW 0b

Read/

Write

Initial Value

Message Upper Address – Written by the system to indicate the

upper 32-bits of the address to use for the MSI memory write transaction.

4.1.3.1.6 Message Data

Bits

15:0 RW 0b

Read/

Write

Initial Value

Message Data – Written by the system to indicate the lower 16 bits of

the data written in the MSI memory write DWORD transaction. The upper 16 bits of the transaction are written as 0b.

4.2 Commands

Description

The Ethernet controller is capable of decoding and encoding commands for both PCI and PCI-X modes. The difference between PCI and PCI-X commands is noted in Table 4-5.

Table 4-5. PCI and PCI-X Encoding Difference

C/BE

Encoding

0h Interrupt Acknowledge Interrupt Acknowledge

1h Special Cycle Special Cycle

2h I/O Read IOR I/O Read IOR

3h I/O Write IOW I/O Write IOW

4h Reserved Reserved

5h Reserved Reserved

6h Memory Read MR Memory Read DWORD MRD

7h Memory Write MW

8h Reserved Alias to MRB AMR

9h Reserved Alias to MWB AMW

Ah Configuration Read CFR Configuration Read CFR

Bh Configuration Write CFW Configuration Write CFW

Ch Memory Read Multiple MRM Split Completion SC

PCI Commands Abr. PCI-X Commands Abr.

Software Developer’s Manual 85

PCI Local Bus Interface

Table 4-5. PCI and PCI-X Encoding Difference

C/BE

Encoding

Dh Dual Address Cycle DAC Dual Address Cycle DAC

Eh Memory Read Line MRL Memory Read Block MRB

Fh Memory Write & Invalidate MWI Memory Write Block MWB

PCI Commands Abr. PCI-X Commands Abr.

As a target, the Ethernet controller only accepts transactions that address its BARs or a configuration transaction in which its IDSEL input is asserted. In PCI-X mode, the Ethernet controller also accepts split completion for an outstanding memory read command that it has requested. The Ethernet controller does not respond to Interrupt Acknowledge or Special Cycle in either mode.

Table 4-6. Accepted PCI/PCI-X Command as a Target

Transaction Target PCI Commands PCI-X Commands

Configuration Read CFR CFR

Configuration Write CFW CFW

Memory Read Completion N/A SC

As a master, the Ethernet controller generates Read and Write commands for different causes as listed in Table 4-7. The addresses of these transactions are programmed either by system software or the software driver. The Ethernet controller always expects that they are claimed by one of the devices on the bus segment. The Ethernet controller never generates Interrupt Acknowledge, Special Cycle, I/O commands, or Configuration Commands.

Table 4-7. Generated PCI/PCI-X as a Master

Transaction Cause PCI Commands PCI-X Commands

CMD RO

Tx Descriptor Read MR,MRL,MRM MRB 1

Tx Descriptor Write back MW,MWI MWB 0

Tx Data Read MR, MRL,MRM MRB 1

Rx Descriptor Read MR,MRL,MRM MRB 1

Rx Descriptor Write back MW,MWI MWB 0

Rx Data Write MW,MWI MWB 1

Message Signaled Interrupt

Split Completion N/A SC N/A

a. Not applicable to the 82541xx or 82547GI/EI.

MW MWB 0

Transaction burst length on PCI is determined by several factors, including the PCI latency timer expiration, the type of bus transfer (descriptor read/write or data read/write) made, the size of the data transfer (for data transfers), and whether the cycle is initiated by the receive or transmit logic.

86 Software Developer’s Manual

Intel Gigabit Ethernet Controllers, PCI-X, PCI User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

Introduction 1

1.1 Scope

1.2 Overview

1.3 Ethernet Controller Features

1.3.1 PCI Features

1.3.2 CSA Features (82547GI/EI Only)

1.3.3 Network Side Features

1.3.4 Host Offloading Features

1.3.5 Additional Performance Features

1.3.6 Manageability Features (Not Applicable to the 82544GC/EI or 82541ER)

1.3.7 Additional Ethernet Controller Features

1.3.8 Technology Features

1.4 Conventions

1.4.1 Register and Bit References

1.4.2 Byte and Bit Designations

1.5 Related Documents

1.6 Memory Alignment Terminology

Architectural Overview 2

2.1 Introduction

2.2 External Architecture

2.3 Microarchitecture

2.3.1 PCI/PCI-X Core Interface

2.3.2 82547GI/EI CSA Interface

2.3.3 DMA Engine and Data FIFO

2.3.4 10/100/1000 Mb/s Receive and Transmit MAC Blocks

2.3.5 MII/GMII/TBI/Internal SerDes Interface Block

2.3.6 10/100/1000 Ethernet Transceiver (PHY)

2.3.7 EEPROM Interface

2.3.8 FLASH Memory Interface

2.4 DMA Addressing

2.5 Ethernet Addressing

2.6 Interrupts

2.7 Hardware Acceleration Capability

2.7.1 Checksum Offloading

2.7.2 TCP Segmentation

2.8 Buffer and Descriptor Structure

Receive and Transmit Description 3

3.1 Introduction

3.2 Packet Reception

3.2.1 Packet Address Filtering

3.2.2 Receive Data Storage

3.2.3 Receive Descriptor Format

3.2.3.1 Receive Descriptor Status Field

3.2.3.2 Receive Descriptor Errors Field

3.2.3.3 Receive Descriptor Special Field

3.2.4 Receive Descriptor Fetching

3.2.5 Receive Descriptor Write-Back

3.2.5.1 Receive Descriptor Packing

3.2.5.2 Null Descriptor Padding

3.2.6 Receive Descriptor Queue Structure

3.2.7 Receive Interrupts

3.2.7.1 Receive Timer Interrupt

3.2.7.2 Small Receive Packet Detect

3.2.7.3 Receive Descriptor Minimum Threshold (ICR.RXDMT)

3.2.7.4 Receiver FIFO Overrun

3.2.8 82544GC/EI Receive Interrupts

3.2.9 Receive Packet Checksum Offloading

3.2.9.1 MAC Address Filter

3.2.9.2 SNAP/VLAN Filter

3.2.9.3 IPv4 Filter

3.2.9.4 IPv6 Filter

3.2.9.5 UDP/TCP Filter

3.3 Packet Transmission

3.3.1 Transmit Data Storage

3.3.2 Transmit Descriptors

3.3.3 Legacy Transmit Descriptor Format

3.3.3.1 Transmit Descriptor Command Field Format

3.3.3.2 Transmit Descriptor Status Field Format

3.3.4 Transmit Descriptor Special Field Format

3.3.5 TCP/IP Context Transmit Descriptor Format

3.3.6 TCP/IP Context Descriptor Layout

3.3.6.1 TCP/UDP Offload Transmit Descriptor Command Field

3.3.6.2 TCP/UDP Offload Transmit Descriptor Status Field

3.3.7 TCP/IP Data Descriptor Format

3.3.7.1 TCP/IP Data Descriptor Command Field

3.3.7.2 TCP/IP Data Descriptor Status Field