Intel E5-4600, E5-1600, E5-2600, CM8062101038606 User Manual

Download

Page 1

Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet - Volume One

May 2012

Reference Number: 326508, Revision: 002

Page 2

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH Intel® PRODUCTS. NO LICENSE, Express* OR IMPL IED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY Expres s* OR IMPL IE D WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING T O S ALE AND/OR USE OF INTEL PRODUCT S INCLUDING LIABILITY OR WARRANTIES RELA TING T O FITNES S FOR A PARTICULAR PURPOSE, MERCHANT ABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CR EA TE A SITUA TION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families, Intel® C600 series chipset, and the Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families-based Platform described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained

by calling 1-800-548-4725, or go to: http://www.intel.com/#/en_US_01 Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology

enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and so ftware you use. For more information including details on which processors support HT Technology, see

http://www.intel.com/products/ht/hyperthreading_more.htm.

Enabling Execute Disable Bit functionality requires a PC with a processor with Execute Disable Bit capability and a supporting operating system. Check with your PC manufacturer on whether your system delivers Execute Disable Bit functionality.

Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain computer system software enabled for it. Functionality, performance or other benefits will vary depending on hardware and software configur ations and may re quire a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor.

Intel® Turbo Boost Technology requires a PC with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your PC manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see

http://www.intel.com/technology/turboboost/.

64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel® 64 architecture. Performance will vary depending on your hardware and software configurations. Consult with your system vendor for more information.

Δ Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor

family, not across different processor families. See http://www.intel.com/products/processor%5Fnumber/ for details.

C is a two-wire communications bus/protocol developed by Philips. SMBus is a subset of the I2C bus/protocol and was developed

I by Intel. Implementations of the I North American Philips Corporation.

C bus/protocol may require licenses from various entities, including Philips Electronics N.V. and

Intel, Xeon, Intel SpeedStep, Intel Core, and the Intel logo are trademarks of Intel Corporation in the U. S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2009-2012, Intel Corporation. All rights reserved.

2 Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 3

Contents

1Overview.................................................................................................................13

1.1 Introduction .....................................................................................................13

1.1.1 Processor Feature Details ........................................................................14

1.1.2 Supported Technologies..........................................................................14

1.2 Interfaces ........................................................................................................15

1.2.1 System Memory Support.........................................................................15

1.2.2 PCI Express*.........................................................................................16

1.2.3 Direct Media Interface Gen 2 (DMI2).........................................................17

1.2.4 Intel® QuickPath Interconnect (Intel® QPI) ..............................................18

1.2.5 Platform Environment Control Interface (PECI)...........................................18

1.3 Power Management Support...............................................................................19

1.3.1 Processor Package and Core States...........................................................19

1.3.2 System States Support ...........................................................................19

1.3.3 Memory Controller.................................................................... .. ............19

1.3.4 PCI Express...........................................................................................19

1.3.5 Intel QPI...............................................................................................19

1.4 Thermal Management Support ............................................................................19

1.5 Package Summary.............................................................................................20

1.6 Terminology .....................................................................................................20

1.7 Related Documents ...........................................................................................22

1.8 State of Data....................................................................................................23

2Interfaces................................................................................................................25

2.1 System Memory Interface ..................................................................................25

2.1.1 System Memory Technology Support ................ .. .. ... .. ...............................25

2.1.2 System Memory Timing Support...................................... .. .......................25

2.2 PCI Express* Interface.......................................................................................26

2.2.1 PCI Express* Architecture .......................................................................26

2.2.2 PCI Express* Configuration Mechanism .....................................................27

2.3 DMI2/PCI Express* Interface..............................................................................28

2.3.1 DMI2 Error Flow.....................................................................................28

2.3.2 Processor/PCH Compatibility Assumptions..................................................28

2.3.3 DMI2 Link Down.....................................................................................28

2.4 Intel QuickPath Interconnect...............................................................................28

2.5 Platform Environment Control Interface (PECI)......................................................30

2.5.1 PECI Client Capabilities ...........................................................................30

2.5.2 Client Command Suite ................... .. ... ........................... .. .......................31

2.5.3 Client Management............................................................................. ....69

2.5.4 Multi-Domain Commands ........................................................................74

2.5.5 Client Responses............................................................. .. .....................75

2.5.6 Originator Responses..............................................................................76

2.5.7 DTS Temperature Data ...........................................................................76

3 Technologies ...........................................................................................................79

3.1 Intel® Virtualization Technology (Intel® VT) ........................................................79

3.1.1 Intel VT-x Objectives..............................................................................79

3.1.2 Intel VT-x Features.................................................................................80

3.1.3 Intel VT-d Objectives..............................................................................80

3.1.4 Intel Virtualization Technology Processor Extensions ...................................81

3.2 Security Technologies........................................................................................81

3.2.1 Intel® Trusted Execution Technology........................................................81

3.2.2 Intel Trusted Execution Technology – Server Extensions..............................82

3.2.3 Intel® Advanced Encryption Standard Instructions (Intel® AES-NI).............. 82

Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families 3 Datasheet Volume One

Page 4

3.2.4 Execute Disable Bit.................................................................................83

3.3 Intel® Hyper-Threading Technology................................................................... ..83

3.4 Intel® Turbo Boost Technology .................. .. ............................ .. .. .......................83

3.4.1 Intel® Turbo Boost Operating Frequency....................................... .. .. .. .. ....83

3.5 Enhanced Intel SpeedStep® Technology...............................................................84

3.6 Intel® Intelligent Power Technology.....................................................................84

3.7 Intel® Advanced Vector Extensions (Intel® AVX) ................... ........................... ....84

3.8 Intel Dynamic Power Technology .........................................................................85

4 Power Management .................................................................................................87

4.1 ACPI States Supported.......................................................................................87

4.1.1 System States........................................ .. .. ........................... .................87

4.1.2 Processor Package and Core States...........................................................87

4.1.3 Integrated Memory Controller States..................................................... .. ..88

4.1.4 DMI2/PCI Express* Link States......................... ............................ .. ..........89

4.1.5 Intel QuickPath Interconnect States ..........................................................89

4.1.6 G, S, and C State Combinations................................................................90

4.2 Processor Core/Package Power Management .........................................................90

4.2.1 Enhanced Intel SpeedStep Technology.......................................................90

4.2.2 Low-Power Idle States.............................................................................91

4.2.3 Requesting Low-Power Idle States .................. .. .. .. ............................ .. .. ....92

4.2.4 Core C-states.........................................................................................92

4.2.5 Package C-States ...................................................................................94

4.2.6 Package C-State Power Specifications........................................................97

4.3 System Memory Power Management....................................................................98

4.3.1 CKE Power-Down........................... .......................... .. .. ...........................98

4.3.2 Self Refresh...........................................................................................98

4.3.3 DRAM I/O Power Management..................................................................99

4.4 DMI2/PCI Express* Power Management............................................. .. .. ... ............99

5 Thermal Management Specifications......................................................................101

5.1 Package Thermal Specifications .........................................................................101

5.1.1 Thermal Specifications........................... .. ........................... .. .. ...............101

5.1.2 TCASE and DTS Based Thermal Specifications...........................................103

5.1.3 Processor Thermal Profiles .....................................................................104

5.1.4 Embedded Server Processor Thermal Profiles............................................130

5.1.5 Thermal Metrology................................ .. .. ........................... .. ...............133

5.2 Processor Core Thermal Features.......................................................................135

5.2.1 Processor Temperature..........................................................................135

5.2.2 Adaptive Thermal Monitor...................... ........................... .....................135

5.2.3 On-Demand Mode.................................................................................137

5.2.4 PROCHOT_N Signal...............................................................................137

5.2.5 THERMTRIP_N Signal ............................................................................138

5.2.6 Integrated Memory Controller (IMC) Thermal Features...............................138

6 Signal Descriptions ................................................................................................141

6.1 System Memory Interface Signals......................................................................141

6.2 PCI Express* Based Interface Signals.................................................................142

6.3 DMI2/PCI Express* Port 0 Signals................................................... .. .................144

6.4 Intel QuickPath Interconnect Signals ..................................................................144

6.5 PECI Signal.....................................................................................................145

6.6 System Reference Clock Signals ........................................................................145

6.7 JTAG and TAP Signals.......................................................................................145

6.8 Serial VID Interface (SVID) Signals....................................................................146

6.9 Processor Asynchronous Sideband and Miscellaneous Signals.................................146

6.10 Processor Power and Ground Supplies ................................................................149

4 Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 5

7 Electrical Specifications......................................................................................... 151

7.1 Processor Signaling ......................................................................................... 151

7.1.1 System Memory Interface Signal Groups ................................................. 151

7.1.2 PCI Express* Signals............................................................................ 151

7.1.3 DMI2/PCI Express* Signals.................................................................... 151

7.1.4 Intel QuickPath Interconnect (Intel QPI).................................................. 151

7.1.5 Platform Environmental Control Interface (PECI) ...................................... 152

7.1.6 System Reference Clocks (BCLK{0/1}_DP, BCLK{0/1}_DN)....................... 152

7.1.7 JTAG and Test Access Port (TAP) Signals ............................................. .. .. 153

7.1.8 Processor Sideband Signals ................................................................... 153

7.1.9 Power, Ground and Sense Signals........................................................... 153

7.1.10 Reserved or Unused Signals................................................................... 158

7.2 Signal Group Summary............................... ............................ .. ....................... 158

7.3 Power-On Configuration (POC) Options............................................................... 162

7.4 Fault Resilient Booting (FRB)............................................................................. 163

7.5 Mixing Processors............................................................................................ 163

7.6 Flexible Motherboard Guidelines (FMB)............................. ... ............................... 164

7.7 Absolute Maximum and Minimum Ratings ........................................................... 164

7.7.1 Storage Conditions Specifications ........................................................... 165

7.8 DC Specifications ............................................................................................ 166

7.8.1 Voltage and Current Specifications.......................................................... 167

7.8.2 Die Voltage Validation................................... .. .. .. ............................ .. .. .. 173

7.8.3 Signal DC Specifications................................................... .. .. .. ............... 174

7.9 Waveforms ..................................................................................................... 180

7.10 Signal Quality....................................................................................... .......... 181

7.10.1 DDR3 Signal Quality Specifications ......................................................... 182

7.10.2 I/O Signal Quality Specifications............................................................. 182

7.10.3 Intel QuickPath Interconnect Signal Quality Specifications.......................... 182

7.10.4 Input Reference Clock Signal Quality Specifications................................... 182

7.10.5 Overshoot/Undershoot Tolerance............................................................ 182

8 Processor Land Listing........................................................................................... 187

8.1 Listing by Land Name ............................... ............................ ........................... 187

8.2 Listing by Land Number .............................. ... .................................................. 212

9 Package Mechanical Specifications ........................................................................ 237

9.1 Package Mechanical Drawing...................... ....................................................... 237

9.2 Processor Component Keep-Out Zones........................... .................................... 241

9.3 Package Loading Specifications .................... ... .. .. ........................... .. .. ... ............ 241

9.4 Package Handling Guidelines............................. .. ........................... .. .. ............... 241

9.5 Package Insertion Specifications................................................... .. .. .. ............... 241

9.6 Processor Mass Specification............................................................................. 242

9.7 Processor Materials.......................................................................................... 242

9.8 Processor Markings.......................................................................................... 242

10 Boxed Processor Specifications ............................................................................. 243

10.1 Introduction ................................................................................................... 243

10.1.1 Available Boxed Thermal Solution Configurations...................................... 243

10.1.2 Intel Thermal Solution STS200C

(Passive/Active Combination Heat Sink Solution)...................................... 243

10.1.3 Intel Thermal Solution STS200P and STS200PNRW

(Boxed 25.5 mm Tall Passive Heat Sink Solutions).................................... 244

10.2 Mechanical Specifications ................................................................................. 245

10.2.1 Boxed Processor Heat Sink Dimensions and Baseboard Keepout Zones ........ 245

10.2.2 Boxed Processor Retention Mechanism and Heat Sink Support (ILM-RS) ...... 254

10.3 Fan Power Supply [STS200C]............................................................................ 254

10.3.1 Boxed Processor Cooling Requirements ................................................... 255

10.4 Boxed Processor Contents ................................................................................ 257

Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families 5 Datasheet Volume One

Page 6

Figures

1-1 Intel® Xeon® Processor E5-2600 Product Family on the 2 Socket

Platform ...........................................................................................................14

1-2 PCI Express* Lane Partitioning and Direct Media Interface Gen 2 (DMI2)...................17

2-1 PCI Express* Layering Diagram...........................................................................26

2-2 Packet Flow through the Layers...........................................................................27

2-3 Ping() ..............................................................................................................32

2-4 Ping() Example..................................................................................................32

2-5 GetDIB() ..........................................................................................................32

2-6 Device Info Field Definition .................................................................................33

2-7 Revision Number Definition.................................................................................33

2-8 GetTemp()........................................................................................................34

2-9 GetTemp() Example.................................... .. .....................................................35

2-10 RdPkgConfig()...................................................................................................36

2-11 WrPkgConfig()...................................................................................................37

2-12 DRAM Thermal Estimation Configuration Data........................................................40

2-13 DRAM Rank Temperature Write Data....................................................................41

2-14 The Processor DIMM Temperature Read / Write .....................................................42

2-15 Ambient Temperature Reference Data ..................................................................42

2-16 Processor DRAM Channel Temperature .................................................................43

2-17 Accumulated DRAM Energy Data..........................................................................43

2-18 DRAM Power Info Read Data ...............................................................................44

2-19 DRAM Power Limit Data.............................................. ............................ ............45

2-20 DRAM Power Limit Performance Data................................................... .. ... .. ..........45

2-21 CPUID Data ................................................................................... ...................49

2-22 Platform ID Data ...............................................................................................49

2-23 PCU Device ID...................................................................................................49

2-24 Maximum Thread ID...........................................................................................50

2-25 Processor Microcode Revision ..............................................................................50

2-26 Machine Check Status ........................................................................................50

2-27 Package Power SKU Unit Data .............................................................................50

2-28 Package Power SKU Data....................................................................................52

2-29 Package Temperature Read Data .........................................................................52

2-30 Temperature Target Read...................................................................................53

2-31 Thermal Status Word ............................ .. .. ............................ .. .. .........................54

2-32 Thermal Averaging Constant Write / Read.............................................................54

2-33 Current Config Limit Read Data ...........................................................................55

2-34 Accumulated Energy Read Data...........................................................................55

2-35 Power Limit Data for VCC Power Plane......................................... .. .......................56

2-36 Package Turbo Power Limit Data..........................................................................57

2-37 Package Power Limit Performance Data ................................................................57

2-38 Efficient Performance Indicator Read ....................................................................58

2-39 ACPI P-T Notify Data..........................................................................................58

2-40 Caching Agent TOR Read Data.............................................................................59

2-41 DTS Thermal Margin Read.............................. ............................ .. .. .....................59

2-42 Processor ID Construction Example......................................................................61

2-43 RdIAMSR()........................................................................................................61

2-44 PCI Configuration Address...................................................................................64

2-45 RdPCIConfig()...................................................................................................64

2-46 PCI Configuration Address for local accesses..........................................................66

6 Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 7

2-47 RdPCIConfigLocal()............................................................................................66

2-48 WrPCIConfigLocal() ...........................................................................................68

2-49 The Processor PECI Power-up Timeline() ..............................................................70

2-50 Temperature Sensor Data Format........................................................................76

4-1 Idle Power Management Breakdown of the Processor Cores.....................................91

4-2 Thread and Core C-State Entry and Exit ...............................................................91

4-3 Package C-State Entry and Exit................................................................... .. .. ....95

5-1 Tcase: 8-Core 150W Thermal Profile, Workstation Platform SKU Only..................... 105

5-2 DTS: 8-Core 150W Thermal Profile, Workstation Platform SKU Only ....................... 105

5-3 Tcase: 8-Core 135W Thermal Profile 2U ............................................................. 107

5-4 DTS: 8-Core 135W Thermal Profile 2U......................... ....................................... 108

5-5 Tcase: 8/6-Core 130W Thermal Profile 1U .......................................................... 110

5-6 DTS: 8-Core 130W Thermal Profile 1U......................... ....................................... 110

5-7 DTS: 6-Core 130W Thermal Profile 1U......................... ....................................... 111

5-8 Tcase: 6-Core 130W 1S WS Thermal Profile........................................................ 112

5-9 DTS: 6-Core 130W 1S WS Thermal Profile ............... .. .. .. .. .. ................................. 113

5-10 Tcase: 8-Core 115W Thermal Profile 1U ............................................................. 115

5-11 DTS: 8-Core 115W Thermal Profile 1U................................................................ 115

5-12 Tcase: 8/6-Core 95W Thermal Profile 1U ............................................................ 117

5-13 DTS: 8-Core 95W Thermal Profile 1U................................................................. 117

5-14 DTS: 6-Core 95W Thermal Profile 1U................................................................. 118

5-15 Tcase: 8-Core 70W Thermal Profile 1U........................... .. ... .. ........................... .. 119

5-16 DTS: 8-Core 70W Thermal Profile 1U................................................................. 120

5-17 Tcase: 6-Core 60W Thermal Profile 1U........................... .. ... .. ........................... .. 121

5-18 DTS: 6-Core 60W Thermal Profile 1U................................................................. 122

5-19 Tcase: 4-Core 130W Thermal Profile 2U ............................................................. 123

5-20 DTS: 4-Core 130W Thermal Profile 2U................................................................ 124

5-21 Tcase: 4-Core 130W 1S WS Thermal Profile........................................................ 126

5-22 DTS: 4-Core 130W 1S WS Thermal Profile .......................................................... 1 26

5-23 Tcase: 4/2-Core 80W Thermal Profile 1U ............................................................ 128

5-24 DTS: 4-Core 80W Thermal Profile 1U................................................................. 128

5-25 DTS: 2-Core 80W Thermal Profile 1U................................................................. 129

5-26 Tcase: 8-Core LV95W Thermal Profile, Embedded Server SKU ......... ...................... 131

5-27 Tcase: 8-Core LV70W Thermal Profile, Embedded Server SKU ......... ...................... 132

5-28 Case Temperature (TCASE) Measurement Location.................................... .. .. .. .. .. 134

5-29 Frequency and Voltage Ordering........................................................................ 136

7-1 Input Device Hysteresis ................................................................................... 152

7-2 VR Power-State Transitions............................................................................... 156

7-3 8/6-Core: VCC Static and Transient Tolerance Loadlines....................................... 170

7-4 4/2-Core: Processor VCC Static and Transient Tolerance Loadlines......................... 172

7-5 Load Current Versus Time .......................................... .. ............................ .. ...... 173

7-6 VCC Overshoot Example Waveform.................................................................... 174

7-7 BCLK{0/1} Differential Clock Crosspoint Specification .......................................... 180

7-8 BCLK{0/1} Differential Clock Measurement Point for Ringback .............................. 180

7-9 BCLK{0/1} Single Ended Clock Measurement Points for Absolute Cross Point

and Swing...................................................................................................... 181

7-10 BCLK{0/1} Single Ended Clock Measurement Points for Delta Cross Point ............... 181

7-11 Maximum Acceptable Overshoot/Undershoot Waveform........................................ 185

9-1 Processor Package Assembly Sketch.................................................................. 237

9-2 Processor Package Drawing Sheet 1 of 2 ............................................................ 239

9-3 Processor Package Drawing Sheet 2 of 2 ............................................................ 240

9-4 Processor Top-Side Markings ........................................................................... 242

Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families 7 Datasheet Volume One

Page 8

10-1 STS200C Passive/Active Combination Heat Sink (with Removable Fan)...................244

10-2 STS200C Passive/Active Combination Heat Sink (with Fan Removed)......................244

10-3 STS200P and STS200PNRW 25.5 mm Tall Passive Heat Sinks ................................245

10-4 Boxed Processor Motherboard Keepout Zones (1 of 4) ..........................................246

10-5 Boxed Processor Motherboard Keepout Zones (2 of 4) ..........................................247

10-6 Boxed Processor Motherboard Keepout Zones (3 of 4) ..........................................248

10-7 Boxed Processor Motherboard Keepout Zones (4 of 4) ..........................................249

10-8 Boxed Processor Heat Sink Volumetric (1 of 2) ....................................................250

10-9 Boxed Processor Heat Sink Volumetric (2 of 2) ....................................................251

10-10 4-Pin Fan Cable Connector (For Active Heat Sink) .......................................... .. .. ..252

10-11 4-Pin Base Baseboard Fan Header (For Active Heat Sink) .....................................253

10-12 Fan Cable Connector Pin Out For 4-Pin Active Thermal Solution.............................255

Tables

1-1 Referenced Documents....................................... .. .. ............................................22

2-1 Summary of Processor-specific PECI Commands ....................................................30

2-2 Minor Revision Number Meaning..........................................................................33

2-3 GetTemp() Response Definition ................ .. .. .. ............................ .. .. .. ...................35

2-4 RdPkgConfig() Response Definition............ ............................ .. .. ...........................36

2-5 WrPkgConfig() Response Definition ......................................................................37

2-6 RdPkgConfig() & WrPkgConfig() DRAM Thermal and Power Optimization

Services Summary.............................................................................................39

2-7 Channel & DIMM Index Decoding.........................................................................41

2-8 RdPkgConfig() & WrPkgConfig() CPU Thermal and Power Optimization

Services Summary.............................................................................................46

2-9 Power Control Register Unit Calculations...............................................................51

2-10 RdIAMSR() Response Definition ...........................................................................62

2-11 RdIAMSR() Services Summary.............................................................................62

2-12 RdPCIConfig() Response Definition.......................................................................65

2-13 RdPCIConfigLocal() Response Definition................................................................67

2-14 WrPCIConfigLocal() Response Definition................................................................68

2-15 WrPCIConfigLocal() Memory Controller and IIO Device/Function Support...................69

2-16 PECI Client Response During Power-Up.................................................................69

2-17 SOCKET ID Strapping.........................................................................................71

2-18 Power Impact of PECI Commands vs. C-states.......................................................71

2-19 Domain ID Definition........................................................................... ... ............74

2-20 Multi-Domain Command Code Reference...............................................................74

2-21 Completion Code Pass/Fail Mask..........................................................................75

2-22 Device Specific Completion Code (CC) Definition....................................................75

2-23 Originator Response Guidelines................... .. .......................................................76

2-24 Error Codes and Descriptions...............................................................................77

4-1 System States...................................... .. .. ..................................................... ....87

4-2 Package C-State Support....................................................................................87

4-3 Core C-State Support.........................................................................................88

4-4 System Memory Power States .............................................................................88

4-5 DMI2/PCI Express* Link States................... .. .. ............................ .........................89

4-6 Intel QPI States.................................................................. ...............................89

4-7 G, S and C State Combinations............................................................................90

4-8 P_LVLx to MWAIT Conversion..............................................................................92

4-9 Coordination of Core Power States at the Package Level..........................................95

8 Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 9

4-10 Package C-State Power Specifications ....... .. .............................. ...........................97

5-1 Processor SKU Summary Table ......................................................................... 104

5-2 Tcase: 8-Core 150W Thermal Specifications, Workstation Platform SKU Only........... 104

5-3 8-Core 150W Thermal Profile, Workstation Platform SKU Only ............................. 106

5-4 Tcase: 8-Core 135W Thermal Specifications 2U................................................... 107

5-5 8-Core 135W Thermal Profile Table 2U............................................................... 108

5-6 Tcase: 8/6-Core 130W Thermal Specifications, Workstation/Server Platform ........... 109

5-7 8/6-Core 130W Thermal Profile Table 1U............................................................ 111

5-8 Tcase: 6-Core 130W 1S WS Thermal Specifications.............................................. 112

5-9 6-Core 130W 1S WS Thermal Profile Table..................................................... .. ... 113

5-10 Tcase: 8-Core 115W Thermal Specifications 1U................................................... 114

5-11 8-Core 115W Thermal Profile Table 1U.................... .. .. .. .. ................................... 116

5-12 Tcase: 8/6-Core 95W Thermal Specifications, Workstation/Server Platform ........ ..... 116

5-13 8/6-Core 95W Thermal Profile Table 1U.............................................................. 118

5-14 Tcase: 8-Core 70W Thermal Specifications 1U..................................................... 119

5-15 8-Core 70W Thermal Profile Table 1U................................................................. 120

5-16 Tcase: 6-Core 60W Thermal Specifications 1U..................................................... 121

5-17 6-Core 60W Thermal Profile Table 1U................................................................. 122

5-18 Tcase: 4-Core 130W Thermal Specifications 2U................................................... 123

5-19 4-Core 130W Thermal Profile Table 2U.................... .. .. .. .. ................................... 124

5-20 Tcase: 4-Core 130W 1S WS Thermal Specifications, Workstation/Server Platform .... 125

5-21 4-Core 130W 1S WS Thermal Profile Table................................ .. .. .... .. .. .............. 127

5-22 Tcase: 4/2-Core 80W Thermal Specifications 1U.................................................. 127

5-23 4/2-Core 80W Thermal Profile Table 1U.............................................................. 129

5-24 Embedded Server Processor Elevated Tcase SKU Summary Table .......................... 130

5-25 Tcase: 8-Core LV95W Thermal Specifications, Embedded Server SKU..................... 130

5-26 8-Core LV95W Thermal Profile Table, Embedded Server SKU. ............................... . 131

5-27 Tcase: 8-Core LV70W Thermal Specifications, Embedded Server SKU..................... 132

5-28 8-Core LV70W Thermal Profile Table, Embedded Server SKU. ............................... . 133

6-1 Memory Channel DDR0, DDR1, DDR2, DDR3....................................................... 141

6-2 Memory Channel Miscellaneous............................................ .. .. .. ....................... 142

6-3 PCI Express* Port 1 Signals.............................................................................. 142

6-4 PCI Express* Port 2 Signals.............................................................................. 142

6-5 PCI Express* Port 3 Signals.............................................................................. 143

6-6 PCI Express* Miscellaneous Signals ................................................................... 143

6-7 DMI2 and PCI Express* Port 0 Signals................................................................ 144

6-8 Intel QPI Port 0 and 1 Signals........................................................................... 144

6-9 Intel QPI Miscellaneous Signals......................................................................... 144

6-10 PECI Signals................................................................................................... 145

6-11 System Reference Clock (BCLK{0/1}) Signals ..................................................... 145

6-12 JTAG and TAP Signals ................................................ .. ............................ ........ 145

6-13 SVID Signals .................................................................................................. 146

6-14 Processor Asynchronous Sideband Signals.......................................................... 146

6-15 Miscellaneous Signals ................................. ............................ ......................... 148

6-16 Power and Ground Signals................................................................................ 149

7-1 Power and Ground Lands.................................................................................. 154

7-2 SVID Address Usage........................................................................................ 157

7-3 VR12.0 Reference Code Voltage Identification (VID) Table .................................... 157

7-4 Signal Description Buffer Types......................................................................... 158

7-5 Signal Groups..................... ............................ ................................................ 159

7-6 Signals with On-Die Termination ................................... .. ... ............................. .. 162

7-7 Power-On Configuration Option Lands................................................................ 162

Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families 9 Datasheet Volume One

Page 10

7-8 Fault Resilient Booting (Output Tri-State) Signals.................................................163

7-9 Processor Absolute Minimum and Maximum Ratings .............................................164

7-10 Storage Condition Ratings.................................................................................165

7-11 Voltage Specification................................................ ........................... ... .. ........167

7-12 Processor Current Specifications ........................................................................168

7-13 8/6 Core: Processor VCC Static and Transient Tolerance .......................................169

7-14 4/2-Core: Processor VCC Static and Transient Tolerance .......................................170

7-15 VCC Overshoot Specifications............................................................................173

7-16 DDR3 and DDR3L Signal DC Specifications...................................................... .. ..174

7-17 PECI DC Specifications .....................................................................................176

7-18 System Reference Clock (BCLK{0/1}) DC Specifications........................................176

7-19 SMBus DC Specifications...................................................................................176

7-20 JTAG and TAP Signals DC Specifications..............................................................177

7-21 Serial VID Interface (SVID) DC Specifications ......................................................177

7-22 Processor Asynchronous Sideband DC Specifications.............................................178

7-23 Miscellaneous Signals DC Specifications..............................................................179

7-24 Processor I/O Overshoot/Undershoot Specifications..............................................182

7-25 Processor Sideband Signal Group Overshoot/Undershoot Tolerance ......... .. ... .. .. ......184

8-1 Land Name.....................................................................................................187

8-2 Land Number..................................................................................................212

9-1 Processor Loading Specifications........................................................................241

9-2 Package Handling Guidelines.............................................................................241

9-3 Processor Materials..........................................................................................242

10-1 PWM Fan Frequency Specifications For 4-Pin Active Thermal Solution......................254

10-2 8 Core / 6 Core Server Thermal Solution Boundary Conditions ...............................256

10-3 4 Core Server Thermal Solution Boundary Conditions ...........................................256

10 Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 11

Revision History

Revision

Number

001 Initial Release March 2012 002 Added Intel® Xeon® Processor E5-4600 Product Family May 2012

Description Revision Date

Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families 11 Datasheet Volume One

Page 12

12 Intel® Xeon® Processor E5-1600/ E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 13

Overview

1 Overview

1.1 Introduction

The Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families Datasheet Volume One provides DC specifications, signal integrity, differential signaling specifications, land and signal definitions, and an overview of additional processor feature interfaces.

The Intel® Xeon® processor E5-1600/E5-2600/E5-4600 product families are the next generation of 64-bit, multi-core enterprise processors built on 32-nanometer process technology. Throughout this document, the Intel® Xeon® processor E5-1600/E52600/E5-4600 product families may be referred to as simply the processor. Where information differs between the EP and EP 4S SKUs, this document uses specific Intel® Xeon® processor E5-1600 product family, Intel® Xeon® processor E5-2600 product family, and Intel® Xeon® processor E5-4600 product family notation.Based on the low-power/high performance 2nd Genera tion Intel® Core™ Processor Family microarchitecture, the processor is designed for a two chip platform consisting of a processor and a Platform Controller Hub (PCH) enabling higher performance, easier validation, and improved x-y footprint. The Intel® Xeon® processor E5-1600 product family and the Intel® Xeon® processor E5-2600 product family are designed for Efficient Performance server, workstation and HPC platforms. The Intel® Xeon® processor E5-4600 product family processor supports scalable server and HPC platforms of two or more processors, including “glueless” 4-way platforms. Note: some processor features are not available on all platforms.

These processors feature per socket, two Intel® QuickPath Interconnect point-to-point links capable of up to 8.0 GT/s, up to 40 lanes of PCI Express* 3.0 links capable of

8.0 GT/s, and 4 lanes of DMI2/PCI Express* 2.0 interface with a peak transfer rate of

5.0 GT/s. The processor supports up to 46 bits of physical address space and 48-bit of virtual address space.

Included in this family of processors is an integrated memory controller (IMC) and integrated I/O (IIO) (such as PCI Express* and DMI2) on a single silicon die. This single die solution is known as a monolithic processor.

Figure 1-1 and Figure 1-2, shows the processor 2-socket and 4-socket platform

configuration. The “Legacy CPU” is the boot processor that is connected to the PCH component, this socket is set to NodeID[0]. In the 4-socket configuration, the “R emote CPU” is the processor which is not connected to the Legacy CPU.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 13 Datasheet Volume One

Page 14

Figure 1-1. Intel® Xeon® Processor E5-2600 Product Family on the 2 Socket

Platform

Overview

Figure 1-2. Intel® Xeon® Processor E5-4600 Product Family on the 4 Socket

Platform

1.1.1 Processor Feature Details

• Up to 8 execution cores

• Each core supports two threads (Intel® Hyper-Threading Technology), up to 16 threads per socket

14 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 15

Overview

• 46-bit physical addressing and 48-bit virtual addressing

• 1 GB large page support for server applications

• A 32-KB instruction and 32-KB data first-level cache (L1) for each core

• A 256-KB shared instruction/data mid-level (L2) cache for each core

• Up to 20 MB last level cache (LLC): up to 2.5 MB per core instruction/data last level cache (LLC), shared among all cores

• The Intel® Xeon® processor E5-4600 product family supports Directory Mode, Route Through, and Node IDs to reduce unnecessary Intel QuickPath Interconnect traffic by tracking cache lines present in remote sockets.

1.1.2 Supported Technologies

• Intel® Virtualization Technology (Intel® VT)

• Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d)

• Intel Virtualization Technology Processor Extensions

• Intel® Trusted Execution Technology (Intel® TXT)

• Intel® Advanced Encryption Standard Instructions (Intel® AES-NI)

• Intel 64 Architecture

• Intel® Streaming SIMD Extensions 4.1 (Intel SSE4.1)

• Intel Streaming SIMD Extensions 4.2 (Intel SSE4.2)

• Intel Advanced Vector Extensions (Intel A VX )

• Intel® Hyper-Threading Technology (Intel® HT Technology)

• Execute Disable Bit

• Intel® Turbo Boost Technology

• Intel® Intelligent Power Technology

• Enhanced Intel SpeedStep® Technology

• Intel® Dynamic Power Technology (Intel® DPT) (Memory Power Management)

1.2 Interfaces

1.2.1 System Memory Support

• Intel® Xeon® processor E5-1600/E5-2600/E5-4600 product families supports 4 DDR3 channels

• Unbuffered DDR3 and registered DDR3 DIMMs

• LR DIMM (Load Reduced DIMM) for buffered memory solutions demanding higher capacity memory subsystems

• Independent channel mode or lockstep mode

• Data burst length of eight cycles for all memory organization modes

• Memory DDR3 data transfer rates of 800, 1066, 1333, and 1600 MT/s

• 64-bit wide channels plus 8-bits of ECC support for each channel

• DDR3 standard I/O Voltage of 1.5 V and DDR3 Low Voltage of 1.35 V

• 1-Gb, 2-Gb and 4-Gb DDR3 DRAM technologies supported for these devices:

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 15 Datasheet Volume One

Page 16

Overview

— UDIMMs x8, x16 — RDIMMs x4, x8 — LRDIMM x4, x8 (2-Gb and 4-Gb only)

• Up to 8 ranks supported per memory channel, 1, 2 or 4 ranks per DIMM

• Open with adaptive idle page close timer or closed page policy

• Per channel memory test and initialization engine can initialize DRAM to all logical zeros with valid ECC (with or without data scrambler) or a predefined test pattern

• Isochronous access support for Quality of Service (QoS), native 1 and 2 socket platforms - Intel® Xeon® processor E5-1600 and E5-2600 product families only

• Minimum memory configuration: independent channel support with 1 DIMM populated

• Integrated dual SMBus master controllers

• Command launch modes of 1n/2n

• RAS Support (including and not limited to):

— Rank Level Sparing and Device Tagging — Demand and Patrol Scrubbing — DRAM Single Device Data Correction (SDDC) for any single x4 or x8 DRAM

device failure. Independent channel mode supports x4 SDDC. x8 SDDC requires lockstep mode

— Lockstep mode where channels 0 & 1 and channels 2 & 3 are operated in

lockstep mode

— The combination of memory channel pair lockstep and memory mirroring is not

supported

— Data scrambling with address to ease detection of write errors to an incorrect

address. — Error reporting via Machine Check Architecture — Read Retry during CRC error handling checks by iMC — Channel mirroring within a socket Channel Mirroring mode is supported on

memory channels 0 & 1 and channels 2 & 3 — Corrupt Data Containment —MCA Recovery

• Improved Thermal Throttling with dynamic Closed Loop Thermal Throttling (CLTT)

• Memory thermal monitoring support for DIMM temperature via two memory signals, MEM_HOT_C{01/23}_N

1.2.2 PCI Express*

• The PCI Express* port(s) are fully-compliant to the PCI Express* Base Specification, Revision 3.0 (PCIe* 3.0)

• Support for PCI Express* 3.0 (8.0 GT/s), 2.0 (5.0 GT/s), and 1.0 (2.5 GT/s)

• Up to 40 lanes of PCI Express* interconnect for general purpose PCI Express* devices at PCIe* 3.0 speeds that are configurable for up to 10 independent ports

• 4 lanes of PCI Express* at PCIe* 2.0 speeds when not using DMI2 port (Port 0), also can be downgraded to x2 or x1

• Negotiating down to narrower widths is supported, see Figure 1-3:

16 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 17

Overview

— x16 port (Port 2 & Port 3) may negotiate down to x8, x4, x2, or x1. — x8 port (Port 1) may negotiate down to x4, x2, or x1. — x4 port (Port 0) may negotiate down to x2, or x1. — When negotiating down to narrower widths, there are caveats as to how lane

reversal is supported.

• Non-Transparent Bridge (NTB) is supported by PCIe* Port3a/IOU1. For more details on NTB mode operation refer to PCI Express Base Specification - Revision 3.0:

— x4 or x8 widths and at PCIe* 1.0, 2.0, 3.0 speeds — Two usage models; NTB attached to a Root Port or NTB attached to another

NTB — Supports three 64-bit BARs — Supports posted writes and non-posted memory read transactions across the

NTB — Supports INTx, MSI and MSI-X mechanisms for interrupts on both side of NTB

in upstream direction only

• Address Translation Services (ATS) 1.0 support

• Hierarchical PCI-compliant configuration mechanism for downstream devices.

• Traditional PCI style traffic (asynchronous snooped, PCI ordering).

• PCI Express* extended configuration space. The first 256 bytes of configuration space aliases directly to the PCI compatibility configuration space. The remaining portion of the fixed 4-KB block of memory-mapped space above that (starting at 100h) is known as extended configuration space.

• PCI Express* Enhanced Access Mechanism. Accessing the device configuration space in a flat memory mapped fashion.

• Automatic discovery, negotiation, and training of link out of reset.

• Supports receiving and decoding 64 bits of address from PCI Express*.

— Memory transactions received from PCI Express* that go above the top of

physical address space (when Intel VT -d is enabled, the check would be against the translated HPA (Host Physical Address) address) are reported as errors by the processor.

— Outbound access to PCI Express* will always have address bits 63 to 46

cleared.

• Re-issues Configuration cycles that have been previously completed with the Configuration Retry status.

• Power Management Event (PME) functions.

• Message Signaled Interrupt (MSI and MSI-X) messages

• Degraded Mode support and Lane Reversal support

• Static lane numbering reversal and polarity inversion support

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 17 Datasheet Volume One

Page 18

Overview

Transaction

Link

Physical

0…3

DMI

Port 0

DMI / PCIe

4…7

Port 1b

Transaction

Link

Physical

0…3

Port 1a

Port 1 (IOU2)

PCIe

Port 1a

8…11

Transaction

Link

Physic al

0…3

Port 2 (IOU0)

PCIe

Port 2b

Port 2a

Port 2d

Port 2c

X16

Port 2a

12..154…7 8…11

Transaction

Link

Physical

0…3

Port 3 (IOU1)

PCIe

Port 3b

Port 3a

Port 3d

Port 3c

X16

Port 3a

12..154…7

Figure 1-3. PCI Express* Lane Partitioning and Direct Media Interface Gen 2 (DMI2)

1.2.3 Direct Media Interface Gen 2 (DMI2)

• Serves as the chip-to-chip interface to the Intel® C600 Chipset

• The DMI2 port supports x4 link width and only operates in a x4 mode when in DMI2

• Operates at PCI Express* 1.0 or 2.0 speeds

• Transparent to software

• Processor and peer-to-peer writes and reads with 64-bit address support

• APIC and Message Signaled Interrupt (MSI) support. Will send Intel-defined “End of Interrupt” broadcast message when initiated by the processor.

• System Management Interrupt (SMI), SCI, and SERR error indication

• Static lane numbering reversal support

• Supports DMI2 virtual channels VC0, VC1, VCm, and VCp

1.2.4 Intel® QuickPath Interconnect (Intel® QPI)

• Compliant with Intel QuickPath Interconnect v1.1 standard packet formats

• Implements two full width Intel QPI ports

• Full width port includes 20 data lanes and 1 clock lane

• 64 byte cache-lines

• Isochronous access support for Quality of Service (QoS), native 1 and 2 socket platforms - Intel® Xeon® processor E5-1600 and E5-2600 product families only

18 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 19

Overview

• Home snoop based coherency

•3-bit Node ID

• 46-bit physical addressing support

• No Intel QuickPath Interconnect bifurcation support

• Differential signaling

• Forwarded clocking

• Up to 8.0 GT/s data rate (up to 16 GB/s direction peak bandwidth per port) — All ports run at same operational frequency

— Reference Clock is 100 MHz — Slow boot speed initialization at 50 MT/s

• Common reference clocking (same clock generator for both sender and receiver)

• Intel® Interconnect Built-In-Self-Test (Intel® IBIST) for high-speed testability

• Polarity and Lane reversal (Rx side only)

1.2.5 Platform Environment Control Interface (PECI)

The PECI is a one-wire interface that provides a communication channel between a PECI client (the processor) and a PECI master (the PCH).

• Supports operation at up to 2 Mbps data transfers

• Link layer improvements to support additional services and higher efficiency over

PECI 2.0 generation

• Services include CPU thermal and estimated power information, control functions

for power limiting, P-state and T-state control, and access for Machine Check Architecture registers and PCI configuration space (both within the processor package and downstream devices)

• PECI address determined by SOCKET_ID configuration

• Single domain (Domain 0) is supported

1.3 Power Management Support

1.3.1 Processor Package and Core States

• ACPI C-states as implemented by the following processor C-states: — Package: PC0, PC1/PC1E, PC2, PC3, PC6 (Package C7 is not supported) — Core: CC0, CC1, CC1E, CC3, CC6, CC7

• Enhanced Intel SpeedStep® Technology

1.3.2 System States Support

• S0, S1, S3, S4, S5

1.3.3 Memory Controller

• Multiple CKE power down modes

• Multiple self-refresh modes

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 19 Datasheet Volume One

Page 20

• Memory thermal monitoring via MEM_HOT_C01_N and MEM_HOT_C23_N Signals

1.3.4 PCI Express

• L0s is not supported

• L1 ASPM power management capability

1.3.5 Intel QuickPath Interconnect

• L0s is not supported

• L0p and L1 power management capabilities

1.4 Thermal Management Support

• Digital Thermal Sensor with multiple on-die temperature zones

• Adaptive Thermal Monitor

• THERMTRIP_N and PROCHOT_N signal support

• On-Demand mode clock modulation

• Open and Closed Loop Thermal Throttling (OLTT/CLTT) support for system memory in addition to Hybrid OLTT/CLTT mode

• Fan speed control with DTS

• Two integrated SMBus masters for accessing thermal data from DIMMs

• New Memory Thermal Throttling features via MEM_HOT_C{01/23}_N signals

• Running Average Power Limit (RAPL), Processor and DRAM Thermal and Power Optimization Capabilities

Overview

1.5 Package Summary

The processor socket is a 52.5 x 45 mm FCLGA package (LGA2011-0 land FCLGA10).

1.6 Terminology

Term Description

ASPM Active State Power Management BMC Baseboard Management Controllers Cbo Cache and Core Box. It is a term used for internal logic providing ring interface to

DDR3 Third generation Double Data Rate SDRAM memory technology that is the

DMA Direct Memory Access DMI Direct Media Interface DMI2 Direct Media Interface Gen 2 DTS Digital Thermal Sensor ECC Error Correction Code

20 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

LLC and Core.

successor to DDR2 SDRAM

Datasheet Volume One

Page 21

Overview

Term Description

Enhanced Intel SpeedStep® Technology

Execute Disable Bit The Execute Disable bit allows memory to be marked as executable or non-

Flit Flow Control Unit. The Intel QPI Link layer’s unit of transfer; 1 Flit = 80-bits. Functional Operation Refers to the normal operating conditions in which all processor specifications,

IMC

IIO The Integrated I/O Controller. An I/O controller that is integrated in the

Intel® ME Intel® Management Engine (Intel® ME) Intel® QuickData

Technology

Intel® QuickPath Interconnect (Intel® QPI)

Intel® 64 Technology 64-bit memory extensions to the IA-32 architecture. Further details on Intel 64

Intel® Turbo Boost Technology

Intel® TXT Intel® Trusted Execution Technology Intel® Virtualization

Technology (Intel® VT)

Intel® VT-d Intel® Virtualization Technology (Intel® VT) for Directed I/O. Intel VT-d is a

Intel® Xeon® processor E5-1600 product family and Intel® Xeon® processor E5-2600 product family

Intel® Xeon® processor E5-4600 product family

Integrated Heat Spreader (IHS)

Jitter Any timing variation of a transition edge or edges from the defined Unit Interval

IOV I/O Virtualization LGA2011-0 land FCLGA10

Socket

Allows the operating system to reduce power consumption when performance is not needed.

executable, when combined with a supporting operating system. If code attempts to run in non-executable memory the processor raises an error to the operating system. This feature can prevent some classes of viruses or worms that exploit buffer overrun vulnerabilities and can thus help improve the overall security of the system. See the Intel® 64 and IA-32 Architectures Software Developer's Manuals for more detailed information.

including DC, AC, system bus, signal quality, mechanical, and thermal, are satisfied.

The Integrated Memory Controller. A Memory Controller that is integrated in the processor die.

processor die.

Intel QuickData Technology is a platform solution designed to maximize the throughput of server data traffic across a broader range of configurations and server environments to achieve faster, scalable, and more reliable I/O.

A cache-coherent, link-based Interconnect specification for Intel processors, chipsets, and I/O bridge components.

architecture and programming model can be found at

http://developer.intel.com/technology/intel64/.

Intel® Turbo Boost Technology is a way to automatically run the processor core faster than the marked frequency if the part is operating under power, temperature, and current specifications limits of the Thermal Design Power (TDP). This results in increased performance of both single and multi-threaded applications.

Processor virtualization which when used in conjunction with Virtual Machine Monitor software enables multiple, robust independent software environments inside a single platform.

hardware assist, under system software (Virtual Machine Manager or OS) control, for enabling I/O device virtualization. Intel VT-d also brings robust security by providing protection from errant DMAs by using DMA remapping, a key feature of Intel VT-d.

Intel’s 32-nm processor design, follow-on to the 32-nm 2nd Generation Intel® Core™ Processor Family design. It is the fir st pr oce sso r for us e in Intel® Xeon® processor E5-1600 and E5-2600 product families-based platforms. Intel® Xeon® processor E5-1600 product family and Intel® Xeon® processor E5-2600 product family supports Efficient Performance server, workstation and HPC platforms

Intel’s 32-nm processor design, follow-on to the 32-nm processor design. It is the first processor for use in Intel® Xeon® processor E5-4600 product familybased platforms. Intel® Xeon® processor E5-4600 product family supports scalable server and HPC platforms for two or mor e processors, i ncluding gluele ss four-way platforms.

A component of the processor package used to enhance the thermal performance of the package. Component thermal solutions interface with the processor at the IHS surface.

(UI).

The processor mates with the system board through this surface mount, LGA2011-0 land FCLGA10 contact socket, for the Intel® Xeon® processor E5 product family-based platform.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 21 Datasheet Volume One

Page 22

Overview

Term Description

LLC Last Level Cache LRDIMM Load Reduced Dual In-line Memory Module NCTF Non-Critical to Function: NCTF locations are typically redundant ground or non-

NEBS Network Equipment Building System. NEBS is the most common set of

PCH Platform Controller Hub (Intel® C600 Chipset). The next generation chipset with

PCU Power Control Unit PCI Express* 3.0 The third generation PCI Express* specification that oper ates at twice the speed

PCI Express* 3 PCI Express* Generation 3.0 PCI Express* 2 PCI Express* Generation 2.0 PCI Express* PCI Express* Generation 2.0/3.0 PECI Platform Environment Control Interface Phit Physical Unit. An Intel® QPI terminology defining units of tr ansfer at the physical

Processor The 64-bit, single-core or multi-core component (package) Processor Core The term “processor core” refers to silicon die itself which can contain multiple

RDIMM Registered Dual In-line Memory Module Rank A unit of DRAM corresponding four to eight devices in parallel, ignoring ECC.

Scalable-2S Intel® Xeon® processor E5 product family-based platform targeted for scalable

SCI System Control Interrupt. Used in ACPI protocol. SSE Intel® Streaming SIMD Extensions (Intel® SSE) SKU A processor Stock Keeping Unit (SKU) to be installed in either server or

SMBus System Management Bus. A two-wire interface through which simpl e system and

Storage Conditions A non-operational state. The processor may be installed in a platform, in a tray,

TAC Thermal Averaging Constant

critical reserved, so the loss of the solder joint continuity at end of life co nditions will not affect the overall product functionality.

environmental design guidelines applied to telecommunications equipment in the United States.

centralized platform capabilities including the main I/O interfaces along with display connectivity , audio features, power management, manageability , security and storage features.

of PCI Express* 2.0 (8 Gb/s); however, PCI Express* 3.0 is completely backward compatible with PCI Express* 1.0 and 2.0.

layer. 1 Phit is equal to 20 bits in ‘full width mode’ and 10 bits in ‘half width mode’

execution cores. Each execution core has an instruction cache, data cache, and 256-KB L2 cache. All execution cores share the L3 cache. All DC and signal integrity specifications are measured at the processor die (pads), unless otherwise noted.

These devices are usually, but not always, mounted on a single side of a DDR3 DIMM.

designs using third party Node Controller chip . In the se designs, Node Controlle r is used to scale the design beyond one/two/four sockets.

workstation platforms. Electrical, power and thermal specifications for these SKU’s are based on specific use condition assumptions. Server processors may be further categorized as Efficient Performance server, workstation and HPC SKUs. For further details on use condition assumptions, please refer to the latest Product Release Qualification (PRQ) Report available via your Customer Quality Engineer (CQE) contact.

power management related devices can communicate with the rest of the system. It is based on the principals of the operation of the I2C* two-wire serial bus from Philips Semiconductor.

or loose. Processors may be sealed in packaging or exposed to free air. Under these conditions, processor landings should not be connected to any supply voltages, have any I/Os biased or receive any clocks. Upon exposure to “free air” (i.e., unsealed packaging or a device removed from packaging material) the processor must be handled in accordance with moisture sensitivity labeling (MSL) as indicated on the packaging material.

22 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 23

Overview

Term Description

TDP Thermal Design Power TSOD Thermal Sensor on DIMM UDIMM Unbuffered Dual In-line Module Uncore The portion of the processor comprising the shared cache, IMC, HA, PCU, UBox,

and Intel QPI link interface.

Unit Interval Signaling convention that is binary and unidirectional. In this binary signaling,

CCD_01, VCCD_23

one bit is sent for every edge of the forwarded clock, whether it be a rising edge or a falling edge. If a number of edges are collected at instances t then the UI at instance “n” is defined as:

= t n - t n - 1

Processor core power supply Processor ground Variable power supply for the processor system memory interface. VCCD is the

generic term for V

CCD_01, VCCD_23.

, t2, tn,...., t

x1 Refers to a Link or Port with one Physical Lane x4 Refers to a Link or Port with four Physical Lanes x8 Refers to a Link or Port with eight Physical Lanes x16 Refers to a Link or Port with sixteen Physical Lanes

1.7 Related Documents

Refer to the following documents for additional information.

Table 1-1. Referenced Docum ents (Sheet 1 of 2)

Document Location

Intel® Xeon® Processor E5 Product Family Datasheet Volume Two http://www.intel.com Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Thermal/Mechanical Design Guide Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

– BSDL (Boundary Scan Description Language) Intel® C600 Series Chipset Data Sheet http://www.intel.com Intel® 64 and IA-32 Architectures Software Developer’s Manual

(SDM) Volumes 1, 2, and 3 Advanced Configuration and Power Interface Specification 3.0 http://www.acpi.info PCI Local Bus Specification 3.0 http://www.pcisig.com/specifications PCI Express Base Specification - Revision 2.1 and 1.1

PCI Express Base Specification - Revision 3.0 System Management Bus (SMBus) Specification http://smbus.org/ DDR3 SDRAM Specification http://www.jedec.org Low (JESD22-A119) and High (JESD-A103) Temperature Storage Life

Specifications Intel 64 and IA-32 Architectures Software Developer's Manuals

• Volume 1: Basic Architecture

• Volume 2A: Instruction Set Reference, A-M

• Volume 2B: Instruction Set Reference, N-Z

• Volume 3A: System Programming Guide

• Volume 3B: System Programming Guide

Intel® 64 and IA-32 Architectures Optimization Reference Manual

http://www.intel.com

http://www.pcisig.com

http://www.jedec.org

http://www.intel.com/products/proce ssor/manuals/index.htm

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 23 Datasheet Volume One

Page 24

Table 1-1. Referenced Documents (Sheet 2 of 2)

Document Location

Intel® Virtualization Technology Specification for Directed I/O Architecture Specificatio n

Intel® Trusted Execution Technology Software Development Guide http://www.intel.com/technology/sec

1.8 State of Data

The data contained within this document is the most accurate information available by the publication date of this document.

Overview

http://download.intel.com/technolog y/computing/vptech/Intel(r)_VT_for_ Direct_IO.pdf

urity/

24 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 25

Interfaces

2 Interfaces

This chapter describes the interfaces supported by the processor.

2.1 System Memory Interface

2.1.1 System Memory Technology Support

The Integrated Memory Controller (IMC) supports DDR3 protocols with four independent 64-bit memory channels with 8 bits of ECC for each channel (total of 72-bits) and supports 1 to 3 DIMMs per channel depending on the type of memory installed. The type of memory supported by the processor is dependent on the target platform:

• Intel® Xeon® processor E5 product family-based platforms support: — ECC registered DIMMs: with a maximum of three DIMMs per channel allowing

up to eight device ranks per channel.

— ECC and non-ECC unbuffered DIMMs: with a maximum of two DIMMs per

channel thus allowing up to four device ranks per channel. Support for mixed non-ECC with ECC un-buffered DIMM configurations.

2.1.2 System Memory Timing Support

The IMC supports the following DDR3 Speed Bin, CAS Write Latency (CWL), and command signal mode timings on the main memory interface:

• tCL = CAS Latency

• tRCD = Activate Command to READ or WRITE Command delay

• tRP = PRECHARGE Command Period

• CWL = CAS Write Latency

• Command Signal modes = 1n indicates a new command may be issued every clock

and 2n indicates a new command may be issued every 2 clocks. Command launch mode programming depends on the transfer rate and memory configuration.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 25 Datasheet Volume One

Page 26

2.2 PCI Express* Interface

Transaction

Data Link

Physical

Logical Sub-Block

Electrical Sub-Block

RX TX

Transaction

Data Link

Physical

Logical Sub-Block

Electrical Sub-Block

RX TX

Transaction

Data Link

Physical

Logical Sub-Block

Electrical Sub-Block

RX TX

Transaction

Data Link

Physical

Logical Sub-Block

Electrical Sub-Block

RX TX

Interfaces

This section describes the PCI Express* 3.0 interface capabilities of the processor. See the PCI Express* Base Specification for details of PCI Express*

2.2.1 PCI Express* Architecture

Compatibility with the PCI addressing model is maintained to ensure that all existing applications and drivers operate unchanged. The PCI Express* configuration uses standard mechanisms as defined in the PCI Plug-and-Play specification.

The PCI Express* architecture is specified in three layers: T ransaction Layer, Data Link Layer, and Physical Layer. The partitionin g in the component is not necessarily along these same boundaries. Refer to Figure 2-1 for the PCI Express* Layering Diagram.

Figure 2-1. PCI Express* Layering Diagram

3.0.

26 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

PCI Express* uses packets to communicate information between components. Packets are formed in the Transaction and Data Link Layers to carry the information from the transmitting component to the receiving component. As the transmitted packets flow through the other layers, they are extended with additional information necessary to handle packets at those layers. At the receiving side, the reverse process occurs and packets get transformed from their Physical Layer representation to the Data Link Layer representation and finally (for Transaction Layer Packets) to the form that can be processed by the Transaction Layer of the receiving device.

Datasheet Volume One

Page 27

Interfaces

Framing

Sequence

Number

Header Data LCRCECRC Framing

Transaction Layer

Physical Layer

Data Link Layer

Figure 2-2. Packet Flow through the Layers

2.2.1.1 Transaction Layer

The upper layer of the PCI Express* architecture is the Transaction Layer. The Transaction Layer's primary responsibility is the assembly and disassembly of Transaction Layer Packets (TLPs). TLPs are used to communicate transactions, such as read and write, as well as certain types of events. The Transaction Layer also manages flow control of TLPs.

2.2.1.2 Data Link Layer

The middle layer in the PCI Express* stack, the Data Link Layer, serves as an intermediate stage between the Transaction Layer and the Physical Layer. Responsibilities of Data Link Layer include link management, error detection, and error correction.

The transmission side of the Data Link Layer accepts TLPs assembled by the Transaction Layer, calculates and applies data protection code and TLP sequence number, and submits them to Physical Layer for transmission across the Link. The receiving Data Link Layer is responsible for checking the integrity of received TLPs and for submitting them to the T ransaction Layer for further processing. On detection of TLP error(s), this layer is responsible for requesting retransm ission of TLPs until information is correctly received, or the Link is determined to have failed. The Data Link Layer also generates and consumes packets which are used for Link management functions.

2.2.1.3 Physical Layer

The Physical Layer includes all circuitry for interface operation, including driver and input buffers, parallel-to-serial and serial-to-parallel conversion, PLL(s), and impedance matching circuitry . It also includes logical functions related to interface initialization and maintenance. The Physical Layer exchanges data with the Data Link Layer in an implementation-specific format, and is responsible for converting this to an appropriate serialized format and transmitting it across the PCI Express* Link at a frequency and width compatible with the remote device.

2.2.2 PCI Express* Configuration Mechanism

The PCI Express* link is mapped through a PCI-to-PCI bridge structure. PCI Express* extends the configuration space to 4096 bytes per-device/function, as

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 27 Datasheet Volume One

compared to 256 bytes allowed by the Conventional PCI Specification. PCI Express* configuration space is divided into a PCI-compatible region (which consists of the first

Page 28

256 bytes of a logical device's configuration space) and an extended PCI Express* region (which consists of the remaining configuration space). The PCI-compatible region can be accessed using either the mechanisms defined in the PCI specification or using the enhanced PCI Express* configuration access mechanism described in the PCI Express* Enhanced Configuration Mechanism section.

The PCI Express* Host Bridge is required to translate the memory-mapped PCI Express* configuration space accesses from the host processor to PCI Express* configuration cycles. To maintain compatibility with PCI configuration addressing mechanisms, it is recommended that system software access the enhanced configuration space using 32-bit operations (32-bit aligned) only.

See the PCI Express* Base Specification for details of both the PCI-compatible and PCI Express* Enhanced configuration mechanisms and transaction rules.

2.3 DMI2/PCI Express* Interface

Direct Media Interface 2 (DMI2) connects the processor to the Platform Controller Hub (PCH). DMI2 is similar to a four-lane PCI Express* supporting a speed of 5 GT/s per lane. This interface can be configured at power-on to serve as a x4 PCI Express* link based on the setting of the SOCKET_ID[1:0] and FRMAGENT signal for processors not connected to a PCH.

Interfaces

Note: Only DMI2 x4 configuration is supported.

2.3.1 DMI2 Error Flow

DMI2 can only generate SERR in response to errors, never SCI, SMI, MSI, PCI INT, or GPE. Any DMI2 related SERR activity is associated with Device 0.

2.3.2 Processor/PCH Compatibility Assumptions

The processor is compatible with the PCH and is not compatible with any previous MCH or ICH products.

2.3.3 DMI2 Link Down

The DMI2 link going down is a fatal, unrecoverable error. If the DMI2 data link goes to data link down, after the link was up, then the DMI2 link hangs the system by not allowing the link to retrain to prevent data corruption. This is controlled by the PCH.

Downstream transactions that had been successfully transmitted across the link prior to the link going down may be processed as normal. No completions from downstream, non-posted transactions are returned upstream over the DMI2 link after a link down event.

2.4 Intel QuickPath Interconnect

The Intel QuickPath Interconnect is a high speed, packetized, point-to-point interconnect used in the 2nd Generation Intel(r) Core(TM) Processor Family. The narrow high-speed links stitch together processors in distributed shared memory and integrated I/O platform architecture. It offers much higher bandwidth with low latency. The Intel QuickPath Interconnect has an efficient architecture allowing more interconnect performance to be achieved in real systems. It has a snoop protocol

28 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 29

Interfaces

optimized for low latency and high scalability, as well as packet and lane structures enabling quick completions of transactions. Reliability, availability, and serviceability features (RAS) are built into the architecture.

The physical connectivity of each interconnect link is made up of twenty differential signal pairs plus a differential forwarded clock. Each port supports a link pair consisting of two uni-directional links to complete the connection between two components. This supports traffic in both directions simultaneously. To facilitate flexibility and longevity, the interconnect is defined as having five layers: Physical, Link, R outing, Transport, and Protocol.

• The Physical layer consists of the actual wires carrying the signals, as well as

circuitry and logic to support ancillary features required in the transmission and receipt of the 1s and 0s. The unit of transfer at the Physical layer is 20-bits, which is called a Phit (for Physical unit).

• The Link layer is responsible for reliable transmission and flow control. The Link

layer’s unit of transfer is 80-bits, which is called a Flit (for Flow control unit).

• The Routing layer provides the framework for directing packets through the

fabric.

• The Transport layer is an architecturally defined layer (not implemented in the

initial products) providing advanced routing capability for reliable end-to-end transmission.

• The Protocol layer is the high-level set of rules for exchanging packets of data

between devices. A packet is comprised of an integral number of Flits.

The Intel QuickPath Interconnect includes a cache coherency protocol to keep the distributed memory and caching structures coherent during system operation. It supports both low-latency source snooping and a scalable home snoop behavior. The coherency protocol provides for direct cache-to-cache transfers for optimal latency.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 29 Datasheet Volume One

Page 30

2.5 Platform Environment Control Interface (PECI)

The Platform Environment Control Interface (PECI) uses a single wire for self-clocking and data transfer. The bus requires no additional control lines. The physical layer is a self-clocked one-wire bus that begins each bit with a driven, rising edge from an idle level near zero volts. The duration of the signal driven high depends on whether the bit value is a logic ‘0’ or logic ‘1’. PECI also includes variable data transfer rate established with every message. In this way, it is highly flexible even though underlying logic is simple.

The interface design was optimized for interfacing to Intel processor and chipset components in both single processor and multiple processor environments. The single wire interface provides low board routing overhead for the multiple load connections in the congested routing area near the processor and chipset components. Bus speed, error checking, and low protocol overhead provides adequate link bandwidth and reliability to transfer critical device operating conditions and configuration information.

The PECI bus offers:

• A wide speed range from 2 Kbps to 2 Mbps

• CRC check byte used to efficiently and atomically confirm accurate data delivery

• Synchronization at the beginning of every message minimizes device timing accuracy requirements

Note: The PECI commands described in this document apply primarily to the Intel® Xeon®

processor E5-1600/E5-2600/E5-4600 product families. The processors utilizes the capabilities described in this document to indicate support for four memory channels. Refer to Table 2-1 for the list of PECI commands supported by the processors.

Table 2-1. Summary of Processor-specific PECI Commands

Command Supported on the Processor

Ping() Yes

GetDIB() Yes

GetTemp() Yes RdPkgConfig() Yes WrPkgConfig() Yes

RdIAMSR() Yes

WrIAMSR() No RdPCIConfig() Yes WrPCIConfig() No

RdPCIConfigLocal() Yes WrPCIConfigLocal() Yes

2.5.1 PECI Client Capabilities

The processor PECI client is designed to support the following sideband functions:

• Processor and DRAM thermal management

• Platform manageability functions including thermal, power, and error monitoring — The platform ‘power’ management includes monitoring and control for both the

processor and DRAM subsystem to assist with data center power limiting.

• Processor interface tuning and diagnostics capabilities (Intel® Interconnect BIST).

30 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 31

2.5.1.1 Thermal Management

Processor fan speed control is managed by comparing Digital Thermal Sensor (DTS) thermal readings acquired via PECI against the processor-specific fan speed control reference point, or T

CONTROL

. Both T

CONTROL

and DTS thermal readings are accessible via the processor PECI client. These variables are referenced to a common temperature, the TCC activation point, and are both defined as negative offsets from that reference.

PECI-based access to the processor package configuration space provides a means for Baseboard Management Controllers (BMCs) or other platform management devices to actively manage the processor and memory power and thermal features. Details on the list of available power and thermal optimization services can be found in

Section 2.5.2.6.

2.5.1.2 Platform Manageability

PECI allows read access to certain error registers in the processor MSR space and status monitoring registers in the PCI configuration space within the processor and downstream devices. Details are covered in subsequent sections.

PECI permits writes to certain Memory Controller RAS-related registers in the processor PCI configuration space. Details are covered in Section 2.5.2.10.

2.5.1.3 Processor Interface Tuning and Diagnostics

The processor Intel® Interconnect Built In Self Test (Intel® IBIST) allows for in-field diagnostic capabilities in the Intel® QPI and memory controller interfaces. PECI provides a port to execute these diagnostics via its PCI Configuration read and write capabilities in the BMC INIT mode. Refer to Section 2.5.3.7 for more details.

2.5.2 Client Command Suite

PECI command requires at least one frame check sequence (FCS) byte to ensure reliable data exchange between originator and client. The PECI message protocol defines two FCS bytes that are returned by the client to the message originator. The first FCS byte covers the client address byte, the Read and Write Length bytes, and all bytes in the write data block. The second FCS byte covers the read response data returned by the PECI client. The FCS byte is the result of a cyclic redundancy check (CRC) of each data block.

2.5.2.1 Ping()

Ping() is a required message for all PECI devices. This message is used to enumerate devices or determine if a device has been removed, been powered-off, etc. A Ping() sent to a device address always returns a non-zero Write FCS if the device at the targeted address is able to respond.

2.5.2.1.1 Command Format

The Ping() format is as follows:

Write Length: 0x00 Read Length: 0x00

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 31 Datasheet Volume One

Page 32

Figure 2-3. Ping()

Byte #

Byte

Definition

Client Address

Write Length

0x00

Read Length

0x00

FCS

Byte #

Byte

Definition

0x30

0x00

0xe1

Byte #

Byte

Definition

Client Address

Write Length

0x01

Read Length

0x08

FCS

Cmd Code

0xf7

Device Info

Revision

Number

Reserved

FCS

An example Ping() command to PECI device address 0x30 is shown below.

Figure 2-4. Ping() Example

2.5.2.2 GetDIB()

The processor PECI client implementation of GetDIB() includes an 8-byte response and provides information regarding client revision number and the number of supported domains. All processor PECI clients support the GetDIB() command.

2.5.2.2.1 Command Format

The GetDIB() format is as follows:

Write Length: 0x01 Read Length: 0x08 Command: 0xf7

Figure 2-5. GetDIB()

32 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 33

2.5.2.2.2 Device Info

Reserved

# of Domains

Reserved

76543210

Byte# 5

Major Revision# Minor Revision#

Byte# 6

The Device Info byte gives details regarding the PECI client configuration. At a minimum, all clients supporting GetDIB will return the number of domains inside the package via this field. With any client, at least one domain (Domain 0) must exist. Therefore, the Number of Domains reported is defined as the number of domains in addition to Domain 0. For example, if bit 2 of the Device Info byte returns a ‘1’, that would indicate that the PECI client supports two domains.

Figure 2-6. Device Info Field Definition

2.5.2.2.3 Revision Number

All clients that support the GetDIB command also support Revision Number reporting. The revision number may be used by a host or originator to manage different command suites or response codes from the client. Revision Number is always reported in the second byte of the GetDIB() response. The ‘Major Revision’ number in Figure 2-7 always maps to the revision number of the PECI specification that the PECI client processor is designed to. The ‘Minor Revision’ number value depends on the exact command suite supported by the PECI client as defined in Table 2-2.

Figure 2-7. Revision Number Definition

Table 2-2. Minor Revision Number Meaning

Minor Revision Supported Command Suite

0 Ping(), GetDIB(), GetTemp() 1 Ping(), GetDIB(), GetTemp(), WrPkgConfig(), RdPkgConfig() 2 Ping(), GetDIB(), GetTemp(), WrPkgConfig(), RdPkgConfig(), RdIAMSR() 3 Ping(), GetDIB(), GetTemp(), WrPkgConfig(), RdPkgConfig(), RdIAMSR(),

4 Ping(), GetDIB(), GetTemp(), WrPkgConfig(), RdPkgConfig(), RdIAMSR(),

RdPCIConfigLocal(), WrPCIConfigLocal(), RdPCIConfig()

RdPCIConfigLocal(), WrPCIConfigLocal()

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 33 Datasheet Volume One

Page 34

Table 2-2. Minor Revision Number Meaning

Byte #

Byte

Definition

Client Address

Write Length

0x01

Read Length

0x02

FCS

Temp[7:0]

Temp[15:8]

FCS

Cmd Code

0x01

Minor Revision Supported Command Suite

5 Ping(), GetDIB(), GetTemp(), WrPkgConfig(), RdPkgConfig(), RdIAMSR(),

6 Ping(), GetDIB(), GetTemp(), WrPkgConfig(), RdPkgConfig(), RdIAMSR(),

RdPCIConfigLocal(), WrPCIConfigLocal(), RdPCIConfig(), WrPCIConfig()

RdPCIConfigLocal(), WrPCIConfigLocal(), RdPCIConfig(), WrPCIConfig(), WrIAMSR()

For the processor PECI client the Revision Number will return ‘0011 0100b’.

2.5.2.3 GetTemp()

The GetTemp() command is used to retrieve the maximum die temperature from a target PECI address. The temperature is used by the external thermal management system to regulate the temperature on the die. The data is returned as a negative value representing the number of degrees centigrade below the maximum processor junction temperature (T corresponds to the processor T which the processor Thermal Control Circuit activates. The actual value that the thermal management system uses as a control set point (T negative number below T issuing a PECI RdPkgConfig() command as described in Section 2.5.2.4 or using a RDMSR instruction. T

CONTROL

the Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families Thermal/ Mechanical Design Guide.

Please refer to Section 2.5.7 for details regarding PECI temperature data formatting.

). The maximum PECI temperature value of zero

jmax

. T

jmax

CONTROL

application to fan speed control management is defined in

. This also represents the default temperature at

CONTROL

) is also defined as a

may be extracted from the processor by

2.5.2.3.1 Command Format

The GetTemp() format is as follows:

Write Length: 0x01 Read Length: 0x02 Command: 0x01 Description: Returns the highest die temperature for addressed processor PECI client.

Figure 2-8. GetTemp()

34 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 35

Example bus transaction for a thermal sensor device located at address 0x30 returning

Byte #

Byte

Definition

0x30

0x01

0x02

0xef

0x80

0xfd

0x4b

0x01

a value of negative 10 counts is show in Figure 2-9.

Figure 2-9. GetTemp() Example

2.5.2.3.2 Supported Responses

The typical client response is a passing FCS and valid thermal data. Under some conditions, the client’s response will indicate a failure. GetTemp() response definitions are listed in Table 2-3. Refer to Section 2.5.7.4 for more details on sensor errors.

Table 2-3. GetTemp() Response Definition

Response Meaning

General Sensor Error (GSE) Bad Write FCS Electrical error Abort FCS Illegal command formatting (mismatched RL/WL/Command Code)

0x0000 All other data Valid temperature reading, reported as a negative offset from the processor

Thermal scan did not complete in time. Retry is appropriate.

Processor is running at its maximum temperature or is currently being reset.

jmax

Notes:

1. This response will be reflected in Bytes 5 & 6 in Figure 2-9.

2.5.2.4 RdPkgConfig()

The RdPkgConfig() command provides read access to the package configuration space (PCS) within the processor, including various power and thermal management functions. Typical PCS read services supported by the processor may include access to temperature data, energy status, run time information, DIMM temperatures and so on. Refer to Section 2.5.2.6 for more details on processor-specific services supported through this command.

2.5.2.4.1 Command Format

The RdPkgConfig() format is as follows:

Write Length: 0x05 Read Length: 0x05 (dword) Command: 0xa1 Description: Returns the data maintained in the processor package configuration

space for the PCS entry as specified by the ‘index’ and ‘parameter’ fields. The ‘index’ field contains the encoding for the requested service and is used in conjunction with the ‘parameter’ field to specify the exact data being requested. The Read Length dictates the desired data return size. This command supports only dword responses on the

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 35 Datasheet Volume One

Page 36

processor PECI clients. All command responses are prepended with a completion code that contains additional pass/fail status information. Refer to Section 2.5.5.2 for details regarding completion codes.

Figure 2-10. RdPkgConfig()

Note: The 2-byte parameter field and 4-byte read data field defined in Figure 2-10 are sent in standard PECI ordering with LSB

first and MSB last.

2.5.2.4.2 Supported Responses

The typical client response is a passing FCS, a passing Completion Code and valid data. Under some conditions, the client’s response will indicate a failure.

Table 2-4. RdPkgConfig() Response Definition

Response Meaning

Bad Write FCS Electrical error

Abort FCS Illegal command formatting (mismatched RL/WL/Command Code)

CC: 0x40 Command passed, data is valid. CC: 0x80 Response timeout. The processor is not able to gen er ate the req uired respon se in a time ly

CC: 0x81 Response timeout. The processor is not able to allocate resources for servicing this

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmware or associated logic error. The processor is unable to

fashion. Retry is appropriate.

command at this time. Retry is appropriate.

process the request.

2.5.2.5 WrPkgConfig()

The WrPkgConfig() command provides write access to the package configuration space (PCS) within the processor, including various power and thermal management functions. Typical PCS write services supported by the processor may include power limiting, thermal averaging constant programming and so on. Refer to Section 2.5.2.6 for more details on processor-specific services supported through this command.

2.5.2.5.1 Command Format

The WrPkgConfig() format is as follows: Write Length: 0x0a(dword)

36 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 37

Read Length: 0x01 Command: 0xa5 AW FCS Support: Yes Description: Writes data to the processor PCS entry as specified by the ‘index’ and

‘parameter’ fields. This command supports only dword data writes on the processor PECI clients. All command responses include a completion code that provides additional pass/fail status information. Refer to Section 2.5.5.2 for details regarding completion codes.

The Assured Write FCS (AW FCS) support provides the processor client a high degree of confidence that the data it received from the host is correct. This is especially critical where the consumption of bad data might result in improper or non-recoverable operation.

Figure 2-11. WrPkgConfig()

Note: The 2-byte parameter field and 4-byte write data field defined in Figure 2-11 are sent in standard PECI

ordering with LSB fir st and MSB last.

2.5.2.5.2 Supported Responses

The typical client response is a passing FCS, a passing Completion Code and valid data. Under some conditions, the client’s response will indicate a failure.

Table 2-5. WrPkgConfig() Response Definition (Sheet 1 of 2)

Response Meaning

Bad Write FCS Electrical error or AW FCS failure

Abort FCS Illegal command formatting (mismatched RL/WL/Command Code)

CC: 0x40 Command passed, data is valid.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 37 Datasheet Volume One

Page 38

Table 2-5. WrPkgConfig() Response Definition (Sheet 2 of 2)

Response Meaning

CC: 0x80 Response timeout. The processor was not able to generate the required response in a

CC: 0x81 Response timeout. The processor is not able to allocate resources for servicing this

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmware or associated logic error. The processor is unable to

timely fashion. Retry is appropriate.

command at this time. Retry is appropriate.

process the request.

2.5.2.6 Package Configuration Capabilities

Table 2-6 combines both read and write services. Any service listed as a “read” would

use the RdPkgConfig() command and a service listed as a “write” would use the WrPkgConfig() command. PECI requests for memory temperature or other data generated outside the processor package do not trigger special polling cycles on the processor memory or SMBus interfaces to procure the required information.

2.5.2.6.1 DRAM Thermal and Power Optimization Capabilities

DRAM thermal and power optimization (also known as RAPL or “Running Average Power Limit”) services provide a way for platform thermal management solutions to program and access DRAM power, energy and temperature parameters. Memory temperature information is typically used to regulate fan speeds, tune refresh rates and throttle the memory subsystem as appropriate. Memory temperature data may be derived from a variety of sources including on-die or on-board DIMM sensors, DRAM activity information or a combination of the two. Though memory temperature data is a byte long, range of actual temperature values are determined by the DIMM specifications and operating range.

Note: DRAM related PECI services described in this section apply only to the memory

connected to the specific processor PECI client in question and not the overall platform memory in general. For estimating DRAM thermal information in closed loop throttling mode, a dedicated SMBus is required between the CPU and the DIMMs. The processor PCU requires access to the VR12 voltage regulator for reading average output current information through the SVID bus for initial DRAM RAPL related power tuning.

Table 2-6 provides a summary of the DRAM power and thermal optimization capabilities

that can be accessed over PECI on the processor. The Index values referenced in

Table 2-6 are in decimal format. Table 2-6 also provides information on alternate inband mechanisms to access similar

or equivalent information through register reads and writes where applicable. The user should consult the Intel® 64 and IA-32 Architectures Software Developer’s Manual

(SDM) Volumes 1, 2, and 3 or Intel® Xeon® Processor E5 Product Family Datasheet Volume Two for details on MSR and CSR register contents.

38 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 39

Table 2-6. RdPkgConfig() & WrPkgConfig() DRAM Thermal and Power Optimization

Services Summary (Sheet 1 of 2)

Service

DRAM Rank Temperature Write 18

DIMM Temperature Read 14

DIMM Ambient Temperature Write / Read

DRAM Channel Temperature Read

Accumulated DRAM Energy Read

DRAM Power Info Read

Index

Value

(decimal)

19 0x0000 N/A

19 0x0000

22 0x0000

35 0x0000

36 0x0000

Parameter

Value

(word)

Channel Index &

DIMM Index

Channel

Index

Channel

Index

0x00FF - All

Channels

RdPkgConfig()

Data

(dword)

N/A

Absolute

temperature in

Degrees Celsius for

DIMMs 0, 1, & 2

Absolute

temperature in

Degrees C to be

used as ambient

temperature

reference

Maximum of all rank

temperatures for

each channel in Degrees Celsius

DRAM energy

consumed by the

DIMMs

Typical and

minimum DRAM

power settings

Maximum DRAM

power settings &

maximum time

window

WrPkgConfig()

Data

(dword)

Absolute

temperature in

Degrees Celsius

for ranks 0, 1, 2

& 3

N/A

Absolute

temperature in

Degrees C to be

used as ambient

temperature

reference

N/A

Description

Write

temperature for

each rank within

a single DIMM.

Read

temperature of

each DIMM

within a

channel.

Write ambient

temperature reference for

activity-based

rank

temperature

estimation.

Read ambient

temperature reference for

activity-based

rank

temperature

estimation.

Read the

maximum DRAM

channel

temperature.

Read the DR AM

energy consumed by all the DIMMs in all

the channels or

all the DIMMs

within a

specified

channel.

Read DRAM power settings info to be used

by power

limiting entity.

Read DRAM power settings info to be used

by power

limiting entity

Alternate Inband

MSR or CSR

Access

N/A

DIMMTEMPSTAT_[0:2]

DRAM_ENERGY_STAT US

DRAM_ENERGY_STATUS_C

CSR: DRAM_POWER_INFO

CSR:

N/A

MSR 619h:

CSR:

H[0:3]

MSR 61Ch:

DRAM_POWER_INFO

MSR 61Ch:

DRAM_POWER_INFO

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 39 Datasheet Volume One

Page 40

Table 2-6. RdPkgConfig() & WrPkgConfig() DRAM Thermal and Power Optimization

Memory Thermal Estimation Configuration Data

RESERVED

BETA VARIABLE

THETA VARIABLE

1920

Services Summary (Sheet 2 of 2)

Service

DRAM Power Limit Data Write / Read

DRAM Power Limit Performance Status Read

Notes:

1. Time, energy and power units should be ass umed, where applicable, to be based on values returne d by a read of the PACKAGE_POWER_SKU_UNIT MSR or through the Package Power SKU Unit PCS read service.

Index Value

(decimal)

34 0x0000 N/A

34 0x0000

38 0x0000

Parameter

Value

(word)

RdPkgConfig()

Data

(dword)

DRAM Plane Power

Limit Data

Accumulated DRAM

throttle time

WrPkgConfig()

Data

(dword)

DRAM Plane

Power Limit Data

N/A

Description

Write DRAM

Power Limit Data

Read DRAM

Power Limit Data

Read sum of all

time durations for which each

DIMM has been

throttled

Alternate Inband

MSR or CSR

Access

MSR 618h:

DRAM_POWER_LIMIT

DRAM_PLANE_POWER_LIM

DRAM_RAPL_PERF_STATUS

CSR:

MSR 618h:

DRAM_POWER_LIMIT

CSR:

2.5.2.6.2 DRAM Thermal Estimation Configuration Data Read/Write

This feature is relevant only when activity-based DRAM temperature estimation methods are being utilized and would apply to all the DIMMs on all the memory channels. The write allows the PECI host to configure the ‘β’ and ‘θ’ variables in

Figure 2-12 for DRAM channel temperature filtering as per the equation below:

TN = β ∗ T

TN and T

+ θ ∗ ΔEnergy

N-1

are the current and previous DRAM temperature estimates respectively in

N-1

degrees Celsius, ‘β’ is the DRAM temperature decay factor, ‘ΔEnergy’ is the energy difference between the current and previous memory transactions as determined by the processor power control unit and ‘θ’ is the DRAM energy-to-temperature translation coefficient. The default value of ‘β’ is 0x3FF. ‘θ’ is defined by the equation:

θ = (1 - β) ∗ (Thermal Resistance) ∗ (Scaling Factor)

The ‘Thermal Resistance’ serves as a multiplier for translation of DRAM energy changes to corresponding temperature changes and may be derived from actual platform characterization data. The ‘Scaling Factor’ is used to convert memory transaction information to energy units in Joules and can be derived from system/memory configuration information. Refer to the Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM) Volumes 1, 2, and 3 for methods to program and access ‘Scaling Factor’ information.

Figure 2-12. DRAM Thermal Estimation Configuration Data

40 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 41

2.5.2.6.3 DRAM Rank Temperature Write

015 7

Rank Temperature Data

Rank# 3 Absolute Temp (in Degrees C)

Rank# 2 Absolute Temp (in Degrees C)

Rank# 1

Absolute Temp

(in Degrees C)

Rank# 0 Absolute Temp (in Degrees C)

816232431

Parameter format

Reserved DIMM Index

Channel Index

5 3 0

This feature allows the PECI host to program into the processor, the temperature for all the ranks within a DIMM up to a maximum of four ranks as shown in Figure 2-13. The DIMM index and Channel index are specified through the parameter field as shown in

Table 2-7. This write is relevant in platforms that do not have on-die or on-board

DIMM thermal sensors to provide memory temperature information or if the processor does not have direct access to the DIMM thermal sensors. This temperature information is used by the processor in conjunction with the activity-based DRAM temperature estimations.

Table 2-7. Channel & DIMM Index Decoding

Index Encoding Physical Channel# Physical DIMM#

000 0 0 001 1 1 010 2 2 011 3 Reserved

Figure 2-13. DRAM Rank Temperature Write Data

2.5.2.6.4 DIMM Temperature Read

This feature allows the PECI host to read the temperature of all the DIMMs within a channel up to a maximum of three DIMMs. This read is not limited to platforms using a particular memory temperature source or temperature estimation method. For platforms using DRAM thermal estimation, the PCU will provide the estimated temperatures. Otherwise, the data represents the latest DIMM temperature provided by the TSOD or on-board DIMM sensor and requires that CLTT (closed loop throttling mode) be enabled and OLT T (open loop throttling mode) be disabled. Refer to Table 2-7 for channel index encodings.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 41 Datasheet Volume One

Page 42

Figure 2-14. The Processor DIMM Temperature Read / Write

15 7 0

DIMM Temperature Data

Reserved

DIMM# 2

Absolute Temp

(in Degrees C)

DIMM# 1

Absolute Temp

(in Degrees C)

DIMM# 0

Absolute Temp

(in Degrees C)

816232431

Parameter format

Reserved Channel Index

3 0

7 0

Ambient Temperature Reference Data

Reserved

Ambient

Temperature

(in Degrees C)

831

2.5.2.6.5 DIMM Ambient Temperature Write / Read

This feature allows the PECI host to provide an ambient temperature reference to be used by the processor for activity-based DRAM temperature estimation. This write is used only when no DIMM temperature information is available from on-board or on-die DIMM thermal sensors. It is also possible for the PECI host controller to read back the DIMM ambient reference temperature.

Since the ambient temperature may vary ov er time within a system, it is recommended that systems monitoring and updating the ambient temperature at a fast rate use the ‘maximum’ temperature value while those updating the ambient temperature at a slow rate use an ‘average’ value. The ambient temperature assumes a single value for all memory channel/DIMM locations and does not account for possible temperature variations based on DIMM location.

Figure 2-15. Ambient Temperature Reference Data

2.5.2.6.6 DRAM Channel Temperature Read

This feature enables a PECI host read of the maximum temperature of each channel. This would include all the DIMMs within the channel and all the ranks within each of the DIMMs. Channels that are not populated will return the ‘ambient temperature’ on systems using activity-based temperature estimations or alternatively return a ‘zero’ for systems using sensor-based temperatures.

42 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 43

Figure 2-16. Processor DRAM Channel Temperature

15 7 0

Channel Temperature Data

Channel 3 Maximum

Temperature

(in Degrees C)

Channel 2

Maximum

Temperature

(in Degrees C)

Channel 1 Maximum

Temperature

(in Degrees C)

Channel 0

Maximum

Temperature

(in Degrees C)

816232431

Accumulated DRAM Energy Data

Accumulated DRAM Energy

Parameter format

Reserved Channel Index

3 0

2.5.2.6.7 Accumulated DRAM Energy Read

This feature allows the PECI host to read the DRAM energy consumed by all the DIMMs within all the channels or all the DIMMs within just a specified channel. The parameter field is used to specify the channel index. Units used are defined as per the Package Power SKU Unit read described in Section 2.5.2.6.11. This information is tracked by a 32-bit counter that wraps around. The channel index in Figure 2-17 is specified as per the index encoding described in Table 2-7. A channel index of 0x00FF is used to specify the “all channels” case. While Intel requires reading the accumulated energy data at least once every 16 seconds to ensure functional correctness, a more realistic polling rate recommendation is once every 100 mS for better accuracy. This feature assumes a 200W memory capacity. In general, as the power capability decreases, so will the minimum polling rate requirement.

When determining energy changes by subtracting energy values between successive reads, Intel advocates using the 2’s complement method to account for counter wraparounds. Alternatively, adding all ‘F’s (‘0xFFFFFFFF’) to a negative result from the subtraction will accomplish the same goal.

Figure 2-17. Accumulated DRAM Energy Data

2.5.2.6.8 DRAM Power Info Read

This read returns the minimum, typical and maximum DRAM power settings and the maximum time window over which the power can be sustained for the entire DRAM domain and is inclusive of all the DIMMs within all the memory channels. Any power values specified by the power limiting entity that is outside of the range specified through these settings cannot be guaranteed. Since this data is 64 bits wide, PECI facilitates access to this register by allowing two requests to read the lower 32 bits and upper 32 bits separately as shown in Table 2-6. Power and time units for this read are defined as per the Package Power SKU Unit settings described in Section 2.5.2.6.11.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 43 Datasheet Volume One

Page 44

The minimum DRAM power in Figure 2-18 corresponds to a minimum bandwidth

DRAM_POWER_INFO (lower bits)

Reserved

Minimum DRAM Power

TDP DRAM Powe r

(Typical Value)

30 015

Reserved

DRAM_POWER_INFO (upper bits)

Maximum DRAM Power

3246

Reserved

Maximum Time

Window

4854

Reserved

5563

setting of the memory interface. It does ‘not’ correspond to a processor IDLE or memory self-refresh state. The ‘time window’ in Figure 2-18 is representative of the rate at which the power control unit (PCU) samples the DRAM energy consumption information and reactively takes the necessary measures to meet the imposed power limits. Programming too small a time window may not give the PCU enough time to sample energy information and enforce the limit while too large a time window runs the risk of the PCU not being able to monitor and take timely action on energy excursions. While the DRAM power setting in Figure 2-18 provides a maximum value for the ‘time window’ (typically a few seconds), the minimum value may be assumed to be ~100 mS.

The PCU programs the DRAM power settings described in Figure 2-18 when DRAM characterization has been completed by the memory reference code (MRC) during boot as indicated by the setting of the RST_CPL bit of the BIOS_RESET_CPL register. The DRAM power settings will be programmed during boot independent of the ‘DRAM Power Limit Enable’ bit setting. Please refer to the Intel® Xeon® Processor E5 Product Family Datasheet Volume Two for information on memory energy estimation methods and energy tuning options used by BIOS and other utilities for determining the range specified in the DRAM power settings. In general, any tuning of the power settings is done by polling the voltage regulators supplying the DIMMs.

Figure 2-18. DRAM Power Info Read Data

2.5.2.6.9 DRAM Power Limit Data Write / Read

This feature allows the PECI host to program the power limit over a specified time or control window for the entire DRAM domain covering all the DIMMs within all the memory channels. Actual values are chosen based on DRAM power consumption characteristics. The units for the DRAM Power Limit and Control Time Window are determined as per the Package Power SKU Unit settings described in

Section 2.5.2.6.11. The DRAM Power Limit Enable bit in Figure 2-19 should be set to

activate this feature. Exact DRAM power limit values are largely determined by platform memory configuration. As such, this feature is disabled by default and there are no defaults associated with the DRAM power limit values. The PECI host may be used to

44 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

enable and initialize the power limit fields for the purposes of DRAM power budgeting. Alternatively, this can also be accomplished through inband writes to the appropriate registers. Both power limit enabling and initialization of power limit values can be done in the same command cycle. All RAPL parameter values including the power limit value, control time window, and enable bit will have to be specified correctly even if the intent is to change just one parameter value when programming over PECI.

Datasheet Volume One

Page 45

The following conversion formula should be used for encoding or programming the

DRAM_POW ER_LIMIT Data

DRAM

Power Limit

Enable

1523

DRAM Power Limit

14 0

RESERVED

Control Time

Window

1731

RESERVED

DRAM Power Limit Performance

Accumulated DRAM Throttle Time

‘Control Time Window’ in bits [23:17]. Control Time Window (in seconds) = ([1 + 0.25 * ‘x’] * 2 ‘x’ = integer value of bits[23:22] ‘y’ = integer value of bits[21:17] ‘z’ = Package Power SKU Time Unit[19:16] (see Section 2.5.2.6.13 for details on

Package Power SKU Unit) For example, using this formula, a control time value of 0x0A will correspond to a

‘1-second’ time window. A valid range for the value of the ‘Control Time Window’ in

Figure 2-19 that can be programmed into bits [23:17] is 250 mS - 40 seconds.

From a DRAM power management standpoint, all post-boot DRAM power management activities (also referred to as ‘DRAM RAPL’ or ‘DRAM Running Average Power Limit’) should be managed exclusively through a single interface like PECI or alternatively an inband mechanism. If PECI is being used to manage DRAM power budgeting activities, BIOS should lock out all subsequent inband DRAM power limiting accesses by setting bit 31 of the DRAM_POWER_LIMIT MSR or DRAM_PLANE_POWER_LIMIT CSR to ‘1’.

Figure 2-19. DRAM Power Limit Data

‘y’

) * ‘z’ where

2.5.2.6.10 DRAM Power Limit Performance Status Read

This service allows the PECI host to assess the performance impact of the currently active DRAM power limiting modes. The read return data contains the sum of all the time durations for which each of the DIMMs has been operating in a low power state. This information is tracked by a 32-bit counter that wraps around. The unit for time is determined as per the Package Power SKU Unit settings described in

Section 2.5.2.6.11. The DRAM performance data does not account for stalls on the

memory interface. In general, for the purposes of DRAM RAPL, the DRAM power management entity

should use PECI accesses to DRAM energy and performance status in conjunction with the power limiting feature to budget power between the various memory sub-systems in the server system.

Figure 2-20. DRAM Power Limit Performance Data

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 45 Datasheet Volume One

Page 46

2.5.2.6.11 CPU Thermal and Power Optimization Capabilities

Table 2-8 provides a summary of the processor power and thermal optimization

capabilities that can be accessed over PECI.

Note: The Index values referenced in Table 2-8 are in decimal format.

Table 2-8 also provides information on alternate inband mechanisms to access similar

or equivalent information for register reads and writes where applicable. The user should consult the appropriate Intel® 64 and IA-32 Architectures Software Developer’s

Manual (SDM) Volumes 1, 2, and 3 or Intel® Xeon® Processor E5 Product Family Datasheet Volume Two for exact details on MSR or CSR register content.

Table 2-8. RdPkgConfig() & WrPkgConfig() CPU Thermal and Power Optimization

Services Summary (Sheet 1 of 3)

Service

Package Identifier Read

Package Power SKU Unit Read

Package Power SKU Read

“Wake on PECI” Mode Bit Write / Read

Index Value

(decimal)

0x0000

0x0001 Platform ID

0x0002 PCU Device ID

0x0003 Max Thread ID

0x0004

0x0005

30 0x0000

28 0x0000

29 0x0000

Parameter

Value

(word)

0x0001 - Set

0x0000 -

Reset

RdPkgConfig()

Data (dword)

CPUID Information

CPU Microcode Update Revision

MCA Error Source Log

Time, Energy and Power Units

Package Power

SKU[31:0]

Package Power

SKU[64:32]

N/A

WrPkgConfig()

Data (dword)

N/A

“Wake on PECI”

mode bit

Description

Returns processorspecific information including CPU family , model and stepping information.

Used to ensure microcode update compatibility with processor.

Returns the Device ID information for the processor Power Control Unit.

Returns the maximum ‘Thread ID’ value supported by the processor.

Returns processor microcode and PCU firmware revision information.

Returns the MCA Error Source Log

Read units for power , energy and time used in power control registers.

Returns Thermal Design Power and minimum package power values for the processor SKU.

Returns the maximum package power value for the processor SKU and the maximum time interval for which it can be sustained.

Enables package pop-up to C2 to service PECI PCIConfig() accesses if appropriate.

Alternate Inband

MSR or CSR Access

Execute CPUID instruction to get processor signature

MSR 17h: IA32_PLATFORM_ID

CSR: DID

MSR: RESOLVED_CORES_MASK CSR: RESOLVED_CORES_MASK

MSR 8Bh: IA32_BIOS_SIGN_ID

CSR: MCA_ERR_SRC_LOG

MSR 606h: PACKAGE_POWER_SKU_UNIT

CSR: PACKAGE_POWER_SKU_UNIT

MSR 614h:

PACKAGE_POWER_SKU

CSR: PACKAGE_POWER_SKU

MSR 614h:

PACKAGE_POWER_SKU

CSR: PACKAGE_POWER_SKU

N/A

46 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 47

Table 2-8. RdPkgConfig() & WrPkgConfig() CPU Thermal and Power Optimization

Services Summary (Sheet 2 of 3)

Service

“Wake on PECI” Mode Bit Write / Read

Accumulated Run Time Read

Package Temperature Read

Per Core DTS Temperature Read

Temperature Target Read

Package Thermal Status Read / Clear

Thermal Averaging Constant Write / Read

Thermally Constrained Time Read

Current Limit Read

Accumulated Energy Status Read

Power Limit for the VCC Power Plane Write / Read

Index

Value

(decimal)

Parameter

Value

(word)

05 0x0000

31 0x0000

02 0x00FF

0x0000-

0x0007

(cores 0-7)

0x00FF -

System

Agent

16 0x0000

20 0x0000

21 0x0000

21 0x0000 N/A

32 0x0000

17 0x0000

0x0000 -

VCC

0x00FF - CPU

package

25 0x0000 N/A Power Limit Data

RdPkgConfig()

Data (dword)

“Wake on PECI”

mode bit

Total reference

time

Processor

package

Temperature

Per core DTS

maximum

temperature

Processor T

and T

Thermal Status

Averaging

Thermally

Constrained

jmax

CONTROL

Thermal

Constant

Time

Current Limit

per power plane

Accumulated

CPU energy

WrPkgConfig()

Data (dword)

N/A

Thermal

Averaging

Constant

N/A

Description

Read status of “Wake on PECI” mode bit

Returns the total run time.

Returns the

maximum processor

die temperature in

PECI format.

Read the maximum DTS temperature of

a particular core or

the System Agent

within the processor

die in relative PECI

temperature format

Returns the

maximum processor

junction

temperature and

processor T

Read the thermal

status register and

optionally clear any

log bits. The register

includes status and

log bits for TCC

assertion and Critical

CONTROL

activation,

PROCHOT_N

Temperature.

Reads the Thermal

Averaging Constant

Writes the Thermal

Averaging Constant

Read the time for

which the processor

has been operating

in a lowered power

state due to internal

TCC activation.

Reads the current

limit on the VCC

power plane

Returns the value of

the energy

consumed by just

the VCC power plane

or entire CPU

package.

Program power limit for VCC power plane

Alternate Inband

MSR or CSR Access

N/A

IA32_TIME_STAMP_COUNTER

IA32_PACKAGE_THERM_STATUS

MSR 10h:

MSR 1B1h:

MSR 19Ch: IA32_THERM_STATUS

MSR 1A2h:

TEMPERATURE_TARGET

CSR: TEMPERATURE_TARGET

IA32_PACKAGE_THERM_STATUS

MSR 1B1h:

N/A

PRIMARY_PLANE_CURRENT_

CSR:

CONFIG_CONTROL

MSR 639h: PP0_ENERGY_

STATUS

CSR: PP0_ENERGY_STATUS

MSR 611h:

PACKAGE_ENERGY_STATUS

CSR: PACKAG_ENERGY_STATUS

MSR 638h: PP0_POWER_LIMIT

CSR: PP0_POWER_LIMIT

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 47 Datasheet Volume One

Page 48

Table 2-8. RdPkgConfig() & WrPkgConfig() CPU Thermal and Power Optimization

Services Summary (Sheet 3 of 3)

Service

Power Limit for the VCC Power Plane Write / Read

Package Power Limits For Multiple Turbo Modes

Package Power Limit Performance Status Read

Efficient Performance Indicator Read

ACPI P-T Notify Write & Read

Caching Agent TOR Read

Thermal Margin Read

Index Value

(decimal)

25 0x0000

26 0x0000 N/A

27 0x0000 N/A

26 0x0000

27 0x0000

06 0x0000

33 0x0000 N/A

33 0x0000

10 0x0000

Parameter

Value

(word)

0x00FF - CPU

package

Cbo Index, TOR Index,

Bank#;

Read Mode

RdPkgConfig()

Data (dword)

Power Limit

Data

Power Limit 1

Data

Power Limit 2

Data

Accumulated

CPU throttle

time

Number of productive

processor cycles

New p-state

equivalent of P1

used in

conjunction with

package power

limiting

Caching Agent

(Cbo) Table of

Requests (TOR)

data;

Core ID &

associated valid

bit

Thermal margin

to processor

thermal profile

or load line

WrPkgConfig()

Data (dword)

N/A

Power Limit 1

Data

Power Limit 2

Data

N/A

New p-state

equivalent of P1

used in

conjunction with

package power

limiting

N/A

Description

Read power limit

data for VCC power

plane

Write power limit

data 1 in multiple

turbo mode.

Write power limit

data 2 in multiple

turbo mode.

Read power limit 1

data in multiple

turbo mode.

Read power limit 2

data in multiple

turbo mode.

Read the total time

for which the

processor package

was throttled due to

power limiting.

Read number of

productive cycles for

power budgeting

purposes.

Notify the processor

PCU of the new p-

state that is one

state below the

turbo frequency as

specified through the

last ACPI Notify

Read the processor

PCU to determine

the p-state that is

one state below the

turbo frequency as

specified through the

last ACPI Notify

Read the Cbo TOR

data for all enabled

cores in the event of

a 3-strike timeout.

Can alternatively be

used to read ‘Core ID’ data to confirm

that IERR was

caused by a core

timeout

Read margin to processor thermal load line

Alternate Inband

MSR or CSR Access

MSR 638h: PP0_POWER_LIMIT

CSR: PP0_POWER_LIMIT

MSR 610h:

PACKAGE_POWER_LIMIT

CSR: PACKAGE_POWER_LIMIT

MSR 610h:

PACKAGE_POWER_LIMIT

CSR: PACKAGE_POWER_LIMIT

MSR 610h:

PACKAGE_POWER_LIMIT

CSR: PACKAGE_POWER_LIMIT

MSR 610h:

PACKAGE_POWER_LIMIT

CSR: PACKAGE_POWER_LIMIT

PACKAGE_RAPL_PERF_STATUS

CSR:

N/A

48 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 49

2.5.2.6.12 Package Identifier Read

CPU ID Data

Model

Stepping ID

3 0

Family ID

Processor

Type

1931

RESERVED

Extended

Model

Extended

Family ID

78111228

RESERVED

1315

Platform ID Data

Processor

Flag

Reserved

231 03

PCU Device ID Data

RESERVED

031

PCU Device ID

1516

This feature enables the PECI host to uniquely identify the PECI client processor. The parameter field encodings shown in Table 2-8 allow the PECI host to access the relevant processor information as described below.

• CPUID data: This is the equivalent of data that can be accessed through the CPUID instruction execution. It contains processor type, stepping, model and family ID information as shown in Figure 2-21.

Figure 2-21. CPUID Data

• Platform ID data: The Platform ID data can be used to ensure processor microcode updates are compatible with the processor. The value of the Platform ID or Processor Flag[2:0] as shown in Figure 2-22 is typically unique to the platform type and processor stepping. Refer to the Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM) Volumes 1, 2, and 3 for more information.

Figure 2-22. Platform ID Data

• PCU Device ID: This information can be used to uniquely identify the processor power control unit (PCU) device when combined with the Vendor Identification register content and remains constant across all SKUs. Refer to the appropriate register description for the exact processor PCU Device ID value.

Figure 2-23. PCU Device ID

• Max Thread ID: The maximum Thread ID data provides the number of supported processor threads. This value is dependent on the number of cores within the processor as determined by the processor SKU and is independent of whether certain cores or corresponding threads are enabled or disabled.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 49 Datasheet Volume One

Page 50

Figure 2-24. Maximum Thread ID

Maximum Thread ID Data

Max Thread

Reserved

331

CPU microcode and PCU firmware revision

CPU code patch revision

MCA Error Source Log

Reserved

MCERRIERRCATERR

293031

Reserved

2031

Time Unit19Reserved

16 15

Energy Unit

12 8

Reserved Power Unit

13 7 4 3 0

• CPU Microcode Update Revision: Reflects the revision number for the microcode update and power control unit firmware updates on the processor sample. The revision data is a unique 32-bit identifier that reflects a combination of specific versions of the processor microcode and PCU control firmware.

Figure 2-25. Processor Microcode Revision

• Machine Check Status: Returns error information as logged by the MCA Error Source Log register. See Figure 2-26 for details. The power control unit will assert the relevant bit when the error condition represented by the bit occurs. For example, bit 29 will be set if the package asserted MCERR, bit 30 is set if the package asserted IERR and bit 31 is set if the package asserted CAT_ERR_N. The CAT_ERR_N may be used to signal the occurrence of a MCERR or IERR.

Figure 2-26. Machine Check Status

2.5.2.6.13 Package Power SKU Unit Read

This feature enables the PECI host to read the units of time, energy and power used in the processor and DRAM power control registers for calculating power and timing parameters. In Figure 2-27, the default value of the power unit field [3:0] is 0011b, energy unit [12:8] is 10000b and the time unit [19:16] is 1010b. Actual unit values are calculated as shown in Table 2-9.

Figure 2-27. Package Power SKU Unit Data

50 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 51

Table 2-9. Power Control Register Unit Calculations

Unit Field Value Calculation Default Value

Time 1s / 2

Energy 1J / 2

Power 1W / 2

TIME UNIT

ENERGY UNIT

POWER UNIT

2.5.2.6.14 Package Power SKU Read

This read allows the PECI host to access the minimum, Thermal Design Power and maximum power settings for the processor package SKU. It also returns the maximum time interval or window over which the power can be sustained. If the power limiting entity specifies a power limit value outside of the range specified through these settings, power regulation cannot be guaranteed. Since this data is 64 bits wide, PECI facilitates access to this register by allowing two requests to read the lower 32 bits and upper 32 bits separately as shown in Table 2-8. Power units for this read are determined as per the Package Power SKU Unit settings described in

Section 2.5.2.6.13.

‘Package Powe r SKU data’ is programmed by the PCU firmw are during boot time b ased on SKU dependent power-on default values set during manufacturing. The TDP package power specified through bits [14:0] in Figure 2-28 is the maximum value of the ‘Power Limit1’ field i n Section 2.5.2.6.26 while the maximum package power in bits [46:32] is the maximum value of the ‘Power Limit2’ field.

The minimum package power in bits [30:16] is applicable to both the ‘Power Limit1’ & ‘Power Limit2’ fields and corresponds to a mode when all the cores are operational and in their lowest frequency mode. Attempts to program the power limit below the minimum power value may not be effective since BIOS/OS, and not the PCU, controls disabling of cores and core activity.

1s / 210 = 976 µs

1J / 216 = 15.3 µJ

1W / 23 = 1/8 W

The ‘maximum time window’ in bits [54:48] is representative of the maximum rate at which the power control unit (PCU) can sample the package energy consumption and reactively take the necessary measures to meet the imposed power limits. Programming too large a time window runs the risk of the PCU not being able to monitor and take timely action on package energy excursions. On the other hand, programming too small a time window may not give the PCU enough time to sample energy information and enforce the limit. The minimum value of the ‘time window’ can be obtained by reading bits [21:15] of the PWR_LIMIT_MISC_INFO CSR using the PECI RdPCIConfigLocal() command.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 51 Datasheet Volume One

Page 52

Figure 2-28. Package Power SKU Data

Package Power SKU (lower bits)

Reserved

Minimum Package Power

TDP Package Power

30 015

Reserved

Package Power SKU (upper bits)

Maximum Package Power

3246

Reserved

Maximum Time

Window

4854

Reserved

5563

Sign

Bit

RESERVED

PECI Temperature

(Integer Value)

61531

PECI Temperature

(Frac tio na l V alu e)

5 0

2.5.2.6.15 “Wake on PECI” Mode Bit Write / Read

Setting the “Wake on PECI” mode bit enables successful completion of the WrPCIConfigLocal(), RdPCIConfigLocal(), WrPCIConfig() and RdPCIConfig() PECI commands by forcing a package ‘pop-up’ to the C2 state to service these commands if the processor is in a low-power state. The exact power impact of such a ‘pop-up’ is determined by the product SKU, the C-state from which the pop-up is initiated and the negotiated PECI bit rate. A ‘reset’ or ‘clear’ of this bit or simply not setting the “Wake on PECI” mode bit could result in a “timeout” response (completion code of 0x82) from the processor indicating that the resources required to service the command are in a low power state.

Alternatively, this mode bit can also be read to determine PECI behavior in package states C3 or deeper.

2.5.2.6.16 Accumulated Run Time Read

This read returns the total time for which the processor has been executing with a resolution of 1 mS per count. This is tracked by a 32-bit counter that rolls over on reaching the maximum value. This counter activates and starts counting for the first time at RESET_N de-assertion.

2.5.2.6.17 Package Temperature Read

This read returns the maximum processor die temperature in 16-bit PECI format. The upper 16 bits of the response data are reserved. The PECI temperature data returned by this read is the ‘instantaneous’ value and not the ‘average’ value as returned by the PECI GetTemp() described in Section 2.5.2.3.

Figure 2-29. Package Temperature Read Data

Datasheet Volume One

52 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Page 53

2.5.2.6.18 Per Core DTS Temperature Read

Processor Tjm a x

CONTROL

823

RESERVED

7 0

RESERVED

31 24

This feature enables the PECI host to read the maximum value of the DT S temperature for any specific core within the processor. Alternatively , this service can be used to read the System Agent temperature. Temperature is returned in the same format as the Package Temperature Read described in Section 2.5.2.6.17. Data is returned in relative PECI temperature format.

Reads to a parameter value outside the supported range will return an error as indicated by a completion code of 0x90. The supported range of parameter values can vary depending on the number of cores within the processor. The temperature data returned through this feature is the instantaneous value and not an averaged value. It is updated once every 1 mS.

2.5.2.6.19 Temperature Target Read

The Temperature Target Read allows the PECI host to access the maximum processor junction temperature (T

) in degrees Celsius. This is also the default temperature

jmax

value at which the processor thermal control circuit activates. The T from processor part to part to reflect manufacturing process variations. The Temperature Target read also returns the processor T returned in standard PECI temperature format and represents the threshold temperature used by the thermal management system for fan speed control.

Figure 2-30. Temperature Target Read

CONTROL

jmax

valu e. T

value may vary

CONTROL

2.5.2.6.20 Package Thermal Status Read / Clear

The Thermal Status Read provides information on package level thermal status. Data includes:

• Thermal Control Circuit (TCC) activation

• Bidirectional PROCHOT_N signal assertion

•Critical Temperature

Both status and sticky log bits are managed in this status word. All sticky log bits are set upon a rising edge of the associated status bit and the log bits are cleared only by Thermal Status reads or a processor reset. A read of the Thermal Status word always includes a log bit clear mask that allows the host to clear any or all of the log bits that it is interested in tracking.

A bit set to ‘0’ in the log bit clear mask will result in clearing the associated log bit. If a mask bit is set to ‘0’ and that bit is not a legal mask, a failing completion code will be returned. A bit set to ‘1’ is ignored and results in no change to any sticky log bits. For example, to clear the TCC Activation Log bit and retain all other log bits, the Thermal Status Read should send a mask of 0xFFFFFFFD.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 53 Datasheet Volume One

Page 54

Figure 2-31. Thermal Status Word

Critical Temperature Log

Critical Temperature Status

Bidirectional PROCHOT# Log

Bidirectional PROCHOT#

Status

TCC Activation Log

TCC Activation Status

6543210

Reserved

Thermal Averaging Constant

RESERVED

PECI Temperature

Averaging Constant

2.5.2.6.21 Thermal Averaging Constant Write / Read

This feature allows the PECI host to control the window over which the estimated processor PECI temperature is filtered. The host may configure this window as a power of two. For example, programming a value of 5 results in a filtering window of 25 or 32 samples. The maximum programmable value is 8 or 256 samples. Programming a value of zero would disable the PECI temperature averaging feature. The default value of the thermal averaging constant is 4 which translates to an aver aging window size of

or 16 samples. More details on the PECI temperature filtering function can be found

2 in Section 2.5.7.3.

Figure 2-32. Thermal Averaging Constant Write / Read

2.5.2.6.22 Thermally Constrained Time Read

This features allows the PECI host to access the total time for which the processor has been operating in a lowered power state due to TCC activation. The returned data includes the time required to ramp back up to the original P-state target after TCC activation expires. This timer does not include TCC activation as a result of an external assertion of PROCHOT_N. This is tracked by a 32-bit counter with a resolution of 1mS per count that rolls over or wraps around. On the processor PECI clients, the only logic that can be thermally constrained is that supplied by VCC.

2.5.2.6.23 Current Limit Read

This read returns the current limit for the processor VCC power plane in 1/8A increments. Actual current limit data is contained only in the lower 13 bits of the response data. The default return value of 0x438 corresponds to a current limit value of 135A.

54 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 55

Figure 2-33. Current Config Limit Read Data

Current Config Limit Data

RESERVED

1331

Current Limit for processor VCC

12 0

Accumulated Energy Status

Accumulated CPU Energy

2.5.2.6.24 Accumulated Energy Status Read

This service can return the value of the total energy consumed by the entire processor package or just the logic supplied by the VCC power plane as specified through the parameter field in Table 2-8. This information is tracked by a 32-bit counter that wraps around and continues counting on reaching its limit. Energy units for this read are determined as per the Package Power SKU Unit settings described in

Section 2.5.2.6.13.

While Intel requires reading the accumulated energy data at least once every 16 seconds to ensure functional correctness, a more realistic polling rate recommendation is once every 100mS for better accuracy. This feature assumes a 150W processor. In general, as the power capability decreases, so will the minimum polling rate requirement.

Figure 2-34. Accumulated Energy Read Data

2.5.2.6.25 Power Limit for the VCC Power Plane Write / Read

This feature allows the PECI host to program the power limit over a specified time or control window for the processor logic supplied by the VCC power plane. This typically includes all the cores, home agent and last level cache. The processor does not support power limiting on a per-core basis. Actual power limit values are chosen based on the external VR (voltage regulator) capabilities. The units for the Power Limit and Control Time Window are determined as per the Package Power SKU Unit settings described in

Section 2.5.2.6.13.

Since the exact VCC plane power limit value is a function of the platform VR, this feature is not enabled by default and there are no default values associated with the power limit value or the control time window. The Power Limit Enable bit in Figure 2-35 should be set to activate this feature. The Clamp Mode bit is also required to be set to allow the cores to go into power states below what the operating system originally requested. In general, this feature provides an improved mechanism for VR protection

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 55 Datasheet Volume One

Page 56

compared to the input PROCHOT_N signal assertion method. Both power limit enabling

VCC Power Plane Power Limit Data

Power Limit

Enable

1523

VCC Plane Power Limit

14 0

Clamp

Mode

Control Time

Window

1731

RESERVED

and initialization of power limit values can be done in the same command cycle. Setting a power limit for the VCC plane enables turbo modes for associated logic. External VR protection is guaranteed during boot through operation at safe voltage and frequency. All RAPL parameter values including the power limit value, control time window, clamp mode and enable bit will have to be specified correctly even if the intent is to change just one parameter value when programming over PECI.

The usefulness of the VCC power plane RAPL may be somewhat limited if the platform has a fully compliant external voltage regulator. However, platforms using lower cost voltage regulators may find this feature useful. The VCC RAPL value is generally expected to be a static value after initialization and there may not be any use cases for dynamic control of VCC plane power limit values during run time. BIOS may be ideally used to read the VR (and associated heat sink) capabilities and program the PCU with the power limit information during boot. No matter what the method is, Intel recommends exclusive use of just one entity or interface, PECI for instance, to manage VCC plane power limiting needs. If PECI is being used to manage VCC plane power limiting activities, BIOS should lock out all subsequent inband VCC plane power limiting accesses by setting bit 31 of the PP0_POWER_LIMIT MSR and CSR to ‘1’.

The same conversion formula used for DRAM Power Limiting (see Section 2.5.2.6.9) should be applied for encoding or programming the ‘Control Time Window’ in bits [23:17].

Figure 2-35. Power Limit Data for VCC Power Plane

2.5.2.6.26 Package Power Limits For Multiple Turbo Modes

This feature allows the PECI host to program two power limit values to support multiple turbo modes. The operating systems and drivers can balance the power budget using these two limits. Two separate PECI requests are available to program the lower and upper 32 bits of the power limit data shown in Figure 2-36. The units for the Power Limit and Control Time Window are determined as per the Package Power SKU Unit settings described in Section 2.5.2.6.13 while the valid range for power limit values are determined by the Package Power SKU settings described in Section 2.5.2.6.14. Setting the Clamp Mode bits is required to allow the cores to go into power states below what the operating system originally requested. The Power Limit Enable bits should be set to enable the power limiting function. Power limit values, enable and clamp mode bits can all be set in the same command cycle. All RAPL parameter values including the power limit value, control time window, clamp mode and enable bit will have to be specified correctly even if the intent is to change just one parameter value when programming over PECI.

Intel recommends exclusive use of just one entity or interface, PECI for instance, to manage all processor package power limiting and budgeting needs. If PECI is being used to manage package power limiting activities, BIOS should lock out all subsequent inband package power limiting accesses by setting bit 31 of the PACKAGE_POWER_LIMIT MSR and CSR to ‘1’. The ‘power limit 1’ is intended to limit processor power consumption to any reasonable value below TDP and defaults to TDP.

56 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 57

‘Power Limit 1’ values may be impacted by the processor heat sinks and system air

Package Power Limit 1

Power Limit

Enable #1

1523

Power Limit # 1

14 0

Clamp

Mode #1

Control Time

Window #1

1731

RESERVED

Package Power Limit 2

Power Limit

Enable #2

4755

Power Limit # 2

46 32

Clamp

Mode #2

Control Time

Window #2

4963

RESERVED

Accumulated CPU Throttle Time

flow. Processor ‘power limit 2’ can be used as appropriate to limit the current drawn by the processor to prevent any external power supply unit issues. The ‘Power Limit 2’ should always be programmed to a value (typically 20%) higher than ‘Power Limit 1’ and has no default value associated with it.

Though this feature is disabled by default and external programming is required to enable, initialize and control package power limit values and time windows, the processor package will still turbo to TDP if ‘Power Limit 1’ is not enabled or initialized. ‘Control Time Window#1’ (Power_Limit_1_Time also known as Tau) values may be programmed to be within a range of 250 mS-40 seconds. ‘Control Time Window#2’ (Power_Limit_2_Time) values should be in the range 3 mS-10 mS.

The same conversion formula used for the DRAM Power Limiting feature (see

Section 2.5.2.6.9) should be applied when programming the ‘Control Time Window’ bits

[23:17] for ‘power limit 1’ in Figure 2-36. The ‘Control Time Window’ for ‘power limit 2’ can be directly programmed into bits [55:49] in units of mS without the aid of any conversion formulas.

Figure 2-36. Package Turbo Power Limit Data

2.5.2.6.27 Package Power Limit Performance Status Read

This service allows the PECI host to assess the performance impact of the currently active power limiting modes. The read return data contains the total amount of time for which the entire processor package has been operating in a power state that is lower than what the operating system originally requested. This information is tracked by a 32-bit counter that wraps around. The unit for time is determined as per the Package Power SKU Unit settings described in Section 2.5.2.6.13.

Figure 2-37. Package Power Limit Performance Data

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 57 Datasheet Volume One

Page 58

2.5.2.6.28 Efficient Performance Indicator Read

Efficien t Pe rfo r m a nc e In dica to r D a ta

Efficient Performance Cycles

ACPI P-T Notify Data

New P1 stateReserved

731 08

The Efficient Performance Indicator (EPI) Read provides an indication of the total number of productive cycles. Specifically, these are the cycles when the processor is engaged in any activity to retire instructions and as a result, consuming energy. Any power management entity monitoring this indicator should sample it at least once every 4 seconds to enable detection of wraparounds. Refer to the processor Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM) Volumes 1, 2, and 3, for details on programming the Energy/Performance Bias (MSR_MISC_PWR_MGMT) register to set the ‘Energy Efficiency’ policy of the processor.

Figure 2-38. Efficient Performance Indicator Read

2.5.2.6.29 ACPI P-T Notify Write & Read

This feature enables the processor turbo capability when used in conjunction with the PECI package RAPL or power limit. When the BMC sets the package power limit to a value below TDP, it also determines a new corresponding turbo frequency and notifies the OS using the ‘ACPI Notify’ mechanism as supported by the _PPC or performance present capabilities object. The BMC then notifies the processor PCU using the PECI ‘ACPI P-T Notify’ service by programming a new state that is one p-state below the turbo frequency sent to the OS via the _PPC method.

When the OS requests a p-state higher than what is specified in bits [7:0] of the PECI ACPI P-T Notify data field, the CPU will treat it as request for P0 or turbo. The PCU will use the IA32_ENERGY_PERFORMANCE_BIAS register settings to determine the exact extent of turbo. Any OS p-state request that is equal to or below what is specified in the PECI ACPI P-T Notify will be granted as long as the RAPL power limit does not impose a lower p-state. However, turbo will not be enabled in this instance even if there is headroom between the processor energy consumption and the RAPL power limit.

This feature does not affect the Thermal Monitor behavior of the processor nor is it impacted by the setting of the power limit clamp mode bit.

Figure 2-39. ACPI P-T Notify Data

58 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 59

2.5.2.6.30 Caching Agent TOR Read

Read Mod e (bit 1 1) = ‘1’

RESERVED

Core ID

Read Mode (bit 11) = ‘0’

Cbo TOR Data

031

Valid

bit

Sign

Bit

RESERVED

Thermal Margin (Integer Value)

61531

The rmal Marg in

(Fra c t io nal Value)

5 0

This feature allows the PECI host to read the Caching Agent (Cbo) Table of Requests (TOR). This information is useful for debug in the event of a 3-strike timeout that results in a processor IERR assertion. The 16-bit parameter field is used to specify the Cbo index, TOR array index and bank number according to the following bit assignments.

• Bits [1:0] - Bank Number - legal values from 0 to 2

• Bits [6:2] - TOR Array Index - legal values from 0 to 19

• Bits [10:7] - Cbo Index - legal values from 0 to 7

• Bit [11] - Read Mode - should be set to ‘0’ for TOR reads

• Bits [15:12] - Reserved

Bit[11] is the Read Mode bit and should be set to ‘0’ for TOR reads. The Read Mode bit can alternatively be set to ‘1’ to read the ‘Core ID’ (with associated valid bit as shown in

Figure 2-40) that points to the first core that asserted the IERR. In this case bits [10:0]

of the parameter field are ignored. The ‘Core ID’ read may not return valid data until at least 1 mS after the IERR assertion.

Figure 2-40. Caching Agent TOR Read Data

Note: Reads to caching agents that are not enabled will return all zeroes. Refer to the debug handbook for

details on methods to interpret the crash dump results using the Cbo TOR data shown in Figure 2-40.

2.5.2.6.31 Thermal Margin Read

This service allows the PECI host to read the margin to the processor thermal profile or load line. Thermal margin data is returned in the format shown in Figure 2-41 with a sign bit, an integer part and a fractional part. A negative thermal margin value implies that the processor is operating in violation of its thermal load line and may be indicative of a need for more aggressive cooling mechanisms through a fan speed increase or other means. This PECI service will continue to return valid margin values even when the processor die temperature exceeds T

jmax

Figure 2-41. DTS Thermal Margin Read

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 59 Datasheet Volume One

Page 60

2.5.2.7 RdIAMSR()

The RdIAMSR() PECI command provides read access to Model Specific Registers (MSRs) defined in the processor’s Intel® Architecture (IA). MSR definitions may be found in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM) Volumes 1, 2, and 3. Refer to Table 2-11 for the exact listing of processor registers accessible through this command.

2.5.2.7.1 Command Format

The RdIAMSR() format is as follows:

Write Length: 0x05 Read Length: 0x09 (qword) Command: 0xb1 Description: Returns the data maintained in the processor IA MSR space as specified

by the ‘Processor ID’ and ‘MSR Address’ fields. The Read Length dictates the desired data return size. This command supports only qword responses. All command responses are prepended with a completion code that contains additional pass/fail status information. Refer to Section 2.5.5.2 for details regarding completion codes.

2.5.2.7.2 Processor ID Enumeration

The ‘Processor ID’ field that is used to address the IA MSR space refers to a specific logical processor within the CPU. The ‘Processor ID’ always refers to the same physical location in the processor silicon regardless of configuration as shown in the example in

Figure 2-42. For example, if certain logical processors are disabled by BIOS, the

Processor ID mapping will not change. The total number of Processor IDs on a CPU is product-specific.

‘Processor ID’ enumeration involves discovering the logical processors enabled within the CPU package. This can be accomplished by reading the ‘Max Thread ID’ value through the RdPkgConfig() command (Index 0, Parameter 3) described in

Section 2.5.2.6.12 and subsequently querying each of the supported processor

threads. Unavailable processor threads will return a completion code of 0x90. Alternatively, this information may be obtained from the RESOLVED_CORES_MASK

Section 2.5.2.9 or other means. Bits [7:0] and [9:8] of this register contain the ‘Core

Mask’ and ‘Thread Mask’ information respectively. The ‘Thread Mask’ applies to all the enabled cores within the processor package as indicated by the ‘Core Mask’. For the processor PECI clients, the ‘Processor ID’ may take on values in the range 0 through 15.

60 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 61

Figure 2-42. Processor ID Construction Example

T0T1T0T1T0T1T0T1T0T1T0T1T0T1T0T1

C0C1C2C3C4C5C6C7

0123456789101112131415

Processor

(0..1 5 )

Cores 0,1.2...7

Thre ad (0,1) M ask for Core4

Figure 2-43. RdIAMSR()

Note: The 2-byte MSR Address field and read data field defined in Figure 2-43 are sent in standard PECI ordering with LSB first

and MSB last.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 61 Datasheet Volume One

Page 62

2.5.2.7.3 Supported Responses

The typical client response is a passing FCS, a passing Completion Code and valid data. Under some conditions, the client’s response will indicate a failure.

Table 2-10. RdIAMSR() Response Definition

Response Meaning

Bad FCS Electrical error Abort FCS Illegal command formatting (mismatched RL/WL/Command Code) CC: 0x40 Command passed, data is valid. CC: 0x80 Response timeout. The processor was not able to generate the required response in a timely

CC: 0x81 Response timeout. The processor is not able to allocate resources for servicing this command

CC: 0x82 The processor hardware resources required to service this command are in a low power state.

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmware or associated logic error. The processor is unable to process

fashion. Retry is appropriate.

at this time. Retry is appropriate.

Retry may be appropriate after modification of PECI wake mode behavior if appropriate.

the request.

2.5.2.7.4 RdIAMSR() Capabilities

The processor PECI client allows PECI RdIAMSR() access to the registers listed in

Table 2-11. These registers pertain to the processor core and uncore error banks

(machine check banks 0 through 19). Information on the exact number of accessible banks for the processor device may be obtained by reading the IA32_MCG_CAP[7:0] MSR (0x0179). This register may be alternatively read using a RDMSR BIOS instruction. Please consult the Intel® 64 and IA-32 Architectures Software Developer’s Manual (SDM) Volumes 1, 2, and 3 for more information on the exact number of cores supported by a particular processor SKU. Any attempt to read processor MSRs that are not accessible over PECI or simply not implemented will result in a completion code of 0x90.

PECI access to these registers is expected only when in-band access mechanisms are not available.

Table 2-11. RdIAMSR() Services Summary (Sheet 1 of 2)

Process

or ID

(byte)

0x0-0xF 0x0400 IA32_MC0_CTL 0x0-0xF 0x041B IA32_MC6_MISC 0x0-0xF 0x0436 IA32_MC13_ADDR 0x0-0xF 0x0280 IA32_MC0_CTL2 0x0-0xF 0x041C IA32_MC7_CTL 0x0-0xF 0x0437 IA32_MC13_MISC 0x0-0xF 0x0401 IA32_MC0_STATUS 0x0-0xF 0x0287 IA32_MC7_CTL2 0x0-0xF 0x0438 IA32_MC14_CTL 0x0-0xF 0x0402 IA32_MC0_ADDR 0x0-0xF 0x041D IA32_MC7_STATUS 0x0-0xF 0x028E IA32_MC14_CTL2 0x0-0xF 0x0403 IA32_MC0_MISC 0x0-0xF 0x0404 IA32_MC1_CTL 0x0-0xF 0x041F IA32_MC7_MISC 0x0-0xF 0x043A IA32_MC14_ADDR 0x0-0xF 0x0281 IA32_MC1_CTL2 0x0-0xF 0x0420 IA32_MC8_CTL 0x0-0xF 0x043B IA32_MC14_MISC 0x0-0xF 0x0405 IA32_MC1_STATUS 0x0-0xF 0x0288 IA32_MC8_CTL2 0x0-0xF 0x043C IA32_MC15_CTL 0x0-0xF 0x0406 IA32_MC1_ADDR 0x0-0xF 0x0421 IA32_MC8_STATUS 0x0-0xF 0x028F IA32_MC15_CTL2 0x0-0xF 0x0407 IA32_MC1_MISC 0x0-0xF 0x0422 IA32_MC8_ADDR 0x0-0xF 0x043D IA32_MC15_STATUS 0x0-0xF 0x0408 IA32_MC2_CTL

62 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

MSR

Address

(dword)

Meaning

Process

or ID

(byte)

0x0-0xF 0x041E IA32_MC7_ADDR 0x0-0xF 0x0439 IA32_MC14_STATUS

0x0-0xF 0x0423 IA32_MC8_MISC 0x0-0xF 0x043E IA32_MC15_ADDR

MSR Address (dword)

Meaning

Proces sor ID (byte)

MSR Address (dword)

Datasheet Volume One

Meaning

Page 63

Table 2-11. RdIAMSR() Services Summary (Sheet 2 of 2)

Process

or ID

(byte)

MSR Address (dword)

Meaning

Process

or ID

(byte)

MSR Address (dword)

Meaning

Proces

sor ID (byte)

MSR

Address

(dword)

Meaning

0x0-0xF 0x0282 IA32_MC2_CTL2 0x0-0xF 0x0424 IA32_MC9_CTL 0x0-0xF 0x043F IA32_MC15_MISC 0x0-0xF 0x0409 IA32_MC2_STATUS 0x0-0xF 0x0289 IA32_MC9_CTL2 0x0-0xF 0x0440 IA32_MC16_CTL

0x0-0xF 0x040A IA32_MC2_ADDR 0x0-0xF 0x040B IA32_MC2_MISC

0x0-0xF 0x0425 IA32_MC9_STATUS 0x0-0xF 0x0290 IA32_MC16_CTL2

0x0-0xF 0x0426 IA32_MC9_ADDR 0x0-0xF 0x0441 IA32_MC16_STATUS 0x0-0xF 0x040C IA32_MC3_CTL 0x0-0xF 0x0427 IA32_MC9_MISC 0x0-0xF 0x0442 IA32_MC16_ADDR 0x0-0xF 0x0283 IA32_MC3_CTL2 0x0-0xF 0x0428 IA32_MC10_CTL 0x0-0xF 0x0443 IA32_MC16_MISC 0x0-0xF 0x040D IA32_MC3_STATUS 0x0-0xF 0x028A IA32_MC10_CTL2 0x0-0xF 0x0444 IA32_MC17_CTL 0x0-0xF 0x040E IA32_MC3_ADDR 0x0-0xF 0x0429 IA32_MC10_STATUS 0x0-0xF 0x0291 IA32_MC17_CTL2 0x0-0xF 0x040F IA32_MC3_MISC 0x0-0xF 0x042A IA32_MC10_ADDR 0x0-0xF 0x0445 IA32_MC17_STATUS 0x0-0xF 0x0410 IA32_MC4_CTL 0x0-0xF 0x042B IA32_MC10_MISC 0x0-0xF 0x0446 IA32_MC17_ADDR 0x0-0xF 0x0284 IA32_MC4_CTL2 0x0-0xF 0x042C IA32_MC11_CTL 0x0-0xF 0x0447 IA32_MC17_MISC 0x0-0xF 0x0411 IA32_MC4_STATUS 0x0-0xF 0x028B IA32_MC11_CTL2 0x0-0xF 0x0448 IA32_MC18_CTL 0x0-0xF 0x0412 IA32_MC4_ADDR 0x0-0xF 0x0413 IA32_MC4_MISC

0x0-0xF 0x042D IA32_MC11_STATUS 0x0-0xF 0x0292 IA32_MC18_CTL2

0x0-0xF 0x042E IA32_MC11_ADDR 0x0-0xF 0x0449 IA32_MC18_STATUS 0x0-0xF 0x0414 IA32_MC5_CTL 0x0-0xF 0x042F IA32_MC11_MISC 0x0-0xF 0x044A IA32_MC18_ADDR 0x0-0xF 0x0285 IA32_MC5_CTL2 0x0-0xF 0x0430 IA32_MC12_CTL 0x0-0xF 0x044B IA32_MC18_MISC 0x0-0xF 0x0415 IA32_MC5_STATUS 0x0-0xF 0x028C IA32_MC12_CTL2 0x0-0xF 0x044C IA32_MC19_CTL 0x0-0xF 0x0416 IA32_MC5_ADDR 0x0-0xF 0x0431 IA32_MC12_STATUS 0x0-0xF 0x0293 IA32_MC19_CTL2 0x0-0xF 0x0417 IA32_MC5_MISC 0x0-0xF 0x0432 IA32_MC12_ADDR 0x0-0xF 0x044D IA32_MC19_STATUS 0x0-0xF 0x0418 IA32_MC6_CTL 0x0-0xF 0x0433 IA32_MC12_MISC 0x0-0xF 0x044E IA32_MC19_ADDR 0x0-0xF 0x0286 IA32_MC6_CTL2 0x0-0xF 0x0434 IA32_MC13_CTL 0x0-0xF 0x0179 IA32_MCG_CAP 0x0-0xF 0x0419 IA32_MC6_STATUS 0x0-0xF 0x028D IA32_MC13_CTL2 0x0-0xF 0x017A IA32_MCG_STATUS 0x0-0xF 0x041A IA32_MC6_ADDR 0x0-0xF 0x0435 IA32_MC13_STATUS 0x0-0xF 0x0178 IA32_MCG_CONTAIN

Notes:

1. The IA32_MC0_MISC register details will be available upon implementation in a future processor stepping.

2. The MCi_ADDR and MCi_MISC registers for machine check banks 2 & 4 are not im plem ented on the pro cesso rs. The MCi_CT L register for machine check bank 2 is also not implemented.

3. The PECI host must determine the total number of machine check banks and the validity of the MCi_ADDR and MCi_MISC register contents prior to issuing a read to the machine check bank similar to standard machine check architecture enumeration and accesses.

4. The information presented in Tab le 2-11 is applicable to the processor only . No association b etween bank numbers and logical functions should be assumed for any other proc es sor device s ( past, p re sen t or futu re) bas ed on the infor mat ion pre sente d in

Table 2-11.

5. The processor machine check banks 4 through 19 reside in the processor uncore and hence will return the same value independent of the processor ID used to access these banks.

6. The IA32_MCG_STATUS, IA32_MCG_CONTAIN and IA32_MCG_CAP are located in the uncore and will return the same value independent of the processor ID used to access them.

7. The processor machine check banks 0 through 3 are core-specific. Since the processor ID is thread-specific and not corespecific, machine check banks 0 through 3 will return the same value for a particular core independent of the thread referenced by the processor ID.

8. PECI accesses to the machine check banks may not be possible in the event of a core hang. A warm reset of the processor may be required to read any sticky machine check banks.

9. Valid processor ID values may be obtained by using the enumeration methods described in Section 2.5.2.7.2.

10. Reads to a machine check bank within a core or thread that is disabled will return all zeroes with a completion code of 0x90.

11. For SKUs where Intel QPI is disabled or absent, reads to the corresponding machine check banks will return all zeros with a completion code of 0x40.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 63 Datasheet Volume One

Page 64

2.5.2.8 RdPCIConfig()

Reserved

2728 20 19 15 1114 12 0

FunctionDevice

Bus

The RdPCIConfig() command provides sideband read access to the PCI configuration space maintained in downstream devices external to the processor. PECI originators may conduct a device/function/register enumeration sweep of this space by issuing reads in the same manner that the BIOS would. A response of all 1’s may indicate that the device/function/register is unimplemented even with a ‘passing’ completion code. Alternatively, reads to unimplemented registers may return a completion code of 0x90 indicating an invalid request. Responses will follow normal PCI protocol.

PCI configuration addresses are constructed as shown in Figure 2-44. Under normal inband procedures, the Bus number would be used to direct a read or write to the proper device. Actual PCI bus numbers for all PCI devices including the PCH are programmable by BIOS. The bus number for PCH devices may be obtained by reading the CPUBUSNO CSR. Refer to the Intel® Xeon® Processor E5 Product Family Datasheet Volume Two document for details on this register.

Figure 2-44. PCI Configuration Address

PCI configuration reads may be issued in byte, word or dword granularities.

2.5.2.8.1 Command Format

The RdPCIConfig() format is as follows:

Write Length: 0x06 Read Length: 0x05 (dword) Command: 0x61

Description: Returns the data maintained in the PCI configuration space at the requested PCI configuration address. The Read Length dictates the desired data return size. This command supports only dword responses with a completion code on the processor PECI clients. All command responses are prepended with a completion code that includes additional pass/fail status information. Refer to Section 2.5.5.2 for details regarding completion codes.

Figure 2-45. RdPCIConfig()

Note: The 4-byte PCI configuration address and read data field defined in Figure 2-45 are sent in standard PECI ordering with

64 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

LSB first and MSB last.

Datasheet Volume One

Page 65

2.5.2.8.2 Supported Responses

20 19 15 1114 12 0

DeviceBus Function Register

The typical client response is a passing FCS, a passing Completion Code and valid data. Under some conditions, the client’s response will indicate a failure.

The PECI client response can also vary depending on the address and data. It will respond with a passing completion code if it successfully submits the request to the appropriate location and gets a response.

Table 2-12. RdPCIConfig() Response Definition

Response Meaning

Bad FCS Electrical error

Abort FCS Illegal command formatting (mismatched RL/WL/Command Code)

CC: 0x40 Command passed, data is valid. CC: 0x80 Response timeout. The processor was not able to generate the required response in a

CC: 0x81 Response timeout. The processor is not able to allocate resources for servicing this

CC: 0x82 The processor hardware resources required to service this command are in a low power

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmware or associated logic error. The processor is unable to

timely fashion. Retry is appropriate.

command at this time. Retry is appropriate.

state. Retry may be appropriate after modification of PECI wake mode behavior if appropriate.

process the request.

2.5.2.9 RdPCIConfigLocal()

The RdPCIConfigLocal() command provides sideband read access to the PCI configuration space that resides within the processor. This includes all processor IIO and uncore registers within the PCI configuration space as described in the Intel® Xeon® Processor E5 Product Family Datasheet Volume Two document.

PECI originators may conduct a device/function enumeration sweep of this space by issuing reads in the same manner that the BIOS would. A response of all 1’s may indicate that the device/function/register is unimplemented even with a ‘passing’ completion code. Alternatively, reads to unimplemented or hidden registers may return a completion code of 0x90 indicating an invalid request. It is also possible that reads to function 0 of non-existent IIO devices issued prior to BIOS POST may return all ‘0’s with a passing completion code. PECI originators can access this space even prior to BIOS enumeration of the system buses. There is no read restriction on accesses to locked registers.

PCI configuration addresses are constructed as shown in Figure 2-46. Under normal inband procedures, the Bus number would be used to direct a read or write to the proper device. PECI reads to the processor IIO devices should specify a bus number of ‘0000’ and reads to the rest of the processor uncore should specify a bus number of ‘0001’ for bits [23:20] in Figure 2-46. Any request made with a bad Bus number is ignored and the client will respond with all ‘0’s and a ‘passing’ completion code.

Figure 2-46. PCI Configuration Address for local accesses

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 65 Datasheet Volume One

Page 66

2.5.2.9.1 Command Format

01 2 3

Byte #

FCS

Completion

Code

LSB PCI Configuration Address MSB

Write Length

0x05

LSB Data (1, 2 or 4 bytes) MSB

Cmd Code

0xe1

Read Length

{0x02,0x03,0x05}

Host ID[7:1] &

Retry[0]

Client Address

FCS

Byte

Definition

The RdPCIConfigLocal() format is as follows:

Write Length: 0x05 Read Length: 0x02 (byte), 0x03 (word), 0x05 (dword) Command: 0xe1 Description: Returns the data maintained in the PCI configuration space within the

processor at the requested PCI configuration address. The Read Length dictates the desired data return size. This command supports byte, word and dword responses as well as a completion code. All command responses are prepended with a completion code that includes additional pass/fail status information. Refer to Section 2.5.5.2 for details regarding completion codes.

Figure 2-47. RdPCIConfigLocal()

Note: The 3-byte PCI configuration address and read data field defined in Figure 2-47 are sent in standard PECI ordering with

2.5.2.9.2 Supported Responses

Table 2-13. RdPCIConfigLocal() Response Definition (Sheet 1 of 2)

66 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

LSB first and MSB last.

The typical client response is a passing FCS, a passing Completion Code and valid data. Under some conditions, the client’s response will indicate a failure.

Response Meaning

Bad FCS Electrical error

Abort FCS Illegal command formatting (mismatched RL/WL/Command Code)

CC: 0x40 Command passed, data is valid. CC: 0x80 Response timeout. The processor was not able to generate the required response in a

CC: 0x81 Response timeout. The processor is not able to allocate resources for servicing this

timely fashion. Retry is appropriate.

command at this time. Retry is appropriate.

Datasheet Volume One

Page 67

Table 2-13. RdPCIConfigLocal() Response Definition (Sheet 2 of 2)

Response Meaning

CC: 0x82 The processor hardware resources required to service this command are in a low power

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmware or associated logic error. The processor is unable to

state. Retry may be appropriate after modification of PECI wake mode behavior if appropriate.

process the request.

2.5.2.10 WrPCIConfigLocal()

The WrPCIConfigLocal() command provides sideband write access to the PCI configuration space that resides within the processor. PECI originators can access this space even before BIOS enumeration of the system buses. The exact listing of supported devices and functions for writes using this command on the processor is defined in Table 2-19. The write accesses to registers that are locked will not take effect but will still return a completion code of 0x40. However , write accesses to registers that are hidden will return a completion code of 0x90.

Because a WrPCIConfigLocal() command results in an update to potentially critical registers inside the processor, it includes an Assured Write FCS (AW FCS) byte as part of the write data payload. In the event that the AW FCS mismatches with the client-calculated FCS, the client will abort the write and will always respond with a bad write FCS.

PCI Configuration addresses are constructed as shown in Figure 2-46. The write command is subject to the same address configuration rules as defined in

Section 2.5.2.9. PCI configuration writes may be issued in byte, word or dword

granularity.

2.5.2.10.1 Command Format

The WrPCIConfigLocal() format is as follows:

Write Length: 0x07 (byte), 0x08 (word), 0x0a (dword) Read Length: 0x01 Command: 0xe5 AW FCS Support: Yes Description: Writes the data sent to the requested register address. Write Length

dictates the desired write granularity. The command always returns a completion code indicating pass/fail status. Refer to Section 2.5.5.2 for details on completion codes.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 67 Datasheet Volume One

Page 68

Figure 2-48. WrPCIConfigLocal()

Byte

Definition

01 2

FCS

Completion

Code

AW FCS

Byte #

FCS

13 14 15

Write Length

{0x07, 0x08, 0x0a}

Host ID[7:1] &

Retry[0]

Read Length

0x01

Cmd Code

0xe5

Client Address

LSB PCI Configuration Address MSB

LSB Data (1, 2 or 4 bytes) MSB

Note: The 3-byte PCI configuration address and write data field defined in Figure 2-48 are sent in standard PECI ordering with

LSB first and MSB last.

2.5.2.10.2 Supported Responses

The typical client response is a passing FCS, a passing Completion Code and valid data. Under some conditions, the client’s response will indicate a failure.

Table 2-14. WrPCIConfigLocal() Response Definition

Response Meaning

Bad FCS Electrical error or AW FCS failure

Abort FCS Illegal command formatting (mismatched RL/WL/Command Code)

CC: 0x40 Command passed, data is valid. CC: 0x80 Response timeout. Th e pro cesso r was not able to g ener ate t he re quired respons e in a timely

CC: 0x81 Response timeout. The processor is not able to allocate resour ces for servicing this command

CC: 0x82 The processor hardware resources required to service this command are in a low power

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmware or associated logic error. The processor is unable to process

fashion. Retry is appropriate.

at this time. Retry is appropriate.

state. Retry may be appropriate after modification of PECI wake mode behavior if appropriate.

the request.

68 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 69

2.5.2.10.3 WrPCIConfigLocal() Capabilities

On the processor PECI clients, the PECI WrPCIConfigLocal() command provides a method for programming certain integrated memory controller and IIO functions as described in Table 2-15. Refer to the Intel® Xeon® Processor E5 Product Family Datasheet Volume Two for more details on specific register definitions. It also enables writing to processor REUT (Robust Electrical Unified Test) registers associated with the Intel QPI, PCIe* and DDR3 functions.

Table 2-15. WrPCIConfigLocal() Memory Controller and IIO Device/Function Support

Bus Device Function Offs et Range Description

0000 0-5 0-7 000-FFFh Integrated I/O (IIO) Configuration Registers 0001 15 0 104h-127h Integrated Memory Controller MemHot Registers 0001 15 0 180h-1AFh Integrated Memory Controller SMBus Registers 0001 15 1 080h-0CFh Integrated Memory Controller RAS Registers (Scrub/Spare) 0001 16 0, 1, 4, 5 104h-18Bh

1F4h-1FFh

0001 16 2, 3, 6, 7 104h-147h Integrated Memory Controller Error Registers

Integrated Memory Controller Thermal Control Registers

2.5.3 Client Management

2.5.3.1 Power-up Sequencing

The PECI client will not be available when the PWRGOOD signal is de-asserted. Any transactions on the bus during this time will be completely ignored, and the host will read the response from the client as all zeroes. PECI client initialization is completed approximately 100 µS after the PWRGOOD assertion. This is represented by the start of the PECI Client “Data Not Ready” (DNR) phase in Figure 2-49. While in this phase, the PECI client will respond normally to the Ping() and GetDIB() commands and return the highest processor die temperature of 0x0000 to the GetTemp() command. All other commands will get a ‘Response Timeout’ completion in the DNR phase as shown in

Table 2-16. All PECI services with the exception of core MSR space accesses become

available ~500 µS after RESET_N de-assertion as shown in Figure 2-49. PECI will be fully functional with all services including core accesses being available when the core comes out of reset upon completion of the RESET microcode execution.

In the event of the occurrence of a fatal or catastrophic error, all PECI services with the exception of core MSR space accesses will be available during the DNR phase to facilitate debug through configuration space accesses.

Table 2-16. PECI Client Response During Power-Up (Sheet 1 of 2)

Command

Ping() Fully functional Fully functional GetDIB() Fully functional Fully functional GetT emp() Client responds with a ‘hot’ reading or 0x0000 Fully functional RdPkgConfig() Client responds with a timeout completion

WrPkgConfig() Client responds with a timeout completion

RdIAMSR() Client responds with a timeout completion

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 69 Datasheet Volume One

code of 0x81

Response During ‘Data Not Ready’

Fully functional

Client responds with a timeout completion code of 0x81

Response During

‘Available Except Core Services’

Page 70

Table 2-16. PECI Client Response During Power-Up (Sheet 2 of 2)

PWRGOOD

RESET_N

Core executio n

idle runn i ng

Reset uCode Boot BIOS

PECI Client

Status

Data Not Ready

Available except core

services

SOCKET_ID[1:0]

XSOCKET ID Valid

In Reset

Fully Opera t io n al

In Reset

Command

RdPCIConfigLocal() Client responds with a timeout completion

WrPCIConfigLocal() Client responds with a timeout completion

RdPCIConfig() Client responds with a timeout completion

code of 0x81

Response During ‘Data Not Ready’

In the event that the processor is tri-stated using power-on-configuration controls, the PECI client will also be tri-stated. Processor tri-state controls are described in

Section 7.3, “Power-On Configuration (POC) Options”.

Figure 2-49. The Processor PECI Power-up Timeline()

Response During

‘Available Except Core Services’

Fully functional

2.5.3.2 Device Discovery

2.5.3.3 Client Addressing

70 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

The PECI client is available on all processors. The presence of a PECI enabled processor in a CPU socket can be confirmed by using the Ping() command described in

Section 2.5.2.1. Positive identification of the PECI revision number can be achieved by

issuing the GetDIB() command. The revision number acts as a reference to the PECI specification document applicable to the processor client definition. Please refer to

Section 2.5.2.2 for details on GetDIB response formatting.

The PECI client assumes a default address of 0x30. The PECI client address for the processor is configured through the settings of the SOCKET_ID[1:0] signals. Each processor socket in the system requires that the two SOCKET_ID signals be configured to a different PECI addresses. Strapping the SOCKET_ID[1:0] pins results in the client addresses shown in Table 2-17. These package strap(s) are evaluated at the assertion of PWRGOOD (as depicted in Figure 2-49). Refer to the appropriate Platform Design Guide (PDG) for recommended resistor values for establishing non-default SOCKET_ID settings.

Datasheet Volume One

Page 71

The client address may not be changed after PWRGOOD assertion, until the next power cycle on the processor. Removal of a processor from its socket or tri-stating a processor will have no impact to the remaining non-tri-stated PECI client addresses. Since each socket in the system should have a unique PECI address, the SOCKET_ID strapping is required to be unique for each socket.

Table 2-17. SOCKET ID Strapping

SOCKET_ID[1] Strap SOCKET_ID[0] Strap PECI Client Address

Ground Ground 0x30 Ground V

2.5.3.4 C-states

The processor PECI client may be fully functional in most core and package C-states.

• The Ping(), GetDIB(), GetTemp(), RdPkgConfig() and WrPkgConfig() commands have no measurable impact on CPU power in any of the core or package C-states.

• The RdIAMSR() command will complete normally unless the targeted core is in a C state that is C3 or deeper. The PECI client will respond with a completion code of 0x82 (see Table 2-22 for definition) for RdIAMSR() accesses in core C-states that are C3 or deeper.

• The RdPCIConfigLocal(), WrPCIConfigLocal(), and RdPCIConfig() commands will not impact the core C-states but may hav e a measurable impact on the package Cstate. The PECI client will successfully return data without impacting package Cstate if the resources needed to service the command are not in a low power state.

— If the resources required to service the command are in a low power state, the

PECI client will respond with a completion code of 0x82 (see Table 2-22 for definition). If this is the case, setting the “W ake on PECI” mode bit as described in Section 2.5.2.6 can cause a package ‘pop-up’ to the C2 state and enable successful completion of the command. The exact power impact of a pop-up to C2 will vary by product SKU, the C -state from which the pop-up is initiated and the negotiated PECI bit rate.

Ground 0x32

0x31

0x33

Table 2-18. Power Impact of PECI Commands vs. C-states

Command Power Impact

Ping() Not measurable GetDIB() Not measurable GetTemp() Not measurable RdPkgConfig() Not measurable WrPkgConfig() Not measurable RdIAMSR() Not measurable. PECI client will not return valid data in core C-state that is C3 or deeper RdPCIConfigLocal() May require package ‘pop-up’ to C2 state WrPCIConfigLocal() May require package ‘pop-up’ to C2 state RdPCIConfig() May require package ‘pop-up’ to C2 state

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 71 Datasheet Volume One

Page 72

2.5.3.5 S-states

The processor PECI client is always guaranteed to be operational in the S0 sleep state.

• The Ping(), GetDIB(), GetTemp(), RdPkgConfig(), WrPkgConfig(), RdPCIConfigLocal() and WrPCIConfigLocal() will be fully operational in S0 and S1. Responses in S3 or deeper states are dependent on POWERGOOD assertion status.

• The RdPCIConfig() and RdIAMSR() responses are guaranteed in S0 only. Behavior in S1 or deeper states is indeterminate.

• PECI behavior is indeterminate in the S3, S4 and S5 states and responses to PECI originator requests when the PECI client is in these states cannot be guaranteed.

2.5.3.6 Processor Reset

The processor PECI client is fully reset on all RESET_N assertions. Upon deassertion of RESET_N where power is maintained to the processor (otherwise known as a ‘warm reset’), the following are true:

• The PECI client assumes a bus Idle state.

• The Thermal Filtering Constant is retained.

• PECI SOCKET_ID is retained.

• GetTemp() reading resets to 0x0000.

• Any transaction in progress is aborted by the client (as measured by the client no longer participating in the response).

• The processor client is otherwise reset to a default configuration.

The assertion of the CPU_ONLY_RESET signal does not reset the processor PECI client. As such, it will have no impact on the basic PECI commands, namely the Ping(), GetTemp() and GetDIB(). However, it is likely that other PECI commands that utilize processor resources being reset will receive a ‘resource unavailable’ response till the reset sequence is completed.

2.5.3.7 System Service Processor (SSP) Mode Support

Sockets in SSP mode have limited PECI command support. Only the following PECI commands will be supported while in SSP mode. Other PECI commands are not guaranteed to complete in this mode.

•Ping

• RdPCIConfigLocal

• WrPCIConfigLocal (all uncore and IIO CSRs within the processor PCI configuration space will be accessible)

• RdPkgConfig (Index 0 only)

Sockets remain in SSP mode until the "Go" handshake is received. This is applicable to the following SSP modes.

2.5.3.7.1 BMC INIT Mode

The BMC INIT boot mode is used to provide a quick and efficient means to transfer responsibility for uncore configuration to a service processor like the BMC. In this mode, the socket performs a minimal amount of internal configuration and then waits for the BMC or service processor to complete the initialization.

72 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 73

2.5.3.7.2 Link Init Mode

In cases where the socket is not one Intel QPI hop away from the Firmware Agent socket, or a working link to the Firmware Agent socket cannot be resolved, the socket is placed in Link Init mode. The socket performs a minimal amount of internal configuration and waits for complete configuration by BIOS.

2.5.3.8 Processor Error Handling

Availability of PECI services may be affected by the processor PECI client error status. Server manageability requirements place a strong emphasis on continued availability of PECI services to facilitate logging and debug of the error condition.

• Most processor PECI client services are available in the event of a CAT_ERR_N assertion though they cannot be guaranteed.

• The Ping(), GetDIB(), GetT emp(), RdPkgConfig() and WrPkgConfig() commands will be serviced if the source of the CAT_E RR_N assertion is not in the processor power control unit hardware, firmware or associated register logic. Additionally, the RdPCIConfigLocal() and WrPCIConfigLocal() comm ands may also be serviced in this case.

• It is recommended that the PECI originator read Index 0/Parameter 5 using the RdPkgConfig() command to debug the CAT_ERR_N assertion.

— The PECI client will return the 0x91 completion code if the CAT_ERR_N

assertion is caused by the PCU hardware, firmware or associated logic errors. In such an event, only the Ping(), GetTemp() and GetDIB() PECI commands may be serviced. All other processor PECI services will be unavailable and further debug of the processor error status will not be possible.

— If the PECI client returns a passing completion code, the originator should use

the response data to determine the cause of the CA T_ERR_N assertion. In such an event, it is also recommended that the PECI originator determine the exact suite of available PECI client services by issuing each o f the PECI commands. The processor will issue ‘timeout’ responses for those services that may not be available.

— If the PECI client continues to return the 0x81 completion code in response to

multiple retries of the RdPkgConfig() command, no PECI services, with the exception of the Ping(), GetTemp() and GetDIB(), will be guaranteed.

• The RdIAMSR() command may be serviced during a CA T_ERR_N assertion though it cannot be guaranteed.

2.5.3.9 Originator Retry and Timeout Policy

The PECI originator may need to retry a command if the processor PECI client responds with a ‘response timeout’ completion code or a bad Read FCS. In each instance, the processor PECI client may have started the operation but n ot completed it yet. When the 'retry' bit is set, the PECI client will ignore a new request if it exactly matches a previous valid request.

The processor PECI client will not clear the semaphore that was acquired to service the request until the originator sends the ‘retry’ request in a timely fashion to successfully retrieve the response data. In the absence of any automatic timeouts, this could tie up shared resources and result in artificial bandwidth conflicts.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 73 Datasheet Volume One

Page 74

2.5.3.10 Enumerating PECI Client Capabilities

The PECI host originator should be designed to support all optional but desirable features from all processors of interest. Each feature has a discovery method and response code that indicates availability on the destination PECI client.

The first step in the enumeration process would be for the PECI host to confirm the Revision Number through the use of the GetDIB() command. The revision number returned by the PECI client processor always maps to the revision number of the PECI specification that it is designed to. The Minor Revision Number as described in Table 2-2 may be used to identify the subset of PECI commands that the processor in question supports for any major PECI revision.

The next step in the enumeration process is to utilize the desired command suite in a real execution context. If the Write FCS response is an Abort FCS or if the data returned includes an “Unknown/Invalid/Illegal Re quest” completion code (0x90), then the command is unsupported.

Enumerating known commands without real, execution context data, or attempting undefined commands, is dangerous because a write command could result in unexpected behavior if the data is not properly formatted. Methods for enumerating write commands using carefully constructed and innocuous data are possible, but are not guaranteed by the PECI client definition.

This enumeration procedure is not robust enough to detect differences in bit definitions or data interpretation in the message payload or client response. Instead, it is only designed to enumerate discrete features.

2.5.4 Multi-Domain Commands

The processor does not support multiple domains, but it is possible that future products will, and the following tables are included as a reference for domain-specific definitions.

Table 2-19. Domain ID Definition

Domain ID Domain Number

0b01 0

0b10 1

Table 2-20. Multi-Domain Command Code Reference

Command Name

GetTemp() 0x01 0x02

RdPkgConfig()

WrPkgConfig()

RdIAMSR()

RdPCIConfig() RdPCIConfigLocal() WrPCIConfigLocal()

Domain 0

Code

0xa1 0xa2 0xa5 0xa6 0xb1 0xb2 0x61 0x62 0xe1 0xe2 0xe5 0xe6

Domain 1

Code

74 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 75

2.5.5 Client Responses

2.5.5.1 Abort FCS

The Client responds with an Abort FCS under the following conditions:

• The decoded command is not understood or not supported on this processor (this includes good command codes with bad Read Length or Write Length bytes).

• Assured Write FCS (AW FCS) failure. Under most circumstances, an Assured Write failure will appear as a bad FCS. However, when an originator issues a poorly formatted command with a miscalculated AW FCS, the client will intentionally abort the FCS in order to guarantee originator notification.

2.5.5.2 Completion Codes

Some PECI commands respond with a completion code byte. These codes are designed to communicate the pass/fail status of the command and may also provide more detailed information regarding the class of pass or fail. For all commands listed in

Section 2.5.2 that support completion codes, the definition in the following table

applies. Throughout this document, a completion code reference may be abbreviated with ‘CC’.

An originator that is decoding these commands can apply a simple mask as shown in

Table 2-21 to determine a pass or fail. Bit 7 is always set on a command that did not

complete successfully and is cleared on a passing command.

Table 2-21. Completion Code Pass/Fail Mask

0xxx xxxxb Command passed 1xxx xxxxb Command failed

Table 2-22. Device Specific Completion Code (CC) Definition

Completion

Code

0x40 Command Passed

CC: 0x80 Resp onse timeout. The proces sor was not able to generate the required response in a timely

CC: 0x81 Response timeout. The processor was not able to allocate resources for servicing this

CC: 0x82 The processor hardware resources required to service this command are in a low power

CC: 0x83-8F Reserved

CC: 0x90 Unknown/Invalid/Illegal Request CC: 0x91 PECI control hardware, firmw are or associated logic error. The processor is unable to process

CC: 0x92-9F Reserved

fashion. Retry is appropriate.

command. Retry is appropriate.

state. Retry may be appropriate after modification of PECI wake mode behavior if appropriate.

the request.

Description

Note: The codes explicitly defined in Table 2-22 may be useful in PECI originator response

algorithms. Reserved or undefined codes may also be generated by a PECI client device, and the originating agent must be capable of tolerating any code. The Pass/Fail mask defined in Table 2-21 applies to all codes, and general response policies may be based on this information. Refer to Section 2.5.6 for originator response policies and recommendations.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 75 Datasheet Volume One

Page 76

2.5.6 Originator Responses

The simplest policy that an originator may employ in response to receipt of a failing completion code is to retry the request. However, certain completion codes or FCS responses are indicative of an error in command encoding and a retry will not result in a different response from the client. Furthermore, the message originator must have a response policy in the event of successive failure responses. Refer to Table 2-22 for originator response guidelines.

Refer to the definition of each command in Section 2.5.2 for a specific definition of possible command codes or FCS responses for a given command. The following response policy definition is generic, and more advanced response policies may be employed at the discretion of the originator developer.

Table 2-23. Originator Response Guidelines

Response After 1 Attempt After 3 Attempts

Bad FCS Retry Fail with PECI client device error.

Abort FCS Retry Fail with PECI client device error if command was not illegal or

CC: 0x8x Retry The PECI client has failed in its attempts to generate a response.

CC: 0x9x Abandon any further

None (all 0’s) Force bus idle (drive

CC: 0x4x Pass N/A

Good FCS Pass N/A

attempts and notify

application layer

low) for 1 mS and retry

malformed.

Notify application layer. N/A

Fail with PECI client device error. Client may not be alive or may be otherwise unresponsive (for example, it could be in RESET).

2.5.7 DTS Temperature Data

2.5.7.1 Format

The temperature is formatted in a 16-bit, 2’s complement value representing a number of 1/64 degrees centigrade. This format allows temperatures in a range of ±512° C to be reported to approximately a 0.016° C resolution.

Figure 2-50. Temperature Sensor Data Format

MSB Upper nibble

S x x x x x x x x x x x x x x x

Sign Integer Value (0-511) Fractional Value (~0.016)

2.5.7.2 Interpretation

The resolution of the processor’s Digital Thermal Sensor (DTS) is approximately 1°C, which can be confirmed by a RDMSR from the IA32_THERM_STATUS MSR where it is architecturally defined. The MSR read will return only bits [13:6] of the PECI temperature sensor data defined in Figure 2-50. PECI temperatures are sent through a configurable low-pass filter prior to delivery in the GetTemp() response data. The output of this filter produces temperatures at the full 1/64°C resolution even though the DTS itself is not this accurate.

MSB Lower nibble

LSB Upper nibble

LSB Lower nibble

76 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 77

Temperature readings from the processor are always negative in a 2’s complement format, and imply an offset from the processor T processor T

is 100°C, a PECI thermal reading of -10 implies that the processor is

jmax

running at approximately 10°C below T not reliable at temperatures above T range and hence, PECI temperature readings are never positive.

The changes in PECI data counts are approximately linear in relation to changes in temperature in degrees centigrade. A change of ‘1’ in the PECI count represents roughly a temperature change of 1 degree centigrade. This linearity is approximate and cannot be guaranteed over the entire range of PECI temperatures, especially as the offset from the maximum PECI temperature (zero) increases.

2.5.7.3 Temperature Filtering

The processor digital thermal sensor (DTS) provides an improved capability to monitor device hot spots, which inherently leads to more varying temperature readings over short time intervals. Coupled with the fact that typical fan speed controllers may only read temperatures at 4Hz, it is necessary for the thermal readings to reflect thermal trends and not instantaneous readings. Therefore, PECI supports a configurable lowpass temperature filtering function that is expressed by the equation:

(PECI = 0). For example, if the

jmax

or at 90°C. PECI temperature readings are

jmax

since the processor is outside its operating

jmax

TN = (1-α) * T

where T respectively,

‘α’ = 1/2

and T

, where ‘X’ is the ‘Thermal Averaging Constant’ that is programmable as

+ α * T

N-1

are the current and previous averaged PECI temperature values

N-1

SAMPLE

is the current PECI temperature sample value and the variable

SAMPLE

described in Section 2.5.2.6.21.

2.5.7.4 Reserved Values

Several values well out of the operational range are reserved to signal temperature sensor errors. These are summarized in Table 2-24.

Table 2-24. Error Codes and Descriptions

Error Code Description

0x8000 General Sensor Error (GSE) 0x8001 Reserved 0x8002 Sensor is operational, but has detected a temperature below its operational range

0x8003-0x81ff Reserved

(underflow)

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 77 Datasheet Volume One

Page 78

78 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 79

Technologies

3 Technologies

3.1 Intel® Virtualization Technology (Intel® VT)

Intel® Virtualization Technology (Intel® VT) makes a single system appear as multiple independent systems to software. This allows multiple, independent operating systems to run simultaneously on a single system. Intel VT comprises technology components to support virtualization of platforms based on Intel architecture microprocessors and chipsets.

• Intel® Virtualization Technology (Intel® VT) for Intel® 64 and IA-32 Intel® Architecture (Intel® VT-x) adds hardware support in the processor to improve the virtualization performance and robustness. Intel VT-x specifications and functional descriptions are included in the

Software Developer’s Manual, Volume 3B

products/processor/manuals/index.htm

• Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d) adds processor and uncore implementations to support and improve I/O virtualization performance and robustness. The Intel VT-d spec and other Intel VT documents can be referenced at http://www.intel.com/technology/

virtualization/index.htm.

Intel® 64 and IA-32 Architectures

and is available at http://www.intel.com/

3.1.1 Intel VT-x Objectives

Intel VT-x provides hardware acceleration for virtualization of IA platforms. Virtual Machine Monitor (VMM) can use Intel VT-x features to provide improved reliable virtualized platform. By using Intel VT-x, a VMM is:

• Robust: VMMs no longer need to use para-virtualization or binary translation. This means that they will be able to run off-the-shelf OS’s and applications without any special steps.

• Enhanced: Intel VT enables VMMs to run 64-bit guest operating systems on IA x86 processors.

• More reliable: Due to the hardware support, VMMs can now be smaller, less complex, and more efficient. This improves reliability and availability and reduces the potential for software conflicts.

• More secure: The use of hardware transitions in the VMM strengthens the isolation of VMs and further prevents corruption of one VM from affecting others on the same system.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 79 Datasheet Volume One

Page 80

3.1.2 Intel VT-x Features

The processor core supports the following Intel VT-x features:

• Extended Page Tables (EPT) — hardware assisted page table virtualization

— eliminates VM exits from guest OS to the VMM for shadow page-table

maintenance

• Virtual Processor IDs (VPID) — Ability to assign a VM ID to tag processor core hardware structures (for

example, TLBs)

— This avoids flushes on VM transitions to give a lower-cost VM transition time

and an overall reduction in virtualization overhead.

• Guest Preemption Timer — Mechanism for a VMM to preempt the execution of a guest OS after an amount

of time specified by the VMM. The VMM sets a timer value before entering a guest

— The feature aids VMM developers in flexibility and Quality of Service (QoS)

guarantees

• Descriptor-Table Exiting — Descriptor-table exiting allows a VMM to protect a guest OS from internal

(malicious software based) attack by preventing relocation of key system data structures like IDT (interrupt descriptor table), GDT (global descriptor table), LDT (local descriptor table), and TSS (task segment selector).

— A VMM using this feature can intercept (by a VM exit) attempts to relocate

these data structures and prevent them from being tampered by malicious software.

• Pause Loop Exiting (PLE) — PLE aims to improve virtualization performance and enhance the scaling of

virtual machines with multiple virtual processors

— PLE attempts to detect lock-holder preemption in a VM and helps the VMM to

make better scheduling decisions

Technologies

3.1.3 Intel VT-d Objectives

The key Intel VT-d objectives are domain-based isolation and hardware-based virtualization. A domain can be abstractly defined as an isolated environment in a platform to which a subset of host physical memory is allocated. Virtualization allows for the creation of one or more partitions on a single system. This could be multiple partitions in the same operating system, or there can be multiple operating system instances running on the same system – offering benefits such as system consolidation, legacy migration, activity partitioning or security.

3.1.3.1 Intel VT-d Features Supported

The processor supports the following Intel VT-d features:

• Root entry, context entry, and default context

• Support for 4-K page sizes only

• Support for register-based fault recording only (for single entry only) and support

for MSI interrupts for faults

80 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 81

Technologies

— Support for fault collapsing based on Requester ID

• Support for both leaf and non-leaf caching

• Support for boot protection of default page table — Support for non-caching of invalid page table entries

• Support for hardware based flushing of translated but pending writes and pending

reads upon IOTLB invalidation.

• Support for page-selective IOTLB invalidation.

• Support for ARI (Alternative Requester ID - a PCI SIG ECR for increasing the

function number count in a PCIe* device) to support IOV devices.

• Improved invalidation architecture

• End point caching support (ATS)

• Interrupt remapping

3.1.4 Intel Virtualization Technology Processor Extensions

The processor supports the following Intel VT Processor Extensions features:

• Large Intel VT-d Pages — Adds 2 MB and 1 GB page sizes to Intel VT-d implementations — Matches current support for Extended Page Tables (EPT) — Ability to share CPU's EPT page-table (with super-pages) with Intel VT-d — Benefits:

• Less memory foot-print for I/O page-tables when using super-pages

• Potential for improved performance - Due to shorter page-walks, allows hardware optimization for IOTLB

• Transition latency reductions expected to improve virtualization performance without the need for VMM enabling. This reduces the VMM overheads further and increase virtualization performance.

3.2 Security Technologies

3.2.1 Intel® Trusted Execution Technology

Intel® Trusted Execution Technology (Intel® TXT) defines platform-level enhancements that provide the building blocks for creating trusted platforms.

The Intel TXT platform helps to provide the authenticity of the controlling environment such that those wishing to rely on the platform can make an appropriate trust decision. The Intel TXT platform determines the identity of the controlling environment by accurately measuring and verifying the controlling software.

Another aspect of the trust decision is the ability of the platform to resist attempts to change the controlling environment. The Intel TXT platform will resist attempts by software processes to change the controlling environment or bypass the bounds set by the controlling environment.

Intel TXT is a set of extensions designed to provide a measured and controlled launch of system software that will then establish a protected environment for itself and any additional software that it may execute.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 81 Datasheet Volume One

Page 82

Technologies

These extensions enhance two areas:

• The launching of the Measured Launched Environment (MLE).

• The protection of the MLE from potential corruption.

The enhanced platform provides these launch and control interfaces using Safer Mode Extensions (SMX).

The SMX interface includes the following functions:

• Measured/Verified launch of the MLE.

• Mechanisms to ensure the above measurement is protected and stored in a secure location.

• Protection mechanisms that allow the MLE to control attempts to modify itself.

For more information refer to the

Development Guide.

http://www.intel.com/technology/security/

For more information on Intel Trusted Execution Technology, see

Intel® Trusted Execution Technology Software

3.2.2 Intel Trusted Execution Technology – Server Extensions

• Software binary compatible with Intel Trusted Execution Technology Server Extensions

• Provides measurement of runtime firmware, including SMM

• Enables run-time firmware in trusted session: BIOS and SSP

• Covers support for existing and expected future Server RAS features

• Only requires portions of BIOS to be trusted, for example, Option ROMs need not be trusted

• Supports S3 State without teardown: Since BIOS is part of the trust chain

3.2.3 Intel® Advanced Encryption Standard Instructions (Intel® AES-NI)

These instructions enable fast and secure data encryption and decryption, using the Intel® AES New Instructions (Intel® AES-NI), which is defined by FIPS Publication number 197. Since Intel AES-NI is the dominant block cipher, and it is deployed in various protocols, the new instructions will be valuable for a w ide r ange of applications.

The architecture consists of six instructions that offer full hardware support for Intel AES-NI. Four instructions support the Intel AES-NI encryption and decryption, and the other two instructions support the Intel AES-NI key expansion. Together, they offer a significant increase in performance compared to pure software implementations.

The Intel AES-NI instructions have the flexibility to support all three standard Intel AES-NI key lengths, all standard modes of operation, and even some nonstandard or future variants.

Beyond improving performance, the Intel AES-NI instructions provide important security benefits. Since the instructions run in data-independent time and do not use lookup tables, they help in eliminating the major timing and cache-based attacks that threaten table-based software implementations of Intel AES-NI. In addition, these instructions make AES simple to implement, with reduced code size. This helps reducing the risk of inadvertent introduction of security flaws, such as difficult-todetect side channel leaks.

82 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 83

Technologies

3.2.4 Execute Disable Bit

Intel's Execute Disable Bit functionality can help prevent certain classes of malicious buffer overflow attacks when combined with a supporting operating system.

• Allows the processor to classify areas in memory by where application code can execute and where it cannot.

• When a malicious worm attempts to insert code in the buffer, the processor disables code execution, preventing damage and worm propagation.

3.3 Intel® Hyper-Threading Technology

The processor supports Intel® Hyper-Threading Technology (Intel® HT Technology), which allows an execution core to function as two logical processors. While some execution resources such as caches, execution units, and buses are shared, each logical processor has its own architectural state with its own set of general-purpose registers and control registers. This feature must be enabled via the BIOS and requires operating system support. For more information on Intel Hyper-Threading Technology, see http://www.intel.com/products/ht/hyperthreading_more.htm.

3.4 Intel® Turbo Boost Technology

Intel® Turbo Boost Technology is a feature that allows the processor to opportunistically and automatically run faster than its rated operating frequency if it is operating below power, temperature, and current limits. The result is increased performance in multi-threaded and single threaded workloads. It should be enabled in the BIOS for the processor to operate with maximum performance.

3.4.1 Intel® Turbo Boost Operating Frequency

The processor’s rated frequency assumes that all execution cores are running an application at the thermal design power (TDP). However, under typical operation, not all cores are active. Therefore most applications are consuming less than the TDP at the rated frequency . To take advantage of the available TDP headroom, the active cores can increase their operating frequency.

To determine the highest performance frequency amongst active cores, the processor takes the following into consideration:

• The number of cores operating in the C0 state.

• The estimated current consumption.

• The estimated power consumption.

• The die temperature.

Any of these factors can affect the maximum frequency for a given workload. If the power, current, or thermal limit is reached, the processor will automatically reduce the frequency to stay with its TDP limit.

Note: Intel T urbo Boost Technology is only active if the operating system is requesting the P0

state. For more information on P-states and C-states refer to Section 4, “Power

Management”.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 83 Datasheet Volume One

Page 84

3.5 Enhanced Intel SpeedStep® Technology

The processor supports Enhanced Intel SpeedStep® T echnology as an advanced means of enabling very high performance while also meeting the power-conservation needs of the platform.

Enhanced Intel SpeedStep Technology builds upon that architecture using design strategies that include the following:

• Separation between Voltage and Frequency Changes. By stepping voltage up and down in small increments separately from frequency changes, the processor is able to reduce periods of system unavailability (which occur during frequency change). Thus, the system is able to transition between voltage and frequency states more often, providing improved power/performance balance.

• Clock Partitioning and Recovery. The bus clock continues running during state transition, even when the core clock and Phase-Locked Loop are stopped, which allows logic to remain active. The core clock is also able to restart more quickly under Enhanced Intel SpeedStep Technology.

For additional information on Enhanced Intel SpeedStep Technology see Section 4.2.1.

3.6 Intel® Intelligent Power Technology

Technologies

Intel® Intelligent Power Technology conserves power while delivering advanced powermanagement capabilities at the rack, group, and data center level. Providing the highest system-level performance per watt with “Automated Low Power States” and “Integrated Power Gates”. Improvements to this processor generation are:

• Intel Network Power Management Technology

• Intel Power Tuning Technology

For more information on Intel Intelligent Power Technology, see this link http://

www.intel.com/technology/intelligentpower/.

3.7 Intel® Advanced Vector Extensions (Intel® AVX)

Intel® Advanced Vector Extensions (Intel® AVX) is a new 256-bit vector SIMD extension of Intel Architecture. The introduction of Intel AVX starts with the 2nd Generation Intel(r) Core(TM) Processor Family. Intel AVX accelerates the trend of parallel computation in general purpose applications like image, video, and audio processing, engineering applications such as 3D modeling and analysis, scientific simulation, and financial analysts.

Intel AVX is a comprehensive ISA extension of the Intel® 64 Architecture. The main elements of Intel AVX are:

• Support for wider vector data (up to 256-bit) for floating-point computation.

• Efficient instruction encoding scheme that supports 3 operand syntax and headroom for future extensions.

• Flexibility in programming environment, ranging from branch handling to relaxed memory alignment requirements.

• New data manipulation and arithmetic compute primitives, including broadcast, permute, fused-multiply-add, and so forth.

84 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 85

Technologies

The key advantages of Intel AVX are:

• Performance - Intel AVX can accelerate application performance via data parallelism and scalable hardware infrastructure across existing and new application domains:

— 256-bit vector data sets can be processed up to twice the throughput of 128-bit

data sets.

— Application performance can scale up with number of hardware threads and

number of cores.

— Application domain can scale out with advanced platform interconnect fabrics,

such as Intel QPI.

• Power Efficiency - Intel AVX is extremely power efficient. Incremental power is insignificant when the instructions are unused or scarcely used. Combined with the high performance that it can deliver, applications that lend themselves heavily to using Intel AVX can be much more energy efficient and realize a higher performance-per-watt.

• Extensibility - Intel AVX has built-in extensibility for the future v ector extensions:

— OS context management for vector-widths beyond 256 bits is streamlined. — Efficient instruction encoding allows unlimited functional enhancements:

• Vector width support beyond 256 bits

• 256-bit Vector Integer processing

• Additional computational and/or data manipulation primitives.

• Compatibility - Intel AVX is backward compatible with previous ISA extensions including Intel® SSE4:

— Existing Intel SSE applications/library can:

• Run unmodified and benefit from processor enhancements

• Recompile existing Intel SSE intrinsic using compilers that generate Intel AVX code

• Inter-operate with library ported to Intel AVX

— Applications compiled with Intel AVX can inter-operate with existing Intel SSE

libraries.

3.8 Intel® Dynamic Power Technology (Intel® DPT)

Intel® Dynamic Power Technology (Intel® DPT) (Memory Power Management) is a platform feature with the ability to transition memory components into various low power states based on workload requirements. The Intel® Xeon® processor E5-1600/ E5-2600/E5-4600 product families platform supports Dynamic CKE (hardware assisted) and Memory Self Refresh (software assisted). For further details refer to the

Specifications for Memory Power Management

document.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 85 Datasheet Volume One

ACPI

Page 86

Technologies

86 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 87

Power Management

4 Power Management

This chapter provides information on the following power management topics:

•ACPI States

•System States

• Processor Core/Package States

• Integrated Memory Controller (IMC) and System Memory States

• Direct Media Interface Gen 2 (DMI2)/PCI Express* Link States

• Intel QuickPath Interconnect States

4.1 ACPI States Supported

The ACPI states supported by the processor are described in this section.

4.1.1 System States

Table 4-1. System States

State Description

G0/S0 Full On G1/S3-Cold Suspend-to-RAM (STR). Context saved to memory G1/S4 Suspend-to-Disk (STD). All power lost (except wakeup on PCH). G2/S5 Soft off. All power lost (except wakeup on PCH). Total reboot. G3 Mechanical off. All power removed from system.

4.1.2 Processor Package and Core States

Table 4-2 lists the package C-state support as: 1) the shallowest core C-state that

allows entry into the package C-state, 2) the additional factors that will restrict the state from going any deeper, and 3) the actions taken with respect to the Ring Vcc, PLL state and LLC.

Table 4-3 lists the processor core C-states support.

Table 4-2. Package C-State Support (Sheet 1 of 2)

Package C-

State

PC0 - Active CC0 N/A No No 2

PC2 Snoopable Idle

Core

States

CC3-CC7

Limiting Factors

• PCIe/PCH and Remote Socket Snoops

• PCIe/PCH and Remote Socket Accesses

• Interrupt response time requirement

• DMI Sidebands

• Configuration Constraints

Retention and

PLL-Off

VccMin Freq = MinFreq PLL = ON

LLC Fully

Flushed

No 2

Notes

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 87 Datasheet Volume One

Page 88

Table 4-2. Package C-State Support (Sheet 2 of 2)

Power Management

Package C-

State

PC3 - Light Retention

PC6 - Deeper Retention

Notes:

1. Package C7 is not supported.

2. All package states are defined to be "E" states - such that they always exit back into the LFM point upon execution resume

3. The mapping of actions for PC3, and PC6 are suggestions - microcode will dynamically determine which actions should be taken based on the desired exit latency parameters.

4. CC3/CC6 will all use a voltage below the VccMin operational point; The exact voltage selected will be a function of the snoop and interrupt respons e time re quirements made by the devices (PCIe* and DMI) and the operating system.

Core

States

at least one Core in C3

CC6-CC7

Table 4-3. Core C-State Support

Core C-State Global Clock PLL L1/L2 Cache Core VCC Context

CC0 Running On Coherent Active Maintained CC1 Stopped On Coherent Active Maintained CC1E Stopped On Coherent Request LFM Maintained CC3 Stopped On Flushed to LLC Request Retention Maintained CC6 Stopped Off Flushed to LLC Power Gate Flushed to LLC CC7 Stopped Off Flushed to LLC Power Gate Flushed to LLC

Limiting Factors

•Core C-state

• Snoop Response Time

• Interrupt Response Time

• Non Snoop Response Time

•LLC ways open

• Snoop Response Time

• Non Snoop Response Time

• Interrupt Response Time

Retention and

PLL-Off

Vcc = retention PLL = OFF

LLC Fully

Flushed

No 2,3,4

Notes

4.1.3 Integrated Memory Controller States

Table 4-4. System Memory Power States (Sheet 1 of 2)

State Description

Power Up/Normal Operation CKE asserted. Active Mode, highest power consumption. CKE Power Down Opportunistic, per rank control after idle time:

• Active Power Down (APD) (default mode) — CKE de-asserted. Power savings in this mode, relative to active idle

state is about 55% of the memory power. Exiting this mode takes 3

• Pre-charge Power Down Fast Exit (PPDF)

• Pre-charge Power Down Slow Exit (PPDS)

• Register CKE Power Down:

– 5 DCLK cycles.

— CKE de-asserted. DLL - On. Als o known as Fast CKE. Power savings in

this mode, relative to active idle state is about 60% of the memory power. Exiting this mode takes 3 – 5 DCLK cycles.

— CKE de-asserted. DLL -Off . Also known as Slo w CKE. Power sa vings in

this mode, relative to active idle state is about 87% of the memory power. Exiting this mode takes 3 – 5 DCLK cycles until the first command is allowed and 16 cycles until first data is allowed.

— IBT-ON mode: Both CKE’s are de-asserted, the Input Buffer

Terminators (IBTs) are left “on”.

— IBT-OFF mode: Both CKE’s are de-asserted, the Input Buffer

Terminators (IBTs) are turned “off”.

88 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 89

Power Management

Table 4-4. System Memory Power States (Sheet 2 of 2)

State Description

Self-Refresh CKE de-asserted. In this mode, no transactions are executed and the system

memory consumes the minimum possible power. Self refresh modes apply to all memory channels for the processor.

• IO-MDLL Off: Option that sets the IO master DLL off when self refresh occurs.

• PLL Off: Option that sets the PLL off when self refresh occurs.

In addition, the register component found on registered DIMMs (RDIMMs) is complemented with the following power down states:

— Clock Stopped Power Down with IBT-On — Clock Stopped Power Down with IBT-Off

4.1.4 DMI2/PCI Express Link States

Table 4-5. DMI2/PCI Express* Link States

State Description

L0 Full on – Active transfer state. L1 Lowest Active State Power Management (ASPM) - Longer exit latency.

Note: L1 is only supported when the DMI2/PCI Express* port is operating as a PCI Express* port.

4.1.5 Intel QuickPath Interconnect States

Table 4-6. Intel QPI States

State Description

L0 Link on. This is the power on active working state, L0p A lower power state from L0 that reduces the link from full width to half width L1 A low power state with longer latency and lower power than L0s and is

activated in conjunction with package C-states below C0.

4.1.6 G, S, and C State Combinations

Table 4-7. G, S and C State Combinations

Global (G)

State

G0 S0 C0 Full On On Full On G0 S0 C1/C1E Auto-Halt On Auto-Halt G0 S0 C3 Deep Sleep On Deep Sleep G0 S0 C6/C7 Deep Power

G1 S3 Power off Off, except RTC Suspend to RAM G1 S4 Power off Off, except RTC Suspend to Disk G2 S5 Power off Off, except RTC Soft Off G3 N/A Power off Power off Hard off

Sleep

(S) State

Processor

Core

Processor

State

Down

On Deep Power Down

System

Clocks

Description

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 89 Datasheet Volume One

Page 90

Power Management

4.2 P rocessor Core/Package Power Management

While executing code, Enhanced Intel SpeedStep Technology optimizes the processor’s frequency and core voltage based on workload. Each frequency and voltage operating point is defined by ACPI as a P-state. When the processor is not executing code, it is idle. A low-power idle state is defined by ACPI as a C-state. In general, lower power C-states have longer entry and exit latencies.

4.2.1 Enhanced Intel SpeedStep® Technology

The following are the key features of Enhanced Intel SpeedStep Technology:

• Multiple frequency and voltage points for optimal performance and power efficiency. These operating points are known as P-states.

• Frequency selection is software controlled by writing to processor MSRs. The voltage is optimized based on temperature, leakage, power delivery loadline and dynamic capacitance.

— If the target frequency is higher than the current frequency, V

to an optimized voltage. This voltage is signaled by the SVID Bus to the voltage regulator. Once the voltage is established, the PLL locks on to the target frequency.

— If the target frequency is lower than the current frequency, the PLL locks to the

target frequency, then transitions to a lower voltage by signaling the target voltage on the SVID Bus.

— All active processor cores share the same frequency and voltage. In a multi-

core processor, the highest frequency P-state requested amongst all active cores is selected.

— Software-requested transitions are accepted at any time. The processor has a

new capability from the previous processor generation, it can preempt the previous transition and complete the new request without waiting for this request to complete.

• The processor controls voltage ramp rates internally to ensure glitch-free transitions.

• Because there is low transition latency between P-states, a significant number of transitions per second are possible.

is ramped up

4.2.2 Low-Power Idle States

When the processor is idle, low-power idle states (C-states) are used to save power. More power savings actions are taken for numerically higher C-states. Howev er, higher C-states have longer exit and entry latencies. Resolution of C-states occurs at the thread, processor core, and processor package level. Thread level C-states are available if Hyper-Threading Technology is enabled. Entry and exit of the C-States at the thread and core level are shown in Figure 4-2.

90 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 91

Power Management

Processor P ackage State

Core N State

Thread 1Thread 0

Core 0 State

Thread 1T hread 0

C1 C1E C7C6C3

MWAIT(C1), HLT

MWAIT(C7),

P_LVL4 I/O Read

MWAIT(C6),

P_LVL3 I/O Read

MWAIT(C3),

P_LVL2 I/O Read

MWAIT(C1), HLT

(C1E Enabled)

Figure 4-1. Idle Power Management Breakdown of the Processor Cores

Figure 4-2. Thread and Core C-State Entry and Exit

While individual threads can request low power C-states, power saving actions only take place once the core C-state is resolved. Core C-states are automatically resolved by the processor. For thread and core C-states, a transition to and from C0 is required before entering any other C-state.

4.2.3 Requesting Low-Power Idle States

The core C-state will be C1E if all actives cores have also resolved a core C1 state or higher.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 91 Datasheet Volume One

The primary software interfaces for requesting low power idle states are through the MWAIT instruction with sub-state hints and the HLT instruction (for C1 and C1E). However, software may make C-state requests using the legacy method of I/O reads from the ACPI-defined processor clock control registers, referred to as P_LVLx. This method of requesting C-states provides legacy support for operating systems that initiate C-state transitions via I/O reads.

Page 92

Power Management

For legacy operating systems, P_LVLx I/O reads are converted within the processor to the equivalent MWAIT C-state request. Therefore, P_L VLx reads do not directly result in I/O reads to the system. The feature, known as I/O MWAIT redirection, must be enabled in the BIOS. To enable it, refer to the

Intel® 64 and IA-32 Architectures

Software Developer’s Manual (SDM) Volumes 1, 2, and 3.

Note: The P_LVLx I/O Monitor address needs to be set up before using the P_LVLx I/O read

interface. Each P-LVLx is mapped to the supported MWAIT(Cx) instruction as follows.

Table 4-8. P_LVLx to MWAIT Conversion

P_LVLx MWAIT(Cx) Notes

P_LVL2 MWAIT(C3) The P_LVL2 base address is defined in the PMG_IO_CAPTURE MSR,

P_LVL3 MWAIT(C6) C6. No sub-states allowed. P_LVL4 MWAIT(C7) C7. No sub-states allowed.

described in the

Developer’s Manual (SDM) Volumes 1, 2, and 3.

Intel® 64 and IA-32 Architectures Software

The BIOS can write to the C-state range field of the PMG_IO_CAPTURE MSR to restrict the range of I/O addresses that are trapped and emulate MWAIT like function ality. Any P_LVLx reads outside of this range does not cause an I/O redirection to MW AIT(Cx) like request. They fall through like a normal I/O instruction.

Note: When P_LVLx I/O instructions are used, MWAIT substates cannot be defined. The

MWAIT substate is always zero if I/O MWAIT redirection is used. By default, P_LVLx I/O redirections enable the MWAIT 'break on EFLAGS.IF’ feature which triggers a wakeup on an interrupt even if interrupts are masked by EFLAGS.IF.

4.2.4 Core C-states

The following are general rules for all core C-states, unless specified otherwise:

• A core C-State is determined by the lowest numerical thread state (for example, Thread 0 requests C1E while Thread 1 requests C3, resulting in a core C1E state). See Table 4-7.

• A core transitions to C0 state when:

— an interrupt occurs. — there is an access to the monitored address if the state was entered via an

MWAIT instruction.

• For core C1/C1E, and core C3, an interrupt directed toward a single thread wakes only that thread. However, since both threads are no longer at the same core C-state, the core resolves to C0.

• An interrupt only wakes the target thread for both C3 and C6 states. Any interrupt coming into the processor package may wake any core.

4.2.4.1 Core C0 State

The normal operating state of a core where code is being executed.

4.2.4.2 Core C1/C1E State

C1/C1E is a low power state entered when all threads within a core execute a HLT or MWAIT(C1/C1E) instruction.

92 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 93

Power Management

A System Management Interrupt (SMI) handler returns execution to either Normal state or the C1/C1E state. See the

Developer’s Manual (SDM) Volumes 1, 2, and 3

While a core is in C1/C1E state, it processes bus snoops and snoops from other threads. For more information on C1E, see Section 4.2.5.2, “Package C1/C1E”.

4.2.4.3 Core C3 State

Individual threads of a core can enter the C3 state by initiating a P_LVL2 I/O read to the P_BLK or an MWAIT(C3) instruction. A core in C3 state flushes the contents of its L1 instruction cache, L1 data cache, and L2 cache to the shared L3 cache, while maintaining its architectural state. All core clocks are stopped at this point. Because the core’s caches are flushed, the processor does not wake any core that is in the C3 state when either a snoop is detected or when another core accesses cacheable memory.

4.2.4.4 Core C6 State

Individual threads of a core can enter the C6 state by initiating a P_LVL3 I/O read or an MWAIT(C6) instruction. Before entering core C6, the core will save its architectural state to a dedicated SRAM. Once complete, a core will have its voltage reduced to zero volts. In addition to flushing core caches core architecture state is saved to the uncore. Once the core state save is completed, core voltage is reduced to zero. During exit, the core is powered on and its architectural state is restored.

Intel® 64 and IA-32 Architectures Software

for more information.

4.2.4.5 Core C7 State

Individual threads of a core can enter the C7 state by initiating a P_LVL4 I/O read to the P_BLK or by an MWAIT(C7) instruction. Core C7 and core C7 substate are the same as Core C6. The processor does not support LLC flush under any condition.

4.2.4.6 C-State Auto-Demotion

In general, deeper C-states such as C6 or C7 have long latencies and have higher energy entry/exit costs. The resulting performance and energy penalties become significant when the entry/exit frequency of a deeper C-state is high. In order to increase residency in deeper C-states, the processor supports C-state auto-demotion.

There are two C-State auto-demotion options:

•C6/C7 to C3

• C3/C6/C7 To C1

The decision to demote a core from C6/C7 to C3 or C3/C6/C7 to C1 is based on each core’s immediate residency history. Upon each core C6/C7 request, the core C-state is demoted to C3 or C1 until a sufficient amount of residency has been established. At that point, a core is allowed to go into C3/C6 or C7. Each option can be run concurrently or individually.

This feature is disabled by default. BIOS must enable it in the PMG_CST_CONFIG_CONTROL register. The auto-demotion policy is also configured by this register. See the

(SDM) Volumes 1, 2, and 3

Intel® 64 and IA-32 Architectures Software Developer’s Manual

for C-state configurations.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 93 Datasheet Volume One

Page 94

4.2.5 Package C-States

The processor supports C0, C1/C1E, C2, C3, and C6 power states. The following is a summary of the general rules for package C-state entry. These apply to all package C-states unless specified otherwise:

• A package C-state request is determined by the lowest numerical core C-state amongst all cores.

• A package C-state is automatically resolved by the processor depending on the core idle power states and the status of the platform components.

— Each core can be at a lower idle power state than the package if the platform

does not grant the processor permission to enter a requested package C-state.

— The platform may allow additional power savings to be realized in the

processor.

• For package C-states, the processor is not required to enter C0 before entering any other C-state.

The processor exits a package C-state when a break event is detected. Depending on the type of break event, the processor does the following:

• If a core break event is received, the target core is activated and the break event message is forwarded to the target core.

— If the break event is not masked, the target core enters the core C0 state and

the processor enters package C0.

— If the break event is masked, the processor attempts to re-enter its previous

package state.

• If the break event was due to a memory access or snoop request.

— But the platform did not request to keep the processor in a higher package

C-state, the package returns to its previous C-state.

— And the platform requests a higher power C-state, the memory access or snoop

request is serviced and the package remains in the higher power C-state.

Power Management

The package C-states fall into two categories: independent and coordinated. C0/C1/ C1E are independent, while C2/C3/C6 are coordinated.

Starting with the 2nd Generation Intel(r) Core(TM) Processor Family, package C-states are based on exit latency requirements which are accumulated from the PCIe* devices, PCH, and software sources. The level of power savings that can be achieved is a function of the exit latency requirement from the platform. As a result, there is no fixe d relationship between the coordinated C-state of a package, and the power savings that will be obtained from the state. Coordinated package C-states offer a range of power savings which is a function of the guaranteed exit latency requirement from the platform.

There is also a concept of Execution Allowed (EA), when EA status is 0, the cores in a socket are in C3 or a deeper state, a socket initiates a request to enter a coordinated package C-state. The coordination is across all sockets and the PCH.

Table 4-9 shows an example of a dual-core processor package C-state resolution. Figure 4-3 summarizes package C-state transitions with package C2 as the interim

between PC0 and PC1 prior to PC3 and PC6.

94 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 95

Power Management

Notes:

C6C3

Table 4-9. Coordination of Core Power States at the Package Level

Package C-State

Core 0

1. The package C-state will be C1E if all actives cores have resolved a core C1 state or higher.

C0 C1 C3 C6

C0 C0 C0 C0

C0 C1

Figure 4-3. Package C-State Entry and Exit

Core 1

C3 C3

C3 C6

4.2.5.1 Package C0

The normal operating state for the processor. The processor remains in the normal state when at least one of its cores is in the C0 or C1 state or when the platform has not granted permission to the processor to go into a low power state. Individual cores may be in lower power idle states while the package is in C0.

4.2.5.2 Package C1/C1E

No additional power reduction actions are taken in the package C1 state. However, if the C1E substate is enabled, the processor automatically transitions to the lowest supported core clock frequency, followed by a reduction in voltage. Autonomous power reduction actions which are based on idle timers, can trigger depending on the activity

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 95 Datasheet Volume One

in the system. The package enters the C1 low power state when:

Page 96

• At least one core is in the C1 state.

• The other cores are in a C1 or lower power state.

The package enters the C1E state when:

• All cores have directly requested C1E via MWAIT(C1) with a C1E sub-state hint.

• All cores are in a power state lower that C1/C1E but the package low power state is limited to C1/C1E via the PMG_CST_CONFIG_CONTROL MSR.

• All cores have requested C1 using HLT or MWAIT(C1) and C1E auto-promotion is enabled in POWER_CTL.

No notification to the system occurs upon entry to C1/C1E.

4.2.5.3 Package C2 State

Package C2 state is an intermediate state which represents the point at which the system level coordination is in progress. The package cannot reach this state unless all cores are in at least C3.

The package will remain in C2 when:

• it is awaiting for a coordinated response

• the coordinated exit latency requirements are too stringent for the package to take any power saving actions

Power Management

If the exit latency requirements are high enough the package will transition to C3 or C6 depending on the state of the cores.

4.2.5.4 Package C3 State

A processor enters the package C3 low power state when:

• At least one core is in the C3 state.

• The other cores are in a C3 or lower power state, and the processor has been granted permission by the platform.

• L3 shared cache retains context and becomes inaccessible in this state.

• Additional power savings actions, as allowed by the exit latency requirements, include putting Intel QPI and PCIe* links in L1, the uncore is not available, further voltage reduction can be taken.

In package C3, the ring will be off and as a result no accesses to the LLC are possible. The content of the LLC is preserved.

4.2.5.5 Package C6 State

A processor enters the package C6 low power state when:

• At least one core is in the C6 state.

• The other cores are in a C6 or lower power state, and the processor has been granted permission by the platform.

• L3 shared cache retains context and becomes inaccessible in this state.

96 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 97

Power Management

In package C6 state, all cores have saved their architectural state and have had their core voltages reduced to zero volts. The LLC retains context, but no accesses can be made to the LLC in this state, the cores must break out to the internal state package C2 for snoops to occur.

4.2.6 Package C-State Power Specifications

The table below lists the processor package C-state power specifications for various processor SKUs.

Table 4-10. Package C-State Power Specifications

TDP SKUs C1E (W) C3 (W) C6 (W)

8-Core / 6-Core

150W (8-core) 58 27 15 135W (8-core) 47 22 15 130W (8-core) 47 22 15 130W (6-core) 53 35 21 130W (6-core 1S WS) 53 35 21 115W (8-core) 47 22 15 95W (8-core) 47 22

35 (E5-2660)

95W (6-core) 48 22

35 (E5-2620) 70W (8-core) 39 20 14 60W (6-core) 38 20 14 LV95W-8C (8-core) 47 22 15 LV70W-8C (8-core) 39 20 14

4-Core / 2-Core

130W (4-core) 53 28 16 130W (4-Core 1S WS) 53 28 16 95W (4-core) 47 22 15 80W (4-core) 42 21

30 (E5-2603) 80W (2-core) 42 30 21

21 (E5-2620)

Notes:

1. Package C1E power specified at Tcase = 60°C.

2. Package C3/C6 power specified at Tcase = 50°C.

4.3 System Memory Power Management

The DDR3 power states can be summarized as the following:

• Normal operation (highest power consumption).

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 97 Datasheet Volume One

Page 98

• CKE Power-Down: Opportunistic, per rank control after idle time. There may be different levels.

—Active Power-Down. — Precharge Power-Down with Fast Exit. — Precharge power Down with Slow Exit.

• Self Refresh: In this mode no transaction is executed. The DDR consumes the minimum possible power.

4.3.1 CKE Power-Down

The CKE input land is used to enter and exit different power-down modes. The memory controller has a configurable activity timeout for each rank. Whenever no reads are present to a given rank for the configured interval, the memory controller will transition the rank to power-down mode.

The memory controller transitions the DRAM to power-down by de-asserting CKE and driving a NOP command. The memory controller will tri-state all DDR interface lands except CKE (de-asserted) and ODT while in power-down. The memory controller will transition the DRAM out of power-down state by synchronously asserting CKE and driving a NOP command.

When CKE is off the internal DDR clock is disabled and the DDR power is significantly reduced.

Power Management

The DDR defines three levels of power-down:

• Active power-down.

• Precharge power-down fast exit.

• Precharge power-down slow exit.

4.3.2 Self Refresh

The Power Control Unit (PCU) may request the memory controller to place the DRAMs in self refresh state. Self refresh per channel is supported. The BIOS can put the channel in self-refresh if software remaps memory to use a subset of all channels. Also processor channels can enter self refresh autonomously without PCU instruction when the package is in a package C0 state.

4.3.2.1 Self Refresh Entry

Self refresh entrance can be either disabled or triggered by an idle counter. The idle counter always clears with any access to the memory controller and remains clear as long as the memory controller is not drained. As soon as the memory controller is drained, the counter starts counting, and when it reaches the idle-count, the memory controller will place the DRAMs in self refresh state.

Power may be removed from the memory controller core at this point. B ut V (1.5 V or 1.35 V) to the DDR IO must be maintained.

CCD

supply

4.3.2.2 Self Refresh Exit

Self refresh exit can be either a message from an external unit or as reaction for an incoming transaction.

98 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Page 99

Power Management

4.3.2.3 DLL and PLL Shutdown

Self refresh, according to configuration, may be a trigger for master DLL shut-down and PLL shut-down. The master DLL shut-down is issued by the memory controller after the DRAMs have entered self refresh.

The PLL shut-down and wake-up is issued by the PCU. The memory controller gets a signal from PLL indicating that the memory controller can start working again.

4.3.3 DRAM I/O Power Management

Unused signals are tristated to save power. This includes all signals associated with an unused memory channel.

The I/O buffer for an unused signal should be tristated (output driver disabled), the input receiver (differential sense-amp) should be disabled. The input path must be gated to prevent spurious results due to noise on the unused signals (typically handled automatically when input receiver is disabled).

4.4 DMI2/PCI Express* Power Management

Active State Power Management (ASPM) support using L1 state, L0s is not supported.

Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families 99 Datasheet Volume One

Page 100

Power Management

100 Intel® Xeon® Processor E5-1600/E5-2600/E5-4600 Product Families

Datasheet Volume One

Intel E5-4600, E5-1600, E5-2600, CM8062101038606 User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

1 Overview

1.1 Introduction

1.1.1 Processor Feature Details

1.1.2 Supported Technologies

1.2 Interfaces

1.2.1 System Memory Support

1.2.2 PCI Express*

1.2.3 Direct Media Interface Gen 2 (DMI2)

1.2.4 Intel® QuickPath Interconnect (Intel® QPI)

1.2.5 Platform Environment Control Interface (PECI)

1.3 Power Management Support

1.3.1 Processor Package and Core States

1.3.2 System States Support

1.3.3 Memory Controller

1.3.4 PCI Express

1.3.5 Intel QuickPath Interconnect

1.4 Thermal Management Support

1.5 Package Summary

1.6 Terminology

1.7 Related Documents

1.8 State of Data

2 Interfaces

2.1 System Memory Interface

2.1.1 System Memory Technology Support

2.1.2 System Memory Timing Support

2.2 PCI Express* Interface

2.2.1 PCI Express* Architecture

2.2.2 PCI Express* Configuration Mechanism

2.3 DMI2/PCI Express* Interface

2.3.1 DMI2 Error Flow

2.3.2 Processor/PCH Compatibility Assumptions

2.3.3 DMI2 Link Down

2.4 Intel QuickPath Interconnect

2.5 Platform Environment Control Interface (PECI)

2.5.1 PECI Client Capabilities

2.5.2 Client Command Suite

2.5.3 Client Management

2.5.4 Multi-Domain Commands

2.5.5 Client Responses

2.5.6 Originator Responses

2.5.7 DTS Temperature Data

3 Technologies

3.1 Intel® Virtualization Technology (Intel® VT)

3.1.1 Intel VT-x Objectives

3.1.2 Intel VT-x Features

3.1.3 Intel VT-d Objectives

3.1.4 Intel Virtualization Technology Processor Extensions

3.2 Security Technologies

3.2.1 Intel® Trusted Execution Technology

3.2.2 Intel Trusted Execution Technology – Server Extensions

3.2.3 Intel® Advanced Encryption Standard Instructions (Intel® AES-NI)

3.2.4 Execute Disable Bit

3.3 Intel® Hyper-Threading Technology

3.4 Intel® Turbo Boost Technology

3.4.1 Intel® Turbo Boost Operating Frequency

3.5 Enhanced Intel SpeedStep® Technology

3.6 Intel® Intelligent Power Technology

3.7 Intel® Advanced Vector Extensions (Intel® AVX)

3.8 Intel® Dynamic Power Technology (Intel® DPT)

4 Power Management

4.1 ACPI States Supported

4.1.1 System States

4.1.2 Processor Package and Core States

4.1.3 Integrated Memory Controller States

4.1.4 DMI2/PCI Express Link States

4.1.5 Intel QuickPath Interconnect States

4.1.6 G, S, and C State Combinations

4.2 P rocessor Core/Package Power Management

4.2.1 Enhanced Intel SpeedStep® Technology

4.2.2 Low-Power Idle States

4.2.3 Requesting Low-Power Idle States

4.2.4 Core C-states

4.2.5 Package C-States

4.2.6 Package C-State Power Specifications

4.3 System Memory Power Management

4.3.1 CKE Power-Down