Kontron S4600 SEL Troubleshooting

System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Intel order number G90620-002
Revision 1.1
September 2013
Enterprise Platforms and Services Division – Marketing
Revision History System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5
Date
Revision
Number
Modifications
January 2013
1.0
Initial release
September 2013
1.1
Added MIC Thermal Margin sensors C4 through C7.  Added MIC Status sensors A2, A3, A6, and A7.  Added voltage sensors EA, EB, EC, ED, and EF.  Corrected typographical errors.  Made corrections to Firmware Update Status table.  Made corrections to Catastrophic Error Sensor table.  Added support for S1400FP, S1400SP, S1600JP, and S4600LH.
4600/2600/2400/1600/1400 Product Families
Revision History
ii Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families Disclaimers
Disclaimers
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.
Revision 1.1 Intel order number G90620-002 iii
Table of Contents System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5
4600/2600/2400/1600/1400 Product Families
Table of Contents
1. Introduction .......................................................................................................................... 1
1.1 Purpose ................................................................................................................... 1
1.2 Industry Standard .................................................................................................... 2
1.2.1 Intelligent Platform Management Interface (IPMI) ................................................... 2
1.2.2 Baseboard Management Controller (BMC) ............................................................. 2
1.2.3 Intel® Intelligent Power Node Manager Version 2.0 ................................................. 3
2. Basic Decoding of a SEL Record ........................................................................................ 4
2.1 Default Values in the SEL Records ......................................................................... 4
2.2 Notes on SEL Logs and Collecting SEL Information ............................................. 10
2.2.1 Examples of Decoding BIOS Timestamp Events .................................................. 10
2.2.2 Example of Decoding a PCI Express* Correctable Error Events ........................... 11
2.2.3 Example of Decoding a Power Supply Predictive Failure Event............................ 12
3. Sensor Cross Reference List ............................................................................................ 13
3.1 BMC owned Sensors (GID = 0020h) ..................................................................... 13
3.2 BIOS POST owned Sensors (GID = 0001h) .......................................................... 24
3.3 BIOS SMI Handler owned Sensors (GID = 0033h) ................................................ 24
3.4 Node Manager / ME Firmware owned Sensors (GID = 002Ch or 602Ch) ............. 25
3.5 Microsoft* OS owned Events (GID = 0041) ........................................................... 26
3.6 Linux* Kernel Panic Events (GID = 0021) .............................................................. 26
4. Power Subsystems ............................................................................................................ 27
4.1 Threshold-based Voltage Sensors ........................................................................ 27
4.2 Voltage Regulator Watchdog Timer Sensor .......................................................... 33
4.2.1 Voltage Regulator Watchdog Timer Sensor – Next Steps ..................................... 34
4.3 Power Unit ............................................................................................................. 34
4.3.1 Power Unit Status Sensor ...................................................................................... 34
4.3.2 Power Unit Redundancy Sensor ............................................................................ 36
4.3.3 Node Auto Shutdown Sensor ................................................................................ 37
4.4 Power Supply ......................................................................................................... 38
4.4.1 Power Supply Status Sensors ............................................................................... 38
4.4.2 Power Supply Power In Sensors ........................................................................... 41
4.4.3 Power Supply Current Out % Sensors .................................................................. 42
4.4.4 Power Supply Temperature Sensors ..................................................................... 43
4.4.5 Power Supply Fan Tachometer Sensors ............................................................... 44
5. Cooling Subsystem ............................................................................................................ 45
5.1 Fan Sensors .......................................................................................................... 45
5.1.1 Fan Tachometer Sensors ...................................................................................... 45
5.1.2 Fan Presence and Redundancy Sensors .............................................................. 46
5.2 Temperature Sensors ............................................................................................ 49
iv Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families Table of Contents
5.2.1 Threshold-based Temperature Sensors ................................................................ 49
5.2.2 Thermal Margin Sensors ....................................................................................... 51
5.2.3 Processor Thermal Control Sensors ...................................................................... 53
5.2.4 Processor DTS Thermal Margin Sensors .............................................................. 55
5.2.5 Discrete Thermal Sensors ..................................................................................... 55
5.2.6 DIMM Thermal Trip Sensors .................................................................................. 57
5.3 System Air Flow Monitoring Sensor ....................................................................... 58
6. Processor Subsystem ........................................................................................................ 59
6.1 Processor Status Sensor ....................................................................................... 59
6.2 Catastrophic Error Sensor ..................................................................................... 61
6.3 CPU Missing Sensor .............................................................................................. 62
6.3.1 CPU Missing Sensor – Next Steps ........................................................................ 63
6.4 Quick Path Interconnect Sensors .......................................................................... 63
6.4.1 QPI Link Width Reduced Sensor ........................................................................... 63
6.4.2 QPI Correctable Error Sensor ................................................................................ 64
6.4.3 QPI Fatal Error and Fatal Error #2 ......................................................................... 65
6.5 Processor ERR2 Timeout Sensor .......................................................................... 67
6.5.1 Processor ERR2 Timeout – Next Steps ................................................................ 68
6.6 Processor MSID Mismatch Sensor ........................................................................ 68
6.6.1 Processor MSID Mismatch Sensor – Next Steps .................................................. 69
7. Memory Subsystem ........................................................................................................... 70
7.1 Memory RAS Configuration Status ........................................................................ 70
7.2 Memory RAS Mode Select .................................................................................... 72
7.3 Mirroring Redundancy State ................................ .................................................. 73
7.3.1 Mirroring Redundancy State Sensor – Next Steps ................................................ 74
7.4 Sparing Redundancy State .................................................................................... 74
7.4.1 Sparing Redundancy State Sensor – Next Steps .................................................. 76
7.5 ECC and Address Parity ........................................................................................ 76
7.5.1 Memory Correctable and Uncorrectable ECC Error .............................................. 76
7.5.2 Memory Address Parity Error ................................................................................ 78
8. PCI Express* and Legacy PCI Subsystem ....................................................................... 81
8.1 PCI Express* Errors ............................................................................................... 81
8.1.1 Legacy PCI Errors ................................................................................................. 81
8.1.2 PCI Express* Fatal Errors and Fatal Error #2 ........................................................ 82
8.1.3 PCI Express* Correctable Errors ........................................................................... 84
9. System BIOS Events .......................................................................................................... 87
9.1 System Events ....................................................................................................... 87
9.1.1 System Boot .......................................................................................................... 87
9.1.2 Timestamp Clock Synchronization ........................................................................ 87
9.2 System Firmware Progress (Formerly Post Error) ................................................. 89
Revision 1.1 Intel order number G90620-002 v
Table of Contents System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5
4600/2600/2400/1600/1400 Product Families
9.2.1 System Firmware Progress (Formerly Post Error) – Next Steps ........................... 89
10. Chassis Subsystem ........................................................................................................... 97
10.1 Physical Security ................................................................................................... 97
10.1.1 Chassis Intrusion ................................................................................................... 97
10.1.2 LAN Leash Lost ..................................................................................................... 97
10.2 FP (NMI) Interrupt .................................................................................................. 98
10.2.1 FP (NMI) Interrupt – Next Steps ............................................................................ 99
10.3 Button Sensor ...................................................................................................... 100
11. Miscellaneous Events ................................ ...................................................................... 101
11.1 IPMI Watchdog .................................................................................................... 101
11.2 SMI Timeout ........................................................................................................ 102
11.2.1 SMI Timeout – Next Steps ................................................................................... 103
11.3 System Event Log Cleared .................................................................................. 103
11.4 System Event – PEF Action ................................................................................. 104
11.4.1 System Event – PEF Action – Next Steps ........................................................... 104
11.5 BMC Watchdog Sensor ....................................................................................... 105
11.5.1 BMC Watchdog Sensor – Next Steps .................................................................. 105
11.6 BMC FW Health Sensor ...................................................................................... 106
11.6.1 BMC FW Health Sensor – Next Steps ................................................................. 106
11.7 Firmware Update Status Sensor .......................................................................... 107
11.8 Add-In Module Presence Sensor ......................................................................... 108
11.8.1 Add-In Module Presence – Next Steps ................................................................ 108
11.9 Intel® Xeon Phi™ Coprocessor Management Sensors ......................................... 109
11.9.1 Intel® Xeon Phi™ Coprocessor (MIC) Thermal Margin Sensors ........................... 109
11.9.2 Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors .......................................... 109
12. Hot-Swap Controller Backplane Events ......................................................................... 111
12.1 HSC Backplane Temperature Sensor ................................................................. 111
12.2 Hard Disk Drive Monitoring Sensor ..................................................................... 112
12.3 Hot-Swap Controller Health Sensor ..................................................................... 113
12.3.1 HSC Health Sensor – Next Steps ........................................................................ 114
13. Manageability Engine (ME) Events ................................................................................. 115
13.1 ME Firmware Health Event .................................................................................. 115
13.1.1 ME Firmware Health Event – Next Steps ............................................................ 115
13.2 Node Manager Exception Event .......................................................................... 117
13.2.1 Node Manager Exception Event – Next Steps .................................................... 117
13.3 Node Manager Health Event ............................................................................... 118
13.3.1 Node Manager Health Event – Next Steps .......................................................... 119
13.4 Node Manager Operational Capabilities Change ................................................ 120
13.4.1 Node Manager Operational Capabilities Change – Next Steps ........................... 121
13.5 Node Manger Alert Threshold Exceeded ............................................................. 122
vi Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families Table of Contents
13.5.1 Node Manger Alert Threshold Exceeded – Next Steps ....................................... 123
14. Microsoft Windows* Records .......................................................................................... 124
14.1 Boot up Event Records ................................ ........................................................ 124
14.2 Shutdown Event Records .................................................................................... 126
14.3 Bug Check / Blue Screen Event Records ............................................................ 128
15. Linux* Kernel Panic Records .......................................................................................... 130
Revision 1.1 Intel order number G90620-002 vii
List of Tables System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5
4600/2600/2400/1600/1400 Product Families
List of Tables
Table 1. SEL Record Format ........................................................................................................ 4
Table 2: Event Request Message Event Data Field Contents ...................................................... 7
Table 3: OEM SEL Record (Type C0h-DFh) ................................................................................ 8
Table 4: OEM SEL Record (Type E0h-FFh) ................................................................................. 9
Table 5: BMC owned Sensors .................................................................................................... 13
Table 6: BIOS POST owned Sensors ......................................................................................... 24
Table 7: BIOS SMI Handler owned Sensors ............................................................................... 24
Table 8: Management Engine Firmware owned Sensors ........................................................... 25
Table 9: Microsoft* OS owned Events ........................................................................................ 26
Table 10: Linux* Kernel Panic Events ......................................................................................... 26
Table 11: Threshold-based Voltage Sensors Typical Characteristics......................................... 27
Table 12: Threshold-based Voltage Sensors Event Triggers – Description ............................... 28
Table 13: Threshold-based Voltage Sensors – Next Steps ........................................................ 28
Table 14: Voltage Regulator Watchdog Timer Sensor Typical Characteristics .......................... 34
Table 15: Power Unit Status Sensors Typical Characteristics .................................................... 35
Table 16: Power Unit Status Sensor – Sensor Specific Offsets – Next Steps ............................ 35
Table 17: Power Unit Redundancy Sensors Typical Characteristics .......................................... 36
Table 18: Power Unit Redundancy Sensor – Event Trigger Offset – Next Steps ....................... 37
Table 19: Node Auto Shutdown Sensor Typical Characteristics ................................................ 37
Table 20: Power Supply Status Sensors Typical Characteristics ............................................... 38
Table 21: Power Supply Status Sensor – Sensor Specific Offsets – Next Steps ....................... 39
Table 22: Power Supply Power In Sensors Typical Characteristics ........................................... 41
Table 23: Power Supply Power In Sensor – Event Trigger Offset – Next Steps ........................ 41
Table 24: Power Supply Current Out % Sensors Typical Characteristics .................................. 42
Table 25: Power Supply Current Out % Sensor – Event Trigger Offset – Next Steps ................ 42
Table 26: Power Supply Temperature Sensors Typical Characteristics ..................................... 43
Table 27: Power Supply Temperature Sensor – Event Trigger Offset – Next Steps .................. 43
Table 28: Power Supply Fan Tachometer Sensors Typical Characteristics ............................... 44
Table 29: Fan Tachometer Sensors Typical Characteristics ...................................................... 45
Table 30: Fan Tachometer Sensor – Event Trigger Offset – Next Steps ................................... 46
Table 31: Fan Presence Sensors Typical Characteristics .......................................................... 46
Table 32: Fan Presence Sensors – Event Trigger Offset – Next Steps ...................................... 47
Table 33: Fan Redundancy Sensors Typical Characteristics ..................................................... 47
Table 34: Fan Redundancy Sensor – Event Trigger Offset – Next Steps .................................. 48
Table 35: Temperature Sensors Typical Characteristics ............................................................ 49
Table 36: Temperature Sensors Event Triggers – Description ................................................... 50
Table 37: Temperature Sensors – Next Steps ............................................................................ 50
Table 38: Thermal Margin Sensors Typical Characteristics ....................................................... 51
viii Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families List of Tables
Table 39: Thermal Margin Sensors Event Triggers – Description .............................................. 52
Table 40: Thermal Margin Sensors – Next Steps ....................................................................... 52
Table 41: Processor Thermal Control Sensors Typical Characteristics ...................................... 53
Table 42: Processor Thermal Control Sensors Event Triggers – Description............................. 54
Table 43: Processor DTS Thermal Margin Sensors Typical Characteristics .............................. 55
Table 44: Discrete Thermal Sensors Typical Characteristics ..................................................... 56
Table 45: Discrete Thermal Sensors – Next Steps ..................................................................... 56
Table 46: DIMM Thermal Trip Typical Characteristics ................................................................ 57
Table 47: Process Status Sensors Typical Characteristics ........................................................ 59
Table 48: Processor Status Sensors – Next Steps ..................................................................... 60
Table 49: Catastrophic Error Sensor Typical Characteristics ..................................................... 61
Table 50: Catastrophic Error Sensor – Event Data 2 Values – Next Steps ................................ 61
Table 51: CPU Missing Sensor Typical Characteristics .............................................................. 62
Table 52: QPI Link Width Reduced Sensor Typical Characteristics ........................................... 63
Table 53: QPI Correctable Error Sensor Typical Characteristics ................................................ 64
Table 54: QPI Fatal Error Sensor Typical Characteristics .......................................................... 65
Table 55: QPI Fatal #2 Error Sensor Typical Characteristics ..................................................... 66
Table 56: Processor ERR2 Timeout Sensor Typical Characteristics .......................................... 68
Table 57: Processor MSID Mismatch Sensor Typical Characteristics ........................................ 69
Table 58: Memory RAS Configuration Status Sensor Typical Characteristics............................ 70
Table 59: Memory RAS Configuration Status Sensor – Event Trigger Offset – Next Steps ....... 71
Table 60: Memory RAS Mode Select Sensor Typical Characteristics ........................................ 72
Table 61: Mirroring Redundancy State Sensor Typical Characteristics ...................................... 73
Table 62: Sparing Redundancy State Sensor Typical Characteristics ....................................... 75
Table 63: Correctable and Uncorrectable ECC Error Sensor Typical Characteristics ................ 76
Table 64: Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset – Next Steps77
Table 65: Address Parity Error Sensor Typical Characteristics .................................................. 78
Table 66: Legacy PCI Error Sensor Typical Characteristics ....................................................... 81
Table 67: PCI Express* Fatal Error Sensor Typical Characteristics ........................................... 82
Table 68: PCI Express* Fatal Error #2 Sensor Typical Characteristics ...................................... 83
Table 69: PCI Express* Correctable Error Sensor Typical Characteristics ................................ 85
Table 70: System Event Sensor Typical Characteristics ............................................................ 88
Table 71: POST Error Sensor Typical Characteristics ................................................................ 89
Table 72: POST Error Codes ...................................................................................................... 90
Table 73: Physical Security Sensor Typical Characteristics ....................................................... 97
Table 74: Physical Security Sensor Event Trigger Offset – Next Steps ..................................... 98
Table 75: FP (NMI) Interrupt Sensor Typical Characteristics ..................................................... 99
Table 76: Button Sensor Typical Characteristics ...................................................................... 100
Table 77: IPMI Watchdog Sensor Typical Characteristics ........................................................ 101
Table 78: IPMI Watchdog Sensor Event Trigger Offset – Next Steps ...................................... 102
Revision 1.1 Intel order number G90620-002 ix
List of Tables System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5
4600/2600/2400/1600/1400 Product Families
Table 79: SMI Timeout Sensor Typical Characteristics ............................................................ 102
Table 80: System Event Log Cleared Sensor Typical Characteristics ...................................... 103
Table 81: System Event – PEF Action Sensor Typical Characteristics .................................... 104
Table 82: BMC Watchdog Sensor Typical Characteristics ....................................................... 105
Table 83: BMC FW Health Sensor Typical Characteristics ...................................................... 106
Table 84: Firmware Update Status Sensor Typical Characteristics .......................................... 107
Table 85: Add-In Module Presence Sensor Typical Characteristics ......................................... 108
Table 86: MIC Status Sensors - Typical Characteristics ........................................................... 109
Table 87: HSC Backplane Temperature Sensor Typical Characteristics ................................. 111
Table 88: HSC Backplane Temperature Sensor – Event Trigger Offset – Next Steps ............. 112
Table 89: Hard Disk Drive Monitoring Sensor Typical Characteristics................................ ...... 112
Table 90: Hard Disk Drive Monitoring Sensor - Event Trigger Offset – Next Steps .................. 113
Table 91: HSC Health Sensor Typical Characteristics ............................................................. 113
Table 92: ME Firmware Health Event Sensor Typical Characteristics...................................... 115
Table 93: ME Firmware Health Event Sensor – Next Steps ..................................................... 116
Table 94: Node Manager Exception Sensor Typical Characteristics ........................................ 117
Table 95: Node Manager Health Event Sensor Typical Characteristics ................................... 118
Table 96: Node Manager Operational Capabilities Change Sensor Typical Characteristics .... 120
Table 97: Node Manager Alert Threshold Exceeded Sensor Typical Characteristics .............. 122
Table 98: Boot up Event Record Typical Characteristics .......................................................... 124
Table 99: Boot up OEM Event Record Typical Characteristics ................................................ 125
Table 100: Shutdown Reason Code Event Record Typical Characteristics ............................. 126
Table 101: Shutdown Reason OEM Event Record Typical Characteristics ............................. 126
Table 102: Shutdown Comment OEM Event Record Typical Characteristics .......................... 127
Table 103: Bug Check/Blue Screen – OS Stop Event Record Typical Characteristics ............ 128
Table 104: Bug Check/Blue Screen code OEM Event Record Typical Characteristics ............ 129
Table 105: Linux* Kernel Panic Event Record Characteristics ................................................. 130
Table 106: Linux* Kernel Panic String Extended Record Characteristics ................................. 131
x Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families Introduction

1. Introduction

The server management hardware that is part of the Intel® Server Boards and Intel® Server Platforms serves as a vital part of the overall server management strategy. The server management hardware provides essential information to the system administrator and provides the administrator the ability to remotely control the server, even when the operating system is not running.
The Intel® Server Boards and Intel® Server Platforms offer comprehensive hardware and software based solutions. The server management features make the servers simple to manage and provide alerting on system events. From entry to enterprise systems, good overall server management is essential to reduce overall total cost of ownership.
This Troubleshooting Guide is intended to help the users better understand the events that are logged in the Baseboard Management Controllers (BMC) System Event Logs (SEL) on these Intel® Server Boards.
There is a separate User’s Guide that covers the general server management and the server management software offered on the Intel® Server Boards and Intel® Server Platforms.
Server boards currently supported by this document:
Intel® S1400FP Server Boards  Intel® S1400SP Server Boards  Intel® S1600JP Server Boards  Intel® S2400BB Server Boards  Intel® S2400EP Server Boards  Intel® S2400GP Server Boards  Intel® S2400LP Server Boards  Intel® S2400SC Server Boards  Intel® S2600CO Server Boards  Intel® S2600CP Server Boards  Intel® S2600GZ/S2600GL Server Boards  Intel® S2600IP Server Boards  Intel® S2600JF Server Boards  Intel® S2600WP Server Boards  Intel® S4600LH Server Boards  Intel® W2600CR Workstation Boards

1.1 Purpose

The purpose of this document is to list all possible events generated by the Intel platform. It may be possible that other sources (not under our control) also generate events, which will not be described in this document.
Revision 1.1 Intel order number G90620-002 1
Introduction System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5
4600/2600/2400/1600/1400 Product Families

1.2 Industry Standard

1.2.1 Intelligent Platform Management Interface (IPMI)

The key characteristic of the Intelligent Platform Management Interface (IPMI) is that the inventory, monitoring, logging, and recovery control functions are available independently of the main processors, BIOS, and operating system. Platform management functions can also be made available when the system is in a power-down state.
IPMI works by interfacing with the BMC, which extends management capabilities in the server system and operates independently of the main processor by monitoring the on-board instrumentation. Through the BMC, IPMI also allows administrators to control power to the server, and remotely access BIOS configuration and operating system console information.
IPMI defines a common platform instrumentation interface to enable interoperability between:
The baseboard management controller and chassis The baseboard management controller and systems management software Between servers
IPMI enables the following:
Common access to platform management information, consisting of:
- Local access from systems management software
- Remote access from LAN
- Inter-chassis access from Intelligent Chassis Management Bus
- Access from LAN, serial/modem, IPMB, PCI SMBus*, or ICMB, available even if the
processor is down
IPMI interface isolates systems management software from hardware. Hardware advancements can be made without impacting the systems management
software.
IPMI facilitates cross-platform management software.
You can find more information on IPMI at the following URL:
http://www.intel.com/design/servers/ipmi

1.2.2 Baseboard Management Controller (BMC)

A baseboard management controller (BMC) is a specialized microcontroller embedded on most Intel® Server Boards. The BMC is the heart of the IPMI architecture and provides the intelligence behind intelligent platform management, that is, the autonomous monitoring and recovery features implemented directly in platform management hardware and firmware.
Different types of sensors built into the computer system report to the BMC on parameters such as temperature, cooling fan speeds, power mode, operating system status, and so on. The BMC monitors the system for critical events by communicating with various sensors on the system
2 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families Introduction
board; it sends alerts and logs events when certain parameters exceed their preset thresholds, indicating a potential failure of the system. The administrator can also remotely communicate with the BMC to take some corrective action such as resetting or power cycling the system to get a hung OS running again. These abilities save on the total cost of ownership of a system.
For Intel® Server Boards and Intel® Server Platforms, the BMC supports the industry standard IPMI 2.0 Specification, enabling you to configure, monitor, and recover systems remotely.
1.2.2.1 System Event Log (SEL)
The BMC provides a centralized, non-volatile repository for critical, warning, and informational system events called the System Event Log or SEL. By having the BMC manage the SEL and logging functions, it helps to ensure that “post-mortem” logging information is available if a failure occurs that disables the system processor(s).
The BMC allows access to SEL from in-band and out-of-band mechanisms. There are various tools and utilities that can be used to access the SEL. There is the Intel® SELView utility and multiple open sourced IPMI tools.
1.2.3 Intel
®
Intelligent Power Node Manager Version 2.0
Intel® Intelligent Power Node Manager Version 2.0 (NM) is a platform-resident technology that enforces power and thermal policies for the platform. These policies are applied by exploiting subsystem knobs (such as processor P and T states) that can be used to control power consumption. Intel® Intelligent Power Node Manager enables data center power and thermal management by exposing an external interface to management software through which platform policies can be specified. It also enables specific data center power management usage models such as power limiting.
The configuration and control commands are used by the external management software or BMC to configure and control the Intel® Intelligent Power Node Manager feature. Because Platform Services firmware does not have any external interface, external commands are first received by the BMC over LAN and then relayed to the Platform Services firmware over IPMB channel. The BMC acts as a relay and the transport conversion device for these commands. For simplicity, the commands from the management console might be encapsulated in a generic CONFIG packet format (configuration data length, configuration data blob) to the BMC so that the BMC doesn’t even have to parse the actual configuration data.
The BMC provides the access point for remote commands from external management SW and generates alerts to them. Intel® Intelligent Power Node Manager on Intel® Manageability Engine (Intel® ME) is an IPMI satellite controller. A mechanism exists to forward commands to Intel® ME and then sends the response back to originator. Similarly events from Intel® ME will be sent as alerts outside of the BMC.
Revision 1.1 Intel order number G90620-002 3
Basic Decoding of a SEL Record
Byte
Field
Description
1 2 Record ID
(RID)
ID used for SEL Record access.
3
Record Type (RT)
[7:0] – Record Type 02h = System event record C0h-DFh = OEM timestamped, bytes 8-16 OEM defined (See Table 3) E0h-FFh = OEM non-timestamped, bytes 4-16 OEM defined (See Table 4)
4 5 6 7
Timestamp (TS)
Time when event was logged. LS byte first. Example: TS:[29][76][68][4C] = 4C687629h = 1281914409 = Sun, 15 Aug 2010
23:20:09 UTC Note: There are various websites that will convert the raw number to a date/time.
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families

2. Basic Decoding of a SEL Record

The System Event Log (SEL) record format is defined in the IPMI Specification. The following section provides a basic definition for each of the fields in a SEL. For more details see the IPMI Specification.
The definitions for the standard SEL can be found in Table 1. The definitions for the OEM defined event logs can be found in Table 3 and Table 4.

2.1 Default Values in the SEL Records

Unless otherwise noted in the event record descriptions the following are the default values in all SEL entries.
Byte [3] = Record Type (RT) = 02h = System event record Byte [9:8] = Generator ID = 0020h = BMC Firmware Byte [10] = Event Message Revision (ER) = 04h = IPMI 2.0
4 Intel order number G90620-002 Revision 1.1
Table 1. SEL Record Format
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Byte
Field
Description
8 9 Generator ID
(GID)
RqSA and LUN if event was generated from IPMB. Software ID if event was generated from system software. Byte 1 [7:1] – 7-bit I2C Slave Address, or 7-bit system software ID [0] 0b = ID is IPMB Slave Address
1b = System software ID
Software ID values:
0001h BIOS POST for POST errors, RAS Configuration/State,
Timestamp Synch, OS Boot events
0033h – BIOS SMI Handler  0020h – BMC Firmware  002Ch – ME Firmware  0041h – Server Management Software  00C0h – HSC Firmware – HSBP A  00C2h – HSC Firmware – HSBP B
Byte 2 [7:4] – Channel number. Channel that event message was received over. 0h if the event
message was received from the system interface, primary IPMB, or internally generated by the BMC.
[3:2] – Reserved. Write as 00b. [1:0] – IPMB device LUN if byte 1 holds Slave Address. 00b otherwise.
10
EvM Rev (ER)
Event Message format version. 04h = IPMI v2.0; 03h = IPMI v1.0
11
Sensor Type (ST)
Sensor Type Code for sensor that generated the event
12
Sensor # (SN)
Number of sensor that generated the event (From SDR)
13
Event Dir | Event Type (EDIR)
Event Dir [7] – 0b = Assertion event.
1b = Deassertion event. Event Type Type of trigger for the event, for example, critical threshold going high, state asserted,
and so on. Also indicates class of the event. For example, discrete, threshold, or OEM. The Event Type field is encoded using the Event/Reading Type Code.
Basic Decoding of a SEL Record
Revision 1.1 Intel order number G90620-002 5
Basic Decoding of a SEL Record
Byte
Field
Description
[6:0] – Event Type Codes
01h = Threshold (States = 0x00-0x0b) 02h-0ch = Discrete 6Fh = Sensor-Specific 70-7Fh = OEM
14
Event Data 1 (ED1)
Per Table 2
15
Event Data 2 (ED2)
16
Event Data 3 (ED3)
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
6 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Sensor
Class
Event Data
Threshold
Event Data 1 [7:6] – 00b = Unspecified Event Data 2
01b = Trigger reading in Event Data 2 10b = OEM code in Event Data 2 11b = Sensor-specific event extension code in Event Data 2
[5:4] – 00b = Unspecified Event Data 3
01b = Trigger threshold value in Event Data 3 10b = OEM code in Event Data 3 11b = Sensor-specific event extension code in Event Data 3
[3:0] – Offset from Event/Reading Code for threshold event. Event Data 2 – Reading that triggered event, FFh or not present if unspecified. Event Data 3 – Threshold value that triggered event, FFh or not present if unspecified. If present, Event Data 2 must be present.
discrete
Event Data 1 [7:6] – 00b = Unspecified Event Data 2
01b = Previous state and/or severity in Event Data 2 10b = OEM code in Event Data 2 11b = Sensor-specific event extension code in Event Data 2
[5:4] – 00b = Unspecified Event Data 3
01b = Reserved 10b = OEM code in Event Data 3 11b = Sensor-specific event extension code in Event Data 3
[3:0] – Offset from Event/Reading Code for discrete event state Event Data 2 [7:4] – Optional offset from “Severity” Event/Reading Code (0Fh if unspecified). [3:0] – Optional offset from Event/Reading Type Code for previous discrete event state (0Fh if unspecified). Event Data 3 – Optional OEM code. FFh or not present if unspecified.
OEM
Event Data 1 [7:6] – 00b = Unspecified in Event Data 2
01b = Previous state and/or severity in Event Data 2 10b = OEM code in Event Data 2
Basic Decoding of a SEL Record
Table 2: Event Request Message Event Data Field Contents
Revision 1.1 Intel order number G90620-002 7
Basic Decoding of a SEL Record
Sensor
Class
Event Data
11b = Reserved
[5:4] – 00b = Unspecified Event Data 3
01b = Reserved 10b = OEM code in Event Data 3 11b = Reserved
[3:0] – Offset from Event/Reading Type Code Event Data 2 [7:4] – Optional OEM code bits or offset from “Severity” Event/Reading Type Code (0Fh if unspecified). [3:0] – Optional OEM code or offset from Event/Reading Type Code for previous event state (0Fh if unspecified). Event Data 3 – Optional OEM code. FFh or not present if unspecified.
Byte
Field
Description
1 2 Record ID
(RID)
ID used for SEL Record access.
3
Record Type (RT)
[7:0] – Record Type C0h-DFh = OEM timestamped, bytes 8-16 OEM defined
4 5 6 7
Timestamp (TS)
Time when event was logged. LS byte first. Example: TS:[29][76][68][4C] = 4C687629h = 1281914409 = Sun, 15 Aug 2010
23:20:09 UTC Note: There are various websites that will convert the raw number to a date/time.
8 9
10
Manufacturer ID
LS Byte first. The manufacturer ID is a 20-bit value that is derived from the IANA “Private Enterprise” ID.
Most significant four bits = Reserved (0000b). 000000h = Unspecified. 0FFFFFh = Reserved. This value is binary encoded. For example the ID for the IPMI forum is 7154 decimal, which is 1BF2h, which will be
stored in this record as F2h, 1Bh, 00h for bytes 8 through 10, respectively.
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Table 3: OEM SEL Record (Type C0h-DFh)
8 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Byte
Field
Description
11 12 13 14 15 16
OEM Defined
OEM Defined. This is defined according to the manufacturer identified by the Manufacturer ID field.
Byte
Field
Description
1 2 Record ID
(RID)
ID used for SEL Record access.
3
Record Type (RT)
[7:0] – Record Type E0h-FFh = OEM system event record
4 5 6 7 8
9 10 11 12 13 14 15 16
OEM
OEM Defined. This is defined by the system integrator.
Basic Decoding of a SEL Record
Table 4: OEM SEL Record (Type E0h-FFh)
Revision 1.1 Intel order number G90620-002 9
Basic Decoding of a SEL Record
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families

2.2 Notes on SEL Logs and Collecting SEL Information

Whenever you capture the SEL log, you should always collect both the text/human readable version and the hex version. Because some of the data is OEM-specific, some utilities cannot decode the information correctly. In addition with some OEM-specific data there may be additional variables that are not decoded at all.
An example of not decoding all of the information is the BIOS timestamp synchronization event log. This event can be logged by the BIOS during POST or it can be logged by the BIOS SMI Handler when a system is requested to do a shutdown or a restart from the operating system (OS). See section 2.2.1 for examples. Most utilities report this as just a BIOS event and do not differentiate between the two. But sometimes it is useful because you can see the sequence of events better. For example if there are multiple sequences of the timestamp synchronization events, was the power lost after booting to the OS and then the system restarted, was it multiple POST events, or was it a restart from the OS?
An example of not decoding all the information is with the PCI Express* errors and some of the Power Supply events. For the PCI Express* errors the type of error and the PCI Bus, Device, and Function are all a part of Event Data 1 through Event Data 3. See section 2.2.2. For the Power Supply events when there is a failure, predictive failure, or a configuration error, Event Data 2 and Event Data 3 hold additional information that describes the Power Supplies PMBus* Command Registers and values for that particular event. See section 2.2.3.

2.2.1 Examples of Decoding BIOS Timestamp Events

The following are some samples of BIOS timestamp events during POST and during an OS shutdown.
2.2.1.1 BIOS POST Timestamp Events
RID[19][01] RT[02] TS[57][49][6A][4E] GID[01][00] ER[04] ST[12] SN[83] EDIR[6F] ED1[05] ED2[00] ED3[FF]
RID[1A][01] RT[02] TS[57][49][6A][4E] GID[01][00] ER[04] ST[12] SN[83] EDIR[6F] ED1[05] ED2[80] ED3[FF]
10 Intel order number G90620-002 Revision 1.1
RID (Record ID) = 0119h RT (Record Type) = 02h = system event record TS (Timestamp) = 4E6A4957h GID (Generator ID = 0001h = BIOS POST ER (Event Message Revision) = 04 = IPMI v2.0 ST (Sensor Type) = 12h = System Event (From IPMI Specification Table 42-3, Sensor Type Codes) SN (Sensor Number = 83h EDIR (Event Direction/Event Type) = 6fh; [7] = 0 = Assertion Event
[6:0] = 6fh = Sensor specific ED1 (Event Data 1) = 05h = Timestamp Clock Synchronization ED2 (Event Data 2) = 00h = First in pair
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Basic Decoding of a SEL Record
RID (Record ID) = 011Ah RT (Record Type) = 02h = system event record TS (Timestamp) = 4E6A4957h GID (Generator ID = 0001h = BIOS POST ER (Event Message Revision) = 04 = IPMI v2.0 ST (Sensor Type) = 12h = System Event (From IPMI Specification Table 42-3, Sensor Type Codes) SN (Sensor Number = 83h EDIR (Event Direction/Event Type) = 6fh; [7] = 0 = Assertion Event
ED1 (Event Data 1) = 05h = Timestamp Clock Synchronization ED2 (Event Data 2) = 80h = Second in pair
[6:0] = 6fh = Sensor specific
2.2.1.2 BIOS SMI Handler Timestamp Events
RID[1F][00] RT[02] TS[C3][70][8D][4F] GID[33][00] ER[04] ST[12] SN[83] EDIR[6F] ED1[05] ED2[00] ED3[FF]
RID[20][00] RT[02] TS[C4][70][8D][4F] GID[33][00] ER[04] ST[12] SN[83] EDIR[6F] ED1[05] ED2[80] ED3[FF]
RID (Record ID) = 001Fh RT (Record Type) = 02h = system event record TS (Timestamp) = 4F8D70C3h GID (Generator ID = 0033h = BIOS SMI Handler ER (Event Message Revision) = 04 = IPMI v2.0 ST (Sensor Type) = 12h = System Event (From IPMI Specification Table 42-3, Sensor Type Codes) SN (Sensor Number = 83h EDIR (Event Direction/Event Type) = 6Fh; [7] = 0 = Assertion Event
[6:0] = 6fh = Sensor specific ED1 (Event Data 1) = 05h = Timestamp Clock Synchronization ED2 (Event Data 2) = 00h = First in pair
RID (Record ID) = 0020h RT (Record Type) = 02h = system event record TS (Timestamp) = 4F8D70C4h GID (Generator ID = 0033h = BIOS SMI Handler ER (Event Message Revision) = 04 = IPMI v2.0 ST (Sensor Type) = 12h = System Event (From IPMI Specification Table 42-3, Sensor Type Codes) SN (Sensor Number = 83h EDIR (Event Direction/Event Type) = 6fh; [7] = 0 = Assertion Event
[6:0] = 6fh = Sensor specific ED1 (Event Data 1) = 05h = Timestamp Clock Synchronization ED2 (Event Data 2) = 00h = First in pair

2.2.2 Example of Decoding a PCI Express* Correctable Error Events

The following is an example of decoding a PCI Express* correctable error event. For this particular event it recorded a receiver error on Bus 0, Device 2, and Function 2. Note that correctable errors are acceptable and normal at a low rate of occurrence.
RID[27][00] RT[02] TS[0A][9B][2E][50] GID[33][00] ER[04] ST[13] SN[05] EDIR[71] ED1[A0] ED1[00] ED3[12]
RID (Record ID) = 0027h
Revision 1.1 Intel order number G90620-002 11
Basic Decoding of a SEL Record
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
RT (Record Type) = 02h = system event record TS (Timestamp) = 502E9B0Ah GID (Generator ID = 0033h = BIOS SMI Handler ER (Event Message Revision) = 04 = IPMI v2.0 ST (Sensor Type) = 13h = Critical Interrupt (From IPMI Specification Table 42-3, Sensor Type Codes) SN (Sensor Number = 05h EDIR (Event Direction/Event Type) = 71h; [7] = 0 = Assertion Event
ED1 (Event Data 1) = A0h; [7:6] = 10b = OEM code in Event Data 2
[5:4] – 10b = OEM code in Event Data 3
[3:0] – Event Trigger Offset = 0h = Receiver Error ED2 (Event Data 2) = 00h; PCI Bus number = 0 ED3 (Event Data 3) = 12h; [7:3] – PCI Device number = 02h
[2:0] – PCI Function number = 2
[6:0] = 71h = OEM Specific for PCI Express* correctable errors

2.2.3 Example of Decoding a Power Supply Predictive Failure Event

The following is an example of decoding a Power Supply predictive failure event. For this example power supply 1 saw an A/C power loss event with both the input under-voltage warning and fault events getting set. In most cases this means that the A/C power spiked under the minimum warning and fault thresholds for over 20 milliseconds but the system remained powered on. If these events continue to occur, it is advisable to check your power source.
RID[5D][00] RT[02] TS[D3][B1][AE][4E] GID[20][00] ER[04] ST[08] SN[50] EDIR[6F] ED1[A2] ED2[06] ED3[30]
RID (Record ID) = 005Dh RT (Record Type) = 02h = system event record TS (Timestamp) = 4EAEB1D3h GID (Generator ID = 0020h = BMC ER (Event Message Revision) = 04 = IPMI v2.0 ST (Sensor Type) = 08h = Power Supply (From IPMI Specification Table 42-3, Sensor Type Codes) SN (Sensor Number = 50h = Power Supply 1 EDIR (Event Direction/Event Type) = 6Fh; [7] = 0 = Assertion Event
[6:0] = 6fh = Sensor specific
ED1 (Event Data 1) = A2h; [7:6] = 10b = OEM code in Event Data 2
[5:4] – 10b = OEM code in Event Data 3
[3:0] – Event Trigger Offset = 2h = Predictive Failure ED2 (Event Data 2) = 06h = Input under-voltage warning ED3 (Event Data 3) = 30h; From PMBus* Specification STATUS_INPUT command
[5] – VIN_UV_WARNING (Input Under-voltage Warning) = 1
[4] – VIN_UV_FAULT (Input Under-voltage Fault) = 1
12 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Sensor
Number
Sensor Name
Details Section
Next Steps
01h
Power Unit Status (Pwr Unit Status)
Power Unit Status Sensor
Table 16: Power Unit Status Sensor – Sensor Specific Offsets – Next Steps
02h
Power Unit Redundancy (Pwr Unit Redund)
Power Unit Redundancy Sensor
Table 18: Power Unit Redundancy Sensor – Event Trigger Offset – Next Steps
03h
IPMI Watchdog (IPMI Watchdog)
IPMI Watchdog
Table 78: IPMI Watchdog Sensor Event Trigger Offset – Next Steps
04h
Physical Security (Physical Scrty)
Physical Security
Table 74: Physical Security Sensor Event Trigger Offset – Next Steps
05h
FP Interrupt (FP NMI Diag Int)
FP (NMI) Interrupt
FP (NMI) Interrupt – Next Steps
06h
SMI Timeout (SMI Timeout)
SMI Timeout
SMI Timeout – Next Steps
07h
System Event Log (System Event Log)
System Event Log Cleared
Not applicable
08h
System Event (System Event)
System Event – PEF Action
System Event – PEF Action – Next Steps
09h
Button Sensor (Button)
Button Sensor
Not applicable
Sensor Cross Reference List

3. Sensor Cross Reference List

This section contains a cross reference to help find details on any specific SEL entry.

3.1 BMC owned Sensors (GID = 0020h)

The following table can be used to find the details of sensors owned by the BMC.
Table 5: BMC owned Sensors
Revision 1.1 Intel order number G90620-002 13
Sensor Cross Reference List
Sensor
Number
Sensor Name
Details Section
Next Steps
0Ah
BMC Watchdog (BMC Watchdog)
BMC Watchdog Sensor
BMC Watchdog Sensor – Next Steps
0Bh
Voltage Regulator Watchdog (VR Watchdog)
Voltage Regulator Watchdog Timer Sensor
Voltage Regulator Watchdog Timer Sensor – Next Steps
0Ch
Fan Redundancy (Fan Redundancy)
Fan Presence and Redundancy Sensors
Table 34: Fan Redundancy Sensor – Event Trigger Offset – Next Steps
0Dh
SSB Thermal Trip (SSB Thermal Trip)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
0Eh
IO Module Presence (IO Mod Presence)
Add-In Module Presence Sensor
Add-In Module Presence – Next Steps
0Fh
SAS Module Presence (SAS Mod Presence)
Add-In Module Presence Sensor
Add-In Module Presence – Next Steps
10h
BMC Firmware Health (BMC FW Health)
BMC FW Health Sensor
BMC FW Health Sensor – Next Steps
11h
System Airflow (System Airflow)
System Air Flow Monitoring Sensor
Not applicable
12h
Firmware Update Status (FW Update Status)
Firmware Update Status Sensor
Not applicable
13h
IO Module2 Presence (IO Mod2 Presence)
Add-In Module Presence Sensor
Add-In Module Presence – Next Steps
14h
Baseboard Temperature 5 (Platform Specific)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
15h
Baseboard Temperature 6 (Platform Specific)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
16h
IO Module2 Temperature (I/O Mod2 Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
17h
PCI Riser 3 Temperature (PCI Riser 3 Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
14 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Sensor
Number
Sensor Name
Details Section
Next Steps
18h
PCI Riser 4 Temperature (PCI Riser 4 Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
19h
Baseboard +1.05V Processor3 Vccp
(BB +1.05Vccp P3)
Threshold-based Voltage Sensors
Table 13: Threshold-based Voltage Sensors – Next Steps
1Ah
Baseboard +1.05V Processor4 Vccp
(BB +1.05Vccp P4)
Threshold-based Voltage Sensors
Table 13: Threshold-based Voltage Sensors – Next Steps
20h
Baseboard Temperature 1 (Platform Specific)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
21h
Front Panel Temperature (Front Panel Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
22h
SSB Temperature (SSB Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
23h
Baseboard Temperature 2 (Platform Specific)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
24h
Baseboard Temperature 3 (Platform Specific)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
25h
Baseboard Temperature 4 (Platform Specific)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
26h
IO Module Temperature (I/O Mod Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
27h
PCI Riser 1 Temperature (PCI Riser 1 Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
28h
IO Riser Temperature (IO Riser Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
29h–2Bh
Hot-Swap Back Plane 1-3 Temperature
(HSBP 1-3 Temp)
HSC Backplane Temperature Sensor
Table 88: HSC Backplane Temperature Sensor – Event Trigger Offset – Next Steps
Sensor Cross Reference List
Revision 1.1 Intel order number G90620-002 15
Sensor Cross Reference List
Sensor
Number
Sensor Name
Details Section
Next Steps
2Ch
PCI Riser 2 Temperature (PCI Riser 2 Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
2Dh
SAS Module Temperature (SAS Mod Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
2Eh
Exit Air Temperature (Exit Air Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
2Fh
Network Interface Controller Temperature
(LAN NIC Temp)
Threshold-based Temperature Sensors
Table 37: Temperature Sensors – Next Steps
30h–3Fh
Fan Tachometer Sensors (Chassis specific sensor names)
Fan Tachometer Sensors
Table 30: Fan Tachometer Sensor – Event Trigger Offset – Next Steps
40h–4Fh
Fan Present Sensors (Fan x Present)
Fan Presence and Redundancy Sensors
Table 32: Fan Presence Sensors – Event Trigger Offset – Next Steps
50h
Power Supply 1 Status (PS1 Status)
Power Supply Status Sensors
Table 16: Power Unit Status Sensor – Sensor Specific Offsets – Next Steps
51h
Power Supply 2 Status (PS2 Status)
Power Supply Status Sensors
Table 16: Power Unit Status Sensor – Sensor Specific Offsets – Next Steps
54h
Power Supply 1 AC Power Input (PS1 Power In)
Power Supply Power In Sensors
Table 23: Power Supply Power In Sensor – Event Trigger Offset – Next Steps
55h
Power Supply 2 AC Power Input (PS2 Power In)
Power Supply Power In Sensors
Table 23: Power Supply Power In Sensor – Event Trigger Offset – Next Steps
58h
Power Supply 1 +12V % of Maximum Current Output
(PS1 Curr Out %)
Power Supply Current Out % Sensors
Table 25: Power Supply Current Out % Sensor – Event Trigger Offset – Next Steps
59h
Power Supply 2 +12V % of Maximum Current Output
(PS2 Curr Out %)
Power Supply Current Out % Sensors
Table 25: Power Supply Current Out % Sensor – Event Trigger Offset – Next Steps
5Ch
Power Supply 1 Temperature (PS1 Temperature)
Power Supply Temperature Sensors
Table 27: Power Supply Temperature Sensor – Event Trigger Offset – Next Steps
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
16 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Sensor
Number
Sensor Name
Details Section
Next Steps
5Dh
Power Supply 2 Temperature (PS2 Temperature)
Power Supply Temperature Sensors
Table 27: Power Supply Temperature Sensor – Event Trigger Offset – Next Steps
60h-68h
Hard Disk Drive 15 – 23 Status (HDD 15 – 23 Status)
Hard Disk Drive Monitoring Sensor
Table 90: Hard Disk Drive Monitoring Sensor - Event Trigger Offset – Next Steps
69h-6Bh
Hot-Swap Controller 1-3 Status (HSC1 – 3 Status)
Hot-Swap Controller Health Sensor
HSC Health Sensor – Next Steps
70h
Processor 1 Status (P1 Status)
Processor Status Sensor
Table 48: Processor Status Sensors – Next Steps
71h
Processor 2 Status (P2 Status)
Processor Status Sensor
Table 48: Processor Status Sensors – Next Steps
72h
Processor 3 Status (P3 Status)
Processor Status Sensor
Table 48: Processor Status Sensors – Next Steps
73h
Processor 4 Status (P4 Status)
Processor Status Sensor
Table 48: Processor Status Sensors – Next Steps
74h
Processor 1 Thermal Margin (P1 Therm Margin)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
75h
Processor 2 Thermal Margin (P2 Therm Margin)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
76h
Processor 3 Thermal Margin (P3 Therm Margin)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
77h
Processor 4 Thermal Margin (P4 Therm Margin)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
78h-7Bh
Processor 1 – 3 Thermal Control % (P1 – P4 Therm Ctrl %)
Processor Thermal Control Sensors
Processor Thermal Control % Sensors – Next Steps
7Ch
Processor 1 ERR2 Timeout (P1 ERR2)
Processor ERR2 Timeout Sensor
Processor ERR2 Timeout – Next Steps
7Dh
Processor 2 ERR2 Timeout (P2 ERR2)
Processor ERR2 Timeout Sensor
Processor ERR2 Timeout – Next Steps
Sensor Cross Reference List
Revision 1.1 Intel order number G90620-002 17
Sensor Cross Reference List
Sensor
Number
Sensor Name
Details Section
Next Steps
7Eh
Processor 3 ERR2 Timeout (P3 ERR2)
Processor ERR2 Timeout Sensor
Processor ERR2 Timeout – Next Steps
7Fh
Processor 4 ERR2 Timeout (P4 ERR2)
Processor ERR2 Timeout Sensor
Processor ERR2 Timeout – Next Steps
80h
Catastrophic Error (CATERR)
Catastrophic Error Sensor
Table 50: Catastrophic Error Sensor – Event Data 2 Values – Next Steps
81h
Processor 1 MSID Mismatch (P1 MSID Mismatch)
Processor MSID Mismatch Sensor
Processor MSID Mismatch Sensor – Next Steps
82h
Processor Population Fault (CPU Missing)
CPU Missing Sensor
CPU Missing Sensor – Next Steps
83h-86h
Processor 1 – 4 DTS Thermal Margin
(P1 – P4 DTS Therm Mgn)
Processor DTS Thermal Margin Sensors
Not applicable
87h
Processor 2 MSID Mismatch (P2 MSID Mismatch)
Processor MSID Mismatch Sensor
Processor MSID Mismatch Sensor – Next Steps
88h
Processor 3 MSID Mismatch (P3 MSID Mismatch)
Processor MSID Mismatch Sensor
Processor MSID Mismatch Sensor – Next Steps
89h
Processor 4 MSID Mismatch (P4 MSID Mismatch)
Processor MSID Mismatch Sensor
Processor MSID Mismatch Sensor – Next Steps
90h
Processor 1 VRD Temp (P1 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
91h
Processor 2 VRD Temp (P2 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
92h
Processor 3 VRD Temp (P3 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
93h
Processor 4 VRD Temp (P4 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
18 Intel order number G90620-002 Revision 1.1
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
Sensor
Number
Sensor Name
Details Section
Next Steps
94h
Processor 1 Memory VRD Hot 0-1 (P1 Mem01 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
95h
Processor 1 Memory VRD Hot 2-3 (P1 Mem23 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
96h
Processor 2 Memory VRD Hot 0-1 (P2 Mem01 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
97h
Processor 2 Memory VRD Hot 2-3 (P2 Mem23 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
98h
Processor 3 Memory VRD Hot 0-1 (P3 Mem01 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
99h
Processor 3 Memory VRD Hot 2-3 (P4 Mem23 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
9Ah
Processor 4 Memory VRD Hot 0-1 (P4 Mem01 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
9Bh
Processor 4 Memory VRD Hot 2-3 (P4 Mem23 VRD Hot)
Discrete Thermal Sensors
Table 45: Discrete Thermal Sensors – Next Steps
A0h
Power Supply 1 Fan Tachometer 1 (PS1 Fan Tach 1)
Power Supply Fan Tachometer Sensors
Power Supply Fan Tachometer Sensors – Next Steps
A1h
Power Supply 1 Fan Tachometer 2 (PS1 Fan Tach 2)
Power Supply Fan Tachometer Sensors
Power Supply Fan Tachometer Sensors – Next Steps
A2h
Intel® Xeon Phi™ Coprocessor Status 1
(MIC 1 Status)
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors Next Steps
A3h
Intel® Xeon Phi™ Coprocessor Status 2
(MIC 2 Status)
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors Next Steps
A4h
Power Supply 2 Fan Tachometer 1 (PS2 Fan Tach 1)
Power Supply Fan Tachometer Sensors
Power Supply Fan Tachometer Sensors – Next Steps
Sensor Cross Reference List
Revision 1.1 Intel order number G90620-002 19
Sensor Cross Reference List
Sensor
Number
Sensor Name
Details Section
Next Steps
A5h
Power Supply 2 Fan Tachometer 2 (PS2 Fan Tach 2)
Power Supply Fan Tachometer Sensors
Power Supply Fan Tachometer Sensors – Next Steps
A6h
Intel® Xeon Phi™ Coprocessor Status 3
(MIC 3 Status)
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors Next Steps
A7h
Intel® Xeon Phi™ Coprocessor Status 4
(MIC 4 Status)
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors
Intel® Xeon Phi™ Coprocessor (MIC) Status Sensors Next Steps
B0h
Processor 1 DIMM Aggregate Thermal Margin 1
(P1 DIMM Thrm Mrgn1)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
B1h
Processor 1 DIMM Aggregate Thermal Margin 2
(P1 DIMM Thrm Mrgn2)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
B2h
Processor 2 DIMM Aggregate Thermal Margin 1
(P2 DIMM Thrm Mrgn1)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
B3h
Processor 2 DIMM Aggregate Thermal Margin 2
(P2 DIMM Thrm Mrgn2)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
B4h
Processor 3 DIMM Aggregate Thermal Margin 1
(P3 DIMM Thrm Mrgn1)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
B5h
Processor 3 DIMM Aggregate Thermal Margin 2
(P3 DIMM Thrm Mrgn2)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
B6h
Processor 4 DIMM Aggregate Thermal Margin 1
(P4 DIMM Thrm Mrgn1)
Thermal Margin Sensors
Table 40: Thermal Margin Sensors – Next Steps
System Event Log Troubleshooting Guide for EPSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600/1400 Product Families
20 Intel order number G90620-002 Revision 1.1
Loading...
+ 111 hidden pages