Dell PowerEdge 3250 Product Manual

Intel® Server Platform SR870BH2
Field Error Reference Guide
Revision 1.1
March 2004
Enterprise Platforms and Services Division
Revision History Intel® Server Platform SR870BH2
Revision History
Date Revision
Number
03/2003 0.5 Initial Release.
07/2003 1.0 Production Update
03/2004 1.1 Update
Modifications
Disclaimers
THIS TEST REPORT IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.
Information in this document is provided in connection with Intel or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.
®
products. No license, express
Intel retains the right to make changes to its test specifications at any time, without notice.
The hardware vendor remains solely responsible for the design, sale and functionality of its product, including any liability arising from product infringement or product warranty.
Intel and Itanium are registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Copyright © Intel Corporation 2004. *Other names and brands may be claimed as the property of others.
Revision 1.1
ii
Intel® Server Platform SR870BH2 Table of Contents
Table of Contents
1. Introduction ..........................................................................................................................1
2. SEL Overview .......................................................................................................................2
3. EFI-Based SELViewer Task ................................................................................................. 3
4. SR870BH2 SEL Data Tables ................................................................................................ 4
5. SR870BH2 Machine Check Error Handling ........................................................................ 7
5.1 Classification of Errors..................................................................................................... 7
5.2 Error Types ......................................................................................................................7
5.3 Error Signaling .................................................................................................................8
5.4 Error Reporting ................................................................................................................ 9
5.5 Thresholding .................................................................................................................. 10
5.6 SEL Event Log Format for Machine Check Errors......................................................... 10
6. SR870BH2 PCI Device IDs ................................................................................................. 12
7. BIOS POST Error Codes and Messages........................................................................... 13
7.1 Error Code Classification ............................................................................................... 13
8. Debug Methodology and Failure Isolation....................................................................... 18
8.1 Memory..........................................................................................................................18
8.1.1 Memory Debug Methodology ................................................................................... 18
8.1.2 Memory Component Isolation .................................................................................. 18
8.2 Processor.......................................................................................................................19
8.2.1 Processor Debug Methodology................................................................................ 19
8.2.2 Processor Component Isolation ............................................................................... 19
8.3 Processor - Late Self-test .............................................................................................. 20
8.3.1 Late Self-test Display ............................................................................................... 20
8.3.2 Late Self-test Usage Notes ...................................................................................... 21
8.4 Watch Dog Timer........................................................................................................... 21
8.4.1 Watch dog timer Debug Methodology...................................................................... 21
8.4.2 Watchdog Timer Failure Isolation ............................................................................21
8.5 Fault Resilient Boot (FRB) ............................................................................................. 21
8.5.1 FRB3 – BSP Reset Failures..................................................................................... 22
8.5.2 FRB2 – BSP POST Failures .................................................................................... 22
8.5.3 FRB1 – BSP Self-Test Failures................................................................................ 22
8.5.4 FRB Debug Methodology......................................................................................... 22
Revision 1.1
iii
Table of Contents Intel® Server Platform SR870BH2
8.5.5 FRB Failure Isolation................................................................................................ 23
9. POST Codes........................................................................................................................ 24
9.1 North and South Port 80/81 Cards ................................................................................24
10. Beep Codes......................................................................................................................... 25
10.1 Recovery Beep Codes................................................................................................ 26
10.2 BMC Beep Code Generation...................................................................................... 26
11. Clearing CMOS and BIOS Recovery ................................................................................. 27
11.1 CMOS Clear ............................................................................................................... 27
11.2 BIOS Recovery Mode................................................................................................. 28
Glossary........................................................................................................................................ I
Appendix B: Reference Documents......................................................................................... III
Revision 1.1
iv
Intel® Server Platform SR870BH2 List of Figures
List of Figures
Figure 1. SEL Viewer .................................................................................................................... 3
List of Tables
Table 1. SR870BH2 Generator ID Codes..................................................................................... 4
Table 2. SR870BH2 Sensor Codes .............................................................................................. 4
Table 3. SAL 3.0 MCA Records....................................................................................................9
Table 4. SEL Event Logs for Machine Check Errors ..................................................................11
Table 5. Onboard PCI Devices and Slots ................................................................................... 12
Table 6. Error Code Classification ..............................................................................................13
Table 7. Error Beep Codes ......................................................................................................... 25
Table 8. Recovery Mode Beep Codes ........................................................................................ 26
Table 9. BMC Beep Codes .........................................................................................................26
Revision 1.1
v
List of Tables Intel® Server Platform SR870BH2
This page intentionally left blank
Revision 1.1
vi
Intel® Server Platform SR870BH2 Introduction
1. Introduction
This document was designed to familiarize the field technician with the error handling architecture for the Intel® Server Platform SR870BH2 and to provide a quick reference to aid in the diagnosis of system failures.
It presents an overview of applicable EFI based system Management Utilities (SMU), the System Error Log (SEL), Machine Check error handling Architecture (MCA) and error messaging. In additon, many of the error messages have been mapped to a possible point of failure and will include a brief comment regarding debug methodology.
The document is organized by the following chapters:
1. Introduction
2. SEL overview
3. EFI-Based SEL viewer task
4. SR870BH2 SEL data tables
5. SR870BH2 machine check error handling
6. SR870BH2 PCI device IDs
7. BIOS POST error codes and messages
8. Debug methodology and failure isolation
9. POST codes
10. Beep codes
11. Clearing CMOS and BIOS recovery
Revision 1.1
1
SEL Overview Intel® Server Platform SR870BH2
2. SEL Overview
The System Event Log (SEL) is a non-volatile repository for event messages. Event messages contain information about system events and anomalies that occur on the server, BIOS, and event generators. System sensors can also trigger events that are logged in the SEL.
Some event messages are the result of normal events, such as a normal server boot, or possible minor problems such as a disconnected keyboard. Other events may indicate internal failures such as a component over-temp condition where thresholds, or ranges of acceptable values have been exceeded. As with other system events, if at any time a component crosses one of these defined thresholds, an event message will be generated.
Regardless of the event, the appropriate management controller generates an event message. Event messages are passed to the Baseboard Management Controller (BMC), the primary management controller on Intel® SEL where it becomes available for querying by the SEL Viewer utility.
The SEL Viewer provides an interface for the server administrator to view information in the SEL. The SEL Viewer is available through the Intel® Server Management (ISM) or the EFI based SEL Viewer utility which is available in the System Management Utility (SMU) that ships on the standard platform resource CD. The system administrator can use this information to monitor the server for warnings and potential critical problems.
server systems. The BMC passes the event message to the
Revision 1.1
2
Intel® Server Platform SR870BH2 EFI-Based SELViewer Task
3. EFI-Based SELViewer Task
The EFI based SEL Viewer task is only available on the Local version of the SMU. This task is not available when running the remote version. The EFI SEL Viewer provides support for the user to perform the following:
Examine all SEL entries stored in the non-volatile storage area of the server in text form
or in hexadecimal.
Examine previously stored SEL entries from a file in text form or in hexadecimal. Save the SEL entries to a file. Clear the SEL entries from the non-volatile storage area. Sort the SEL records by various fields such as timestamp, sensor type number, event
description, and generator ID.
Five columns of SEL data can be viewed from the EFI SEL Viewer Utility:
o Number of Event o Time Stamp o Sensor Type and Number o Event Description o Generator ID
Revision 1.1
Figure 1. SEL Viewer
3
SR870BH2 SEL Data Tables Intel® Server Platform SR870BH2
4. SR870BH2 SEL Data Tables
The tables in this section provide information on the data provided by the SEL Viewer utility.
Table 1. SR870BH2 Generator ID Codes
Generator ID Generator
20 00 BMC
CO 00 HSC
0x31 00 –0x3F 00 System BIOS or system software
Table 2. SR870BH2 Sensor Codes
Sensor Type Sensor Number Sensor Name
01
02
20h Memory Board Temp
21h Memory Board SNC Temp
22h PCI Riser SIOH Temp
23h Peripheral Board AMB Temp
24h PCI Riser board Temp
25h CPU Area Temp
26h Memory Area Temp
81h Processor 1 Temp
82h Processor 2 Temp
10h MB Bd +1.25V
11h MB Bd +1.5V
12h MB Bd +1.8V
13h MB Bd +3.3V
14h MB Bd +3.3V SB
15h MB Bd +5V
16h MB Bd +12V
17h MB Bd –12V
18h MB Bd +1.2V
19h MB Bd +1.3V
1Ah MB Bd –1.5V SB
1Bh MB Bd +2.5V
1Ch MB Bd +2.5V SB
1Dh MB Bd 2 +5V SB
50h LVDS SCSI channel 1 terminator 1
51h LVDS SCSI channel 1 terminator 2
52h LVDS SCSI channel 1 terminator 3
53h LVDS SCSI channel 2 terminator 1
54h LVDS SCSI channel 2 terminator 2
Temperature
Voltage
Revision 1.1
4
Intel® Server Platform SR870BH2 SR870BH2 SEL Data Tables
Sensor Type Sensor Number Sensor Name
04
05
06
07
08
09
0D
0F
10
12
13
15
55h LVDS SCSI channel 2 terminator 3
86h Proc 1 Power Pod Good
87h Proc 2 Power Pod Good
30h Tach Fan 1
31h Tach Fan 2
32h Tach Fan 3
33h Tach Fan 4
34h Tach Fan 5
35h Tach Fan 6
70h Fan 1 Present
71h Fan 2 Present
72h Fan 3 Present
73h Fan 4 Present
74h Fan 5 Present
75h Fan 6 Present
05h LAN Leash Lost
04h Platform Security Violation
80h Proc 1 Status
81h Proc 2 Status
60h Power Supply 1
61h Power Supply 2
62h Power Supply 3
01h Power Unit Status
02h Power Unit Redundancy
01h SCSI BP Temperature
02h Hot Swap Drive 1 Status
03h Hot Swap Drive 2 Status
05h Hot Swap Drive 1 Present
06h Hot Swap Drive 2 Present
06h POST Error
09h Event Logging Disabled
12h OEM System Boot Event PEF Action
07h FP Diag Interrupt (Front Panel SD Init)
Fan
Physical Security
Security Violation Attempt
Processor
Power Supply
Power Unit
Hot Swap Drive Sensors
POST Error
Event Logging
System Event
Critical Interrupt
Module / Board
Revision 1.1
5
SR870BH2 SEL Data Tables Intel® Server Platform SR870BH2
Sensor Type Sensor Number Sensor Name
77h System Board Interlock
23
C7
03h BMC Watchdog2
40h Fan Boost Mem Board Temp
41h Fan Boost Mem Board SNC Temp
42h Fan Boost PCI Riser SIOH Temp
43h Fan Boost Peripheral Board AMB Temp
44h Fan Boost PCI Riser Board Temp
45h Fan Boost CPU Area Temp
46h Fan Boost Mem Area Temp
84h Fan Boost Proc 1 Temp
85h Fan Boost Proc 2 Temp
Watchdog
OEM
Revision 1.1
6
Loading...
+ 25 hidden pages