Sun Microsystems X4240, X4440, X4140 User Manual

Sun Fire™ X4140, X4240, and X4440
Servers Diagnostics Guide
Sun Microsystems, Inc. www.sun.com
Part No. 820-3067-11 August 2008, Revision A
Submit comments about this document at: http://www.sun.com/hwdocs/feedback
Copyright ©2008 SunMicrosystems, Inc.,4150 NetworkCircle, Santa Clara, California 95054, U.S.A. All rights reserved.
Unpublished -rights reservedunder theCopyright Lawsof theUnited States.
THIS PRODUCTCONTAINSCONFIDENTIAL INFORMATION ANDTRADE SECRETSOF SUNMICROSYSTEMS, INC.USE, DISCLOSURE ORREPRODUCTION ISPROHIBITED WITHOUTTHE PRIOREXPRESS WRITTENPERMISSION OFSUN MICROSYSTEMS, INC.
This distributionmay includematerials developedby thirdparties.
Sun, SunMicrosystems, theSun logo,Java, Solaris,Sun Fire 4140, SunFire 4240 and Sun Fire 4440 aretrademarks orregistered trademarks of Sun Microsystems,Inc. inthe U.S.and othercountries.
AMD Opteronand Opteron are trademarks ofAdvanced Micro Devices, Inc.. Intel isa registered trademark ofIntel Corporation.
This productis covered and controlled by U.S.Export Control laws andmay besubject tothe exportor importlaws inother countries.Nuclear, missile, chemicalbiological weaponsor nuclearmaritime enduses orend users,whether director indirect, are strictly prohibited. Export or reexport tocountries subjectto U.S.embargo or to entities identified on U.S. export exclusion lists, including, but not limited to, the denied persons andspecially designatednationals listsis strictlyprohibited.
Use ofany spareor replacement CPUs islimited torepair or one-for-one replacement ofCPUs inproducts exportedin compliancewith U.S. export laws.Use ofCPUs asproduct upgradesunless authorizedby theU.S. Governmentis strictlyprohibited.
Copyright ©2008 SunMicrosystems, Inc.,4150 NetworkCircle, Santa Clara, California 95054, Etats-Unis. Tous droitsréservés.
Non publie- droitsréservés selonla législationdes Etats-Unissur ledroit d'auteur.
CE PRODUITCONTIENT DESINFORMATIONSCONFIDENTIELLES ETDES SECRETSCOMMERCIAUX DESUN MICROSYSTEMS,INC. SON UTILISATION, SA DIVULGATIONET SAREPRODUCTION SONTINTERDITES SANSL AUTORISATION EXPRESSE,ECRITE ET PREALABLE DESUN MICROSYSTEMS,INC.
Cette distributionpeut incluredes éléments développés par des tiers .
Sun, SunMicrosystems, lelogo Sun,Java, Solariset SunFire 4140, Sun Fire 4240, and Sun Fire 4440 sont des marquesde fabriqueou des marques déposéesde SunMicrosystems, Inc. aux Etats-Unis et dans d'autres pays.
AMD Opteronet Opteron sont marques déposéesde AdvancedMicro Devices, Inc. Intel est unemarque déposée de Intel Corporation
Ce produitest soumisà lalégislation américainesur lecontrôle desexportations etpeut être soumis àla règlementationen vigueurdans d'autres paysdans ledomaine desexportations etimportations. Lesutilisations finales,ou utilisateursfinaux, pourdes armesnucléaires, des missiles, desarmes biologiqueset chimiquesou dunucléaire maritime,directement ou indirectement, sont strictement interdites. Les exportations oureexportations versles payssous embargo américain, ouvers desentités figurantsur leslistes d'exclusiond'exportation américaines, ycompris, maisde manierenon exhaustive,la listede personnesqui fontobjet d'unordre de ne pas participer, d'une façon directe ou indirecte,aux exportationsdes produits ou desservices quisont régispar lalégislation américainesur lecontrôle desexportations etla liste de ressortissantsspécifiquement désignés,sont rigoureusement interdites.
L'utilisation de pièces détachées ou d'unités centrales de remplacement est limitée aux réparations ou à l'échange standard d'unités centrales pour les produits exportés, conformément à la législation américaine en matière d'exportation. Sauf autorisation par les autorités des Etats­Unis, l'utilisation d'unités centrales pour procéder à des mises à jour de produits est rigoureusement interdite.
Please

Contents

Preface vii
1. Initial Inspection of the Server 1
Service Troubleshooting Flowchart 1
Gathering Service Information 2
System Inspection 3
Troubleshooting Power Problems 3
Externally Inspecting the Server 3
Internally Inspecting the Server 4
2. Using SunVTS Diagnostic Software 7
Running SunVTS Diagnostic Tests 7
SunVTS Documentation 8
Diagnosing Server Problems With the Bootable Diagnostics CD 8
Requirements 8
Using the Bootable Diagnostics CD 9
3. Troubleshooting DIMM Problems 11
DIMM Population Rules 11
DIMM Replacement Policy 12
How DIMM Errors Are Handled by the System 12
iii
Uncorrectable DIMM Errors 12
Correctable DIMM Errors 14
BIOS DIMM Error Messages 15
DIMM Fault LEDs 15
Isolating and Correcting DIMM ECC Errors 18
A. Event Logs and POST Codes 21
Viewing Event Logs 21
Power-On Self-Test (POST) 25
How BIOS POST Memory Testing Works 25
Redirecting Console Output 26
Changing POST Options 28
POST Codes 31
POST Code Checkpoints 33
B. Status Indicator LEDs 37
External Status Indicator LEDs 37
Front Panel LEDs 38
Back Panel LEDs 38
Hard Drive LEDs 39
Internal Status Indicator LEDs 39
C. Using the ILOM Service Processor GUI to View System Information 43
Making a Serial Connection to the SP 44
Viewing ILOM SP Event Logs 45
Interpreting Event Log Time Stamps 47
Viewing Replaceable Component Information 48
Viewing Sensors 50
D. Error Handling 53
iv Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Handling of Uncorrectable Errors 53
Handling of Correctable Errors 56
Handling of Parity Errors (PERR) 59
Handling of System Errors (SERR) 61
Handling Mismatching Processors 63
Hardware Error Handling Summary 64
Index 69
Contents v
vi Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008

Preface

The Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide contains information and procedures for using available tools to diagnose problems with the servers.
Before You Read This Document
It is important that you review the safety guidelines in the Sun Fire X4140, X4240,
and X4440 Safety and Compliance Guide.
vii
Related Documentation
The document set for the Sun Fire X4140, X4240, and X4440 Servers is described in the Where To Find Sun Fire X4140, X4240, and X4440 Servers Documentation sheet that is packed with your system. You can also find the documentation at
http://docs.sun.com.
Translated versions of some of these documents are available at
http://docs.sun.com. Select a language from the drop-down list and navigate to
the Sun Fire X4140, X4240, and X4440 Servers document collection using the Product category link. Available translations for the Sun Fire X4140, X4240, and X4440 Servers include Simplified Chinese, Traditional Chinese, French, Japanese, and Korean.
English documentation is revised more frequently and might be more up-to-date than the translated documentation. For all Sun documentation, go to the following URL:
http://docs.sun.com
viii Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Typographic ConventionsThird-Party
Typeface
AaBbCc123 The names of commands, files,
AaBbCc123 What you type, when contrasted
AaBbCc123 Book titles, new words or terms,
* The settings on your browser might differ from these settings.
*
Meaning Examples
Edit your.login file.
and directories; onscreen computer output
with onscreen computer output
words to be emphasized. Replace command-line variables with real names or values.
Use ls -a to list all files. % You have mail.
su
% Password:
Read Chapter 6 in the User’s Guide.
These are called class options.
Yo u must be superuser to do this. To delete a file, type rm filename.
Web Sites
Sun™is not responsible for the availability of third-party web sites mentioned in this document. Sun does not endorse and is not responsible or liable for any content, advertising, products, or other materials that are available on or through such sites or resources. Sun will not be responsible or liable for any actual or alleged damage or loss caused by or in connection with the use of or reliance on any such content, goods, or services that are available on or through such sites or resources.
Preface ix
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by going to:
http://www.sun.com/hwdocs/feedback
Please include the title and part number of your document with your feedback:
Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide, part number 820-3067-11
x Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
CHAPTER
1

Initial Inspection of the Server

This chapter includes the following topics:
“Service Troubleshooting Flowchart” on page 1
“Gathering Service Information” on page 2
“System Inspection” on page 3

Service Troubleshooting Flowchart

Use the following flowchart as a guideline for using the subjects in this book to troubleshoot the server.
TABLE 1-1 Troubleshooting Flowchart
To perform this task Refer to this section
Gather initial service information. “Gathering Service Information” on page 2
Investigate any powering-on problems.
Perform external visual inspection and internal visual inspection.
View BIOS event logs and POST messages.
“Troubleshooting Power Problems” on page 3
“Externally Inspecting the Server” on page 3
“Internally Inspecting the Server” on page 4
Chapter 3
“Viewing Event Logs” on page 21
“Power-On Self-Test (POST)” on page 25
1
TABLE 1-1 Troubleshooting Flowchart (Continued)
To perform this task Refer to this section
View service processor logs and sensor information...
...or view service processor logs and sensor information.
Run SunVTS diagnostics “Diagnosing Server Problems With the Bootable
“Using the ILOM Service Processor GUI to View System Information” on page 43
“Using IPMItool to View System Information” on page 55
Diagnostics CD” on page 8

Gathering Service Information

The first step in determining the cause of a problem with the server is to gather information from the service-call paperwork or the onsite personnel. Use the following general guideline steps when you begin troubleshooting.
To gather service information:
1. Collect information about the following items:
Events that occurred prior to the failure
Whether any hardware or software was modified or installed
Whether the server was recently installed or moved
How long the server exhibited symptoms
The duration or frequency of the problem
2. Document the server settings before you make any changes.
If possible, make one change at a time in order to isolate potential problems. In this way, you can maintain a controlled environment and reduce the scope of troubleshooting.
3. Take note of the results of any change that you make. Include any errors or informational messages.
4. Check for potential device conflicts before you add a new device.
5. Check for version dependencies, especially with third-party software.
2 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008

System Inspection

Controls that have been improperly set and cables that are loose or improperly connected are common causes of problems with hardware components.

Troubleshooting Power Problems

If the server will power on, skip this section and go to “Externally Inspecting the
Server” on page 3.
If the server will not power on, check the following:
1. Check that AC power cords are attached firmly to the server’s power supplies and to the AC sources.
2. Check that the main cover is firmly in place.
There is an intrusion switch on the motherboard that automatically shuts down the server power to standby mode when the cover is removed.

Externally Inspecting the Server

To perform a visual inspection of the external system:
1. Inspect the external status indicator LEDs, which can indicate component malfunction.
For the LED locations and descriptions of their behavior, see “External Status
Indicator LEDs” on page 37.
2. Verify that nothing in the server environment is blocking air flow or making a contact that could short out power.
3. If the problem is not evident, continue with the next section, “Internally
Inspecting the Server” on page 4.
Chapter 1 Initial Inspection of the Server 3

Internally Inspecting the Server

To perform a visual inspection of the internal system:
1. Choose a method for shutting down the server from main power mode to standby power mode. See
Graceful shutdown – Use a ballpoint pen or other stylus to press and release
the Power button on the front panel. This causes Advanced Configuration and Power Interface (ACPI) enabled operating systems to perform an orderly shutdown of the operating system. Servers not running ACPI-enabled operating systems will shut down to standby power mode immediately.
Emergency shutdown – Use a ballpoint pen or other stylus to press and hold
the Power button for four seconds to force main power off and enter standby power mode.
Caution – Performing an emergency shutdown can cause open files to become
corrupt. Use an emergency shutdown only when necessary.
When main power is off, the Power/OK LED on the front panel will begin flashing, indicating that the server is in standby power mode.
Caution – When you use the Power button to enter standby power mode, power is
still directed to service processor and power supply fans, indicated when the Power/OK LED is flashing. To completely power off the server, you must disconnect the AC power cords from the back panel of the server.
FIGURE 1-1 and FIGURE 1-2.
FIGURE 1-1 X4140 Server Front Panel
Locate Button/LED
PowerButton
4 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
FIGURE 1-2 X4440 Server Front Panel
Locate Button/LED
Power Button
2. Remove the server cover.
For instructions on removing the server cover, refer to your server’s service manual.
3. Inspect the internal status indicator LEDs. These can indicate component malfunction.
For the LED locations and descriptions of their behavior, see “Internal Status
Indicator LEDs” on page 39.
Note – The server must be in standby power mode for viewing the internal LEDs.
You can hold down the Locate button on the server back panel or front panel for 5 seconds to initiate a “push-to-test” mode that illuminates all other LEDs both inside and outside of the chassis for 15 seconds.
4. Verify that there are no loose or improperly seated components.
5. Verify that all cable connectors inside the system are firmly and correctly attached to their appropriate connectors.
6. Verify that any after-factory components are qualified and supported.
For a list of supported PCI cards and DIMMs, refer to your server’s service manual.
7. Check that the installed DIMMs comply with the supported DIMM population rules and configurations, as described in “DIMM Population Rules” on page 11.
8. Replace the server cover.
9. To restore the server to main power mode (all components powered on), use a ballpoint pen or other stylus to press and release the Power button on the server front panel. See
FIGURE 1-1 and FIGURE 1-2.
When main power is applied to the full server, the Power/OK LED next to the Power button lights and remains lit.
Chapter 1 Initial Inspection of the Server 5
10. If the problem with the server is not evident, you can obtain additional information by viewing the power-on self test (POST) messages and BIOS event logs during system startup. Continue with “Viewing Event Logs” on
page 21.
6 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
CHAPTER
2

Using SunVTS Diagnostic Software

This chapter contains information about the SunVTS™ diagnostic software tool.

Running SunVTS Diagnostic Tests

The servers are shipped with a Bootable Diagnostics CD that contains the Sun Validation Test Suite (SunVTS) software.
SunVTS provides a comprehensive diagnostic tool that tests and validates Sun hardware by verifying the connectivity and functionality of most hardware controllers and devices on Sun platforms. SunVTS software can be tailored with modifiable test instances and processor affinity features.
The following tests are supported on x86 platforms:
CD DVD Test (cddvdtest)
CPU Test (cputest)
Cryptographics Test (cryptotest)
Disk and Diskette Drives Test (disktest)
Data Translation Look-aside Buffer (dtlbtest)
Emulex HBA Test (emlxtest)
Floating Point Unit Test (fputest)
InfiniBand Host Channel Adapter Test (ibhcatest)
Level 1 Data Cache Test (l1dcachetest)
Level 2 SRAM Test (l2sramtest)
Ethernet Loopback Test (netlbtest)
Network Hardware Test (nettest)
Physical Memory Test (pmemtest)
7
QLogic Host Bus Adapter Test (qlctest)
RAM Test (ramtest)
Serial Port Test (serialtest)
System Test (systest)
Tape Drive Test (tapetest)
Universal Serial Board Test (usbtest)
Virtual Memory Test (vmemtest)
SunVTS software has a sophisticated graphical user interface (GUI) that provides test configuration and status monitoring. The user interface can be run on one system to display the SunVTS testing of another system on the network. SunVTS software also provides a TTY-mode interface for situations in which running a GUI is not possible.

SunVTS Documentation

For the most up-to-date information on SunVTS software, go to:
http://docs.sun.com/app/docs/prod/test.validate

Diagnosing Server Problems With the Bootable Diagnostics CD

SunVTS 6.4 or later software is preinstalled on your server. The server is also shipped with the Bootable Diagnostics CD. This CD is designed so that the server will boot from the CD. This CD boots and starts SunVTS software. Diagnostic tests run and write output to log files that the service technician can use to determine the problem with the server.
Requirements
To use the diagnostics CD you must have a keyboard, mouse, and monitor
attached to the server on which you are performing diagnostics, or available through a remote KVM.
8 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Using the Bootable Diagnostics CD
To use the diagnostics CD to perform diagnostics:
1. With the server powered on, insert the CD into the DVD-ROM drive.
2. Reboot the server, and press F2 during the start of the reboot so that you can change the BIOS setting for boot-device priority.
3. When the BIOS Main menu appears, navigate to the BIOS Boot menu.
Instructions for navigating within the BIOS screens appear on the BIOS screens.
4. On the BIOS Boot menu screen, select Boot Device Priority.
The Boot Device Priority screen appears.
5. Select the DVD-ROM drive to be the primary boot device.
6. Save and exit the BIOS screens.
7. Reboot the server.
When the server reboots from the CD in the DVD-ROM drive, the Solaris Operating System boots and SunVTS software starts and opens its first GUI window.
8. In the SunVTS GUI, press Enter or click the Start button when you are prompted to start the tests.
The test suite will run until it encounters an error or the test is completed.
Note – The CD will take approximately nine minutes to boot.
9. When SunVTS software completes the test, review the log files generated during the test.
SunVTS provides access to four different log files:
SunVTS test error log contains time-stamped SunVTS test error messages. The
log file path name is /var/opt/SUNWvts/logs/sunvts.err. This file is not created until a SunVTS test failure occurs.
SunVTS kernel error log contains time-stamped SunVTS kernel and SunVTS
probe errors. SunVTS kernel errors are errors that relate to running SunVTS, and not to testing of devices. The log file path name is /var/opt/SUNWvts/logs/vtsk.err. This file is not created until SunVTS reports a SunVTS kernel error.
SunVTS information log contains informative messages that are generated
when you start and stop the SunVTS test sessions. The log file path name is /var/opt/SUNWvts/logs/sunvts.info. This file is not created until a SunVTS test session runs.
Chapter 2 Using SunVTS Diagnostic Software 9
Solaris system message log is a log of all the general Solaris events logged by
syslogd. The path name of this log file is /var/adm/messages.
a. Click the Log button.
The Log file window is displayed.
b. Specify the log file that you want to view by selecting it from the Log file
window.
The content of the selected log file is displayed in the window.
c. With the three lower buttons you can perform the following actions:
Print the log file – A dialog box appears for you to specify your printer
options and printer name.
Delete the log file – The file remains on the display, but it will not be
available the next time you try to display it.
Close the Log file window – The window is closed.
Note – If you want to save the log files: When you use the Bootable Diagnostics
CD, the server boots from the CD. Therefore, the test log files are not on the server ’s hard disk drive and they will be deleted when you power cycle the server. To save the log files, you must save them to a removable media device or FTP them to another system.
10 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
CHAPTER
3

Troubleshooting DIMM Problems

This chapter describes how to detect and correct problems with the server’s Dual Inline Memory Modules (DIMM)s. It includes the following sections:
“DIMM Population Rules” on page 11
“DIMM Replacement Policy” on page 12
“How DIMM Errors Are Handled by the System” on page 12
“Isolating and Correcting DIMM ECC Errors” on page 18

DIMM Population Rules

The DIMM population rules for the server are as follows:
Each CPU can support a maximum of eight DIMMs.
The DIMM slots are paired and the DIMMs must be installed in pairs (0-1, 2-3, 4-
5, and 6-7). See
FIGURE 3-1 and FIGURE 3-2. The memory sockets are colored black
or white to indicate which slots are paired by matching colors.
DIMMs are populated starting from the outside (away from the CPU) and
working toward the inside.
CPUs with only a single pair of DIMMs must have those DIMMs installed in that
CPU’s outside white DIMM slots (6 and 7). See
Only DDR2 800 Mhz, 667Mhz, and 533Mhz DIMMs are supported.
Each pair of DIMMs must be identical (same manufacturer, size, and speed).
FIGURE 3-1 and FIGURE 3-2.
11

DIMM Replacement Policy

Replace a DIMM when one of the following events takes place:
The DIMM fails memory testing under BIOS due to Uncorrectable Memory Errors
(UCEs).
UCEs occur and investigation shows that the errors originated from memory.
In addition, a DIMM should be replaced whenever more than 24 Correctable Errors (CEs) originate in 24 hours from a single DIMM and no other DIMM is showing further CEs.
If more than one DIMM has experienced multiple CEs, other possible causes of
CEs have to be ruled out by a qualified Sun Support specialist before replacing any DIMMs.
Retain copies of the logs showing the memory errors per the above rules to send to Sun for verification prior to calling Sun.

How DIMM Errors Are Handled by the System

This section describes system behavior for the two types of DIMM errors: UCEs and CEs, and also describes BIOS DIMM error messages.

Uncorrectable DIMM Errors

For all operating systems (OS’s), the behavior is the same for UCEs:
1. When an UCE occurs, the memory controller causes an immediate reboot of the system.
2. During reboot, the BIOS checks the Machine Check registers and determines that the previous reboot was due to an UCE, then reports this in POST after the memtest stage:
A Hypertransport Sync Flood occurred on last boot
12 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
3. BIOS reports this event in the service processor’s system event log (SEL) as shown in the sample IPMItool output below:
# ipmitool -H 10.6.77.249 -U root -P changeme -I lanplus sel list
8 | 09/25/2007 | 03:22:03 | System Boot Initiated #0x02 | Initiated by warm
reset | Asserted 9 | 09/25/2007 | 03:22:03 | Processor #0x04 | Presence detected | Asserted a | 09/25/2007 | 03:22:03 | OEM #0x12 | | Asserted b | 09/25/2007 | 03:22:03 | System Event #0x12 | Undetermined system hardware
failure | Asserted c | OEM record e0 | 00000002000000000029000002 d | OEM record e0 | 00000004000000000000b00006 e | OEM record e0 | 00000048000000000011110322 f | OEM record e0 | 00000058000000000000030000 10 | OEM record e0 | 000100440000000000fefff000 11 | OEM record e0 | 00010048000000000000ff3efa 12 | OEM record e0 | 10ab0000000010000006040012 13 | OEM record e0 | 10ab0000001111002011110020 14 | OEM record e0 | 0018304c00f200002000020c0f 15 | OEM record e0 | 0019304c00f200004000020c0f 16 | OEM record e0 | 001a304c00f45aa10015080a13 17 | OEM record e0 | 001a3054000000000320004880 18 | OEM record e0 | 001b304c00f200001000020c0f 19 | OEM record e0 | 80000002000000000029000002 1a | OEM record e0 | 80000004000000000000b00006 1b | OEM record e0 | 80000048000000000011110322 1c | OEM record e0 | 80000058000000000000030000 1d | OEM record e0 | 800100440000000000fefff000 1e | OEM record e0 | 80010048000000000000ff3efa 1f | 09/25/2007 | 03:22:06 | System Boot Initiated #0x03 | Initiated by warm
reset | Asserted 20 | 09/25/2007 | 03:22:06 | Processor #0x04 | Presence detected | Asserted 21 | 09/25/2007 | 03:22:15 | System Firmware Progress #0x01 | Memory
initialization | Asserted 22 | 09/25/2007 | 03:22:16 | Memory | Uncorrectable ECC | Asserted | CPU 2 DIMM 0 23 | 09/25/2007 | 03:22:16 | Memory | Uncorrectable ECC | Asserted | CPU 2 DIMM 1 24 | 09/25/2007 | 03:22:16 | Memory | Memory Device Disabled | Asserted | CPU
2 DIMM 0 25 | 09/25/2007 | 03:22:16 | Memory | Memory Device Disabled | Asserted | CPU
2 DIMM 1
Chapter 3 Troubleshooting DIMM Problems 13
The lines in the display start with event numbers (in hex), followed by a description of the event.
TABLE 3-1 Lines in IPMI Output
Event (hex) Description
8 UCE caused a Hypertransport sync flood which lead to system's warm
9 BIOS detected and initiated 4 processors in system.
a BIOS detected a Sync Flood caused this reboot.
b BIOS detected a hardware error caused the Sync Flood.
c to 1e BIOS retrieved and reported some hardware evidence, including all
1f After BIOS detected that a UCE had occurred, it located the DIMM and
21 to 25 BIOS off-lined faulty DIMMs from system memory space and reported
TABLE 3-1 describes the contents of the display:
reset. #0x02 refers to a reboot count maintained since the last AC power reset.
processors' Machine Check Error registers (events 14 to 18).
reset. 0x03 refers to reboot count.
them. Each DIMM of a pair is being reported, since hardware UCE evidence cannot lead BIOS any further than detection of a faulty pair.

Correctable DIMM Errors

If a DIMM has 24 or more correctable errors in 24 hours, it is considered defective and should be replaced.
At this time, CEs are not logged in the server ’s system event logs. They are reported or handled in the supported OS’s as follows:
Windows Server:
a. A Machine Check error-message bubble appears on the task bar.
b. The user must manually open Event Viewer to view errors. Access Event
Viewer through this menu path:
Start-->Administration Tools-->Event Viewer
c. The user can then view individual errors (by time) to see details of the error.
Solaris:
Solaris FMA reports and (sometimes) retires memory with correctable Error Correction Code (ECC) errors. See your Solaris Operating System documentation for details. Use the command:
fmdump -eV
14 Sun Fire X4140, X4240, and X4440 Servers Diagnostics Guide • August 2008
Loading...
+ 56 hidden pages