This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual
property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license,
transmit, distribute, exhibit, perform, publish or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software,
unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you nd any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is
applicable:
U.S. GOVERNMENT RIGHTS. Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are
"commercial computer software" or "commercial technical data" pursuant to the applicable FederalAcquisitionRegulation and agency-specic supplemental
regulations. As such, the use, duplication, disclosure, modication, and adaptation shall be subject to the restrictions and license terms set forth in the applicable
Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial
Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently
dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall
be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its aliates disclaim any
liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its aliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered
trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro
Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information on content, products, and services from third parties. Oracle Corporation and
its aliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services. Oracle Corporation
and its aliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services.
Ce logiciel et la documentation qui l’accompagnesont protégés par les lois sur la propriété intellectuelle. Ils sont concédés sous licence et soumis à des restrictions
d’utilisation et de divulgation. Sauf disposition de votre contrat de licence ou de la loi, vous ne pouvez pas copier, reproduire, traduire, diuser, modier, breveter,
transmettre, distribuer, exposer, exécuter, publier ou acher le logiciel, même partiellement, sous quelque forme et par quelque procédé que ce soit. Par ailleurs, il est
interdit de procéder à toute ingénierie inverse du logiciel, de le désassembler ou de le décompiler, excepté à des ns d’interopérabilité avec des logiciels tiers ou tel que
prescrit par la loi.
Les informations fournies dans ce document sont susceptibles de modication sans préavis. Par ailleurs, Oracle Corporation ne garantit pas qu’elles soient exemptes
d’erreurs et vous invite, le cas échéant, à lui en faire part par écrit.
Si ce logiciel, ou la documentation qui l’accompagne,est concédé sous licence au Gouvernement des Etats-Unis, ou à toute entité qui délivre la licence de ce logiciel
ou l’utilise pour le compte du Gouvernement des Etats-Unis, la notice suivante s’applique:
U.S. GOVERNMENT RIGHTS. Programs, software, databases, and related documentation and technical data delivered to U.S. Government customers are
"commercial computer software" or "commercial technical data" pursuant to the applicable FederalAcquisitionRegulation and agency-specic supplemental
regulations. As such, the use, duplication, disclosure, modication, and adaptation shall be subject to the restrictions and license terms set forth in the applicable
Government contract, and, to the extent applicable by the terms of the Government contract, the additional rights set forth in FAR 52.227-19, Commercial
Computer Software License (December 2007). Oracle America, Inc., 500 Oracle Parkway, Redwood City, CA 94065.
Ce logiciel ou matériel a été développé pour un usage général dans le cadre d’applications de gestion des informations. Ce logiciel ou matériel n’est pas conçu ni n’est
destiné à être utilisé dans des applications à risque, notamment dans des applications pouvant causer des dommages corporels. Si vous utilisez ce logiciel ou matériel
dans le cadre d’applications dangereuses, il est de votre responsabilité de prendre toutes les mesures de secours, de sauvegarde, de redondance et autres mesures
nécessaires à son utilisation dans des conditions optimales de sécurité. Oracle Corporation et ses aliés déclinent toute responsabilité quant aux dommages causés
par l’utilisation de ce logiciel ou matériel pour ce type d’applications.
Oracle et Java sont des marques déposées d’OracleCorporation et/ou de ses aliés. Tout autre nom mentionné peut correspondre à des marques appartenant à
d’autrespropriétaires qu’Oracle.
Intel et Intel Xeon sont des marques ou des marques déposées d’Intel Corporation. Toutes les marques SPARC sont utilisées sous licence et sont des marques ou des
marques déposées de SPARC International, Inc. AMD, Opteron, le logo AMD et le logo AMD Opteron sont des marques ou des marques déposées d’Advanced Micro
Devices. UNIX est une marque déposée d’The Open Group.
Ce logiciel ou matériel et la documentation qui l’accompagne peuvent fournir des informations ou des liens donnant accès à des contenus, des produits et des services
émanant de tiers. Oracle Corporation et ses aliés déclinent toute responsabilité ou garantie expresse quant aux contenus, produits ou services émanant de tiers. En
aucun cas, Oracle Corporation et ses aliés ne sauraient être tenus pour responsables des pertes subies, des coûts occasionnés ou des dommages causés par l’accès à
des contenus, produits ou services tiers, ou à leur utilisation.
121005@25097
Page 3
Contents
Using This Documentation ...................................................................................................................5
How to Diagnose Server Problems With the Bootable Diagnostics CD ............................... 34
Creating a Data Collector Snapshot ..................................................................................................37
How To Create a Snapshot With the ILOM Web Interface .................................................... 37
How To Create a Snapshot With the ILOM Command-Line Interface ................................ 39
Resetting the SP ................................................................................................................................... 41
How to Reset the ILOM SP Using the Web Interface .............................................................. 41
How to Reset the ILOM SP Using the Command-Line Interface .......................................... 42
Index ......................................................................................................................................................43
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A4
Page 5
UsingThis Documentation
■
“Product Downloads” on page 5
■
“About This Documentation (PDF and HTML)” on page 6
■
“We Welcome Your Comments” on page 6
■
“Change History” on page 6
Product Downloads
You can nd downloads for all Oracle x86 servers and server modules (blades) on Support
(MOS). On MOS you can nd two type of downloads:
■
Software release bundles specic to the rackmount server, server module, modular system
(blade chassis), or NEM. These software release bundles include Oracle ILOM, Oracle
Hardware Installation Assistant and other platform software and rmware.
■
Standalone software common across multiple types of hardware. This includes the
Hardware Management Pack and Hardware Management Connectors.
▼
Get Software and Firmware Downloads
Go to http://support.oracle.com.
1
Sign in to My Oracle Support.
2
At the top of the page, click the Patches and Updates tab.
3
In the PatchSearch box, click Product or Family(AdvancedSearch).
4
In the Product eld, type a full or partial product name, forexample, Sun Fire X4640 until a list of
5
matches is displayedand select the product of interest.
In the Release pull-down list, click the Down arrow.
6
In the window that appears, click the triangle (>) by the product folder icon to show the choices
7
and then select the release of interest and click Close.
5
Page 6
About This Documentation (PDF and HTML)
In the PatchesSearch box, click Search.
8
A list of product downloads (listed as patches) appears.
9
Select the Patch name of interest, for example, 12980209, for the Sun Fire X4640 1.3.1 Firmware.
In the right-side pane thatappears, click Download.
10
About This Documentation (PDF and HTML)
This documentation set is available in both PDF and HTML. The information is presented in
topic-based format (similar to online help) and therefore does not include chapters, appendices
or section numbering.
WeWelcomeYour Comments
Oracle is interested in improving its documentation and welcomes your comments and
suggestions. To share your comments, go to http://www.oracle.com/goto/docfeedback
Change History
The following changes have been made to the documentation set.
■
October 2009, initial publication
■
January 2010, two documents revised
■
Service Manual - Revised DIMM population rules and addressed illustration issues
■
Product Notes - Revised software information and xed bugs
■
April 2010, one document revised
■
Installation Guide - Revised power specications
■
December 2010, two documents revised
■
Service Manual - Revised motherboard FRUID update instructions
■
Product Notes - Revised software information and xed bugs
■
January 2012, two documents revised
■
Service Manual — Added CPLD reset procedure.
■
Product Notes — Revised software information
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A6
Page 7
Overview of the Diagnostics Guide
The following topics are covered in this document.
DescriptionLink
Learn about troubleshooting procedures and
diagnostics tools available for the server.
Troubleshoot system problems.“Troubleshooting the Server” on page 11
Troubleshoot DIMM problems.“Troubleshooting DIMM Problems” on page 15
Use ILOM to monitor the host.“Using the ILOM to Monitor the Host” on page 23
Use SunVTS to diagnose server problems.“Using SunVTS Diagnostics Software ” on page 33
Create a data collector snapshot.“Creating a Data Collector Snapshot” on page 37
Reset the service processor (SP).“Resetting the SP” on page 41
“Introduction to System Diagnostics” on page 9
7
Page 8
8
Page 9
Introduction to System Diagnostics
This section contains an introduction to Oracle's Sun Fire X4640 server diagnostics and covers
the following topics:
■
“Troubleshooting Options” on page 9
■
“Diagnostic Tools” on page 10
Troubleshooting Options
The following table lists the suggested order of troubleshooting procedures when you have an
issue with the server.
StepTroubleshootingTaskLink
1Gather initial service visit information.“How to Gather Service Visit Information”
2Investigate any power-on problems.“How to Troubleshoot Power Problems” on
on page 11
page 11
3Perform external visual inspection and
internal visual inspection.
4Troubleshoot DIMM problems.“Troubleshooting DIMM Problems” on
5View BIOS event logs and POST messages.Sun Fire X4640 Server Service Manual
6View service processor logs and sensor
information.
7Run SunVTS diagnostics.“Using SunVTS Diagnostics Software ” on
8Collect diagnostics data for Oracle Service.“Creating a Data Collector Snapshot” on
“How to Inspect the Outside of the Server” on
page 12
“How to Inspect the Inside of the Server” on
page 12
page 15
“Using the ILOM to Monitor the Host” on
page 23
page 33
page 37
9
Page 10
Diagnostic Tools
Diagnostic Tools
The following diagnostic tools are available for the Sun Fire X4640 server.
BIOS/POST
From the point that the host subsystem is powered on and begins executing code, BIOS code is
executed. The sequence that BIOS goes through, from the rst point where code is executed to
the point that the operating system booting begins, is referred to as POST (power-on self-test).
POST works in conjunction with other processes to complete initialization and booting. POST
focuses on thoroughly testing devices as well as providing meaningful information to users
when errors occur. For more information on BIOS/POST, refer to
Manual
Service Processor ILOM
You can use the Integrated Lights Out Manager (ILOM) to diagnose system problems by
viewing the following:
■
View component information to determine component status.
■
View the ILOM system event log.
Sun Fire X4640 Server Service
For more information on using the ILOM to diagnose system issues, see
Monitor the Host” on page 23
.
“Using the ILOM to
SunVTS Diagnostics
SunVTS is the Sun Validation Test Suite, which provides a comprehensive diagnostic tool that
tests and validates Sun hardware by verifying the connectivity and functionality of most
hardware controllers and devices on Sun platforms. SunVTS software can be tailored with
modiable test instances and processor anity features.
For more information on using the Sun VTS to diagnose system issues, see
Diagnostics Software ” on page 33.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A10
“Using SunVTS
Page 11
Troubleshooting the Server
This section covers the following procedures:
■
“How to Gather Service Visit Information” on page 11
■
“How to Troubleshoot Power Problems” on page 11
■
“How to Inspect the Outside of the Server” on page 12
■
“How to Inspect the Inside of the Server” on page 12
▼
How to Gather ServiceVisit Information
Use the following general guideline steps when you begin troubleshooting.
1
Collect information about the following items:
■
Events that occurred prior to the failure
■
Whether any hardware or software was modied or installed
■
Whether the server was recently installed or moved
■
How long the server exhibited symptoms
■
The duration or frequency of the problem
Document the server settings beforeyou make any changes.
2
If possible, make one change at a time, in order to isolate potential problems. In this way, you
can maintain a controlled environment and reduce the scope of troubleshooting.
Take note of the results of any change you make. Include any errors or informational messages.
3
Check for potential device conicts before you add a new device.
4
Check for versiondependencies, especially with third-party software.
5
▼
How to Troubleshoot Power Problems
■
If the server powers on, skip this section and go to “How to Inspect the Outside of the
Server” on page 12
■
If the server does not power on, check the following:
.
11
Page 12
TroubleshootingtheServer
1
2
▼
1
2
Check that ACpower cords are attached rmly to the server’s powersupplies and to the AC
sources.
Check that the main cover is rmly in place.
There is an intrusion switch on the motherboard that automatically shuts down the server
power to standby mode when the cover is removed.
How to Inspect the Outside of the Server
Inspect the external status indicatorLEDs, which can indicate component malfunction.
For the LED locations and descriptions of their behavior, see
LED Locations and Meanings” in Sun Fire X4640 Server Service Manual
Verify that nothing in the server environment is blocking air ow or making a contact that could
short out power.
“Sun Fire X4640 Server External
.
If the problem is not evident, continue with
3
page 12
▼
How to Inspect the Inside of the Server
Prepare the serverfor service. See “Preparing the Server for Service and Operation”in Sun Fire
1
X4640 Server Service Manual
Choose a method for shutting down the server from main power mode to standby power mode.
2
■
.
.
Graceful shutdown: Use a ballpoint pen or other nonconducting stylus to press and release
“How to Inspect the Inside of the Server”on
the Powerbutton on the front panel. This causes Advanced Conguration and Power
Interface (ACPI) enabled operating systems to perform an orderly shutdown of the
operating system. Servers not running ACPI-enabled operating systems will shut down to
standby power mode immediately.
■
Emergency shutdown: Use a ballpoint pen or other nonconducting stylus to press and hold
the Powerbutton for four seconds to force main power o and enter standby power mode.
When main power is o, the Power/OK LED on the front panel will begin ashing,
indicating that the server is in standby power mode.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A12
Page 13
TroubleshootingtheServer
Caution – When you use the Power button to enter standby power mode, power is still
directed to the service processor board and power supply fans, indicated when the
Power/OK LED is ashing. To completely power o the server, you must disconnect the AC
power cords from the back panel of the server.
Remove the servercover, as required.
3
For instructions on removing the server cover, refer to the
Manual
Inspect the internal status indicatorLEDs, which can indicate component malfunction.
4
Note – The server must be in standby power mode to view the internal LEDs.
.
Sun Fire X4640 Server Service
For the LED locations and descriptions of their behavior, see “Troubleshooting DIMM
Problems” on page 15
Note – You can hold down the Locate button on the server back panel or front panel for 5
.
seconds to initiate a “push-to-test” mode that illuminates all other LEDs both inside and
outside of the chassis for 15 seconds.
Verify that there are no loose or improperly seatedcomponents.
5
13
Page 14
TroubleshootingtheServer
6
7
8
9
10
11
Verify that all cable connectors inside the system are rmly and correctly attached to their
appropriate connectors.
Verify that any after-factory components are qualied and supported.
For a list of supported PCI cards and DIMMs, refer to the
Sun Fire X4640 Server Service
Manual.
Check that the installed DIMMs comply with the supported DIMM population rules and
congurations, as described in
“Troubleshooting DIMM Problems”on page 15.
Replace the server cover.
To restore main power mode to the server(all components poweredon), use a ballpoint pen or
other nonconducting stylus to press and release the Power button on the serverfront panel.
When main power is applied to the full server, the Power/OK LED next to the Power button
blinks intermittently till BIOS post nishes.
If the problem with the server is not evident, youcan tryviewing the power-on self test (POST)
messages and BIOS event logs during system startup.
Refer to
Sun Fire X4640 Server Service Manual for more information on POST and BIOS event
logs.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A14
Page 15
Troubleshooting DIMM Problems
This section contains information on how to troubleshoot DIMM problems.
Note – For information on Sun’s DIMM replacement policy for x64 servers, contact your Oracle
Service representative.
This section covers the following topics:
■
“DIMM Fault LEDs” on page 15
■
“DIMM Population Rules” on page 17
■
“How to Isolate and Correct DIMM ECC Errors” on page 17
■
“Identifying Correctable DIMM Errors (CEs)” on page 19
■
“Identifying BIOS DIMM Error Messages” on page 21
DIMM Fault LEDs
In the Sun Fire X4640 servers, eight DIMM slots are on each removable CPU module. The
DIMM fault LEDs in the DIMM slot ejector levers indicate which DIMM pair has failed. These
DIMM fault LEDs can be lit for up to one minute by a capacitor on the CPU module, even after
the CPU module is removed from the server.
15
Page 16
DIMM Fault LEDs
3
2
1
LegendHeader
1DIMM slot DIMM 0
4
5
6
7
2DIMM slot DIMM 7
3CPU module fault LED
4DIMM fault LEDs (16)
5Mechanical key
6Fault Remind switch
7Power connector
The CPU fault LED indicates which CPU module contains the faulty DIMM.
To light the fault LED from the capacitor, push the small button on the CPU module labelled
“FAULT REMIND BUTTON.”
The DIMM ejector levers contain LEDs that can indicate a faulty DIMM.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A16
Page 17
■
DIMM fault LED is o– The DIMM is operating properly.
■
DIMM fault LED is on (amber)– At least one of the DIMMs in this DIMM pair is faulty and
should be replaced.
DIMM Population Rules
See Sun Fire X4640 Server Service Manual for the DIMM population rules.
▼
How to Isolate and Correct DIMM ECC Errors
If the ILOM reports an ECC error or a problem with a DIMM, rst complete the steps in the
following procedure.
In this example, ILOM reports an error with the DIMM in CPU0, slot 1. The fault LEDs on
CPU0, slots 1 and 0, are lit.
DIMM Population Rules
Refer to
“Using the ILOM to Monitor the Host” on page 23for information on locating
component errors.
Caution – Before handling components, attach an antistatic wrist strap to a chassis ground (any
unpainted metal surface). The system’s printed circuit boards and hard disk drives contain
components that are extremely sensitive to static electricity.
If you havenot already done so, shut down your server to standby power mode and remove the
1
cover.
Refer to the
2
Inspect the CPU fault LEDs for each CPU module. The CPU fault LED will be lit on the CPU module
Sun Fire X4640 Server Service Manual.
that has the faulty DIMM.
3
Disconnect the AC power cords from the server.
4
Remove the CPU module that has the DIMM problem.
Refer to the
5
Inspect the installed DIMMs to ensure that they comply with the DIMM PopulationRules in the
Sun Fire X4640 Server Service Manual.
Sun Fire X4640 Server Service Manual.
6
Press the Fault Remind button on the CPU module to light the faulty DIMM LEDs.
“DIMM Fault LEDs” on page 15 for the location of the Fault Remind button and DIMM
See
fault LEDs.
17
Page 18
DIMM Population Rules
7
8
9
10
11
12
Inspect the fault LEDs on the DIMM slot ejectors.
If any of these LEDs are lit, they can indicate the component with the fault.
Remove the DIMMs from the CPU module.
Refer to the
Sun Fire X4640 Server Service Manual.
Visually inspect the DIMMs for physical damage, dust, or any other contamination on the
connector or circuits.
Visually inspect the DIMM slot for physical damage. Look for cracked or broken plastic on the
slot.
Dust o the DIMMs, clean the contacts, and reseat them.
If there is no obvious damage, exchange the individual DIMMs betweenthe two slots of a given
pair.Ensure that they are inserted correctly with ejector latches secured. Using the slot numbers
from the example:
a. Remove the DIMMs from CPU0, slots 1 and 0.
b. Reinstallthe DIMM from slot 1 into slot 0.
c. Reinstall the DIMM from slot 0 into slot 1.
Reinstall the CPU module thathas the DIMM problem.
13
Refer to the
Reconnect AC powercords to the server.
14
Poweron the server and run the diagnostics test again.
15
Review the log le.
16
■
If the error now appears in CPU0, slot 0 (the opposite of the original error in slot 1), the
Sun Fire X4640 Server Service Manual.
problem is related to the individual DIMM. In this case, return both DIMMs (the pair) to
the Support Center for replacement.
■
If the error still appears in CPU0, slot 1 (as the original error did), the problem is not related
to an individual DIMM. Instead, it might be caused by CPU0 or by the DIMM slot.
Continue with the rest of the procedure.
Shut down the server again and disconnect the AC powercords.
17
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A18
Page 19
Identifying Correctable DIMM Errors (CEs)
Remove the CPU module that has the DIMM problem, and removeanother CPU module that
18
does not indicate a DIMM problem.
Refer to the
Remove both DIMMs of the pair and install them into paired slots on the second CPU module
19
Sun Fire X4640 Server Service Manual.
that did not indicate a DIMM problem.
Using the slot numbers in the example, install the two DIMMs from CPU0, slots 1 and 0 into
CPU1, slots 1 and 0 or CPU1, slots 3 and 2.
Reinstall both CPU modules thatyou removed.
20
Refer to the
Reconnect AC powercords to the server.
21
Poweron the server and run the diagnostics test again.
22
Review the log le.
23
■
If the error now appears under the CPU that manages the DIMM slots you just installed, the
Sun Fire X4640 Server Service Manual.
problem is with the DIMMs. Return both DIMMs (the pair) to the Support Center for
replacement.
■
If the error remains with the original CPU, there is a problem with that CPU module.
Identifying Correctable DIMM Errors (CEs)
CEs rarely occur, therefore during a short POST, the BIOS might not be able to catch a CE to log
it in the server's IPMI SEL (system event logs). Memory Correctable Errors are usually reported
and handled by the supporting operating systems. See the following procedures for more
information:
■
“How to Identify CEs on a Solaris Server” on page 19
■
“How to Identify CEs on a Linux Server” on page 20
■
“How to Identify CEs on a Windows Server” on page 20
▼
How to Identify CEs on a Solaris Server
Use Solaris FMAto identify memory CEs.
●
Refer to Solaris 10 documentation for more information at:
prod/solaris10#hic
.
http://docs.sun.com/app/docs/
19
Page 20
Identifying Correctable DIMM Errors (CEs)
▼
How to Identify CEs on a Linux Server
Use HERD to detect Linux Memory CEs.
●
Refer to the x64 Servers Applications and Utilities Documentation at :http://docs.sun.com/
app/docs/coll/x64apps-utilities?l=en
▼
How to Identify CEs on a Windows Server
Use one of the following methods to detectWindows CEs:
●
■
Use the HERD Utility
See the x64 Servers Applications and Utilities Documentation.at http://docs.sun.com/
app/docs/coll/x64apps-utilities?l=en
■
UseWindows native support such as WHEA (if available).
http://www.microsoft.com/whdc/system/pnppwr/whea/default.mspx for more
See
information.
■
Use a combination of AMD MCAT utility and system eventsas follows:
a. A Machine Check error message pops up on the task bar.
.
.
b. Manuallygo into the EventViewer'sSystem Events to view errors. Access the Event
Viewer through this menu path: Start-->AdministrationTools-->Event Viewer-->System
events list.
c. View individual errors by right-clicking on the event, and selecting "Properties" to see
details of the error.
d. Save the complete logs through this menu path: EventViewer tree-->System
(right-click)-->Save Log FileAs...-->(*.evt).
e. Run AMD Machine Check AnalysisTool (MCAT) using the savedlog, to nd the potential
whereabouts of a faulty DIMM.
Note – The MCAT utility is available as pare of the Windows supplemental software from
the Tools and Drivers CD/DVD for your server.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A20
Page 21
Identifying BIOS DIMM Error Messages
The system BIOS displays and logs four types of DIMM error messages on system screen and in
ILOM's IPMI SEL. The ILOM SEL format is as follows
Event# | Date | Time | Memory #0x(error type) | Configuration Error | CPU Y DIMM Z
Where Y represents the processor socket that the DIMM is associated with and Z is the DIMM
socket that displays the error.
The following table describes the error details.
SEL DisplayScreen DisplayCause of Error Message
Identifying BIOS DIMM Error Messages
Memory #0x01NODE-n Paired DIMMs Size
Mismatch
Memory #0x02NODE-n Paired DIMMs SPD
Checksum Mismatch
The paired DIMMs are not of the
same size.
A dierences in SPD checksum
value (byte63) of a DIMM from
other DIMMs of same processor
Memory #0x03NODE-n DIMMs Manufacturer
Mismatch
Memory #0x04NODE-n single DIMM slot is left
unpopulated
The DIMM manufacturer is not
supported or recognized
The DIMM slot "z" of processor "y"
is left unpopulated while it's pairing
slot has a DIMM installed.
In addition, the following error message is displayed to the screen only (not in the SEL): NODE-n
Memory Configuration Mismatch
The following conditions cause this error message:
■
The DIMMs mode is not paired (running in 64-bit mode instead of 128-bit mode)
■
The DIMMs’ speed is not the same.
■
The DIMMs do not support ECC.
■
The DIMMs are not registered.
■
The MCT stopped due to errors in the DIMM.
■
The DIMM module type (buer) is mismatched.
■
The DIMM generation (I or II) is mismatched.
■
The DIMM CL/T is mismatched.
■
The banks on a two-sided DIMM are mismatched.
■
The DIMM organization is mismatched (128-bit).
■
The SPD is missing Trc or Trfc information.
21
Page 22
22
Page 23
Using the ILOM to Monitor the Host
This section describes how to view server sensor readings and view the system event log.
This section contains the following procedures:
■
“Viewing the ILOM Sensor Readings” on page 23
■
“Viewing the ILOM System Event Log” on page 26
■
“Clearing the Faults from the System Event Log” on page 30
■
“Interpreting Event Log Time Stamps” on page 31
Viewing the ILOM Sensor Readings
There are three ambient temperature sensors that are monitored. Each CPU board has two
temperature sensors. Ambient temperature sensors and core temperature sensors on CPU
boards are fed to IPMI stack to adjust fan speed. View the sensor information in
Supplement for the Sun Fire X4640 Server
This section contains the following procedures:
■
“How to Use the ILOM Web Interface to View the Sensor Readings” on page 23
■
“How to Use the ILOM Command-Line Interface to View the Sensor Readings” on page 25
for more information about the sensors.
Sun ILOM 3.0
BeforeYou Begin
▼
How to Use the ILOMWeb Interface to Viewthe Sensor Readings
To view sensor readings, you need the Read Only (o) role enabled.
Log in to the SP as Administrator or Operator to reach the ILOM web interface:
1
a. Typethe IP address of the server’sSP into your web browser.
The Sun Integrated Lights Out Manager Login screen is displayed.
b. Type your user name and password.
When you rst try to access the ILOM Service Processor, you are prompted to type the
default user name and password:
Default user name: root
Default password: changeme
23
Page 24
Viewing the ILOM Sensor Readings
From the System Monitoring tab, select Sensor Readings.
2
The sensor readings appear.
Note – If the server is powered o, many components will have no readings.
In the Sensor Readings page,do the following:
3
a. Locate the name of the sensor you want to view.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A24
Page 25
Viewing the ILOM Sensor Readings
b. Clickthe name of the sensor to view the property values associated with that sensor.
BeforeYou Begin
For specic details about the type of discrete sensor targets you can access, as well as the
paths to access them, see
4
If the problem with the server is not evident after viewing sensor readings information,
continue with
▼
How to Use the ILOM Command-Line Interface to View the Sensor
“Using SunVTS Diagnostics Software ”on page 33.
Sun ILOM 3.0 Supplement for the Sun Fire X4640 Server.
Readings
To view sensor readings, you need the Read Only (o) role enabled.
1
Log in to the ILOM CLI.
2
Typethe following commands to navigate to the sensor target and then to view the sensor
properties:
->cd target
->show
For example, on some server platforms, you can specify the following path to view a
temperature reading of a server's ambient air intake:
->cd /SYS/MB/T_AMB0
25
Page 26
Viewing the ILOM System Event Log
->show
The properties that describes the sensor target appear. For example:
type = Temperature
To view a discrete sensor reading, type the following commands:
3
ipmi_name = MB/T_AMB0
class = Threshold Sensor
value = 39.000 degree C
upper_nonrecov_threshold = 100.000 degree C
upper_critical_threshold = 90.000 degree C
upper_noncritical_threshold = 80.000 degree C
lower_noncritical_threshold = N/A
lower_critical_threshold = N/A
lower_nonrecov_threshold = N/A
alarm_status = cleared
->cd target
->show
The properties that describe the discrete sensor target appear.
For example:
■
Type = Entity Presence
■
Class = Discrete Indicator
■
Value = Present
For specic details about the type of discrete or threshold sensor targets you can access, as well
as the paths to access them, see
Sun ILOM 3.0 Supplement for the Sun Fire X4640 Server.
Viewing the ILOM System Event Log
This section contains the following procedures:
■
“How to View the System Event Log Using the ILOM Web Interface” on page 26
■
“How to View the System Event Log With the ILOM Command-Line Interface” on page 28
▼
How toView the System Event Log Using the ILOMWeb Interface
Events are notications that occur in response to some actions. The IPMI system event log
(SEL) provides status information about the server’s hardware and software to the ILOM
software, which displays the events in the ILOM web interface. To view event logs:
Log in to the SP as Administrator or Operator to reach the ILOM web interface:
1
a. Typethe IP address of the server’sSP into your web browser.
The Sun Integrated Lights Out Manager Login screen appears.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A26
Page 27
b. Type your user name and password.
When you rst try to access the ILOM SP, you are prompted to type the default user name
and password:
Default user name: root
Default password: changeme
From the System Monitoring tab, select EventLogs.
2
The System Event Logs page appears.
Viewing the ILOM System Event Log
View the Event Log page in one of the following ways:
3
■
Page through entries - Use the page navigation controls at the top and bottom of the table
to navigate forward and back through the available data in the table.
27
Page 28
Viewing the ILOM System Event Log
Note that selecting a greater number of entries might cause the web interface to respond
slower than if you select fewer entries.
■
View the entries in the display by scrolling through the list - The following table provides
descriptions about each column appearing in the log.
Column LabelDescription
Event IDThe number of the event, in sequence from number
1.
Class/Type
SeverityDebug, Down, Critical, Major, or Minor.
Date/TimeThe day and time the event occurred. If the Network
DescriptionA description of the event.
■
Audit/ Log - Commands that result in a
conguration change. Description includes
user, command parameters, and success/fail.
■
IPMI/Log - Any event that is placed in the IPMI
SEL is also put in the management log.
■
Chassis/State - For changes to the inventory and
general system state changes.
■
Chassis/Action - Category for shutdown events
for server module/chassis, hot insert/removal of
a FRU, and Reset Parameters button pushed.
Time Protocol (NTP) server is enabled to set the
ILOM time, the ILOM clock uses Universal
Coordinated Time (UTC).
Note – The ILOM event log accumulates many types of events, including copies of IPMI entries.
Clearing the ILOM event log clears all entries in the log, including the IPMI entries. However,
clearing the ILOM event log entries does not clear the actual entries posted directly to an IPMI
log.
BeforeYou Begin
▼
How toView the System Event Log With the ILOM Command-Line
Interface
To view or clear the event log, you need the Admin (a) role enabled.
Establish a local serial consoleconnection or SSH connection to the serverSP.
1
See
Sun ILOM 3.0 Supplement for the Sun Fire X4640 Server for more information.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A28
Page 29
Typethe following command to set the working directory:
2
-> cd /SP/logs/event
Typethe following command to display the event log list:
root : Open Session : object = /session/type : value = www : success
lk66251 : Open Session : object = /session/type : value = www : error
root : Close Session : object = /session/type : value = www : success
root : Open Session : object = /session/type : value = www : success
66251 : Open Session : object = /session/type : value = www : error
root : Close Session : object = /session/type : value = www : success
root : Open Session : object = /session/type : value = shell : success
root : Open Session : object = /session/type : value = www : success
root : Close Session : object = /session/type : value = www : success
root : Open Session : object = /session/type : value = www : success
Viewing the ILOM System Event Log
To scroll down the list to view entries, press any key except q. The following table provides
4
descriptions about each column appearing in the log.
Column LabelDescription
Event IDThe number of the event, in sequence from number 1.
29
Page 30
Clearing the Faults from the System Event Log
Column LabelDescription
Class/Type
SeverityDebug, Down, Critical, Major, or Minor.
Date/TimeThe day and time the event occurred. If the Network
DescriptionA description of the event.
To dismiss the event log (stop displaying the log), press the q key.
5
■
Audit/ Log - Commands that result in a
conguration change. Description includes user,
command parameters, and success/fail.
■
IPMI/Log - Any event that is placed in the IPMI
SEL is also put in the management log.
■
Chassis/State - For changes to the inventory and
general system state changes.
■
Chassis/Action - Category for shutdown events
for server module/chassis, hot insert/removal of a
FRU, and Reset Parameters button pushed.
Time Protocol (NTP) server is enabled to set the
ILOM time, the ILOM clock uses Universal
Coordinated Time (UTC).
Clearing the Faults from the System Event Log
This section contains the following procedures:
■
“How to Clear Faults From the System Event Log Using the ILOM Web Interface” on
page 30
■
“How to Clear Faults From the System Event Log Using the ILOM Command-Line Interface”
on page 31
▼
How to Clear Faults Fromthe System EventLog Using the ILOM Web
Interface
Navigate to the Event Log from the ILOM System Management tab.
1
Click the Clear Event Log button on the bottom of the Event Log page
2
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A30
Page 31
Interpreting Event LogTimeStamps
A conrmation dialog appears.
Click OK to clear the entries.
3
▼
How to Clear Faults Fromthe System EventLog Using the ILOM
Command-Line Interface
Typethe following command:
1
cd /SP/logs/event/
set clear=true
A conrmation message appears.
Typeone of the following:
2
■
To clear the entries, type: y
■
To cancel clearing the log,type: n
Interpreting Event Log Time Stamps
The time stamps in the event log are related to the service processor clock settings. If the clock
settings change, the change is reected in the time stamps.
When the service processor reboots, the SP clock is set to Thu Jan 1 00:00:00 UTC 1970.
The SP reboots as a result of the following:
■
A complete system unplug/replug power cycle
■
An IPMI command; for example, mc reset cold
■
A command-line interface (CLI) command; for example, reset /SP
■
ILOM web interface operation; for example, from the Maintenance tab, selecting Reset SP
■
An SP rmware upgrade
31
Page 32
Interpreting Event LogTimeStamps
After an SP reboot, the SP clock is changed in these circumstances:
■
When the host is booted. The host’s BIOS unconditionally sets the SP time to that indicated
by the host’s real time clock (RTC). The host’s RTC is set by the following operations:
■
■
■
■
Continuously through NTP if NTP is enabled on the SP. NTP jumping is enabled to
recover quickly from an erroneous update from the BIOS or user. NTP servers provide UTC
time. Therefore, if NTP is enabled on the SP, the SP clock is in UTC.
■
Through the CLI, ILOM web interface, and IPMI
When the host’s CMOS is cleared as a result of changing the host’s RTC battery or
inserting the CMOS-clear jumper on the motherboard. The host’s RTC starts at Jan 1
00:01:00 2002.
When the host’s operating system sets the host’s RTC. The BIOS does not consider time
zones. Solaris and Linux software respect time zones and set the system clock to UTC.
Therefore, after the OS adjusts the RTC, the time set by the BIOS is UTC.
When the user sets the RTC using the host BIOS Setup screen.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A32
Page 33
Using SunVTS Diagnostics Software
SunVTS is the Sun Validation Test Suite, which provides a comprehensive diagnostic tool that
tests and validates Sun hardware by verifying the connectivity and functionality of most
hardware controllers and devices on Sun platforms.
This section contains the following procedures:
■
“Introduction to SunVTS Diagnostic Test Suite” on page 33
■
“SunVTS Documentation” on page 34
■
“How to Diagnose Server Problems With the Bootable Diagnostics CD” on page 34
Introduction to SunVTS Diagnostic Test Suite
The servers are shipped with a bootable diagnostics CD that contains Sun VTS software.
SunVTS provides a comprehensive diagnostic tool that tests and validates Sun hardware by
verifying the connectivity and functionality of most hardware controllers and devices on Sun
platforms. SunVTS software can be tailored with modiable test instances and processor
anity features.
The following tests are available in SunVTS:
■
Processor
■
Memory
■
Disk
■
Graphics
■
Media
■
Ioports
■
Interconnects
■
Network
■
Environment
■
HBA
SunVTS software has a sophisticated graphical user interface (GUI) that provides test
conguration and status monitoring. The user interface can be run on one system to display the
Sun VTS testing of another system on the network. SunVTS software also provides a
TTY-mode interface for situations in which running a GUI is not possible.
33
Page 34
SunVTS Documentation
SunVTS Documentation
For the most up-to-date information on SunVTS software, go to:
and scroll down to the Validation Test Suite listing.
▼
How to Diagnose Server ProblemsWith the Bootable
Diagnostics CD
SunVTS 7.0 ps5 or later software is preinstalled on the server. The server is also shipped with the
bootable diagnostics CD containing SunVTS 6.0 ps6 or later. This CD is designed so that the
server boots from the CD. This CD boots the Solaris operating system and starts SunVTS
software. Diagnostics tests rfun and write output to log les that a service technician can use to
determine the problem with the server.
BeforeYou Begin
You must have a keyboard, mouse, and monitor attached to the server on which you are
performing diagnostics.
With the serverpowered on, insert the bootable diagnostics CD into the CD/DVD drive.
1
Reboot the server,but press F2 during the start of the reboot so that you can change the BIOS
2
setting for boot-device priority.
When the BIOS Main menu appears, navigate to the BIOS Boot menu.
3
Instructions for navigating within the BIOS screens are printed on the BIOS screens.
On the BIOS Boot menu screen, select Boot Device Priority.
4
The Boot Device Priority screen appears.
Select the DVD-ROM drive to be the primary boot device.
5
Save and exitthe BIOS screens.
6
Reboot the server.
7
When the server reboots from the CD in the DVD-ROM drive, the Solaris OS boots and the
SunVTS software starts and opens its rst GUI window.
In the SunVTS GUI, press Enter or click the Start button when you are prompted to start the
8
tests.
The test suite runs until it encounters an error or the test is completed.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A34
Page 35
Note – The CD takes approximately nine minutes to boot.
When the test is completed, review the log les generated during the test.
9
SunVTS software provides access to four dierent log les:
■
SunVTS test error log: contains time-stamped SunVTS test error messages. The log le
path name is /var/sunvts/logs/sunvts.err. This le is not created until a SunVTS test
failure occurs.
■
SunVTS kernel error log: contains time-stamped SunVTS kernel and SunVTS probe
errors. SunVTS kernel errors are errors that relate to running SunVTS, and not to testing of
devices. The log le path name is /var/sunvts/logs/vtsk.err. This le is not created until
SunVTS reports a SunVTS kernel error.
■
SunVTS information log: contains informative messages that are generated when you start
and stop the SunVTS test sessions. The log le path name is
/var/sunvts/logs/sunvts.info. This le is not created until a SunVTS test session runs.
■
Solaris system message log: a log of all the general Solaris events logged by syslogd. The
path name of this log le is /var/adm/messages.
To view a log le:
SunVTS Documentation
a. Click the Log button.
The log le window appears.
b. Specifythe log le that you want to view by selecting it from the Log le windows.
The content of the selected log le is displayed in the windows.
c. Use the three lowerbuttons to perform the following actions:
■
Print the log le.
A dialog box appears for you to specify your printer options and printer name.
■
Delete the log le.
The le remains displayed, but it will be gone the next time you try to display it.
■
Close the log le window.
The window closes.
35
Page 36
SunVTS Documentation
Note – To save the log les, you must save them to another networked system or a
removable media device. When you use the Bootable Diagnostics CD, the server boots
from the CD. Therefore, the test log les are not on the server's hard disk drive and they
will be deleted when you power cycle the server.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A36
Page 37
Creating a Data Collector Snapshot
The purpose of the ILOM Service Snapshot utility is to collect data for use by Sun Services
personnel to diagnose system problems. Customers should not run this utility unless requested
to do so by Sun Services.
This section contains the following procedures:
■
“How To Create a Snapshot With the ILOM Web Interface” on page 37
■
“How To Create a Snapshot With the ILOM Command-Line Interface” on page 39
▼
HowTo Create a Snapshot With the ILOMWeb
Interface
Caution – Customers should not run this utility unless requested to do so by Sun Services.
BeforeYou Begin
To collect SP data using the Service Snapshot utility, you need the Admin (a) role enabled.
Log in to the ILOM web interface.
1
37
Page 38
Creating a Data Collector Snapshot
Select Maintenance–> Snapshot.
2
The Service Snapshot Utility page appears.
Select the desired Data Set: Normal, Full, or Custom.
3
■
Normal - Species that ILOM, operating system, and hardware information is collected.
■
Full - Species that all data is collected. Selecting Full might reset the system.
■
Custom - Allows you to choose one or more of the following data sets:
■
ILOM Data
■
Hardware Data
■
Basic OS Data
■
Diagnostic Data
(Optional) Check the Enabled to collect only log les from the data set.
4
(Optional) Check Enabled check box to encrypt the output le.
5
Select one of the following methods to transfer the output le:
6
■
Browser
■
SFTP
■
FTP
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A38
Page 39
Creating a Data Collector Snapshot
Click Run.
7
A Save As dialog box appears.
In the dialog box, specify the directory to which to save the le and the le name.
8
Click OK.
9
The le is saved to the specied directory.
▼
HowTo Create a Snapshot With the ILOM
Command-Line Interface
Caution – Customers should not run this utility unless requested to do so by Sun Services.
BeforeYou Begin
To collect SP data using the Service Snapshot utility, you need the Admin (a) role enabled.
Log in to the ILOM CLI.
1
Typethe following commands:
2
->set /SP/diag/snapshot dataset=data
->set /SP/diag/snapshot dump_uri=URI
Where data and URI are one of the following:
ValueOptionHeader
datanormalSpecies that ILOM, operating system, and
hardware information is collected.
fullSpecies that all data is collected (“full”
collection).
Note: Using this option might reset the running
host.
normal-logonly
or
full-logonly
Species that only log les are to be collected.
39
Page 40
Creating a Data Collector Snapshot
ValueOptionHeader
URIAny valid target
directory
location
Species the URI of the target directory. The URI
format is as follows:
protocol://username:password@host/directory
Where protocol can be one of these transfer
methods: SFTP or FTP.
For example, to store the snapshot information
in the directory named data on the host, dene
the URI as follows:
ftp://joe:mypasswd@host_ip_address/data
The directory data is relative to the user's login,
so the directory would probably be
/home/joe/data.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A40
Page 41
BeforeYou Begin
Resetting the SP
If you need to reset your ILOM service processor (SP), you can do so without aecting the host
OS. However, resetting an SP disconnects your current ILOM session and renders the SP
unmanageable during reset.
This section contains the following procedures:
■
“How to Reset the ILOM SP Using the Web Interface” on page 41
■
“How to Reset the ILOM SP Using the Command-Line Interface” on page 42
▼
How to Reset the ILOM SP Using the Web Interface
■
To reset the SP, you need the Reset and Host Control (r) role enabled.
■
After updating the ILOM/BIOS rmware, you must reset the ILOM SP.
Log in to the ILOM SP web interface.
1
Select Maintenance–> Reset SP.
2
The Reset service processor page appears.
Click the Reset SP button.
3
The ILOM reboots. The web interface is unavailable while the ILOM reboots.
41
Page 42
Resetting the SP
BeforeYou Begin
▼
How to Reset the ILOM SP Using the Command-Line
Interface
■
To reset the SP, you need the Reset and Host Control (r) role enabled.
■
After updating the ILOM/BIOS rmware, you must reset the ILOM SP.
Log in to the ILOM CLI.
1
Typethe following command:
2
-> reset /SP
The ILOM reboots. The command line interface is unavailable while the ILOM reboots.
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A42
Page 43
Index
B
BIOS DIMM errors, 21
BIOS/POST, 10
C
clearing faults
with the ILOM command-line interface, 31
with the ILOM web interface, 30–31
correctable DIMM errors, 19
correcting DIMM errors, 17–19
D
data collector snapshot, 37–40
data snapshot
creating with the ILOM command-line
interface, 39–40
creating with the ILOM web interface, 37–39
diagnostic tools, description, 10
DIMM errors
emergency shutdown, 12
externally inspecting the server, 12
F
fan sensor readings, 23–32
nding your product on My Oracle Support
(support.oracle.com), 5–6
G
gathering service visit information, 11
graceful shutdown, 12
guidelines for troubleshooting, 11
I
ILOM
description, 10
sensor readings, 23–32
system event log, 26
time stamps, 31
using to monitor the host, 23–32
ILOM command-line interface
using to clear faults, 31
using to create a data snapshot, 39–40
using to reset the ILOM SP, 42
using to view sensor readings, 25–26
using to view system event log, 28–30
43
Page 44
Index
ILOM web interface
using to clear faults, 30–31
using to create a data snapshot, 37–39
using to reset the ILOM SP, 41
using to view sensor readings, 23–25
using to view system event log, 26–28
inspecting the server
externally, 12
internally, 12–14
internally inspecting the server, 12–14
introduction to SunVTS diagnostics software, 33
isolating DIMM errors, 17–19
M
monitoring the host using ILOM, 23–32
My Oracle Support, how to use, 5–6
O
overview of diagnostics guide, 7
P
power o procedure, 12
power problems, troubleshooting, 11–12
shutdown procedure, 12
snapshot
creating with the ILOM command-line
interface, 39–40
creating with the ILOM web interface, 37–39
SP SEL, time stamps, 31
SunVTS, description, 10
SunVTS diagnostics software, 33–36
documentation, 34
introduction, 33
using to diagnose problems, 34–36
support.oracle.com, 5–6
system event log, 26
viewing with the ILOM command-line
interface, 28–30
viewing with the ILOM web interface, 26–28
T
temperature sensor readings, 23–32
time stamps in ILOM SP SEL, 31
troubleshooting
DIMM problems, 15–21
guidelines, 11
power problems, 11–12
troubleshooting options, 9
R
resetting the ILOM SP
using the ILOM command-line interface, 42
using the ILOM web interface, 41
S
sensor readings, 23–32
using the ILOM command-line interface, 25–26
using the ILOM web interface, 23–25
Service Processor ILOM, description, 10
service visit information, gathering, 11
Sun Fire X4640 Server Diagnostics Guide • December2010,Rev A44
V
voltage sensor readings, 23–32
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.