System Memory Troubleshooting Best Practices for HP
ProLiant Servers
Accurately troubleshooting system memory issues in ProLiant server configurations is an important process that
can help prevent unnecessary replacement of hardware components. In addition, accurate problem diagnosis
prevents customers from experiencing unnecessary downtime while waiting for hardware that may not need
to be replaced. Following standard troubleshooting guidelines and using them each time a memory issue is
suspected helps to establish this.
HP has developed several methods for troubleshooting memory problems in ProLiant servers.
The purpose of this white paper is to assist HP customers in troubleshooting system memory problems by
successfully isolating the specific DIMMs causing the problem. This helps to prevent nonessential replacement
of unaffected DIMMs or, in some cases, entire banks of memory. In addition, effective troubleshooting can
help determine if a firmware or other software download can resolve a problem without replacing hardware.
This white paper covers the following topics:
• Why should I troubleshoot every system memory problem?
• How can I tell if a memory problem has occurred?
• What tools are available from HP to help identify a failing DIMM?
• Troubleshooting using HP Insight Diagnostics Online Edition
• Troubleshooting using HP Insight Diagnostics Offline Edition
• Troubleshooting flowchart for bootable systems
• Troubleshooting flowchart for non-bootable systems
• What
• Why Buy HP Memory?
• Other troubleshooting resources
role does firmware play in solving system memory problems?
Why Should I Troubleshoot Every Memory Problem?
Accurate diagnosis of system memory problems in ProLiant servers has many advantages, including:
• Prevents unnecessary hardware replacement.
• Prevents the return of parts that test NFF (No Fault Found).
• Prevents server downtime.
Best Practice
correctable with a firmware update. HP strongly recommends checking for a firmware update before
sending a part back to HP for replacement. Based on the HP ProLiant product return rates, a
significant percentage of all returned hardware products were functioning properly and only needed
a firmware update. Although not all products fall into this category, server downtime and time spent
removing, returning, and ultimately replacing hardware may have been avoided if an attempt had
been made to flash the firmware during the troubleshooting process.
How can I tell if a Memory Problem has Occurred?
: Many product issues that result in hardware replacement are preventable or
There are many indicators that a problem has occurred within the memory subsystem. HP has several tools
used to identify the status of hardware and software within a system. Using these tools is a good first step in
the troubleshooting process. When a system memory problem is suspected, check one or all of these
common places to find information:
• HP System Management Homepage (SMH)
• HP Systems Insight Manager (HP SIM)
• Server Logs
• DIMM Slot LEDs
IMPORTANT:
When a memory error is detected, the firmware illuminates the fault LEDs located near each DIMM
slot on the system board.
If the system identifies an error to a specific slot, that LED illuminates. However, if the system can only
identify an error within a bank, but cannot isolate a specific DIMM, all the LEDs in the bank will
illuminate.
In addition, if the system cannot identify the bank in which the error has occurred, all the LEDs in all
2
banks illuminate, making the task of isolating the failing DIMM more difficult and the chance of
replacing functioning banks of memory more likely.
Therefore, further troubleshooting is necessary to determine which specific DIMM is failing. Use the
LEDs as a tool in identifying that a memory problem may exist, but do not rely solely on the status of
the LEDs to determine if hardware should be replaced.
What Tools are Available from HP to Help Identify a Failing DIMM?
Refer to the any of following HP system tools whenever a memory problem is suspected.
HP System Management Homepage
The HP System Management Homepage supplies a consolidated view of system hardware health,
configuration, performance and status information for individual HP servers. Details are provided on total
system health, including system memory. Information on system memory can be found under the
“Performance” section on the main page (See Figure 1). For Linux and Windows, HP SMH is available in the
ProLiant Support Pack and Integrity Support Pack. To download the latest version of the ProLiant Support Pack
or Integrity Support Pack, navigate to the Support and Troubleshooting link on
http://www.hp.com.
3
Figure 1: Overview of the HP System Management Homepage
HP Systems Insight Manager (HP SIM)
HP Systems Insight Manager monitors the health of the hardware in the system and polls installed hardware
for its status every few minutes. Refer to Figure 2 below for an example of events displayed on the System
page. For more information on HP SIM, refer to the following URL:
erver system logs record the status of hardware events, including memory issues. For servers running
S
Microsoft Windows operating systems, either of the following logs can be a valuable resource:
• Integrated Management Log (IML)
• Event Viewer
r servers running Linux operating systems, refer to either of the following:
Fo
• Integrated Management Log (IML)
• varlog/messages file
icrosoft Windows Operating Systems: Using the IML
M
e IML Viewer is a software tool created by HP and is available as a downloadable component pack from
Th
HP.COM. It can also be accessed via the HP System Management Homepage (SMH). Navigate to this tool
through SMH by clicking on the Logs tab or through the operating system from HP System Tools. Figure 3
below shows the Integrated Management Log accessed via SMH. System memory issues, if present, will b
recorded and will be visible in the IML.
e
Figure 3: Integrated Management Log
6
Loading...
+ 12 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.