Hp PROLIANT BL20P G3, PROLIANT ML110, PROLIANT ML570 G3, PROLIANT ML150 G3, PROLIANT DL560 System Memory Troubleshooting Best Practices

...
System Memory Troubleshooting Best Practices for HP ProLiant Servers
Accurately troubleshooting system memory issues in ProLiant server configurations is an important process that can help prevent unnecessary replacement of hardware components. In addition, accurate problem diagnosis prevents customers from experiencing unnecessary downtime while waiting for hardware that may not need to be replaced. Following standard troubleshooting guidelines and using them each time a memory issue is suspected helps to establish this.
HP has developed several methods for troubleshooting memory problems in ProLiant servers.
The purpose of this white paper is to assist HP customers in troubleshooting system memory problems by successfully isolating the specific DIMMs causing the problem. This helps to prevent nonessential replacement of unaffected DIMMs or, in some cases, entire banks of memory. In addition, effective troubleshooting can help determine if a firmware or other software download can resolve a problem without replacing hardware.
This white paper covers the following topics:
Why should I troubleshoot every system memory problem?
How can I tell if a memory problem has occurred?
What tools are available from HP to help identify a failing DIMM?
Troubleshooting using HP Insight Diagnostics Online Edition
Troubleshooting using HP Insight Diagnostics Offline Edition
Troubleshooting flowchart for bootable systems
Troubleshooting flowchart for non-bootable systems
What
Why Buy HP Memory?
Other troubleshooting resources
role does firmware play in solving system memory problems?

Why Should I Troubleshoot Every Memory Problem?

Accurate diagnosis of system memory problems in ProLiant servers has many advantages, including:
Prevents unnecessary hardware replacement.
Prevents the return of parts that test NFF (No Fault Found).
Prevents server downtime.
Best Practice
correctable with a firmware update. HP strongly recommends checking for a firmware update before sending a part back to HP for replacement. Based on the HP ProLiant product return rates, a significant percentage of all returned hardware products were functioning properly and only needed a firmware update. Although not all products fall into this category, server downtime and time spent removing, returning, and ultimately replacing hardware may have been avoided if an attempt had been made to flash the firmware during the troubleshooting process.

How can I tell if a Memory Problem has Occurred?

: Many product issues that result in hardware replacement are preventable or
There are many indicators that a problem has occurred within the memory subsystem. HP has several tools used to identify the status of hardware and software within a system. Using these tools is a good first step in the troubleshooting process. When a system memory problem is suspected, check one or all of these common places to find information:
HP System Management Homepage (SMH)
HP Systems Insight Manager (HP SIM)
Server Logs
DIMM Slot LEDs
IMPORTANT:
When a memory error is detected, the firmware illuminates the fault LEDs located near each DIMM slot on the system board.
If the system identifies an error to a specific slot, that LED illuminates. However, if the system can only identify an error within a bank, but cannot isolate a specific DIMM, all the LEDs in the bank will illuminate.
In addition, if the system cannot identify the bank in which the error has occurred, all the LEDs in all
2
banks illuminate, making the task of isolating the failing DIMM more difficult and the chance of replacing functioning banks of memory more likely.
Therefore, further troubleshooting is necessary to determine which specific DIMM is failing. Use the LEDs as a tool in identifying that a memory problem may exist, but do not rely solely on the status of the LEDs to determine if hardware should be replaced.
What Tools are Available from HP to Help Identify a Failing DIMM?
Refer to the any of following HP system tools whenever a memory problem is suspected.
HP System Management Homepage
The HP System Management Homepage supplies a consolidated view of system hardware health, configuration, performance and status information for individual HP servers. Details are provided on total system health, including system memory. Information on system memory can be found under the “Performance” section on the main page (See Figure 1). For Linux and Windows, HP SMH is available in the ProLiant Support Pack and Integrity Support Pack. To download the latest version of the ProLiant Support Pack or Integrity Support Pack, navigate to the Support and Troubleshooting link on
http://www.hp.com.
3
Figure 1: Overview of the HP System Management Homepage

HP Systems Insight Manager (HP SIM)

HP Systems Insight Manager monitors the health of the hardware in the system and polls installed hardware for its status every few minutes. Refer to Figure 2 below for an example of events displayed on the System page. For more information on HP SIM, refer to the following URL:
http://h18013.www1.hp.com/products/servers/management/hpsim/index.html
4
Figure 2: HP Systems Insight Manager
5

System Logs

erver system logs record the status of hardware events, including memory issues. For servers running
S Microsoft Windows operating systems, either of the following logs can be a valuable resource:
Integrated Management Log (IML)
Event Viewer
r servers running Linux operating systems, refer to either of the following:
Fo
Integrated Management Log (IML)
varlog/messages file
icrosoft Windows Operating Systems: Using the IML
M
e IML Viewer is a software tool created by HP and is available as a downloadable component pack from
Th HP.COM. It can also be accessed via the HP System Management Homepage (SMH). Navigate to this tool through SMH by clicking on the Logs tab or through the operating system from HP System Tools. Figure 3 below shows the Integrated Management Log accessed via SMH. System memory issues, if present, will b recorded and will be visible in the IML.
e
Figure 3: Integrated Management Log
6
Loading...
+ 12 hidden pages