HP PROLIANT DL360 G4, PROLIANT DL380 G4, PROLIANT ML370 G4 User Manual

Page 1

Advanced memory protection for HP ProLiant 300 series G4 servers

Introduction......................................................................................................................................... 2

Protection from memory failures............................................................................................................. 3

Benefits of online spare memory ............................................................................................................ 4

Deployment considerations ................................................................................................................... 4

Implementation differences between ProLiant 300 series G3 and G4 servers .............................................. 4

Dual rank vs. single rank DIMMs ........................................................................................................... 5

Configuration rules for online spare ....................................................................................................... 5

Configuring the AMP mode................................................................................................................... 6

Managing failures in a system with online spare enabled.........................................................................8

Symmetric memory mode.................................................................................................................... 10

Memory Scrubbing ............................................................................................................................ 10

What is memory scrubbing?............................................................................................................ 10

Detailed description of memory scrubbing......................................................................................... 10

Page 2

Introduction

Advanced Memory Protection (AMP) consists of memory features that provide increased tolerance and protection from memory failures. There are varying levels of AMP that are supported on ProLiant servers, depending on the class of server. Refer to the product QuickSpecs for specific information on the level of features supported on each ProLiant server.

AMP features include Advanced ECC, Online Spare Memory, Memory Mirroring and RAID. Advanced ECC and Online Spare are supported on 300 series platforms. The focus of this whitepaper is to detail Advanced ECC and Online Spare support for the 300 series platforms and will cover how these features are enabled, the configuration rules for using these features, what utilities can be used for monitoring failures, and how the failures can be repaired.

Memory failures defined

There are differing degrees of memory failures that impact the severity of the state of the server. Memory errors can be classified into correctable errors and uncorrectable errors.

Correctable errors can be detected and corrected if the chipset and DIMM support this functionality. Error detection and correction is implemented by storing data and ECC bits on the DIMM. By utilizing the data and ECC bits, the system can detect memory errors and correct certain types of failures. Correctable errors are generally single-bit errors. All ProLiant 300-series servers are capable of detecting and correcting single-bit errors. In addition, ProLiant servers with Advanced ECC support can detect and correct some multi-bit errors. HP’s Advanced ECC allows detection and correction of multi-bit failures if all failed bits are contained within a single DRAM device on the DIMM.

Correctable errors can be classified as “hard” and “soft” errors. With a hard error, every access to the memory location will return an error. A hard error typically indicates a problem with the DIMM. With a soft error, the data and/or ECC bits on the DIMM are incorrect, but the error will not continue to occur once the data and/or ECC bits on the DIMM have been corrected. Soft errors are typically caused by cosmic rays. They are rare but expected occurrences.

Although hard correctable memory errors are corrected by the system and will not result in system downtime or data corruption, they indicate a problem with the hardware. On the other hand, soft errors do not indicate any issue with the hardware. Due to this, HP ProLiant servers track the rate of correctable errors through correctable error thresholding. This allows the system to differentiate between hard and soft errors. A soft error will not typically cause a DIMM to exceed HP’s correctable error threshold. On the other hand, a hard error will typically cause a DIMM to exceed HP’s correctable error threshold. Due to HP’s correctable error thresholding, the user is warned about hard correctable errors, but is not notified about soft errors which don’t indicate any issue with the hardware. HP suggests that corrective action be taken if a DIMM is receiving correctable errors at a rate higher than HP’s correctable error threshold rate. Even though a DIMM has exceeded the correctable threshold, future errors will continue to be corrected. The system will not shutdown or crash due to additional correctable errors. However, a DIMM that is receiving correctable errors at a high rate has a higher probability of receiving an uncorrectable error, which would result in a system crash or shutdown for systems not configured for the Mirroring or RAID AMP modes.

The user is warned about a DIMM exceeding the correctable error threshold in multiple ways. The systems internal Health LED will indicate a caution condition. On most ProLiant 300-series servers, an LED next to the DIMM exceeding the threshold will be illuminated. In addition, if the System Management Driver and agents are loaded, a message will be logged to both the console and Systems Insight Manager. Correctable memory errors can typically be isolated to the actual failed DIMM.

Page 3

While correctable errors do not affect the normal operation of the system, uncorrectable memory errors will immediately result in a system crash or shutdown of the system when not configured for Mirroring or RAID AMP modes. Uncorrectable errors are detected by ProLiant 300-series servers, but cannot be corrected. ProLiant 500-series and 700-series platforms with Mirroring or RAID AMP support are capable of protecting against uncorrectable memory errors. Uncorrectable errors are always multi-bit memory errors. For systems with Advanced ECC support, multi-bit memory errors within the same DRAM device on the DIMM are not uncorrectable. However, if multiple bits are failed on different DRAM devices on a DIMM, the error will be uncorrectable. When a system receives an uncorrectable error and is not in an AMP mode providing protection against these errors, the system will NMI. The internal Health LED will indicate a critical condition, and on most systems, the LEDs next to the failed DIMMs will be illuminated. In addition, the error will be logged if the Systems Management Driver is loaded. In certain cases (typically when the failed memory is in the first Bank of memory), the NMI handler will be incapable of running because the memory where the NMI handler resides will be corrupted. In these cases, the system will typically hard lock without any additional indication regarding the failure. Uncorrectable memory errors can typically only be isolated down to a failed Bank of DIMMs, rather than the DIMM itself.

Protection from memory failures

There are six levels of protection from memory errors that are supported by HP. In this whitepaper, the focus will be on those levels of protection supported by the 300-series G4 class of servers. Each level of protection requires server support.

The base level of memory protection available is parity protection. All ProLiant 300-series platforms provide memory protection beyond that provided by parity. Parity can detect when a single-bit error occurs, but cannot correct it. When a single-bit error occurs on a system with parity protection, the system will hard lock causing a non-maskable interrupt (NMI). Thus, single-bit errors are uncorrectable errors on a system with parity protection. In parity mode, there is no protection from any level of memory failures because the ability to correct the failure does not exist.

The next level of protection is Standard ECC. Standard ECC requires chipset and DIMM level support and provides the capability to detect and correct a single-bit error on a memory access. When a single-bit error occurs, the system will detect the error and correct the data. Thus, the system will continue to operate normally. With Standard ECC, all multi-bit memory errors will be detected, but not corrected. Multi-bit errors are uncorrectable and will result in a system crash and NMI.

A more robust level of protection is provided by Advanced ECC, also known in the industry as “Chipkill.” Advanced ECC requires chipset and DIMM support and provides a higher level of protection over Standard ECC. Like Standard ECC, Advanced ECC will detect and correct single-bit errors. However, Advanced ECC will also detect and correct multi-bit errors if all failed bits are within a single DRAM device on the DIMM. An entire DRAM device on the DIMM can be failed, and the system will continue to operate normally. If there are multiple bits of failure that occur on multiple DRAM devices on the DIMM, the error cannot be corrected with Advanced ECC support, and the system will crash and NMI.

HP offers memory protection beyond those features listed above. ProLiant 300-series servers support Online Spare Mode. With Online Spare enabled, the system still takes advantage of Advanced ECC. In Online Spare Mode, one bank of memory is designated as the spare bank. In this mode, the designated bank is not used for total available system memory. If the correctable error threshold is exceeded by a DIMM in a particular bank of memory, that bank will be taken offline and the spare bank activated instead. Once the original bank is deactivated, the system will not utilize the memory that exhibited the failure. After switching to the spare bank of memory, the system will continue to monitor correctable threshold errors and log any failures. If an uncorrectable memory error occurs before or after the online spare switchover, the system will crash and NMI. However, the memory

Page 4

which exceeded the correctable threshold and was deactivated cannot result in an uncorrectable error once the online spare switchover is complete.

Benefits of online spare memory

With Online Spare Memory, degraded memory is automatically disengaged and a fresh set of memory is used in its place. This brings the reliability of the system to the pre-failure level without any service interruption and without compromising system availability.

This solution is beneficial to businesses that do not have a permanent IT staff, do not have replacement memory on hand, or cannot bring down the server for any reason until a scheduled downtime. If a memory module has achieved its pre-defined threshold of correctable memory errors, its chance of encountering uncorrectable errors increases dramatically. Online Spare allows the system to automatically deactivate memory that is at a high risk of receiving an uncorrectable error, and replace it with good memory. No interruption to system operation occurs. An uncorrectable error would result in a system crash and unscheduled downtime. Thus, Online Spare Mode decreases the chances of unscheduled downtime and system crashes due to uncorrectable memory errors.

Online Spare Memory is a higher level of memory protection that complements Advanced ECC support. Online Spare Memory is a user selectable option. Users can choose to disable Online Spare and make all installed memory available to the operating system and applications, or they can choose to enable Online Spare and reduce the amount of memory available to the OS and applications in return for a higher level of protection against uncorrectable memory errors. By default, Online Spare Mode is disabled.

Deployment considerations

There are a few key factors to consider when determining what level of AMP support should be enabled:

• What features are supported on the ProLiant server being deployed?

• What level of protection is desired?

• What the cost of implementation is for the AMP mode?

To determine what AMP features are supported on your ProLiant server, refer to the Product QuickSpecs. The above sections detail the various protection modes and the benefits of each. The cost of implementation for Online Spare over Advanced ECC is the hardware cost of the extra DIMMs required for the spare bank. If Standard ECC or Advanced ECC is implemented, there is no cost associated with extra hardware.

Implementation differences between ProLiant 300 series G3 and G4 servers

The implementation of AMP support for G4 300-series ProLiant servers is very similar to the implementation on G3 servers with the following exceptions:

• The configuration rules have changes in regards to dual-ranked DIMMs (see “Configuration rules,” below).

Page 5

• Memory will automatically be tested in POST whenever the memory configuration has changed (see “Configuring AMP,” below).

• RBSU will now allow the system to be configured for Online Spare, even when the DIMM configuration is invalid (see “Configuring AMP,” below).

• Correctable threshold errors are still monitored after an Online Spare switchover. On the previous generation (G3) systems, after an Online Spare switchover had occurred, the next time any DIMM exceeded its correctable threshold, it would be detected and reported. After that, though, if any more DIMMs exceeded their correctable threshold on a G3 system, this would not be detected by the system and would not be reported (see “Managing Failures in a system with online spare enabled,” below). G4 systems will continue to monitor correctable threshold errors until all banks of memory have exceeded the correctable error threshold.

• Symmetric Memory Mode feature added (See “Symmetric Memory” section).

• Background Scrubbing feature added (See “What is scrubbing?” section).

• The Online Spare Bank is periodically tested during normal system operation. If an uncorrectable

error is detected in the Online Spare Bank prior to an Online Spare switchover, the system will continue to operate normally, but the system will not switch to the Online Spare Bank if a DIMM exceeds the correctable error threshold. This prevents the system from switching over to memory that would result in an uncorrectable memory error and subsequent crash.

There is no additional software required to enable AMP features. Previous generations (G2) required the system management (health) driver to be loaded. The health driver is not required in 300 series G4 servers but is required to provide console messaging and messaging with System Insight Manager.

Dual rank vs. single rank DIMMs

DIMMs can be classified as single- or dual-rank. Single rank and dual rank DIMMs (also known as single and double-sided DIMMs) may be the same capacity, but are not equivalent in all cases. For instance, many ProLiant servers require installing DIMMs in pairs. In this instance, a 1 GB single-rank DIMM is not equivalent to a 1 GB dual-rank DIMM. Also, single- and dual-rank DIMMs of the same capacity may not be equivalent for certain population rules in AMP modes. On some ProLiant products, combinations of these will work and on other products they may not. There may also be certain configuration rules to allow combinations of single and dual-rank DIMMs to operate together. Refer to the product QuickSpecs for specific details on what DIMMs are supported on your server.

Configuration rules for online spare

The configuration rules for Online Spare are very simple. If these rules are not followed and the system is configured for Online Spare mode, the system will automatically boot into Advanced ECC mode. On each reboot, the system will attempt to enter Online Spare mode as long as the system is configured for Online Spare mode. A warning message will be displayed at POST and logged to the Integrated Management Log (IML). When the user configures the system for Online Spare mode, the system will remain configured for this mode even if the system boots in Advanced ECC mode.

The general configuration rules for Advanced Memory Protection are:

• The banks are designated A, B, C, and D.

• Memory must be populated sequentially, starting with bank A.

• DIMMs must be the same capacity and have the same number of ranks within a bank.

• On systems that support DDRII memory, dual-rank DIMMs are not supported.

Page 6

• On systems that support DDRI DIMMs, dual-rank DIMMs must be populated after single-rank DIMMs.

• The last bank populated is the Online Spare bank.

• The Online Spare bank must have equal or larger capacity DIMMs to all other DIMMs in the

system. See “Note,” below.

Note

This simple rule is true with HP Memory Option kits. For third party memory, the true requirement is that the Online Spare bank must have greater than or equal to the amount of memory in each rank of the Online Spare bank. If a dual-rank Online Spare bank is used, and another bank in the system is populated with single-rank DIMMs, each rank of the Online Spare bank must be at least as large as the rank in the single-rank DIMM. To simplify, if single and dual-rank DIMMs are mixed in a system, Online Spare can only be supported if the Online Spare bank is at least twice as large as the bank with single-rank DIMMs. HP Memory Option kits are configured such the previous simple rule can be followed.

• The system will attempt to boot in Online Spare with every reboot and will downgrade to Advanced ECC if the configuration rules are not met.

Configuring the AMP mode

Configuring the AMP mode requires very little work once the user determines the desired mode. The desired AMP mode is selected via ROM-Based Setup Utility (RBSU). It is recommended that the system’s DIMM configuration be set up properly to support the desired AMP mode prior to running RBSU to enable the desired AMP mode.

Also, it is highly recommended that the memory test is run with the system configured for Advanced ECC once the memory is added to the system. This helps to verify that the memory in the online spare bank has been tested and is working properly. The spare memory is not fully tested again after the system is configured for Online Spare and will not be utilized until the switchover occurs between the primary bank and the designated spare bank. With ProLiant 300-series G4 platforms, a very basic verification of the Online Spare bank is periodically run during normal operation. However, it is still recommended that a more thorough test be ran on the spare bank prior to configuring the system for Online Spare Mode. The POST memory test can be used for this purpose.

The POST memory test will automatically be executed in POST anytime the memory configuration is changed. When memory is replaced with DIMMs of the same capacity, a memory configuration change will not be detected, so the user should follow the following steps to manually test the memory:

1. Under Advanced Options in RBSU, change the setting POST Speed Up to disable (enabled by

default.)

2. Make sure that the selected AMP Mode is Advanced ECC (this is the default). This option is also in

RBSU under Advanced Options and Advanced Memory Protection.

3. Reboot. All the memory will be tested. This may take a few minutes, depending on how much

memory is installed in your system. Once the memory has been tested, you can enable POST Speed Up again for faster system boot.

Another option for verifying the Online Spare bank is to run the ROM-based Diagnostics Memory Test when the system is configured for Advanced ECC Mode. The ROM-based Diagnostics Memory Test is entered via entering the System Maintenance menu at the end of POST (enter the System Management menu by pressing the F10 key at the prompt late in POST). Using this memory test

Page 7

prevents the user from having to enter RBSU to disable POST speedup, reboot to allow POST to test the memory, and then once again entering RBSU to enable POST speedup. However, just as with the POST memory test, the system must be configured for Advanced ECC to allow the Online Spare bank to be tested.

Once the memory has been tested, enable Online Spare:

1. At the prompt at the end of POST, press the F9 key to enter RBSU.

2. From the RBSU main menu, select System Options.

3. Using the arrow key, select Advanced Memory Protection.

4. To activate Online Spare Memory, highlight Online Spare and press the Enter key. Online Spare

is selected as soon as Enter is pressed. (The default option is Advanced ECC, providing maximum memory size for applications that require a large memory footprint.)

5. Press the F10 key to exit RBSU and the server will automatically re-boot. Upon reboot, the system

will be in Online Spare mode if supported by the installed DIMM configuration.

As the server reboots subsequent to enabling Online Spare Memory, the following message displays: Advanced Memory Protection Mode: Online Spare with Advanced ECC xxxxMB System Memory and

xxxxMB memory reserved for Online Spare

Note

RBSU will allow the user to configure an AMP mode for which the current DIMM configuration does not support, but it will display a warning message when making the selection. If the user enables an AMP mode not supported by the current DIMM configuration, the system will boot in Advanced ECC mode (though the system will still be configured for Online Spare mode) on the next reboot and an error message will be displayed during POST indicated the desired AMP mode is not supported by the current DIMM configuration.

Page 8

Online Spare mode does not require operating system support. It can be enabled and function properly with any operating system. In addition, no special software beyond the System BIOS is required for the proper function of Online Spare mode. However, if messaging and logging are desired at the console along with messages in Insight Manager, an operating system must be used that has system management and agent support for AMP. On ProLiant 300-series G4 servers, these operating systems include:

• Microsoft® Windows® 2000 Server

• Microsoft Windows 2000 Advanced Server

• Microsoft Windows 2003 Server

• Microsoft Windows 2003 Enterprise Server

• Linux

• Novell NetWare 6.x

Managing failures in a system with online spare enabled

When the correctable error threshold is exceeded for a DIMM with Online Spare enabled, the system will copy the contents of the failing bank to the online spare bank. The health driver will log a message to the console and to the event log indicating that a threshold has been exceeded and a copy has been initiated. If the agents are loaded, System Insight Manager will indicate the memory is in a degraded state. In addition, the following LEDs will indicate that the error has occurred:

• The internal health LED will indicate caution.

• The LED next to the DIMM exceeding the correctable error threshold will illuminate amber.

• The Online Spare Status LED will illuminate amber, indicating an online spare copy has been

initiated.

After the copy is completed, the failing bank will be deactivated and the Online Spare bank will become active. The health driver will log a message to the console and to the event log to indicate that the copy is complete.

Page 9

As mentioned previously, the ProLiant 300-series G4 servers periodically verify the online spare bank during normal operation. If a potential uncorrectable error in the spare bank is detected, support for switching over to the spare bank will be disabled. The health driver will log a message to the console and to the IML. In addition, the internal health LED will indicate a degraded state, the Online Spare Status LED will illuminate amber, and the DIMM LEDs for the spare DIMMs will illuminate amber. If a potential uncorrectable error is detected in the online spare bank prior to an online spare switchover, the system will continue to operate normally, just without the protection of Online Spare mode. The system will not crash or NMI.

A system in Online Spare mode does not have full protection from uncorrectable errors since Online Spare does not provide this level of protection. An uncorrectable memory error will result in a system crash and NMI. However, a system in Online Spare mode has a reduction in the probability of receiving an uncorrectable error because DIMMs that are exceeding the correctable error threshold, and thus at higher risk of receiving an uncorrectable error, are deactivated. Once the switchover has occurred to the spare bank or once support for switching over to the spare bank has been disabled in the case that the system detects a potential uncorrectable error in the non-active online spare bank, the system no longer has a spare bank to switch over to in the event that the correctable error threshold is exceeded on another bank. These events would then be treated as if the system were in Advanced ECC mode. The health driver would report the event and the failed DIMM’s LED will illuminate, but no switchover will occur and the failing memory will remain active. The system can be powered off at the user’s convenience to replace the failed DIMMs. It is important to note that with each reboot, the system will continue to attempt to boot off the original memory. It is expected in this case that the memory will fail again at some point and the spare bank will again become active.

Page 10

Symmetric memory mode

Symmetric Memory mode is a performance enhancement supported with certain DIMM configurations on the ProLiant 300-series G4 servers. With either four identical dual-rank DIMMs or eight identical single-rank DIMMs installed in a G4 system configured for Advanced ECC Mode, Symmetric Memory mode will automatically be enabled. When enabled, the system takes advantage of the particular DIMM configuration to improve performance. This mode cannot be entered when configured for Online Spare mode.

Memory Scrubbing

What is memory scrubbing?

There are two types of memory scrubbing, demand and background. ProLiant 300-series G4 servers support both types of memory scrubbing. Previous generations of 300-series platforms typically supported only demand scrubbing.

Demand scrubbing is a feature that allows the system to differentiate between soft and hard correctable memory errors. It allows the system to ignore soft errors while still notifying the customer of a true DIMM failure. When a correctable memory error occurs due to a soft error, correct data is written back to the memory device. This prevents the same soft error from occurring again in the future. This prevents a soft error from resulting in a DIMM exceeding the correctable error threshold.

Background scrubbing reduces the chances that multiple soft errors will result in an uncorrectable memory error. With background scrubbing, the system is constantly correcting any potential soft errors “in the background.” Without affecting normal system operation, the memory controller will continuously perform read/write operations on memory correcting any soft errors that may exist in memory. By doing this, a soft error that would result in a single-bit failure will likely be corrected before another soft error potentially occurs which might result in multiple-bits of corrupted data. If multiple bits of data are corrupted, an uncorrectable error could result causing the system to crash (see description of Advanced ECC protection).

See the section below for additional detail on how demand and background scrubbing work.

Detailed description of memory scrubbing

As mentioned previously, Advanced ECC and Standard ECC utilize data and ECC bits to perform memory error detection and correction. Through the data and ECC bits, the system can correct certain memory errors. For Standard ECC, the system can correct single-bit errors. For Advanced ECC, the system can correct single or multi-bit errors as long as all failed bits are on the same DRAM device on the DIMM. If the ECC bits are correct for the corresponding data, then no error occurs when a memory read occurs. However, if the data does not match the ECC bits, then an error occurs when a memory read occurs. In many cases, the proper data can be reconstructed through using the ECC check bits, resulting in a correctable error. The data and ECC bits are checked on memory reads to detect and potentially correct errors. The system writes correct data and ECC bits on memory writes.

Also mentioned previously, soft errors are errors that occur in memory but which do not indicate a hardware problem. A cosmic ray hitting the DRAM device on a DIMM can in rare cases cause one or more bits to change states. This can result in a soft correctable or soft uncorrectable error. It is extremely rare to have multiple soft errors occur to the same memory location that would result in an uncorrectable error due to a soft error. Since soft memory errors don’t indicate a problem with the hardware, and are simply the result of a bit in memory changing state due to cosmic rays, the memory error will only occur until the data and ECC bits are properly written back to the DIMM.

Page 11

Since the data and ECC bits are written to the DIMM on memory writes, and checked on memory reads. A soft error could result in multiple correctable memory errors occurring if the processor continually read a memory location containing a soft memory error. If a write to that memory location occurred, the error would disappear. However, after the soft error results in the data and ECC bits being out of synch., every read to that memory location would result in a correctable error until a write to that memory location occurred. This could result in a soft error resulting in a DIMM exceeding the correctable error threshold.

Memory scrubbing is a method of solving this problem. There are two types of memory scrubbing supported by the ProLiant 300-series G4 platforms. The G4 systems and previous generations have supported something known as demand scrubbing. The G4 systems are the first ProLiant servers to support what is known as background or patrol scrubbing.

Demand scrubbing solves the problem of obtaining multiple correctable errors due to a single soft error, and thus the problem of potentially reporting a correctable threshold error due to soft errors. Whenever the system detects a correctable error, the system will correct the data and pass the data to the requester, whether that be the processor or a DMA capable device. With demand scrubbing, the correct data and ECC check bits will also be written back to memory. In other words, when the system detects a correctable error via the data and ECC bits, it writes back the proper data and ECC bits to memory. Thus, subsequent reads of the same memory location will not result in a correctable error if the error was simply a soft error. If there was a hard error and something actually wrong with the DIMM, writing the correct data and ECC bits back to memory would typically not correct the problem, and additional correctable errors will occur on subsequent reads.

Background scrubbing (also known as patrol scrubbing) is a very similar process. Instead of only reading the data and ECC bits, correcting them, and writing them back to memory when a correctable memory error occurs, the system will constantly be reading and writing memory locations. Thus, the system will be constantly scrubbing all of the contents of memory in an effort to correct soft errors before a correctable error even occurs. Even if a particular section of memory is not being accessed by software or DMA capable devices, background scrubbing will correct any soft errors that exist in the memory. Background scrubbing occurs at a very slow rate and only when the memory bus is available. Thus, the memory accesses due to background scrubbing do not affect normal system operation or system performance. If background scrubbing detects an uncorrectable memory error, it does not cause the system to crash or result in an NMI.

Background scrubbing serves two purposes. First, it reduces the chances of the system receiving a correctable error on memory reads initiated by software or DMA-capable devices. While demand scrubbing prevents multiple correctable errors due to a soft error, one correctable error will occur on the initial memory read access. If a soft error occurs in memory (ie. if a cosmic ray inverts a bit on a DRAM device), the background scrub may correct the error before any normal memory read occurs to the memory location that had been affected. Second and more importantly, background scrubbing reduces the chances of an uncorrectable error occurring due to a soft error. Although rare, it is possible that a portion of memory that is not being accessed for a long time could have multiple bit positions inverted by cosmic rays. For instance, if a bit in memory is inverted by a cosmic ray, but the memory is never read or written for a relatively long period of time, this would leave a window where an additional bit in the same memory location could be inverted by cosmic rays. In this case, multiple bits in the memory location could be inverted, which could potentially result in a system crash and NMI when the memory is read (in Advanced ECC, the system crash would occur if both inverted bits were not in the same DRAM device on the DIMM).

Page 12

© 2004 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. Linux is a U.S. registered trademark of Linus Torvalds.

374481-001, 7/2004