What is memory scrubbing?............................................................................................................ 10
Detailed description of memory scrubbing......................................................................................... 10
Introduction
Advanced Memory Protection (AMP) consists of memory features that provide increased tolerance and
protection from memory failures. There are varying levels of AMP that are supported on ProLiant
servers, depending on the class of server. Refer to the product QuickSpecs for specific information on
the level of features supported on each ProLiant server.
AMP features include Advanced ECC, Online Spare Memory, Memory Mirroring and RAID.
Advanced ECC and Online Spare are supported on 300 series platforms. The focus of this
whitepaper is to detail Advanced ECC and Online Spare support for the 300 series platforms and
will cover how these features are enabled, the configuration rules for using these features, what
utilities can be used for monitoring failures, and how the failures can be repaired.
Memory failures defined
There are differing degrees of memory failures that impact the severity of the state of the server.
Memory errors can be classified into correctable errors and uncorrectable errors.
Correctable errors can be detected and corrected if the chipset and DIMM support this functionality.
Error detection and correction is implemented by storing data and ECC bits on the DIMM. By utilizing
the data and ECC bits, the system can detect memory errors and correct certain types of failures.
Correctable errors are generally single-bit errors. All ProLiant 300-series servers are capable of
detecting and correcting single-bit errors. In addition, ProLiant servers with Advanced ECC support
can detect and correct some multi-bit errors. HP’s Advanced ECC allows detection and correction of
multi-bit failures if all failed bits are contained within a single DRAM device on the DIMM.
Correctable errors can be classified as “hard” and “soft” errors. With a hard error, every access to
the memory location will return an error. A hard error typically indicates a problem with the DIMM.
With a soft error, the data and/or ECC bits on the DIMM are incorrect, but the error will not continue
to occur once the data and/or ECC bits on the DIMM have been corrected. Soft errors are typically
caused by cosmic rays. They are rare but expected occurrences.
Although hard correctable memory errors are corrected by the system and will not result in system
downtime or data corruption, they indicate a problem with the hardware. On the other hand, soft
errors do not indicate any issue with the hardware. Due to this, HP ProLiant servers track the rate of
correctable errors through correctable error thresholding. This allows the system to differentiate
between hard and soft errors. A soft error will not typically cause a DIMM to exceed HP’s correctable
error threshold. On the other hand, a hard error will typically cause a DIMM to exceed HP’s
correctable error threshold. Due to HP’s correctable error thresholding, the user is warned about hard
correctable errors, but is not notified about soft errors which don’t indicate any issue with the
hardware. HP suggests that corrective action be taken if a DIMM is receiving correctable errors at a
rate higher than HP’s correctable error threshold rate. Even though a DIMM has exceeded the
correctable threshold, future errors will continue to be corrected. The system will not shutdown or
crash due to additional correctable errors. However, a DIMM that is receiving correctable errors at a
high rate has a higher probability of receiving an uncorrectable error, which would result in a system
crash or shutdown for systems not configured for the Mirroring or RAID AMP modes.
The user is warned about a DIMM exceeding the correctable error threshold in multiple ways. The
systems internal Health LED will indicate a caution condition. On most ProLiant 300-series servers, an
LED next to the DIMM exceeding the threshold will be illuminated. In addition, if the System
Management Driver and agents are loaded, a message will be logged to both the console and
Systems Insight Manager. Correctable memory errors can typically be isolated to the actual failed
DIMM.
2
While correctable errors do not affect the normal operation of the system, uncorrectable memory
errors will immediately result in a system crash or shutdown of the system when not configured for
Mirroring or RAID AMP modes. Uncorrectable errors are detected by ProLiant 300-series servers, but
cannot be corrected. ProLiant 500-series and 700-series platforms with Mirroring or RAID AMP
support are capable of protecting against uncorrectable memory errors. Uncorrectable errors are
always multi-bit memory errors. For systems with Advanced ECC support, multi-bit memory errors
within the same DRAM device on the DIMM are not uncorrectable. However, if multiple bits are failed
on different DRAM devices on a DIMM, the error will be uncorrectable. When a system receives an
uncorrectable error and is not in an AMP mode providing protection against these errors, the system
will NMI. The internal Health LED will indicate a critical condition, and on most systems, the LEDs next
to the failed DIMMs will be illuminated. In addition, the error will be logged if the Systems
Management Driver is loaded. In certain cases (typically when the failed memory is in the first Bank of
memory), the NMI handler will be incapable of running because the memory where the NMI handler
resides will be corrupted. In these cases, the system will typically hard lock without any additional
indication regarding the failure. Uncorrectable memory errors can typically only be isolated down to
a failed Bank of DIMMs, rather than the DIMM itself.
Protection from memory failures
There are six levels of protection from memory errors that are supported by HP. In this whitepaper, the
focus will be on those levels of protection supported by the 300-series G4 class of servers. Each level
of protection requires server support.
The base level of memory protection available is parity protection. All ProLiant 300-series platforms
provide memory protection beyond that provided by parity. Parity can detect when a single-bit error
occurs, but cannot correct it. When a single-bit error occurs on a system with parity protection, the
system will hard lock causing a non-maskable interrupt (NMI). Thus, single-bit errors are uncorrectable
errors on a system with parity protection. In parity mode, there is no protection from any level of
memory failures because the ability to correct the failure does not exist.
The next level of protection is Standard ECC. Standard ECC requires chipset and DIMM level support
and provides the capability to detect and correct a single-bit error on a memory access. When a
single-bit error occurs, the system will detect the error and correct the data. Thus, the system will
continue to operate normally. With Standard ECC, all multi-bit memory errors will be detected, but not
corrected. Multi-bit errors are uncorrectable and will result in a system crash and NMI.
A more robust level of protection is provided by Advanced ECC, also known in the industry as
“Chipkill.” Advanced ECC requires chipset and DIMM support and provides a higher level of
protection over Standard ECC. Like Standard ECC, Advanced ECC will detect and correct single-bit
errors. However, Advanced ECC will also detect and correct multi-bit errors if all failed bits are within
a single DRAM device on the DIMM. An entire DRAM device on the DIMM can be failed, and the
system will continue to operate normally. If there are multiple bits of failure that occur on multiple
DRAM devices on the DIMM, the error cannot be corrected with Advanced ECC support, and the
system will crash and NMI.
HP offers memory protection beyond those features listed above. ProLiant 300-series servers support
Online Spare Mode. With Online Spare enabled, the system still takes advantage of Advanced ECC.
In Online Spare Mode, one bank of memory is designated as the spare bank. In this mode, the
designated bank is not used for total available system memory. If the correctable error threshold is
exceeded by a DIMM in a particular bank of memory, that bank will be taken offline and the spare
bank activated instead. Once the original bank is deactivated, the system will not utilize the memory
that exhibited the failure. After switching to the spare bank of memory, the system will continue to
monitor correctable threshold errors and log any failures. If an uncorrectable memory error occurs
before or after the online spare switchover, the system will crash and NMI. However, the memory
3
which exceeded the correctable threshold and was deactivated cannot result in an uncorrectable error
once the online spare switchover is complete.
Benefits of online spare memory
With Online Spare Memory, degraded memory is automatically disengaged and a fresh set of
memory is used in its place. This brings the reliability of the system to the pre-failure level without any
service interruption and without compromising system availability.
This solution is beneficial to businesses that do not have a permanent IT staff, do not have
replacement memory on hand, or cannot bring down the server for any reason until a scheduled
downtime. If a memory module has achieved its pre-defined threshold of correctable memory errors,
its chance of encountering uncorrectable errors increases dramatically. Online Spare allows the
system to automatically deactivate memory that is at a high risk of receiving an uncorrectable error,
and replace it with good memory. No interruption to system operation occurs. An uncorrectable error
would result in a system crash and unscheduled downtime. Thus, Online Spare Mode decreases the
chances of unscheduled downtime and system crashes due to uncorrectable memory errors.
Online Spare Memory is a higher level of memory protection that complements Advanced ECC
support. Online Spare Memory is a user selectable option. Users can choose to disable Online Spare
and make all installed memory available to the operating system and applications, or they can
choose to enable Online Spare and reduce the amount of memory available to the OS and
applications in return for a higher level of protection against uncorrectable memory errors. By default,
Online Spare Mode is disabled.
Deployment considerations
There are a few key factors to consider when determining what level of AMP support should be
enabled:
• What features are supported on the ProLiant server being deployed?
• What level of protection is desired?
• What the cost of implementation is for the AMP mode?
To determine what AMP features are supported on your ProLiant server, refer to the Product
QuickSpecs. The above sections detail the various protection modes and the benefits of each. The
cost of implementation for Online Spare over Advanced ECC is the hardware cost of the extra DIMMs
required for the spare bank. If Standard ECC or Advanced ECC is implemented, there is no cost
associated with extra hardware.
Implementation differences between ProLiant 300 series G3
and G4 servers
The implementation of AMP support for G4 300-series ProLiant servers is very similar to the
implementation on G3 servers with the following exceptions:
• The configuration rules have changes in regards to dual-ranked DIMMs (see “Configuration rules,”
below).
4
• Memory will automatically be tested in POST whenever the memory configuration has changed (see
“Configuring AMP,” below).
• RBSU will now allow the system to be configured for Online Spare, even when the DIMM
configuration is invalid (see “Configuring AMP,” below).
• Correctable threshold errors are still monitored after an Online Spare switchover. On the previous
generation (G3) systems, after an Online Spare switchover had occurred, the next time any DIMM
exceeded its correctable threshold, it would be detected and reported. After that, though, if any
more DIMMs exceeded their correctable threshold on a G3 system, this would not be detected by
the system and would not be reported (see “Managing Failures in a system with online spare
enabled,” below). G4 systems will continue to monitor correctable threshold errors until all banks of
memory have exceeded the correctable error threshold.
• Symmetric Memory Mode feature added (See “Symmetric Memory” section).
• Background Scrubbing feature added (See “What is scrubbing?” section).
• The Online Spare Bank is periodically tested during normal system operation. If an uncorrectable
error is detected in the Online Spare Bank prior to an Online Spare switchover, the system will
continue to operate normally, but the system will not switch to the Online Spare Bank if a DIMM
exceeds the correctable error threshold. This prevents the system from switching over to memory that
would result in an uncorrectable memory error and subsequent crash.
There is no additional software required to enable AMP features. Previous generations (G2) required
the system management (health) driver to be loaded. The health driver is not required in 300 series
G4 servers but is required to provide console messaging and messaging with System Insight
Manager.
Dual rank vs. single rank DIMMs
DIMMs can be classified as single- or dual-rank. Single rank and dual rank DIMMs (also known as
single and double-sided DIMMs) may be the same capacity, but are not equivalent in all cases. For
instance, many ProLiant servers require installing DIMMs in pairs. In this instance, a 1 GB single-rank
DIMM is not equivalent to a 1 GB dual-rank DIMM. Also, single- and dual-rank DIMMs of the same
capacity may not be equivalent for certain population rules in AMP modes. On some ProLiant
products, combinations of these will work and on other products they may not. There may also be
certain configuration rules to allow combinations of single and dual-rank DIMMs to operate together.
Refer to the product QuickSpecs for specific details on what DIMMs are supported on your server.
Configuration rules for online spare
The configuration rules for Online Spare are very simple. If these rules are not followed and the
system is configured for Online Spare mode, the system will automatically boot into Advanced ECC
mode. On each reboot, the system will attempt to enter Online Spare mode as long as the system is
configured for Online Spare mode. A warning message will be displayed at POST and logged to the
Integrated Management Log (IML). When the user configures the system for Online Spare mode, the
system will remain configured for this mode even if the system boots in Advanced ECC mode.
The general configuration rules for Advanced Memory Protection are:
• The banks are designated A, B, C, and D.
• Memory must be populated sequentially, starting with bank A.
• DIMMs must be the same capacity and have the same number of ranks within a bank.
• On systems that support DDRII memory, dual-rank DIMMs are not supported.
5
• On systems that support DDRI DIMMs, dual-rank DIMMs must be populated after single-rank
DIMMs.
• The last bank populated is the Online Spare bank.
• The Online Spare bank must have equal or larger capacity DIMMs to all other DIMMs in the
system. See “Note,” below.
Note
This simple rule is true with HP Memory Option kits. For third party
memory, the true requirement is that the Online Spare bank must have
greater than or equal to the amount of memory in each rank of the Online
Spare bank. If a dual-rank Online Spare bank is used, and another bank in
the system is populated with single-rank DIMMs, each rank of the Online
Spare bank must be at least as large as the rank in the single-rank DIMM.
To simplify, if single and dual-rank DIMMs are mixed in a system, Online
Spare can only be supported if the Online Spare bank is at least twice as
large as the bank with single-rank DIMMs. HP Memory Option kits are
configured such the previous simple rule can be followed.
• The system will attempt to boot in Online Spare with every reboot and will downgrade to Advanced
ECC if the configuration rules are not met.
Configuring the AMP mode
Configuring the AMP mode requires very little work once the user determines the desired mode. The
desired AMP mode is selected via ROM-Based Setup Utility (RBSU). It is recommended that the
system’s DIMM configuration be set up properly to support the desired AMP mode prior to running
RBSU to enable the desired AMP mode.
Also, it is highly recommended that the memory test is run with the system configured for Advanced
ECC once the memory is added to the system. This helps to verify that the memory in the online spare
bank has been tested and is working properly. The spare memory is not fully tested again after the
system is configured for Online Spare and will not be utilized until the switchover occurs between the
primary bank and the designated spare bank. With ProLiant 300-series G4 platforms, a very basic
verification of the Online Spare bank is periodically run during normal operation. However, it is still
recommended that a more thorough test be ran on the spare bank prior to configuring the system for
Online Spare Mode. The POST memory test can be used for this purpose.
The POST memory test will automatically be executed in POST anytime the memory configuration is
changed. When memory is replaced with DIMMs of the same capacity, a memory configuration
change will not be detected, so the user should follow the following steps to manually test the
memory:
1. Under Advanced Options in RBSU, change the setting POST Speed Up to disable (enabled by
default.)
2. Make sure that the selected AMP Mode is Advanced ECC (this is the default). This option is also in
RBSU under Advanced Options and Advanced Memory Protection.
3. Reboot. All the memory will be tested. This may take a few minutes, depending on how much
memory is installed in your system. Once the memory has been tested, you can enable POST Speed
Up again for faster system boot.
Another option for verifying the Online Spare bank is to run the ROM-based Diagnostics Memory Test
when the system is configured for Advanced ECC Mode. The ROM-based Diagnostics Memory Test is
entered via entering the System Maintenance menu at the end of POST (enter the System
Management menu by pressing the F10 key at the prompt late in POST). Using this memory test
6
prevents the user from having to enter RBSU to disable POST speedup, reboot to allow POST to test
the memory, and then once again entering RBSU to enable POST speedup. However, just as with the
POST memory test, the system must be configured for Advanced ECC to allow the Online Spare bank
to be tested.
Once the memory has been tested, enable Online Spare:
1. At the prompt at the end of POST, press the F9 key to enter RBSU.
2. From the RBSU main menu, select System Options.
3. Using the arrow key, select Advanced Memory Protection.
4. To activate Online Spare Memory, highlight Online Spare and press the Enter key. Online Spare
is selected as soon as Enter is pressed. (The default option is Advanced ECC, providing maximum
memory size for applications that require a large memory footprint.)
5. Press the F10 key to exit RBSU and the server will automatically re-boot. Upon reboot, the system
will be in Online Spare mode if supported by the installed DIMM configuration.
As the server reboots subsequent to enabling Online Spare Memory, the following message displays:
Advanced Memory Protection Mode: Online Spare with Advanced ECC xxxxMB System Memory and
xxxxMB memory reserved for Online Spare
Note
RBSU will allow the user to configure an AMP mode for which the current
DIMM configuration does not support, but it will display a warning
message when making the selection. If the user enables an AMP mode not
supported by the current DIMM configuration, the system will boot in
Advanced ECC mode (though the system will still be configured for Online
Spare mode) on the next reboot and an error message will be displayed
during POST indicated the desired AMP mode is not supported by the
current DIMM configuration.
7
Online Spare mode does not require operating system support. It can be enabled and function
properly with any operating system. In addition, no special software beyond the System BIOS is
required for the proper function of Online Spare mode. However, if messaging and logging are
desired at the console along with messages in Insight Manager, an operating system must be used
that has system management and agent support for AMP. On ProLiant 300-series G4 servers, these
operating systems include:
• Microsoft® Windows® 2000 Server
• Microsoft Windows 2000 Advanced Server
• Microsoft Windows 2003 Server
• Microsoft Windows 2003 Enterprise Server
• Linux
• Novell NetWare 6.x
Managing failures in a system with online spare enabled
When the correctable error threshold is exceeded for a DIMM with Online Spare enabled, the system
will copy the contents of the failing bank to the online spare bank. The health driver will log a
message to the console and to the event log indicating that a threshold has been exceeded and a
copy has been initiated. If the agents are loaded, System Insight Manager will indicate the memory is
in a degraded state. In addition, the following LEDs will indicate that the error has occurred:
• The internal health LED will indicate caution.
• The LED next to the DIMM exceeding the correctable error threshold will illuminate amber.
• The Online Spare Status LED will illuminate amber, indicating an online spare copy has been
initiated.
After the copy is completed, the failing bank will be deactivated and the Online Spare bank will
become active. The health driver will log a message to the console and to the event log to indicate
that the copy is complete.
8
As mentioned previously, the ProLiant 300-series G4 servers periodically verify the online spare bank
during normal operation. If a potential uncorrectable error in the spare bank is detected, support for
switching over to the spare bank will be disabled. The health driver will log a message to the console
and to the IML. In addition, the internal health LED will indicate a degraded state, the Online Spare
Status LED will illuminate amber, and the DIMM LEDs for the spare DIMMs will illuminate amber. If a
potential uncorrectable error is detected in the online spare bank prior to an online spare switchover,
the system will continue to operate normally, just without the protection of Online Spare mode. The
system will not crash or NMI.
A system in Online Spare mode does not have full protection from uncorrectable errors since Online
Spare does not provide this level of protection. An uncorrectable memory error will result in a system
crash and NMI. However, a system in Online Spare mode has a reduction in the probability of
receiving an uncorrectable error because DIMMs that are exceeding the correctable error threshold,
and thus at higher risk of receiving an uncorrectable error, are deactivated. Once the switchover has
occurred to the spare bank or once support for switching over to the spare bank has been disabled in
the case that the system detects a potential uncorrectable error in the non-active online spare bank,
the system no longer has a spare bank to switch over to in the event that the correctable error
threshold is exceeded on another bank. These events would then be treated as if the system were in
Advanced ECC mode. The health driver would report the event and the failed DIMM’s LED will
illuminate, but no switchover will occur and the failing memory will remain active. The system can be
powered off at the user’s convenience to replace the failed DIMMs. It is important to note that with
each reboot, the system will continue to attempt to boot off the original memory. It is expected in this
case that the memory will fail again at some point and the spare bank will again become active.
9
Symmetric memory mode
Symmetric Memory mode is a performance enhancement supported with certain DIMM configurations
on the ProLiant 300-series G4 servers. With either four identical dual-rank DIMMs or eight identical
single-rank DIMMs installed in a G4 system configured for Advanced ECC Mode, Symmetric Memory
mode will automatically be enabled. When enabled, the system takes advantage of the particular
DIMM configuration to improve performance. This mode cannot be entered when configured for
Online Spare mode.
Memory Scrubbing
What is memory scrubbing?
There are two types of memory scrubbing, demand and background. ProLiant 300-series G4 servers
support both types of memory scrubbing. Previous generations of 300-series platforms typically
supported only demand scrubbing.
Demand scrubbing is a feature that allows the system to differentiate between soft and hard
correctable memory errors. It allows the system to ignore soft errors while still notifying the customer of
a true DIMM failure. When a correctable memory error occurs due to a soft error, correct data is
written back to the memory device. This prevents the same soft error from occurring again in the
future. This prevents a soft error from resulting in a DIMM exceeding the correctable error threshold.
Background scrubbing reduces the chances that multiple soft errors will result in an uncorrectable
memory error. With background scrubbing, the system is constantly correcting any potential soft
errors “in the background.” Without affecting normal system operation, the memory controller will
continuously perform read/write operations on memory correcting any soft errors that may exist in
memory. By doing this, a soft error that would result in a single-bit failure will likely be corrected
before another soft error potentially occurs which might result in multiple-bits of corrupted data. If
multiple bits of data are corrupted, an uncorrectable error could result causing the system to crash
(see description of Advanced ECC protection).
See the section below for additional detail on how demand and background scrubbing work.
Detailed description of memory scrubbing
As mentioned previously, Advanced ECC and Standard ECC utilize data and ECC bits to perform
memory error detection and correction. Through the data and ECC bits, the system can correct certain
memory errors. For Standard ECC, the system can correct single-bit errors. For Advanced ECC, the
system can correct single or multi-bit errors as long as all failed bits are on the same DRAM device on
the DIMM. If the ECC bits are correct for the corresponding data, then no error occurs when a
memory read occurs. However, if the data does not match the ECC bits, then an error occurs when a
memory read occurs. In many cases, the proper data can be reconstructed through using the ECC
check bits, resulting in a correctable error. The data and ECC bits are checked on memory reads to
detect and potentially correct errors. The system writes correct data and ECC bits on memory writes.
Also mentioned previously, soft errors are errors that occur in memory but which do not indicate a
hardware problem. A cosmic ray hitting the DRAM device on a DIMM can in rare cases cause one or
more bits to change states. This can result in a soft correctable or soft uncorrectable error. It is
extremely rare to have multiple soft errors occur to the same memory location that would result in an
uncorrectable error due to a soft error. Since soft memory errors don’t indicate a problem with the
hardware, and are simply the result of a bit in memory changing state due to cosmic rays, the
memory error will only occur until the data and ECC bits are properly written back to the DIMM.
10
Since the data and ECC bits are written to the DIMM on memory writes, and checked on memory
reads. A soft error could result in multiple correctable memory errors occurring if the processor
continually read a memory location containing a soft memory error. If a write to that memory location
occurred, the error would disappear. However, after the soft error results in the data and ECC bits
being out of synch., every read to that memory location would result in a correctable error until a
write to that memory location occurred. This could result in a soft error resulting in a DIMM exceeding
the correctable error threshold.
Memory scrubbing is a method of solving this problem. There are two types of memory scrubbing
supported by the ProLiant 300-series G4 platforms. The G4 systems and previous generations have
supported something known as demand scrubbing. The G4 systems are the first ProLiant servers to
support what is known as background or patrol scrubbing.
Demand scrubbing solves the problem of obtaining multiple correctable errors due to a single soft
error, and thus the problem of potentially reporting a correctable threshold error due to soft errors.
Whenever the system detects a correctable error, the system will correct the data and pass the data to
the requester, whether that be the processor or a DMA capable device. With demand scrubbing, the
correct data and ECC check bits will also be written back to memory. In other words, when the system
detects a correctable error via the data and ECC bits, it writes back the proper data and ECC bits to
memory. Thus, subsequent reads of the same memory location will not result in a correctable error if
the error was simply a soft error. If there was a hard error and something actually wrong with the
DIMM, writing the correct data and ECC bits back to memory would typically not correct the problem,
and additional correctable errors will occur on subsequent reads.
Background scrubbing (also known as patrol scrubbing) is a very similar process. Instead of only
reading the data and ECC bits, correcting them, and writing them back to memory when a
correctable memory error occurs, the system will constantly be reading and writing memory locations.
Thus, the system will be constantly scrubbing all of the contents of memory in an effort to correct soft
errors before a correctable error even occurs. Even if a particular section of memory is not being
accessed by software or DMA capable devices, background scrubbing will correct any soft errors that
exist in the memory. Background scrubbing occurs at a very slow rate and only when the memory bus
is available. Thus, the memory accesses due to background scrubbing do not affect normal system
operation or system performance. If background scrubbing detects an uncorrectable memory error, it
does not cause the system to crash or result in an NMI.
Background scrubbing serves two purposes. First, it reduces the chances of the system receiving a
correctable error on memory reads initiated by software or DMA-capable devices. While demand
scrubbing prevents multiple correctable errors due to a soft error, one correctable error will occur on
the initial memory read access. If a soft error occurs in memory (ie. if a cosmic ray inverts a bit on a
DRAM device), the background scrub may correct the error before any normal memory read occurs to
the memory location that had been affected. Second and more importantly, background scrubbing
reduces the chances of an uncorrectable error occurring due to a soft error. Although rare, it is
possible that a portion of memory that is not being accessed for a long time could have multiple bit
positions inverted by cosmic rays. For instance, if a bit in memory is inverted by a cosmic ray, but the
memory is never read or written for a relatively long period of time, this would leave a window where
an additional bit in the same memory location could be inverted by cosmic rays. In this case, multiple
bits in the memory location could be inverted, which could potentially result in a system crash and
NMI when the memory is read (in Advanced ECC, the system crash would occur if both inverted bits
were not in the same DRAM device on the DIMM).