For more information.......................................................................................................................... 20
Call to action .................................................................................................................................... 20
Abstract
The widening performance gap between processors and memory along with the growth of memoryintensive business applications are driving the need for better memory technologies for servers and
workstations. Consequently, there are several memory technologies on the market at any given time.
HP evaluates developing memory technologies in terms of price, performance, and backward
compatibility and implements the most promising technologies in ProLiant servers. HP is committed to
providing customers with the most reliable memory at the lowest possible cost.
This paper summarizes the evolution of memory technology and provides an overview of some the
newest memory technologies that HP is evaluating for servers and workstations. The purpose is to
allay some of the confusion about the performance and benefits of the dynamic random access
memory (DRAM) technologies on the market.
Introduction
Processors use system memory to temporarily store the operating system, mission-critical applications,
and the data they use and manipulate. Therefore, the performance of the applications and reliability
of the data are intrinsically tied to the speed and bandwidth of the system memory. Over the years,
these factors have driven the evolution of system memory from asynchronous DRAM technologies,
such as Fast Page Mode (FPM) memory and Extended Data Out (EDO) memory, to high-bandwidth
synchronous DRAM (SDRAM) technologies. Yet, system memory bandwidth has not kept pace with
improvements in processor performance, thus creating a “performance gap.” Processor performance,
which is often equated to the number of transistors in a chip, doubles every couple of years. On the
other hand, memory bandwidth doubles roughly every three years. Therefore, if processor and
memory performance continue to increase at these rates, the performance gap between them will
widen.
Why is the processor-memory performance gap important? The processor is forced to idle while it
waits for data from system memory. Thus, the performance gap prevents many applications from
effectively using the full computing power of modern processors. In an attempt to narrow the
performance gap, the industry vigorously pursues the development of new memory technologies. HP
works with Joint Electronic Device Engineering Council (JEDEC) memory vendors and chipset
developers during memory technology development to ensure that new memory products fulfill
customer needs in regards to reliability, cost, and backward compatibility.
This paper describes the benefits and drawbacks regarding price, performance, and compatibility of
DRAM technologies. Some descriptions are very technical. For readers who are not familiar with
memory technology, the paper begins with a description of basic DRAM operation and terminology.
Basic DRAM operation
Before a computer can perform any useful task, it copies applications and data from the hard disk
drive to the system memory. Computers use two types of system memory—cache memory and main
memory. Cache memory consists of very fast static RAM (SRAM) and is usually integrated with the
processor. Main memory consists of DRAM chips that can be packaged in a variety of ways on dual
inline memory modules (DIMMs) for the notebook, desktop PC, and server markets.
2
Each DRAM chip contains millions of memory locations, or cells, which are arranged in a matrix of
rows and columns (Figure 1). On the periphery of the array of memory cells are transistors that read,
amplify, and transfer the data from the memory cells to the memory bus. Each DRAM row, called a
page, consists of several DRAM cells. Each DRAM cell on a page contains a capacitor capable of
storing an electrical charge for a very short time. A charged cell represents a “1” data bit, and an
uncharged cell represents a “0” data bit. The capacitors discharge over time so they must be
recharged, or refreshed, thousands of times per second to maintain the validity of the data. These
refresh mechanisms are described later in this section.
Figure 1. Representation of a single DRAM chip on a DIMM
The memory subsystem operates at the memory bus speed. Typically, a DRAM cell is accessed when
the memory controller sends electronic address signals that specify the row address and column
address of the target cell. The memory controller sends these signals to the DRAM chip by way of the
memory bus. The memory bus consists of two sub-buses: the address/command bus and the data bus.
The data bus is a set of lines (traces) that carry the data to and from DRAM. Each trace carries one
data bit at a time. The throughput (bandwidth) of the data bus depends on its width (in bits) and its
frequency. The data width of a memory bus is usually 64-bits, which means that the bus has 64
traces, each of which transports one bit at a time. Each 64-bit unit of data is called a data word.
The address portion of the address/command bus is a set of traces that carry signals identifying the
location of data in memory. The command portion of the address/command bus conveys instructions
such as read, write, or refresh.
When FPM or EDO memory writes data to a particular cell, the location where the data will be
written is selected by the memory controller. The memory controller first selects the page by strobing
the Row Address onto the address/command bus. It then selects the exact location by strobing the
Column Address onto the address/command bus (see Figure 2). These actions are called Row
Address Strobe (RAS) and Column Address Strobe (CAS). The Write Enable (WE) signal is activated
at the same time as the CAS to specify that a write operation is to be performed. The memory
controller then drives the data onto the memory bus. The DRAM devices latch the data and store it
into the respective cells.
During a DRAM read operation, RAS followed by CAS are driven onto the memory bus. The WE
signal is held inactive, indicating a read operation. After a delay called CAS Latency, the DRAM
devices drive the data onto the memory bus.
While DRAM is being refreshed, it cannot be accessed. If the processor makes a data request while
the DRAM is being refreshed, the data will not be available until after the refresh is complete. There
3
are many mechanisms to refresh DRAM, including RAS only refresh, CAS before RAS (CBR) refresh,
and Hidden refresh. CBR, which involves driving CAS active before driving RAS active, is used most
often.
Figure 2. Representation of a write operation for FPM or EDO RAM
DRAM storage density and power consumption
The storage capacity (density) of DRAM is inversely proportional to the cell geometry. In other words,
storage density increases as cell geometry shrinks. Over the past few years, improvements in DRAM
storage density have increased capacity from almost 1 kilobit (Kb) per chip to 2 gigabit (Gb) per
chip. In the near future, it is expected that capacity will increase even further to 4 Gb per chip.
The industry-standard operating voltage for computer memory components was originally at 5 volts.
However, as cell geometries decreased, memory circuitry became smaller and more sensitive.
Likewise, the industry-standard operating voltage has decreased. Today, computer memory
components operate at 1.8 volts, which allows them to run faster and consume less power.
Memory access time
The length of time it takes for DRAM to produce the data, from the CAS signal until the data is
available on the data bus, is called the memory access time or CAS Latency. Memory access time is
measured in billionths of a second (nanoseconds, ns) for asynchronous DRAM. For synchronous
DRAM, the time is converted to number of memory bus clocks.
Chipsets and system bus timing
All computer components that execute instructions or transfer data are controlled by a system bus
clock. The system chipset controls the speed, or frequency, of the system bus clock and thus regulates
the traffic between the processor, main memory, PCI bus, and other peripheral buses.
The bus clock is an electronic signal that alternates between two voltages (designated as “0” and “1”
in Figure 3) at a specific frequency. The bus frequency is measured in millions of cycles per second,
or megahertz (MHz). During each clock cycle, the voltage signal transitions from "0" to "1" and back
to "0". A complete clock cycle is measured from one rising edge to the next rising edge. Data transfer
along the memory bus can be triggered on either the rising edge or falling edge of the clock signal.
4
Figure 3. Representation of a bus clock signal
Over the years, some computer components have gained in speed more than others have. For this
reason, the components in a typical server are controlled by different clocks that run at different, but
related, speeds. These clocks are created by using various clock multiplier and divider circuits to
generate multiple signals based on the main system bus clock. For example, if the main system bus
operates at 100 MHz, a divider circuit can generate a PCI bus frequency of 33 MHz (system clock ÷
3) and a multiplier circuit can generate a processor frequency of 400 MHz (system clock x 4).
Computer components that operate in whole multiples of the system clock are termed synchronous
because they are “in sync” with the system clock.
Synchronous components operate more efficiently than components that are not synchronized
(asynchronous) with the system bus clock. With asynchronous components, either the rest of the
system or the component itself must wait one or more additional clock cycles for data or instructions
due to clock resynchronization. In contrast, synchronized components know on which clock cycle data
will be available, thus eliminating these timing delays.
Memory bus speed
The speed of the DRAM is not the same as the true speed (or frequency) of the overall memory
subsystem. The memory subsystem operates at the memory bus speed, which may not be the same
frequency (in MHz) as the main system bus clock. The two main factors that control the speed of the
memory subsystem are the memory timing and the maximum DRAM speed.
Burst mode access
The original DRAM took approximately six system bus clock cycles for each memory access. During
memory access, the RAS and CAS were sent first and then 64 bits of data were transferred through
the memory bus. The next sequential address access required a repeat of the RAS-CAS-Data
sequence. As a result, most of the overhead occurred while transferring row and column addresses,
rather than the data.
FPM and EDO improved performance by automatically retrieving data from sequential memory
locations on the assumption that they too will be requested. Using this process called burst mode
access, four consecutive 64-bit sections of memory are accessed, one after the other, based on the
address of the first section. So instead of taking six clock cycles to access each of the last three 64-bit
sections, it may take from one to three clock cycles each (see Figure 4).
Burst mode access timing is normally stated in the format “x-y-y-y” where “x” represents the number of
clock cycles to read/write the first 64 bits and “y” represents the number of clock cycles required for
the second, third, and fourth reads/writes. For example, prior to burst mode access, DRAM took up to
24 clock cycles (6-6-6-6) to access four 64-bit memory sections. With burst mode access, three
5
additional data sections are accessed with every clock cycle after the first access (6-1-1-1) before the
memory controller has to send another CAS.
Figure 4. Burst mode access. NOP is a “No Operation” instruction.
Clock
Command
Address
Data
Active
Row
NOPNOPRead
Col
NOPNOPNOPNOP
Data
Da taDataD ata
64b 64b 64b 64b
NOP
NOP
SDRAM technology
FPM and EDO DRAMs are controlled asynchronously, that is, without a memory bus clock. The
memory controller determined when to assert signals and when to expect data based on absolute
timing. The inefficiencies of transferring data between a synchronous system bus and an
asynchronous memory bus resulted in longer latency.
Consequently, JEDEC—the electronics industry standards agency for memory devices and modules—
developed the synchronous DRAM standard to reduce the number of system clock cycles required to
read or write data. SDRAM uses a memory bus clock to synchronize the input and output signals on
the memory chip. This simplified the memory controller and reduced the latency from CPU to memory.
In addition to synchronous operation and burst mode access, SDRAM has other features that
accelerate data retrieval and increase memory capacity—multiple memory banks, greater bandwidth,
and register logic chips. Figure 5 shows SDRAM DIMMs with two key notches that prevent incorrect
insertion and indicate a particular feature of the module.
Figure 5. SDRAM DIMM with two notches
6
Loading...
+ 14 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.