HP DL585 Introduction Manual

Page 1

The AMD processor roadmap for industrystandard servers

technology brief, 6th Edition

Abstract.............................................................................................................................................. 2

Introduction......................................................................................................................................... 2

X86 architecture .................................................................................................................................. 2

32-bit operations.............................................................................................................................. 2

AMD64 technology.......................................................................................................................... 3

Instruction set and registers............................................................................................................ 3

Operating modes ......................................................................................................................... 4

Memory addressability.................................................................................................................. 4

Naming conventions ............................................................................................................................ 4

Direct Connect I/O Architecture ............................................................................................................ 5

Integrated memory controller and dedicated memory banks.................................................................. 5

HyperTransport Technology............................................................................................................... 6

Multi-core technologies......................................................................................................................... 7

Dual-core Revision F processors ......................................................................................................... 8

Quad-Core AMD Opteron processors................................................................................................. 9

AMD Smart Fetch Technology ......................................................................................................10

Enhanced AMD PowerNow! Technology....................................................................................... 10

Rapid Virtualization Indexing....................................................................................................... 10

Average CPU Power metric ......................................................................................................... 11

Independent and combined memory channel modes....................................................................... 12

Six-Core AMD Opteron processors................................................................................................... 12

HT Assist ................................................................................................................................... 13

Future AMD Opteron processors.......................................................................................................... 14

Software licensing ............................................................................................................................. 14

Conclusion........................................................................................................................................ 14

For more information.......................................................................................................................... 15

Call to action .................................................................................................................................... 15

Page 2

Abstract

Since 1996, HP and AMD have collaborated to provide high-performance, energy efficient solutions that deliver quality, variety, and value in industry-standard servers. This collaboration includes the adoption of the latest multi-core AMD Opteron™ processors. This technology brief discusses current and near future AMD Opteron processors and the evolving AMD Opteron processor microarchitecture.

Introduction

The AMD Opteron family of processors is AMD’s offering for the industry-standard server market. HP is providing AMD processors in the ProLiant server product line to offer enterprise customers expanded options for improved performance while maintaining cost-effective infrastructures.

AMD Opteron processors are based on the X86 architecture with AMD64 technology and feature an integrated memory controller and the Direct Connect I/O Architecture, which uses HyperTransport™ technology. In addition, AMD multi-core processors feature AMD Virtualization™ and AMD PowerNow!™ technologies.

X86 architecture

AMD Opteron processors adhere to the x86 instruction set architecture to be compatible with the wealth of 32-bit software applications available. In other words, at the software/hardware interface, the software interface of the AMD Opteron processor remains the same with regard to the memory addressing size, the instruction sets, and the register designs for the x86 architecture.

32-bit operations

A 32-bit processor has general-purpose registers (GPRs) that are 32 bits wide and can operate on an integer data stream that is 32 bits wide. In addition, a 32-bit processor can hold 32 bits of memory address data in a single register, for a maximum of 4 GB of addressable memory.

The x86 architecture supports physical addressing extensions (PAE), which extend the address space to allow addressing to 36 bits for a maximum of 64 GB of physical addressable memory. However, this requires the OS and applications to take advantage of the additional memory addressing.

As shown in Table 1, the x86, 32-bit instruction set of the AMD Opteron family of processors includes the following:

• Standard x86 instructions, which are general-purpose arithmetic functions

• Single Input Multiple Data (SIMD) Instructions, which let one command work simultaneously on

multiple data items. This includes Streaming SIMD Extensions (SSE), SSE2, SSE3, and SSE4a instructions.

• x87 floating point instructions

AMD Opteron processors support 32-bit addressing as well as the 36-bit PAE.

Page 3

Table 1. 32-bit x86 instructions common to AMD processors

Instruction name

Standard x86

MMX

x87 Instructions for floating point calculations FP 80-bit* 8

SSE, SSE2, SSE3, and SSE4a

* According to the article “An Introduction to 64-bit Computing and x86-64” by Jon Stokes1, the “x87 uses 80-bit registers to do double-precision floating point. The floats themselves are 64-bit, but the processor converts them to an internal, 80-bit format for increased precision when doing computations.”

Description Register

type

Instructions for logical and arithmetic operations, address calculations, and has 16-bit index registers for memory pointers

Multimedia instructions that allow the processor to do 64-bit SIMD operations

SSE improved upon the MMX instructions and allowed processors to do 128-bit SIMD floating-point operations.

SSE2 added 64-bit parallel floating point numeric support. It also added new instructions to support 128-bit SIMD integer operations.

SSE3 instructions include 13 instructions that accelerate performance of SSE technology, SSE2 technology, and x87-floating-point math capabilities.

SSE4a instructions include two new SSE instructions. SSE4a instructions also add support for unaligned SSE loadoperation, which formerly required 16-byte alignment.

GPR 32-bit 8

MMX 64-bit 8

MMX 128-bit 8

Size of registers

Number of registers

AMD Opteron processors support the AMD 3Dnow!™ instruction set, AMD’s version of multimedia instructions. The 3DNow! set added SIMD instructions to improve the vector-processing (floating point) performance of graphic-intensive and multimedia applications.

AMD64 technology

Introduced in 2003, AMD64 technology is the AMD microarchitecture and instruction set that provides full support for 64-bit operating systems and applications. The most important feature of AMD64 is support for very large virtual and physical memory in a flat address space.

Instruction set and registers

AMD64 instruction These registers are used by the applications only when running the processors in 64-bit long mode. To support the AMD64 instructions, the registers expand to include the following:

• Eight new 64-bit GPRs

• Extensions of the eight original, 32-bit GPRs to 64 bits

• Eight new 128-bit registers for SSE, SSE2, and SSE3 instructions

can take advantage of the 64-bit wide registers in AMD Opteron processors.

Available at http://arstechnica.com/cpu/03q1/x86-64/x86-64-1.html

Page 4

Operating modes

AMD Opteron processors use three different operating modes: 64-bit long mode, 64-bit compatibility mode, and 32-bit legacy mode. The 64-bit long mode requires a 64-bit OS and an application recompiled to use the 64-bit registers. In other words, the full capabilities of the expanded register set are available only when both the OS and the application support 64 bits. The 64-bit compatibility mode requires a 64-bit OS, but can use a 32-bit application. The additional registers are available to the OS, but not to the 32-bit application, because it cannot make use of them. When running in legacy mode, the processor acts just like a 32-bit processor, and the extra registers are not available (Table 2).

Table 2. Operating modes for AMD Opteron processors2

Mode OS required Application

recompile required?

64-bit long mode 64-bit OS Yes Yes 64

64-bit compatibility mode

32-bit legacy mode 32-bit OS No No 32

64-bit OS No Yes – to OS

No – to application

GPR width (bits)

Memory addressability

The AM

D Opteron registers are at least 64-bits wide. When operating in 64-bit long mode, the AMD Opteron processors support up to 48 bits (256 Terabytes) for physical memory and use 64 bits for virtual memory

Naming conventions

First-generation single-core AMD Opteron processors (Socket 940 and Socket 939) have three-digit model numbers in the form XZZ, and third-generation Quad-Core AMD Opteron processors (Socket F and Socket AM2) have four-digit model numbers XYZZ. AMD Opteron processor “generations” are called Revisions.

For all AMD Opteron processors, the first digit “X” specifies the number of CPUs on the target machine:

• 1000 Series - Single-processor systems

• 2000 Series - Dual-processor systems

• 8000 Series - Systems with up to 8 processors

The second digit, Y, indicates socket generation, where “2” indicates Socket AM2 or Socket F (1207). Series 12ZZ processors are based on Socket AM2; Series 22ZZ and 82ZZ processors are based on Socket F (1207). If the second digit is “3,” it stands for third-generation AMD Opteron processors for Socket AM2 and Socket F (1207). If the second digit is “4,” it indicates Six-Core AMD Opteron processors.

From the document titled “AMD64 Architecture Programmer’s Manual, Vol. 1: Application Programming,“

available at www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf

Page 5

The last two digits, ZZ, indicate the relati higher performance.

In addition, the model number can include a suffix designator to indicate a non-standard power level. HE designates a lower power version, and SE a higher power version. For example, Model 2220, Model 2220 HE, and Model 2220 SE all offer equivalent performance, but differ in power consumption.

The AMD website includes a quick reference guide socket, revision (stepping), core frequency, manufacturing process (45 nm, 65 nm, or 90 nm), HyperTransport frequency, and wattage.

ve performance within the series. Higher numbers indicate

that details each processor part number by

Direct Connect I/O Architecture

The AMD Direct Connect I/O Architecture replaces the traditional front side bus with point-to-point HyperTransport Technology links and an integrated memory controller connected to dedicated memory banks for each processor.

Integrated memory controller and dedicated memory banks

Each AMD Opteron processor contains an integrated dual-channel SDRAM memory controller that is directly connected to dedicated memory banks. Integrating the controller into the processor means that memory performance can scale linearly based on the number of processors in a multiprocessor system. For example, in a multi-processor system, the integrated memory controller allows for multiple memory requests in parallel, thereby increasing the effective memory bandwidth and decreasing average memory latency.

The memory controller operates at a frequency independent of—and usually slower than—the processor core. It has a 128-bit interface that is capable of supporting up to eight DDR2 DIMMs. With four DDR2-800 DIMMs per channel, the memory bandwidth is up to 12.8 GB/s. The 128-bit interface can be divided into two independent 64-bit memory channels for memory controller utilization and better memory performance.

http://www.amdcompare.com/us-en/AMD Opteron/

Page 6

HyperTransport Technology

HyperTransport is a point-to-point interconnect with two unidirectional links (see Figure 1) that directly connect the processors to each other and connect each processor to its dedicated memory banks, as well as to other I/O chipsets. HyperTransport has the advantages of no overhead for bus arbitration and easier signal integrity maintenance, resulting in a scalable, high-bandwidth architecture.

Each16-bit (2-byte) HyperTransport link is double-pumped, performing two data transfers per clock cycle. From HyperTransport 1.0 in 2001 to HyperTransport 3.0 in 2008, the maximum clock speed and transfer rate increased from 800 MHz (1.6 MT/s (4.8 GT/s) in each direction. This gives each HyperTransport 3.0 link a maximum data rate of

4.8 GT/s × 2 bytes per transfer, or 9.6 GB/s (19.2 GB/s aggregate data rate).

Figure 1. The Hy each processor, allowing memory capacity to scale with the number of processors.

perTransport interconnect separates memory and I/O traffic and directly attaches memory to

Compared to a shared, parallel front-side bus,

) to a maximum of 2.4 GHz

HyperTransport Technology was invented at AMD with contributions from industry partners and is managed and licensed by the

HyperTransport Technology Consortium, a Texas non-profit corporation.

MT/s, or megatransfers per second, equals the speed of the link in millions of cycles per second times the number of transfers per cycle.

Page 7

Multi-core technologies

In the past, the most common way to improve processor performance was to increase core frequency and/or cache size. However, both of these solutions increase power consumption (and heat generation) and have other limitations. Alternatively, higher performance can be achieved by using multiple execution cores per processor. Multi-core processors run applications more efficiently and allow multi-threaded software to achieve higher performance, while maintaining a similar power budget to single-core processors. Also, multi-core processors are increasingly attractive with reductions in the manufacturing process (for example, from 90 nm to 65 nm to 45 nm). This is because smaller cores require less power, which permits more cores to be built into a single processor.

AMD introduced its first dual-core AMD64 processor in 2005; it was manufactured using a 90 nm process. The AMD Opteron processor is essentially divided into two parts: execution and communications, with a system request interface and crossbar switch linking these two parts (see Figure 2). The crossbar switch architecture enabled AMD Opteron processors to transition easily from single-core to dual-core processors without fundamental design changes.

Each execution core includes a 64-KB/64-KB data/instruction L1 cache and a 1-MB L2 cache. The system request interface manages and prioritizes the processor requests to the crossbar switch. The crossbar switch connects both processor cores directly to communications: I/O (through HyperTransport links) and the integrated memory controller. The memory controller and the HyperTransport links remain the same as in a single core system.

Figure 2. The Socket F (1207) and Socket AM2 designs support dual-core AMD Revision F processors.

The primary difference between the proc the way the processor uses the HyperTransport link(s). In the 1000 series AMD Opteron processors, the single HyperTransport link can only connect to I/O in a non-coherent link. This means that the 1000 series processors are limited to single-processor systems. In the 2000 series, one of three HyperTransport links can connect to one other AMD Opteron processor in a coherent link. The

essors designed for single, dual, or multi-core systems is in

Page 8

other link

s can connect to I/O (non-coherent links); thus, 2000 series AMD Opteron processors can be used in dual-processor systems. With the 8000 series AMD Opteron processors, all three HyperTransport links can connect to other AMD Opteron processors or to I/O.

For more information about multi-core processors, see the AMD whitepaper titled “Multi-Core Processors—The Next Evolution in Computing.”

Dual-core Revision F processors

The dual-core Revision F (Rev F) was introduced in 2006. A number of features from previous revisions remained unchanged with the introduction of the Rev F processor:

• 64-KB/64-KB data/instruction L1 cache per core

• 1-MB L2 cache per core

• 1-GHz HyperTransport

Revision F also included key improvements:

• DDR2 memory support

• Hardware assisted virtualization (AMD V

• Power management (PowerNow!) improvements

• Quad-core upgradeability

DDR2 operates at 1.8 V compared to 2.5 V for DDR, reducing power requirements by up to 30 percent.

AMD Virtualization

hardware assistance directly supports virtualization with industry-standard

servers, which reduces complexity and improves performance.

)

PowerNow! Technology with Optimized Power Management reduces power requirements and heat generation by reducing the processor’s clock speed and voltage during periods when the CPU is not fully utilized. Up to five power states are supported. Power consumption at idle is reduced by up to 75 percent.

Quad-core upgradeability means that Revision F sockets are pin-compatible with and will support quad-core processors within the same power and thermal envelopes.

AMD whitepaper “Multi-Core Processors—The Next Evolution in Computing” is accessible at

http://multicore.amd.com/Resources/3321

HP ProLiant DL145 G3 servers do not support quad-core Revision F processor upgradeability.

1A_Multi-Core_WP_en.pdf.

Page 9

Quad-Core AMD Opteron processors

AMD introduced the quad-core AMD Opteron (see Figure 3) in September of 2007. It included several innovations:

• A new core microarchitecture – K8L (true quad-core on a single die)

• Extensions to AMD64 instruction set – bit manipulation and SSE, SSE2, SSE3, and SSE4

• 128-bit FPU for improved floating point and graphics performance

• AMD Smart Fetch Technology Support for DDR2 memory

• Dedicated 64-KB L1 cache and 512-KB L2 cache for each core

• 2-MB to 6-MB L3 cache shared among all cores

• 65 nm silicon process technology

• Enhanced AMD PowerNow! with Independent Dynamic Core Technology and Dual Dynamic

Power Management

• AMD-V™ with Rapid Virtualization Indexing

Figure 3. The Socket F (1207) design supports Quad-Core AMD Opteron™ processors.

Page 10

AMD Smart Fetch Technology

Smart Fetch Technology allows cores to enter a "halt" state during idle processing times, causing them to draw less power. Before entering the halt state, data from the L1 and L2 caches are transferred to the shared L3 cache so that the contents of the idle cores can be retrieved.

Enhanced AMD PowerNow! Technology

Native qu four cores. Two power management enhancements—Independent Dynamic Core Technology and Dual Dynamic Power Management™—provide optimum performance-per-watt and power savings.

ad-core tec

hnology enables enhancements to AMD PowerNow! Technology across all

Independent Dynamic Core Technology

AMD’s Independent Dynamic Core Technology allows each core to independently adjust its frequency to reduce power use based on application requirements (Figure 4). This enables more precise power management, which can reduce the total cost of ownership (TCO) of a data center.

Figure 4. Independently controlled cores reduce power use. The voltage is locked to the core with the highest P-state.

Dual Dynamic Power Management

Dual Dynamic Power Management provides separate (split) power planes for the cores and memory controller. This can reduce idle power consumption and allow individual processors to be managed in multi-socket systems, thereby creating power-saving opportunities without compromising performance.

Rapid Virtualization Indexing

apid Virtu associated with software virtualization. With software virtualization, processor overhead increases as each guest OS and application vies for the host machine’s physical resources; this results in decreased performance. Also, memory latency increases as the virtual machine monitor, or hypervisor, dynamically translates the memory addresses sent to and received from the memory controller. The hypervisor does this so that each guest application does not realize that it is being virtualized. The translation from virtual machine memory address to host machine physical address is achieved by using “shadow page tables” (Figure 5).

alization Indexing is an innovation in AMD-V technology that reduces the overhead

Page 11

Rapid Virtu allows virtual machines to manage memory more directly. Rapid Virtualization Indexing eliminates the time the hypervisor spends managing shadow pages in software, and accelerates this task with much faster hardware-based page management. This hardware-based management reduces hypervisor overhead and improves the speed of the guest OS.

Figure 5. Hardware-based management using nested page tables reduces hypervisor overhead, compared to software-based management of shadow page tables, thus improving the speed of the guest OS.

alization Indexing is the AMD implementation of nested page tables technology which

Average CPU Power metric

Beca

use of rising power and cooling costs in data centers, organizations are adopting a new paradigm that focuses on maximizing system energy efficiency down to the component level. This is especially true for the processor, which represents a significant percentage of power use and heat generation. AMD’s introduction of power management enhancements such as Dual Dynamic Power Management and Independent Dynamic Core technology help to reduce processor power use. However, if data center planners do not know the actual power required by the processor, they must use the maximum power ratings listed in the engineering specifications.

To accurately measure processor power consumption, its power use must be isolated from the power use of other components on the motherboard. To accomplish this, AMD developed specially instrumented motherboards with voltage regulators that deliver power to individual processor power rails. This special instrumentation allows AMD to measure processor power use of all processor rails during standard test workloads such as floating point, integer, Web, and transaction processing.

From these test results, AMD developed an average CPU power (ACP) metric to more accurately estimate the power consumption of AMD Opteron processors during peak workloads (Table 3). The ACP metric allows data centers to more accurately forecast their power requirements and reap the benefits of lower power and cooling costs.

Page 12

Table 3. Thermal Design Power versus Average CPU Power

Thermal Design Power (watts) Average CPU Power (watts)

137 105

115 75

75 55

Independent and combined memory channel modes

The Qu

ad-Core AMD Opteron processor includes two DRAM controllers that support DDR2 DIMMs. Each DRAM controller controls one 64-bit DDR DIMM channel that connects to a series of DIMMs. The DRAM controllers can be configured to behave as a single channel (called ganged, or combined, mode) or as two channels (called unganged, or independent, mode). Configuring the DRAM controllers in unganged mode creates two 64-bit logical DIMMs, each equivalent to one 64bit physical DIMM. Configuring the DRAM controllers in ganged mode creates one 128-bit logical DIMM. Each physical DIMM of a 128-bit logical DIMM must be identical (same size and same timing parameters).

The configuration requirements for the DRAM controllers and DIMMs are as follows:

• Both DRAM controllers must be programmed to the same frequency. All DIMMs must operate at

the same memory clock frequency, regardless of the channel on which they are connected.

• The DRAM controllers do not support mixing unbuffered and registered DIMMs on the same

channel or between channels.

• The DRAM controllers do not support mixing ECC and non-ECC DIMMs on the same channel

or between channels.

Six-Core AMD Opteron processors

AMD introduced the six-core Opteron processor, formerly code-named "Istanbul," in June 2009. According to AMD, the six-core Opteron processor (Figure 6) operates within the same power and thermal envelope as the Quad-Core Opteron processor. However, it provides a 20% to 50% performance increase, specifically for virtualization, database, and high-performance computing applications. The six-core processor includes several innovations:

• A dedicated 64-KB L1 cache and 512-KB L2 cache for each core

• A 6-MB L3 cache shared among all cores

• 45 nm silicon process technology

• Support for DDR2 memory

• HyperTransport™ 3 technology

• HyperTransport (HT) Assist technology

• AMD Smart Fetch Technology

• Enhanced AMD PowerNow! with Independent Dynamic Core Technology and Dual Dynamic

Power Management

• AMD-V™ with Rapid Virtualization Indexing

Page 13

Figure 6. The Six-Core AMD Opteron processor operates in the same power and thermal envelope as the Quad-Core Opteron processor while improving performance by up to 50%.

HT Assist

HT Assi or eight sockets. It is designed to maintain data correctness (coherence) between the processors and minimize inter-processor communication traffic on the HyperTransport links.

In a multi-socket system, each processor has to ensure that it is executing the latest data, or cache line, to maintain coherence. Before a processor can execute a transaction, it probes the caches of the other processors by broadcasting a coherence protocol and only requests data from system memory is there is a cache miss. All of these latency-sensitive messages—probe requests, probe responses, data requests, and data responses—are transmitted over the HyperTransport links. For example, one cache line coherency check in a 4-socket system can generate 10 or more messages over the four HyperTransport links between the processors. In a 4- or 8-socket system with six-core AMD Opteron processors (a total of 24 or 48 processor cores), this traffic can severely load the HyperTransport links.

HT Assist uses 1MB of each processor's 6-MB L3 cache as a directory cache to track all cache lines stored in the multi-socket system. This allows a multi-core processor to probe its own L3 cache when checking a cache line, called a Probe Filter Lookup, instead of broadcasting numerous cache probes over the HyperTransport links. With HT Assist, a cache line coherency check in the previously mentioned 4-socket system may only generate two to three messages. The Probe Filter Lookup also reduces latency for accesses to local DRAM because there is no need to broadcast probe requests and wait for responses.

st helps increase performance of six-core AMD Opteron processor-based systems with four

The performance benefits of HT Assist in 4- and 8-socket systems outweigh the small decrease in available L3 data cache. HT Assist does not need to be enabled on 2-socket systems where there is much less cache probe traffic.

Page 14

Future AMD Opteron processors

A new generation of processor socket (called G34) is planned for the first half of 2010. It will feature DDR3 memory, the AMD RD890 chipset, and an additional HT link. New 8- and 12-core AMD Opteron processors, codenamed Magny-Cours, are planned for socket G34.

AMD is expected to continue improving the AMD Opteron processor family with faster memory and HyperTransport speeds. AMD has also announced the Torrenza initiative which will provide an additional chip socket for a co-processor on the motherboard. This socket will include a HyperTransport bus connection, and it will support graphics and other more specialized third-party co-processors.

Software licensing

Customers should be aware of possible changes in software licensing for use of multi-core processors. At this writing, major OS vendors, such as Microsoft, treat multi-core processors as performance improvements to a single processor purposes among processors with one, two, four, or more cores. However, customers should check with their OS and application vendors to determine particular licensing requirements.

; they are not making a distinction for licensing

In April 2009, AMD announced a ROM feature called AMD Core Select that enables IT managers to turn off one or more cores to fine tune hardware for specific operating conditions and workloads, or to address software licensing issues.

Conclusion

HP ProLiant servers continue to offer both AMD Opteron and Intel® Xeon™ processor architectures to deliver the best possible choice to customers. HP ProLiant servers using the AMD Opteron processor family have proven their performance in numerous benchmarks and systems. Multi-core AMD Opteron technology takes advantage of multi-threaded applications and reduces latencies, providing higher performance within the same power budget.

Refer to the Microsoft website http://www.microsoft.com/licensing/highlights/multicore.mspx

Page 15

For more information

For additional information, refer to the resources listed below.

HyperTransport Consortium

Multi-Core Processors—The Next Evolution in Computing

ISS Technology Papers

http://www.hypertransport.org

http://multicore.amd.com/Resources/3

http://www.hp.com/servers/technology

Call to action

Send comments about this paper to TechCom@HP.com.

1A_Multi-Core_WP_en.pdf

321

© 2009 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.

AMD and AMD Opteron are trademarks of Advanced Micro Devices, Inc.

HyperTransport is a licensed trademark of the HyperTransport Technology Consortium.

Intel is a registered trademark and Xeon is a trademark of Intel Corporation in the U.S. and other countries.

TC091005TB, October 2009

HP DL585 Introduction Manual

Specifications and Main Features

Frequently Asked Questions

User Manual