Hp COMPAQ PROLIANT CL380, PROLIANT BL35P, PROLIANT BL30P, PROLIANT BL25P, PROLIANT DL145 G3 The Intel® processor roadmap for industrystandard servers technology brief, 8th edition

...

The Intel® processor roadmap for industrystandard servers

technology brief, 8th edition

Abstract.............................................................................................................................................. 2

Introduction......................................................................................................................................... 2

Intel processor architecture and microarchitectures................................................................................... 2

NetBurst

Intel Core™ microarchitecture ............................................................................................................. 12

microarchitecture................................................................................................................... 5

Hyper-pipeline and clock frequency ................................................................................................... 5

Hyper-Threading Technology............................................................................................................. 7

NetBurst microarchitecture on 90nm silicon process technology............................................................. 9

Extended hyper-pipeline.............................................................................................................. 10

SSE3 instructions ........................................................................................................................ 10

64-bit extensions — Intel 64 ........................................................................................................ 10

Dual-core technology...................................................................................................................... 11

Processors ..................................................................................................................................... 12

Xeon dual-core processors............................................................................................................... 12

Xeon quad-core processors ............................................................................................................. 13

Enhanced SpeedStep® Technology .............................................................................................. 14

Intel Virtualization® Technology................................................................................................... 15

Performance comparisons................................................................................................................... 15

TPC-C performance ........................................................................................................................ 15

SPEC performance ......................................................................................................................... 16

Intel Nahalem microarchitecture .......................................................................................................... 17

Conclusion........................................................................................................................................ 17

For more information.......................................................................................................................... 18

Call to action .................................................................................................................................... 18

Abstract

Intel® continues to introduce processor technologies that boost the performance of x86 processors in multi-threaded environments. This paper describes these processors and some of the more important innovations as they affect HP industry-standard enterprise servers.

Introduction

As standards-based computing has pushed into the enterprise server market, the demand for increased performance and greater variety in processor solutions has grown with it. To meet this demand, Intel continues to introduce processor innovations and new speeds. This paper summarizes the recent history and near-term plans for Intel processors as they relate to the industry-standard enterprise server market.

Intel processor architecture and microarchitectures

The Intel processor architecture refers to its x86 instruction set and registers that are exposed to programmers. The x86 instruction set is the list of all instructions and their variations that can be executed by processors derived from the original 16-bit 8086 processor architecture. Processor manufacturers, such as Intel and AMD, use a common processor architecture to maintain backward and forward compatibility of the instruction set among generations of their processors. Intel refers to its 32-bit and 64-bit versions of the x86 processor architecture as Intel Architecture (IA)-32 and IA-64. In comparison, the term “microarchitecture” refers to each processor’s physical design that implements the instruction set. Processors with different microarchitectures, Intel and AMD x86 processors for example, can still use a common instruction set.

Figure 1 shows the relationship between the x86 processor architecture and Intel’s evolving microarchitectures, as well as processors based on these microarchitectures.

Figure 1. Intel processor architecture and microarchitectures for industry-standard enterprise servers

Intel processor sequences are intended to help developers select the best processor for a particular platform design. Intel offers three processor number sequences for server applications (see Table 1). Intel processor series numbers within a sequence (for example, 5100 series) help differentiate processor features such as number of cores, architecture, cache, power dissipation, and embedded Intel technologies.

Table 1. Intel processor sequences

Processor sequence Platform

Dual-Core Intel® Xeon™ processor 3000 sequence Uni-processor servers

Dual-Core and Quad-Core Intel® Xeon™ processor 5000 sequence

Dual-Core and Quad-Core Intel® Xeon™ processor 7000 sequence

Dual-processor high-volume servers and workstations

Enterprise servers with 4 to 32-processors

Intel enhances the microarchitecture of a family of processors over time to improve performance and capability while maintaining compatibility with the processor architecture. One method to enhance the microarchitectures involves changing the silicon process technology. For example, Figure 2 shows that Intel enhanced NetBurst-based processors in 2004 by changing the manufacturing process from 130nm to 90nm silicon process technology.

In the second half of 2006, Intel launched the Core® microarchitecture, which is the basis for the multi-core Xeon 5000 Sequence processors, including the first quad-core Xeon processor (Clovertown). Beginning with the Penryn family of processors, Intel plans to enhance the performance and energy efficiency of Intel Core microarchitecture-based processors by switching from 65nm to 45nm Hi-k

process technology with the hafnium-based high-K + metal gate transistor design. In 2008, Intel plans initial production of processors based on the “next generation” Nehalem microarchitecture.

Figure 2. Intel microarchitecture introductions and associated silicon process technologies for industry-standard

servers

Hi-k, or High-k, stands for high dielectric constant, a measure of how much charge a material can hold. For

more information, refer to http://www.intel.com/technology/silicon/high-k.htm?iid=tech_arch_45nm+body_hik.

Table 2 includes more details about the release dates and features of previously released Intel x86 processors as well as processors projected to be available through 2007.

Table 2. Release dates and features of Intel x86 processors

Code Name

Smithfield Pentium D 90 Dual-core uni-

Irwindale Xeon 90 2MB L2 version of

Cranford Xeon MP 90 Xeon MP 1Q2005 1MB L2 667

Prescott 2M Xeon 90 2MB L2 version of

Potomac Xeon MP 90 Xeon MP 1Q2005 8MB L3 667

Paxville Xeon MP 90 Dual-core Xeon MP 4Q2005 2x1MB

Paxville Xeon MP 90 Dual-core Xeon MP 4Q2005 2x2MB

Presler Pentium D 65 Dual-core uni-

Dempsey Xeon

Market name

5000

Feature size (nm)

65 Dual-core Xeon 1H2006 2MB L2

Description Date available/

Projected availability

2H2005 1MB L2

processor

1Q2005 2MB L2 800

Nocona

1Q2005 2MB L2 800

Prescott

Q12006 2MB L2

processor

Cache Max. Bus

800 per core

800 L2

>800 per core

1066 per core

speed (MT/s)

Woodcrest Xeon

5100

Conroe Core 2

Duo

Conroe Xeon 65 Dual-core,

Tulsa Xeon MP 65 Dual-core Xeon MP 4Q2006 16MB

Clovertown Xeon 65 Quad-core Xeon 4Q2006 2x4MB

Tigerton Xeon 65 Quad-core Xeon 2H2007 8MB L2 1066 MHz

Wolfdale Xeon 45 Dual-core 1Q2008 1x6MB

Harpertown Xeon 45 Quad-core Xeon 4Q2007 2x6MB

MT/s is an abbreviation for Mega-Transfers per second. A bus operating at 200 MHz and

65 Dual-core Xeon 1H2006 4MB L2

shared

65 Dual-core,

uni-processor

Mid-2006 4MB L2

shared

3Q2006 4MB L2

shared

1333

1333 MHz

800 MHz

1333 MHz

1600 MHz*

1333/1600

MHz*

transferring four data packets on each clock (referred to as quad-pumped) would have 800 MT/s.

* Selected chipsets only

NetBurst® microarchitecture

The NetBurst-based processor for low-cost, single-processor servers is the Pentium® 4 processor. The original 180nm version of the Pentium 4 was known as Willamette, and the subsequent 130nm version was known as Northwood. NetBurst-based processors intended for multi-processor environments are referred to as Intel® Xeon™ (for dual-processor systems) and Xeon MP (for systems using more than two processors).

The NetBurst microarchitecture included the following enhancements:

• Higher bandwidth for instruction fetches

• 256-KB Level 2 (L2) cache with 64-byte cache lines

• NetBurst system bus: a 64-bit, 100-MHz bus capable of providing 3.2 GB/s of bandwidth by

double pumping the address and quad pumping the data. The 100-MHz quad pumped data bus is also referred to as a 400-MHz data bus. To provide higher levels of performance, Intel added support for a 533-MHz front side bus to the Pentium 4 and Xeon processors and later added support for 800 MHz to the Pentium 4.

• Integer arithmetic logic unit (ALU) running at twice the clock speed (double data rate)

• Modified floating point unit (FPU)

• Streaming SIMD extension 2 (SSE2): New instructions bring the total to 144 SIMD instructions to

manage floating point, application, and multimedia performance.

• Advanced dynamic execution

• Deeper instruction window for out-of-order, speculative execution and improved branch prediction

over the P6 dynamic execution core

• Execution trace cache (stores pre-decoded micro-operations)

• Enhanced floating point/multimedia engine

• Hyper-threading (HT) in Xeon processors and Pentium 4 processors (described below)

Hyper-pipeline and clock frequency

One performance-enhancing feature of the NetBurst microarchitecture was its hyper-pipeline, a 20stage branch-prediction pipeline. Previous 32-bit processors had a 10-stage pipeline. The hyperpipeline can contain more than 100 instructions at once and can handle up to 48 loads and stores concurrently. The pipeline in a processor is analogous to a factory assembly line where production is split into multiple stages to keep all factory workers busy and to complete multiple stages in parallel. Likewise, the work to execute program code is split into stages to keep the processor busy and allow it to execute more code during each clock cycle. In this case, the processor must complete the operation for each stage within a single clock cycle. The processor can achieve this by splitting the task into smaller tasks and using more (shorter) stages to execute the instructions (Figure 3). Thus, each stage can be completed quicker, allowing the processor to have a higher clock frequency. However, it is important to understand that splitting each stage into smaller stage to achieve a higher clock frequency does not mean that more work is being done in the pipeline per clock cycle.

Figure 3. Decreasing the amount of work done in each stage allows the clock frequency to increase

A basic structure for a computer pipeline consists of the following four steps, which are performed repeatedly to execute a program:

1. Fetch the next instruction from the address stored in the program counter.

2. Store that instruction in the instruction register and decode it, and increment the address in the

program counter.

3. Execute the instruction currently in the instruction register.

4. Write the results of that instruction from the execution unit back into the destination register.

Typical processor architectures split the pipeline into segments that perform those basic steps: the “front end” of the microprocessor, the execution engine, and the retire unit, as shown in Figure 4. The front end fetches the instruction and decodes it into smaller instructions (commonly referred to as micro-ops). These decoded instructions are sent to one of the three types of execution units (integer, load/store, or floating point) to be executed. Finally, the instruction is retired and the result is written back to its destination register.

Figure 4. Basic 4-stage pipeline schematic

+ 12 hidden pages

Hp COMPAQ PROLIANT CL380, PROLIANT BL35P, PROLIANT BL30P, PROLIANT BL25P, PROLIANT DL145 G3 The Intel® processor roadmap for industrystandard servers technology brief, 8th edition

Specifications and Main Features

Frequently Asked Questions

User Manual

Abstract

Introduction

Intel processor architecture and microarchitectures

NetBurst® microarchitecture

Hyper-pipeline and clock frequency