Hp COMPAQ PROLIANT CL380, PROLIANT BL35P, PROLIANT BL30P, PROLIANT BL25P, PROLIANT DL145 G3 The Intel® processor roadmap for industrystandard servers technology brief, 8th edition

...
0 (0)

The Intel® processor roadmap for industrystandard servers

technology brief, 8th edition

Abstract..............................................................................................................................................

2

Introduction.........................................................................................................................................

2

Intel processor architecture and microarchitectures...................................................................................

2

NetBurst® microarchitecture...................................................................................................................

5

Hyper-pipeline and clock frequency ...................................................................................................

5

Hyper-Threading Technology.............................................................................................................

7

NetBurst microarchitecture on 90nm silicon process technology.............................................................

9

Extended hyper-pipeline ..............................................................................................................

10

SSE3 instructions ........................................................................................................................

10

64-bit extensions — Intel 64 ........................................................................................................

10

Dual-core technology......................................................................................................................

11

Intel Core™ microarchitecture .............................................................................................................

12

Processors .....................................................................................................................................

12

Xeon dual-core processors...............................................................................................................

12

Xeon quad-core processors .............................................................................................................

13

Enhanced SpeedStep® Technology ..............................................................................................

14

Intel Virtualization® Technology...................................................................................................

15

Performance comparisons...................................................................................................................

15

TPC-C performance ........................................................................................................................

15

SPEC performance .........................................................................................................................

16

Intel Nahalem microarchitecture ..........................................................................................................

17

Conclusion........................................................................................................................................

17

For more information..........................................................................................................................

18

Call to action ....................................................................................................................................

18

Abstract

Intel® continues to introduce processor technologies that boost the performance of x86 processors in multi-threaded environments. This paper describes these processors and some of the more important innovations as they affect HP industry-standard enterprise servers.

Introduction

As standards-based computing has pushed into the enterprise server market, the demand for increased performance and greater variety in processor solutions has grown with it. To meet this demand, Intel continues to introduce processor innovations and new speeds. This paper summarizes the recent history and near-term plans for Intel processors as they relate to the industry-standard enterprise server market.

Intel processor architecture and microarchitectures

The Intel processor architecture refers to its x86 instruction set and registers that are exposed to programmers. The x86 instruction set is the list of all instructions and their variations that can be executed by processors derived from the original 16-bit 8086 processor architecture. Processor manufacturers, such as Intel and AMD, use a common processor architecture to maintain backward and forward compatibility of the instruction set among generations of their processors. Intel refers to its 32-bit and 64-bit versions of the x86 processor architecture as Intel Architecture (IA)-32 and IA-64. In comparison, the term “microarchitecture” refers to each processor’s physical design that implements the instruction set. Processors with different microarchitectures, Intel and AMD x86 processors for example, can still use a common instruction set.

Figure 1 shows the relationship between the x86 processor architecture and Intel’s evolving microarchitectures, as well as processors based on these microarchitectures.

Figure 1. Intel processor architecture and microarchitectures for industry-standard enterprise servers

Intel processor sequences are intended to help developers select the best processor for a particular platform design. Intel offers three processor number sequences for server applications (see Table 1). Intel processor series numbers within a sequence (for example, 5100 series) help differentiate processor features such as number of cores, architecture, cache, power dissipation, and embedded Intel technologies.

Table 1. Intel processor sequences

Processor sequence

Platform

 

 

Dual-Core Intel® Xeon™ processor 3000 sequence

Uni-processor servers

 

 

Dual-Core and Quad-Core Intel® Xeon™ processor

Dual-processor high-volume servers and

5000 sequence

workstations

 

 

Dual-Core and Quad-Core Intel® Xeon™ processor

Enterprise servers with 4 to 32-processors

7000 sequence

 

 

 

Intel enhances the microarchitecture of a family of processors over time to improve performance and capability while maintaining compatibility with the processor architecture. One method to enhance the microarchitectures involves changing the silicon process technology. For example, Figure 2 shows that Intel enhanced NetBurst-based processors in 2004 by changing the manufacturing process from 130nm to 90nm silicon process technology.

In the second half of 2006, Intel launched the Core® microarchitecture, which is the basis for the multi-core Xeon 5000 Sequence processors, including the first quad-core Xeon processor (Clovertown). Beginning with the Penryn family of processors, Intel plans to enhance the performance and energy efficiency of Intel Core microarchitecture-based processors by switching from 65nm to 45nm Hi-k1 process technology with the hafnium-based high-K + metal gate transistor design. In 2008, Intel plans initial production of processors based on the “next generation” Nehalem microarchitecture.

Figure 2. Intel microarchitecture introductions and associated silicon process technologies for industry-standard servers

1 Hi-k, or High-k, stands for high dielectric constant, a measure of how much charge a material can hold. For more information, refer to http://www.intel.com/technology/silicon/high-k.htm?iid=tech_arch_45nm+body_hik.

Table 2 includes more details about the release dates and features of previously released Intel x86 processors as well as processors projected to be available through 2007.

Table 2. Release dates and features of Intel x86 processors

Code

Market

Feature

Description

Date available/

Cache

Max. Bus

Name

name

size

 

Projected

 

speed1

 

 

(nm)

 

availability

 

(MT/s)

 

 

 

 

 

 

 

Smithfield

Pentium D

90

Dual-core uni-

2H2005

1MB L2

800

 

 

 

processor

 

per

 

 

 

 

 

 

core

 

 

 

 

 

 

 

 

Irwindale

Xeon

90

2MB L2 version of

1Q2005

2MB L2

800

 

 

 

Nocona

 

 

 

 

 

 

 

 

 

 

Cranford

Xeon MP

90

Xeon MP

1Q2005

1MB L2

667

 

 

 

 

 

 

 

Prescott 2M

Xeon

90

2MB L2 version of

1Q2005

2MB L2

800

 

 

 

Prescott

 

 

 

 

 

 

 

 

 

 

Potomac

Xeon MP

90

Xeon MP

1Q2005

8MB L3

667

 

 

 

 

 

 

 

Paxville

Xeon MP

90

Dual-core Xeon MP

4Q2005

2x1MB

800

 

 

 

 

 

L2

 

 

 

 

 

 

 

 

Paxville

Xeon MP

90

Dual-core Xeon MP

4Q2005

2x2MB

800

 

 

 

 

 

L2

 

 

 

 

 

 

 

 

Presler

Pentium D

65

Dual-core uni-

Q12006

2MB L2

>800

 

 

 

processor

 

per core

 

 

 

 

 

 

 

 

Dempsey

Xeon

65

Dual-core Xeon

1H2006

2MB L2

1066

 

5000

 

 

 

per core

 

 

 

 

 

 

 

 

Woodcrest

Xeon

65

Dual-core Xeon

1H2006

4MB L2

1333

 

5100

 

 

 

shared

 

 

 

 

 

 

 

 

Conroe

Core 2

65

Dual-core,

Mid-2006

4MB L2

1333 MHz

 

Duo

 

uni-processor

 

shared

 

 

 

 

 

 

 

 

Conroe

Xeon

65

Dual-core,

3Q2006

4MB L2

1333 MHz

 

 

 

uni-processor

 

shared

 

 

 

 

 

 

 

 

Tulsa

Xeon MP

65

Dual-core Xeon MP

4Q2006

16MB

800 MHz

 

 

 

 

 

L3

 

 

 

 

 

 

 

 

Clovertown

Xeon

65

Quad-core Xeon

4Q2006

2x4MB

1333 MHz

 

 

 

 

 

L2

 

 

 

 

 

 

 

 

Tigerton

Xeon

65

Quad-core Xeon

2H2007

8MB L2

1066 MHz

 

 

 

 

 

 

 

Wolfdale

Xeon

45

Dual-core

1Q2008

1x6MB

1600 MHz*

 

 

 

 

 

L2

 

 

 

 

 

 

 

 

Harpertown

Xeon

45

Quad-core Xeon

4Q2007

2x6MB

1333/1600

 

 

 

 

 

L2

MHz*

1 MT/s is an abbreviation for Mega-Transfers per second. A bus operating at 200 MHz and transferring four data packets on each clock (referred to as quad-pumped) would have 800 MT/s.

* Selected chipsets only

4

NetBurst® microarchitecture

The NetBurst-based processor for low-cost, single-processor servers is the Pentium® 4 processor. The original 180nm version of the Pentium 4 was known as Willamette, and the subsequent 130nm version was known as Northwood. NetBurst-based processors intended for multi-processor environments are referred to as Intel® Xeon™ (for dual-processor systems) and Xeon MP (for systems using more than two processors).

The NetBurst microarchitecture included the following enhancements:

Higher bandwidth for instruction fetches

256-KB Level 2 (L2) cache with 64-byte cache lines

NetBurst system bus: a 64-bit, 100-MHz bus capable of providing 3.2 GB/s of bandwidth by double pumping the address and quad pumping the data. The 100-MHz quad pumped data bus is also referred to as a 400-MHz data bus. To provide higher levels of performance, Intel added support for a 533-MHz front side bus to the Pentium 4 and Xeon processors and later added support for 800 MHz to the Pentium 4.

Integer arithmetic logic unit (ALU) running at twice the clock speed (double data rate)

Modified floating point unit (FPU)

Streaming SIMD extension 2 (SSE2): New instructions bring the total to 144 SIMD instructions to manage floating point, application, and multimedia performance.

Advanced dynamic execution

Deeper instruction window for out-of-order, speculative execution and improved branch prediction over the P6 dynamic execution core

Execution trace cache (stores pre-decoded micro-operations)

Enhanced floating point/multimedia engine

Hyper-threading (HT) in Xeon processors and Pentium 4 processors (described below)

Hyper-pipeline and clock frequency

One performance-enhancing feature of the NetBurst microarchitecture was its hyper-pipeline, a 20stage branch-prediction pipeline. Previous 32-bit processors had a 10-stage pipeline. The hyperpipeline can contain more than 100 instructions at once and can handle up to 48 loads and stores concurrently. The pipeline in a processor is analogous to a factory assembly line where production is split into multiple stages to keep all factory workers busy and to complete multiple stages in parallel. Likewise, the work to execute program code is split into stages to keep the processor busy and allow it to execute more code during each clock cycle. In this case, the processor must complete the operation for each stage within a single clock cycle. The processor can achieve this by splitting the task into smaller tasks and using more (shorter) stages to execute the instructions (Figure 3). Thus, each stage can be completed quicker, allowing the processor to have a higher clock frequency. However, it is important to understand that splitting each stage into smaller stage to achieve a higher clock frequency does not mean that more work is being done in the pipeline per clock cycle.

5

Hp COMPAQ PROLIANT CL380, PROLIANT BL35P, PROLIANT BL30P, PROLIANT BL25P, PROLIANT DL145 G3 The Intel® processor roadmap for industrystandard servers technology brief, 8th edition

Figure 3. Decreasing the amount of work done in each stage allows the clock frequency to increase

A basic structure for a computer pipeline consists of the following four steps, which are performed repeatedly to execute a program:

1.Fetch the next instruction from the address stored in the program counter.

2.Store that instruction in the instruction register and decode it, and increment the address in the program counter.

3.Execute the instruction currently in the instruction register.

4.Write the results of that instruction from the execution unit back into the destination register.

Typical processor architectures split the pipeline into segments that perform those basic steps: the “front end” of the microprocessor, the execution engine, and the retire unit, as shown in Figure 4. The front end fetches the instruction and decodes it into smaller instructions (commonly referred to as micro-ops). These decoded instructions are sent to one of the three types of execution units (integer, load/store, or floating point) to be executed. Finally, the instruction is retired and the result is written back to its destination register.

Figure 4. Basic 4-stage pipeline schematic

6

Loading...
+ 12 hidden pages