Hp COMPAQ PROLIANT CL380, PROLIANT BL35P, PROLIANT BL30P, PROLIANT BL25P, PROLIANT DL145 G3 The Intel® processor roadmap for industrystandard servers technology brief, 8th edition

...
The Intel® processor roadmap for industry­standard servers
technology brief, 8th edition
Abstract.............................................................................................................................................. 2
Introduction......................................................................................................................................... 2
Intel processor architecture and microarchitectures................................................................................... 2
NetBurst
Intel Core™ microarchitecture ............................................................................................................. 12
®
microarchitecture................................................................................................................... 5
Hyper-pipeline and clock frequency ................................................................................................... 5
Hyper-Threading Technology............................................................................................................. 7
NetBurst microarchitecture on 90nm silicon process technology............................................................. 9
Extended hyper-pipeline.............................................................................................................. 10
SSE3 instructions ........................................................................................................................ 10
64-bit extensions — Intel 64 ........................................................................................................ 10
Dual-core technology...................................................................................................................... 11
Processors ..................................................................................................................................... 12
Xeon dual-core processors............................................................................................................... 12
Xeon quad-core processors ............................................................................................................. 13
Enhanced SpeedStep® Technology .............................................................................................. 14
Intel Virtualization® Technology................................................................................................... 15
Performance comparisons................................................................................................................... 15
TPC-C performance ........................................................................................................................ 15
SPEC performance ......................................................................................................................... 16
Intel Nahalem microarchitecture .......................................................................................................... 17
Conclusion........................................................................................................................................ 17
For more information.......................................................................................................................... 18
Call to action .................................................................................................................................... 18

Abstract

Intel® continues to introduce processor technologies that boost the performance of x86 processors in multi-threaded environments. This paper describes these processors and some of the more important innovations as they affect HP industry-standard enterprise servers.

Introduction

As standards-based computing has pushed into the enterprise server market, the demand for increased performance and greater variety in processor solutions has grown with it. To meet this demand, Intel continues to introduce processor innovations and new speeds. This paper summarizes the recent history and near-term plans for Intel processors as they relate to the industry-standard enterprise server market.

Intel processor architecture and microarchitectures

The Intel processor architecture refers to its x86 instruction set and registers that are exposed to programmers. The x86 instruction set is the list of all instructions and their variations that can be executed by processors derived from the original 16-bit 8086 processor architecture. Processor manufacturers, such as Intel and AMD, use a common processor architecture to maintain backward and forward compatibility of the instruction set among generations of their processors. Intel refers to its 32-bit and 64-bit versions of the x86 processor architecture as Intel Architecture (IA)-32 and IA-64. In comparison, the term “microarchitecture” refers to each processor’s physical design that implements the instruction set. Processors with different microarchitectures, Intel and AMD x86 processors for example, can still use a common instruction set.
Figure 1 shows the relationship between the x86 processor architecture and Intel’s evolving microarchitectures, as well as processors based on these microarchitectures.
Figure 1. Intel processor architecture and microarchitectures for industry-standard enterprise servers
Intel processor sequences are intended to help developers select the best processor for a particular platform design. Intel offers three processor number sequences for server applications (see Table 1). Intel processor series numbers within a sequence (for example, 5100 series) help differentiate processor features such as number of cores, architecture, cache, power dissipation, and embedded Intel technologies.
Table 1. Intel processor sequences
Processor sequence Platform
Dual-Core Intel® Xeon™ processor 3000 sequence Uni-processor servers
Dual-Core and Quad-Core Intel® Xeon™ processor 5000 sequence
Dual-Core and Quad-Core Intel® Xeon™ processor 7000 sequence
Dual-processor high-volume servers and workstations
Enterprise servers with 4 to 32-processors
Intel enhances the microarchitecture of a family of processors over time to improve performance and capability while maintaining compatibility with the processor architecture. One method to enhance the microarchitectures involves changing the silicon process technology. For example, Figure 2 shows that Intel enhanced NetBurst-based processors in 2004 by changing the manufacturing process from 130nm to 90nm silicon process technology.
In the second half of 2006, Intel launched the Core® microarchitecture, which is the basis for the multi-core Xeon 5000 Sequence processors, including the first quad-core Xeon processor (Clovertown). Beginning with the Penryn family of processors, Intel plans to enhance the performance and energy efficiency of Intel Core microarchitecture-based processors by switching from 65nm to 45nm Hi-k
1
process technology with the hafnium-based high-K + metal gate transistor design. In 2008, Intel plans initial production of processors based on the “next generation” Nehalem microarchitecture.
Figure 2. Intel microarchitecture introductions and associated silicon process technologies for industry-standard
servers
1
Hi-k, or High-k, stands for high dielectric constant, a measure of how much charge a material can hold. For
more information, refer to http://www.intel.com/technology/silicon/high-k.htm?iid=tech_arch_45nm+body_hik.
Table 2 includes more details about the release dates and features of previously released Intel x86 processors as well as processors projected to be available through 2007.
Table 2. Release dates and features of Intel x86 processors
Code Name
Smithfield Pentium D 90 Dual-core uni-
Irwindale Xeon 90 2MB L2 version of
Cranford Xeon MP 90 Xeon MP 1Q2005 1MB L2 667
Prescott 2M Xeon 90 2MB L2 version of
Potomac Xeon MP 90 Xeon MP 1Q2005 8MB L3 667
Paxville Xeon MP 90 Dual-core Xeon MP 4Q2005 2x1MB
Paxville Xeon MP 90 Dual-core Xeon MP 4Q2005 2x2MB
Presler Pentium D 65 Dual-core uni-
Dempsey Xeon
Market name
5000
Feature size (nm)
65 Dual-core Xeon 1H2006 2MB L2
Description Date available/
Projected availability
2H2005 1MB L2
processor
1Q2005 2MB L2 800
Nocona
1Q2005 2MB L2 800
Prescott
Q12006 2MB L2
processor
Cache Max. Bus
800 per core
800 L2
800 L2
>800 per core
1066 per core
speed (MT/s)
1
Woodcrest Xeon
5100
Conroe Core 2
Duo
Conroe Xeon 65 Dual-core,
Tulsa Xeon MP 65 Dual-core Xeon MP 4Q2006 16MB
Clovertown Xeon 65 Quad-core Xeon 4Q2006 2x4MB
Tigerton Xeon 65 Quad-core Xeon 2H2007 8MB L2 1066 MHz
Wolfdale Xeon 45 Dual-core 1Q2008 1x6MB
Harpertown Xeon 45 Quad-core Xeon 4Q2007 2x6MB
1
MT/s is an abbreviation for Mega-Transfers per second. A bus operating at 200 MHz and
65 Dual-core Xeon 1H2006 4MB L2
shared
65 Dual-core,
uni-processor
uni-processor
Mid-2006 4MB L2
shared
3Q2006 4MB L2
shared
L3
L2
L2
L2
1333
1333 MHz
1333 MHz
800 MHz
1333 MHz
1600 MHz*
1333/1600
MHz*
transferring four data packets on each clock (referred to as quad-pumped) would have 800 MT/s.
* Selected chipsets only
4

NetBurst® microarchitecture

The NetBurst-based processor for low-cost, single-processor servers is the Pentium® 4 processor. The original 180nm version of the Pentium 4 was known as Willamette, and the subsequent 130nm version was known as Northwood. NetBurst-based processors intended for multi-processor environments are referred to as Intel® Xeon™ (for dual-processor systems) and Xeon MP (for systems using more than two processors).
The NetBurst microarchitecture included the following enhancements:
Higher bandwidth for instruction fetches
256-KB Level 2 (L2) cache with 64-byte cache lines
NetBurst system bus: a 64-bit, 100-MHz bus capable of providing 3.2 GB/s of bandwidth by
double pumping the address and quad pumping the data. The 100-MHz quad pumped data bus is also referred to as a 400-MHz data bus. To provide higher levels of performance, Intel added support for a 533-MHz front side bus to the Pentium 4 and Xeon processors and later added support for 800 MHz to the Pentium 4.
Integer arithmetic logic unit (ALU) running at twice the clock speed (double data rate)
Modified floating point unit (FPU)
Streaming SIMD extension 2 (SSE2): New instructions bring the total to 144 SIMD instructions to
manage floating point, application, and multimedia performance.
Advanced dynamic execution
Deeper instruction window for out-of-order, speculative execution and improved branch prediction
over the P6 dynamic execution core
Execution trace cache (stores pre-decoded micro-operations)
Enhanced floating point/multimedia engine
Hyper-threading (HT) in Xeon processors and Pentium 4 processors (described below)

Hyper-pipeline and clock frequency

One performance-enhancing feature of the NetBurst microarchitecture was its hyper-pipeline, a 20­stage branch-prediction pipeline. Previous 32-bit processors had a 10-stage pipeline. The hyper­pipeline can contain more than 100 instructions at once and can handle up to 48 loads and stores concurrently. The pipeline in a processor is analogous to a factory assembly line where production is split into multiple stages to keep all factory workers busy and to complete multiple stages in parallel. Likewise, the work to execute program code is split into stages to keep the processor busy and allow it to execute more code during each clock cycle. In this case, the processor must complete the operation for each stage within a single clock cycle. The processor can achieve this by splitting the task into smaller tasks and using more (shorter) stages to execute the instructions (Figure 3). Thus, each stage can be completed quicker, allowing the processor to have a higher clock frequency. However, it is important to understand that splitting each stage into smaller stage to achieve a higher clock frequency does not mean that more work is being done in the pipeline per clock cycle.
5
Figure 3. Decreasing the amount of work done in each stage allows the clock frequency to increase
A basic structure for a computer pipeline consists of the following four steps, which are performed repeatedly to execute a program:
1. Fetch the next instruction from the address stored in the program counter.
2. Store that instruction in the instruction register and decode it, and increment the address in the
program counter.
3. Execute the instruction currently in the instruction register.
4. Write the results of that instruction from the execution unit back into the destination register.
Typical processor architectures split the pipeline into segments that perform those basic steps: the “front end” of the microprocessor, the execution engine, and the retire unit, as shown in Figure 4. The front end fetches the instruction and decodes it into smaller instructions (commonly referred to as micro-ops). These decoded instructions are sent to one of the three types of execution units (integer, load/store, or floating point) to be executed. Finally, the instruction is retired and the result is written back to its destination register.
Figure 4. Basic 4-stage pipeline schematic
6
Loading...
+ 12 hidden pages