For more information.......................................................................................................................... 18
Call to action .................................................................................................................................... 18
Abstract
Intel® continues to introduce processor technologies that boost the performance of x86 processors in
multi-threaded environments. This paper describes these processors and some of the more important
innovations as they affect HP industry-standard enterprise servers.
Introduction
As standards-based computing has pushed into the enterprise server market, the demand for
increased performance and greater variety in processor solutions has grown with it. To meet this
demand, Intel continues to introduce processor innovations and new speeds. This paper summarizes
the recent history and near-term plans for Intel processors as they relate to the industry-standard
enterprise server market.
Intel processor architecture and microarchitectures
The Intel processor architecture refers to its x86 instruction set and registers that are exposed to
programmers. The x86 instruction set is the list of all instructions and their variations that can be
executed by processors derived from the original 16-bit 8086 processor architecture. Processor
manufacturers, such as Intel and AMD, use a common processor architecture to maintain backward
and forward compatibility of the instruction set among generations of their processors. Intel refers to
its 32-bit and 64-bit versions of the x86 processor architecture as Intel Architecture (IA)-32 and IA-64.
In comparison, the term “microarchitecture” refers to each processor’s physical design that implements
the instruction set. Processors with different microarchitectures, Intel and AMD x86 processors for
example, can still use a common instruction set.
Figure 1 shows the relationship between the x86 processor architecture and Intel’s evolving
microarchitectures, as well as processors based on these microarchitectures.
Figure 1. Intel processor architecture and microarchitectures for industry-standard enterprise servers
Intel processor sequences are intended to help developers select the best processor for a particular
platform design. Intel offers three processor number sequences for server applications (see Table 1).
Intel processor series numbers within a sequence (for example, 5100 series) help differentiate
processor features such as number of cores, architecture, cache, power dissipation, and embedded
Intel technologies.
Dual-Core and Quad-Core Intel® Xeon™ processor
5000 sequence
Dual-Core and Quad-Core Intel® Xeon™ processor
7000 sequence
Dual-processor high-volume servers and
workstations
Enterprise servers with 4 to 32-processors
Intel enhances the microarchitecture of a family of processors over time to improve performance and
capability while maintaining compatibility with the processor architecture. One method to enhance
the microarchitectures involves changing the silicon process technology. For example, Figure 2 shows
that Intel enhanced NetBurst-based processors in 2004 by changing the manufacturing process from
130nm to 90nm silicon process technology.
In the second half of 2006, Intel launched the Core® microarchitecture, which is the basis for the
multi-core Xeon 5000 Sequence processors, including the first quad-core Xeon processor
(Clovertown). Beginning with the Penryn family of processors, Intel plans to enhance the performance
and energy efficiency of Intel Core microarchitecture-based processors by switching from 65nm to
45nm Hi-k
1
process technology with the hafnium-based high-K + metal gate transistor design. In
2008, Intel plans initial production of processors based on the “next generation” Nehalem
microarchitecture.
Figure 2. Intel microarchitecture introductions and associated silicon process technologies for industry-standard
servers
1
Hi-k, or High-k, stands for high dielectric constant, a measure of how much charge a material can hold. For
more information, refer tohttp://www.intel.com/technology/silicon/high-k.htm?iid=tech_arch_45nm+body_hik.
Table 2 includes more details about the release dates and features of previously released Intel x86
processors as well as processors projected to be available through 2007.
Table 2. Release dates and features of Intel x86 processors
MT/s is an abbreviation for Mega-Transfers per second. A bus operating at 200 MHz and
65 Dual-core Xeon 1H2006 4MB L2
shared
65 Dual-core,
uni-processor
uni-processor
Mid-2006 4MB L2
shared
3Q2006 4MB L2
shared
L3
L2
L2
L2
1333
1333 MHz
1333 MHz
800 MHz
1333 MHz
1600 MHz*
1333/1600
MHz*
transferring four data packets on each clock (referred to as quad-pumped) would have 800 MT/s.
* Selected chipsets only
4
NetBurst® microarchitecture
The NetBurst-based processor for low-cost, single-processor servers is the Pentium® 4 processor. The
original 180nm version of the Pentium 4 was known as Willamette, and the subsequent 130nm
version was known as Northwood. NetBurst-based processors intended for multi-processor
environments are referred to as Intel® Xeon™ (for dual-processor systems) and Xeon MP (for systems
using more than two processors).
The NetBurst microarchitecture included the following enhancements:
• Higher bandwidth for instruction fetches
• 256-KB Level 2 (L2) cache with 64-byte cache lines
• NetBurst system bus: a 64-bit, 100-MHz bus capable of providing 3.2 GB/s of bandwidth by
double pumping the address and quad pumping the data. The 100-MHz quad pumped data bus is
also referred to as a 400-MHz data bus. To provide higher levels of performance, Intel added
support for a 533-MHz front side bus to the Pentium 4 and Xeon processors and later added
support for 800 MHz to the Pentium 4.
• Integer arithmetic logic unit (ALU) running at twice the clock speed (double data rate)
• Modified floating point unit (FPU)
• Streaming SIMD extension 2 (SSE2): New instructions bring the total to 144 SIMD instructions to
manage floating point, application, and multimedia performance.
• Advanced dynamic execution
• Deeper instruction window for out-of-order, speculative execution and improved branch prediction
• Hyper-threading (HT) in Xeon processors and Pentium 4 processors (described below)
Hyper-pipeline and clock frequency
One performance-enhancing feature of the NetBurst microarchitecture was its hyper-pipeline, a 20stage branch-prediction pipeline. Previous 32-bit processors had a 10-stage pipeline. The hyperpipeline can contain more than 100 instructions at once and can handle up to 48 loads and stores
concurrently. The pipeline in a processor is analogous to a factory assembly line where production is
split into multiple stages to keep all factory workers busy and to complete multiple stages in parallel.
Likewise, the work to execute program code is split into stages to keep the processor busy and allow
it to execute more code during each clock cycle. In this case, the processor must complete the
operation for each stage within a single clock cycle. The processor can achieve this by splitting the
task into smaller tasks and using more (shorter) stages to execute the instructions (Figure 3). Thus,
each stage can be completed quicker, allowing the processor to have a higher clock frequency.
However, it is important to understand that splitting each stage into smaller stage to achieve a higher
clock frequency does not mean that more work is being done in the pipeline per clock cycle.
5
Figure 3. Decreasing the amount of work done in each stage allows the clock frequency to increase
A basic structure for a computer pipeline consists of the following four steps, which are performed
repeatedly to execute a program:
1. Fetch the next instruction from the address stored in the program counter.
2. Store that instruction in the instruction register and decode it, and increment the address in the
program counter.
3. Execute the instruction currently in the instruction register.
4. Write the results of that instruction from the execution unit back into the destination register.
Typical processor architectures split the pipeline into segments that perform those basic steps: the
“front end” of the microprocessor, the execution engine, and the retire unit, as shown in Figure 4. The
front end fetches the instruction and decodes it into smaller instructions (commonly referred to as
micro-ops). These decoded instructions are sent to one of the three types of execution units (integer,
load/store, or floating point) to be executed. Finally, the instruction is retired and the result is written
back to its destination register.
Figure 4. Basic 4-stage pipeline schematic
6
Loading...
+ 12 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.