Intel NetBurst User Manual

A Detailed Look Inside the
Intel® NetBurst™ Micro-Architecture of
the Intel Pentium® 4 Processor
November, 2000
A Detailed Look Inside the Intel
NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor
Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel’s Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.”
Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
The Intel® Pentium® 4 processor may contain design defects or errors known as errata. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be
obtained by calling 1-800-548-4725 or by visiting Intel’s Website at http://www.intel.com. Copyright © 2000 Intel Corporation.
* Third-party brands and names are the property of their respective owners.
Page 2
A Detailed Look Inside the Intel
Revision History
Revision Date Revision Major Changes
11/2000 1.0 Release
NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor
Page 3
A Detailed Look Inside the Intel
NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor
Table of Contents
ABOUT THIS DOCUMENT................................................................................................................. 5
INTRODUCTION................................................................................................................................ 6
SIMD TECHNOLOGY AND STREAMING SIMD EXTENSIONS 2......................................................... 6
Summary of SIMD Technologies..............................................................................................................................................7
INTEL® NETBURST™ MICRO-ARCHITECTURE................................................................................. 9
The Design Considerations of the Intel NetBurst Micro-architecture............................................................................9
Overview of the Intel NetBurst Micro-architecture Pipeline..........................................................................................10
The Front End.............................................................................................................................................................................10
The Out-of-order Core...............................................................................................................................................................11
Retirement...................................................................................................................................................................................11
Front End Pipeline Detail..........................................................................................................................................................11
Prefetching...................................................................................................................................................................................12
Decoder........................................................................................................................................................................................12
Execution Trace Cache..............................................................................................................................................................12
Branch Prediction.......................................................................................................................................................................12
Branch Hints ...............................................................................................................................................................................13
Execution Core Detail................................................................................................................................................................13
Instruction Latency and Throughput......................................................................................................................................13
Execution Units and Issue Ports ..............................................................................................................................................14
Caches..........................................................................................................................................................................................15
Data Prefetch...............................................................................................................................................................................15
Loads and Stores........................................................................................................................................................................16
Store Forwarding........................................................................................................................................................................17
Page 4
A Detailed Look Inside the Intel
NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor
About this Document
The Intel® NetBurst™ micro-architecture is the foundation for the Intel® Pentium® 4 processor. It includes several important new features and innovations that will allow the Intel Pentium 4 processor and future IA-32 processors to deliver industry leading performance for the next several years. This paper provides an in-depth examination of the features and functions the Intel NetBurst micro-architecture.
Page 5
A Detailed Look Inside the Intel
NetBurst™ Micro-Architecture of the Intel Pentium® 4 Processor
Introduction
The Intel® Pentium® 4 processor, utilizing the Intel® NetBurstTM micro-architecture, is a complete processor re­design that delivers new technologies and capabilities while advancing many of the innovative features, such as “out-of-order speculative execution” and “super-scalar execution”, introduced on prior Intel® micro-architectural generations. Many of these new innovations and advances were made possible with the improvements in processor technology, process technology and circuit design and could not previously be implemented in high-volume, manufacturable solutions. The features and resulting benefits of the new micro-architecture are defined in the following sections.
This paper begins with a brief introduction of three generations of single-instruction, multiple-data (SIMD) technology. The rest of this paper describes the principle of operation of the innovations of Intel Pentium 4 processor with respect to the Intel NetBurst micro-architecture and the implementation characteristics of the Pentium 4 processor.
SIMD Technology and Streaming SIMD Extensions 2
One way to increase processor performance is to execute several computations in parallel, so that multiple computations are done with a single instruction. The way to achieve this type of parallel execution is to use the single-instruction, multiple-data (SIMD) computation technique.
Figure 1 shows a typical SIMD computation. Here two sets of four packed data elements (X1, X2, X3, and X4, and Y1, Y2, Y3, and Y4) are operated on in parallel, with the same operation being performed on each
Figure 1 Typical SIMD Operations
X4 X1X2X3
corresponding pair of data elements (X1 and Y1, X2 and Y2, X3 and Y3, and X4 and Y4). The results of the four parallel computations are a set of four packed data elements.
SIMD computations like those shown in Figure 1 were introduced into the Intel IA-32 architecture with the Intel
Y4 Y1 Y2 Y3
op opopop
MMX™ technology. The Intel MMX technology allows SIMD computations to be performed on packed byte, word, and doubleword integers that are contained in a set
X4 op Y4 X1 op Y1X2 op Y2X3 op Y3
of eight 64-bit registers called the MMX registers (see Figure 2). The Pentium III processor extended this initial SIMD computation model with the introduction of the Streaming
SIMD Extensions (SSE). The Streaming SIMD Extensions allow SIMD computations to be
Figure 2 Registers available to SIMD Instructions
performed on operands that contain four packed single-precision floating-point data elements. The operands can be either in memory or in a set of eight 128-bit registers called the XMM registers (see Figure 2). The SSE also extended SIMD computational capability with additional 64-bit MMX instructions.
The Pentium 4 processor further extends the SIMD computation model with the introduction of the Streaming SIMD Extensions 2 (SSE2). The SSE2 extensions also work with operands in either memory or in the XMM registers. The SSE2 extends SIMD
MM7 MM6 MM5 MM4 MM3 MM2 MM1 MM0
128 Bit XMM Registers64 Bit MMXTM Registers
XMM7 XMM6 XMM5 XMM4 XMM3 XMM2 XMM1 XMM0
Page 6
Loading...
+ 11 hidden pages