NOTE: The Penti u m® Pro Family Developer’s Manual consists of three
books: Specifications, Order Number 242690; Programmer’s Reference
Manual, Order Number 242691; and the Operating System Writer’s Guide,
Order Number 242692.
Please refer to all three volume s whe n evaluating your design needs.
1996
PATENT NOTICE
Through its investment in comp ute r tech n ology, Intel Corp ora tion (Int el ) ha s acqu ire d num e rou s
proprietary rights, including pate nts issued by the U.S . Patent and Trademar k Office. Intel has
patents covering the use o r implementation of processors in combination with other products,
e.g., certain computer systems. System and method p aten ts or pending pa tents, of Intel and
others, may apply to these syste ms. A sep arate licen se m ay be requi red fo r the ir use (se e Intel
Terms and Conditions for details). Specific Intel patents include U.S. patent 4,972,338.
Information in this document is provided in connection with Intel products. Inte l assumes no liability whatsoever,
including infringement of any patent or copyright, for sale and use of Intel products except as provided in Intel’s Terms
and Conditions of Sale for such products.
No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted herein.
Intel retains th e right to make change s to these specifications at an y time, without noti ce. Microcomputer Products
may have minor variations to this specification known as erra t a.
*Other brands and names are the property of their respective owners.
†Since p ublication of docum ents referenced in this document, registration of the Pentiu m, OverDrive and iCOMP
trademarks has been issued to Intel Corporation.
Contact your local Intel sales o ffice or your distributor to obtain the latest specificatio ns before placing your product
order.
Copies of do cuments which ha ve an o rdering numb er and are referenced i n this docume nt, or other Intel l iterature,
may be obtained from:
Intel Corporation
P.O. Box 7641
Mt. Prospect, IL 60056-7641
The Pentium® Pro microprocessor is the next generation in the Intel386™, Intel486™, and Pentium family of processors. The Pentium Pro processor implements a Dynamic Execution microarchitecture — a unique combination of multiple branch prediction, data flow analysis, and
speculative execution while maintaining binary compatibility with the 8086/88, 80286,
Intel386, Intel486, and Pe ntium processors. The Pentium Pro processor inte grates the second
level cache, the APIC, and the memory bus controller found in previous Intel processor families
into a single component, as shown in Figure 1-1.
™
I
n
t
8
6
4
l
Bus
Controll e r
e
o
r
P
r
®
t
P
e
n
u
m
i
o
s
c
e
o
r
s
Pentium Pro
Processor
Cache
SRAMs
Pentium P ro
Processo r
L2
Cache
Cache
Controller
APIC
Pentium Pro Processor
Bus Interface Uni t
APIC
Figure 1-1. The Pentium® Pro Processor Integrating the CPU, L2 Cache, APIC and Bus
Controller
A significant new feature of the Pentium Pro processor, from a system perspective, is the builtin direct multi-processing support. In order to achieve multi-processing for up to four processors
and maintain the memory and I/O bandwidth to support them, new system designs are needed
which consider the a dditional power require ments and signal integrity iss ues of supporting up
to eight loads on a high speed bus.
®
The Pentium Pro processor may be upgraded by a future OverDrive
processor and matching
voltage regulator module described in Chapter 17, OverDrive® Processor Socket Specification.
Since increasing clock frequencies and silicon density can complicate system de signs, the Pen-
tium Pro processor integrates several syste m components whic h allevi ate some of the previ o us
system requirem ents. The second le vel cache, ca che controller, and Advanced Program mable
Interrupt Controller (APIC) are some of the components that existed in previous Intel processor
1-1
COMPONENT INTRODUCTION
family systems whi ch are inte grated int o this single com ponent. This integra tion result s in the
Pentium Pro processor bus m ore close l y resembl ing a symm etric m ulti-processing (SM P ) system bus rather than a previous generation processor-to-cache bus. This added level of integration
and improved performance results in higher power consumption and a new bus technology . This
means it is more important than ever to ensure adherence to the specifications contained in this
document.
1.1.BUS FEATURES
The desig n of the exte rnal Penti um Pro processor b us enables it to be “mul tiprocessor ready.”
Bus arbitration and control, cache coherency circuitry , an MP interrupt controller and other system-level f unctio ns are integrat ed into the b us interfa ce.
To relax timi ng const rai nts, the Penti um Pro proce ssor imple me nts a synch ronous, latched b us
protocol to enable a full clock cycle for signal transmissi on and a full clock cycle for signal interpretation and generation. This latche d pr otocol simplifies interco nnect timing re quirement s
and supports higher frequency system designs using inexpensive ASIC interconnect technology.
The Pentium Pro processor bus uses low-voltage-swing GTL+ I/O buffers, making
high-frequency signal communication easier.
All output pins are actually implemented in the Pentium Pro processor as I/O buffers. This buffer
design complies with IEE E 1149.1 Boundary Scan Specification, allowing all pins to be sa mpled and tested. An output only buffer is used only for TDO, which is not sampled in the boundary scan chain. A pin is an output pin when it is not an input for normal operation or FRC.
Most of the Pentium Pro process or cache protocol complexit y is handled by the processor. A
non-caching I/O bridge on the Pentium Pro processor bus does not need to recognize the cache
protocol and does not need snoop logic. The I/O bridge can issue standard memory accesses on
the Pentium Pro processor bus, which a re transparently sn ooped by all Pentium Pro processor
bus agents. If data is modified in a Pentium Pro processor cache, the processor transparently provides data on the bus, instead of the memory controller. This functionality eliminates the need
for a back-off capability that existing I/O bridges require to enable cache writeback cycles. The
memory controller must observe snoop response signals driven by the Pentium Pro processor
bus agents, absorb writeback data on a modified hit, and merge any write data.
The Pentium Pro processor inte grat es mem or y type range registe rs (MTRR s) to replac e the external address decode logi c used to decode cachea bil ity attribute s.
The Pentium Pro p rocessor bus protocol enabl es a near linea r increas e in system per formance
with an increase in the number of proce ssors. T he Pentium Pr o processor interfaces to a multiprocessor system without any support logic. This “glueless” interface enables a desktop system
to be built with an upgrade socket for another Pentium Pro processor.
The external Pentium Pro processor bus and Pentium Pro processor use a ratio clock design that
provides modularity and an upgrade path. The processor internal clock frequency is an n/2 multiple of the bus clock frequency where n is an integer equal to or greater than 4 but only certain
bus and processor frequency combinations are supported. Additional combinations are reserved
by this spec ification to provide future upgrade paths. See Section 9.2., “Clock Frequencies and
Ratios” for the bus and processor frequencies and combinations.
1-2
COMPONENT INTRODUCTION
The ratio clock approach reduces the tight coupling between the processor clock and the external
bus clock. For a fixed system bus clock frequency, Pentium Pro processors introduced later with
higher processor clock frequencies can use the same support chip-set at the same bus frequency.
An investment in a Pentium Pro processor chip-set is protected for a longer time and for a greater
range of processor freq uencies. The ratio c loc k ap proach a lso pres erves system m o dularity, allowing the system electri cal topology to determine t he system bus clock freque ncy while process technology can determine the processor clock frequency.
The Pentium Pro processor bus archi tectu re provides a number of features t o support high reliability and high availability designs. Most of these additional features can be disabled, if necessary. Fo r exa mple, the b us arc hit ecture allows the data bus to be unprotected or protected with
an error correcting code (ECC). Error detection and limited recovery are built into the bus
protocol.
A Pentium Pro processor bus can contain up to four Pentium Pro processors, and a combination
of four other loads consisting pri marily of bus cluste rs, memo ry controllers, I/O bridges, an d
custom attachments.
In a four-processor system, the data bus is the most critical resource. To account for this situation, the Pentium Pro processor bus implements several features to maximize available bus
bandwidth including pipelined transactions in which bus transactions in different phases overlap, an increase in transaction pipeline depth over previous generati ons, and support for deferring a transaction for later completion .
The Pentium Pro processor bus architecture is therefore adaptable to various classes of systems.
In desktop multiprocess or systems, a subset of the bus features can be used. In server designs,
the Pentium Pro processor bus provides an entry into low-end multiprocessi ng offering linear
increases in performance as CPUs are added to scale performance upward allowing Pentium Pro
proces sor s ys tem s to be s uperi or for app lic ati ons tha t would otherwise in di cat e a do wnsi ze d
solution.
1.2.BUS DESCRIPTION
The Pentiu m Pr o proce ssor bus is a de mu ltiplex ed bu s with a 64-bi t data p ath an d a 36-b it
address path. This section provides more details on the bus features introduced in the preceding
section:
Ease of system desi gn
•
Efficient bus utilization
•
Multiproces sor ready
•
Data integrity
•
1-3
COMPONENT INTRODUCTION
1.2.1.System Design Aspects
The P entiu m Pro processor bus clock and the Pentium Pro processor internal execution clock
run at different frequencies, related by a ratio. Section 9.2., “Clock Frequencies and Ratios” provides more information about bus frequency and processor frequency.
The Pentium Pro processor bus use s GTL+ . The GTL+ low volt age swi ng red uces both power
consumption and electromagnetic interference (EMI). The low voltage swing GTL+ I/O buffers
also enable direct drive by A SICs and ma ke hig h-frequency s ignal c omm unication easier and
cheaper to imple ment .
The Pentium Pro processor bus is a synchronous, latched bus. The bus protocol latches all inputs
on the bus clock rising edge, which are used internally i n the following cycle. The Pentium Pro
processor and other bus agents drive outputs on the bus clock rising edge. The bus protocol
therefore provides a full cycle for signal transmi ssi o n and an agent also has a full cloc k period
to determine its out p ut.
1.2.2.Efficient Bus Utilization
The Pentium Pro processor bus supports multiple outstanding bus transactions. The transaction
pipeline depth is limited to the smallest depth supported by any agent (processors, memory, or
I/O). The Pentium P r o proce ssor bus can be configured at power-on to support a maximum of
eight outstanding bus transactions depending on the amount of buffering available in the system.
Each Pentium Pro processor is capable of issuing up to four outstanding transactions.
The Pentium Pro processor bus enables transactions with long latencies to be completed at a later time using separate deferred reply transactions. The same Pentium Pro processor bus agent or
other Penti um Pro processor bus agents can co ntinue with s ubsequent reads an d writes while a
slow agent is processing an outstanding request.
1.2.3.Multiprocessor Ready
The Pentium Pr o processor bus enables multiple Pentium Pro processors to operate on one bus,
with no external support logic. The Pentium Pro processor requires no separate snoop generation
logic. The processor I/O buffers can drive the Pentium Pro processor bus in an MP system.
The Pentium Pro process ors and bus support a MESI cache protocol in the inter nal caches. The
cache protocol enables direct cache-to-cache line transfers with memory reflection.
The Pentium Pro processors and b us support fair, symme tric, round-robin bus arbitration that
minimizes overhead associated with bus ownership exchange. An I/O agent may generate a high
priority bus request.
1-4
COMPONENT INTRODUCTION
1.2.4.Data Integrity
The Pentium Pro proce ssor bus provides parity si gnals for address, re quest, and response signals. The bus protocol supports retrying bus requests.
The Pentium Pro processor bus supports error correcting code (ECC) on the data bus and has
correction capability at the receiver.
The Pentium Pro processor supports functional redundancy checking (FRC ), similar to that of
the Pentium processor. FRC support e na bles the Pentium Pro processor to be used in hi gh dataintegrity, fault-tolerant applications. In addition, two Pentium Pro processors can be configured
at power-on as an FRC pair or a multiprocessor-ready pair.
1.3.SYSTEM OVERVIEW
Figure 1-2 illustrates t he Pent ium Pro proc essor system environm ent, conta ining multiple processors (MP), memory, and I/O. This particular archit ec tura l view is not intended t o imply an y
implementation trade-offs.
Pentium® Pro
ProcessorProcessor
P6
Agent 0
Pentium Pro
Agent 1
High Speed I/O
Interface
System Interface
Pent ium Pro
ProcessorProcessor
Agent 2
Memory
Interface
Pentium Pro
Agent 3
Figure 1-2. Pentium® Pro Processor System Interface Block Diagram
1-5
COMPONENT INTRODUCTION
Up to four Pentium Pro processors can be gluelessly interconnected on the Pentium Pro processor bus. These agents are b us masters, capable of supporting all the features des cribed in this
document. The interface to the remainder of the system is represented by the high-speed I/O interface and memory interface bl ocks. The memory inte rface bloc k represents a path to system
memory capable of supporting over 500 Mbytes/second data bandwidth. The high-speed I/O interface block provides a fast path to system I/ O. Various impleme ntations of the se two block s
can provide different cost vs. performance t rade-offs. F or exam ple , more tha n one me mory interface or high-speed I/O interface may be included.
An MP system containing more than four Pentium Pro processors can be created based on clusters that each contai n four processors. Such a system can use cluster controllers that connect
Pentium Pro processor buses to a global memory bus. The Pentium Pro processor bus provides
appropriate protocol support for building external caches and memory directory-based systems.
1.4.TE R MINOLOGY CLARIFICATION
Some key definitions and concepts are introduced here to aid the unde rstanding of this
document.
A ‘#” sym bol a fte r a si gnal na me refers to an ac tive low signa l. Thi s means that a si gnal i s in
the active state (based on the name of th e s ignal) when d riven low. For example, when FLUSH#
is low a flush has been requested. When NMI is high, a Non-maskable interrupt has occ urred.
In the case of lines where the name does not imply an active state but describes part of a binary
sequence ( such as address or da ta), t he ‘#’ sym bol im plies th at th e signal is invert ed. For
example, D[3:0] = ‘HLHL’ refers to a hex ‘A’, and D#[3:0] = ‘LHLH’ also refers to a hex
‘A’. (H= High logic level, L= Low logic level )
Pentium Pro processor bus agent s issue t ransactions to transfer data an d system information.
A bus agent is any device that connects to the processor bus inclu ding the Pentium Pro proce ssors themselves.
This specification refers to several classificat ions of bus agents.
Central Ag ent. Handles reset, hardware configuration and initializa tion, spec ial transa c-
•
tions, and centralized hardware error detecti on and handlin g.
I/O Agent. Interfaces to I/O devices using I/O port addresses. Can be a bus bridge to
•
another bus used for I/O devices, such as a PCI bridge.
Memory Agent. Provi des access to main memory.
•
A particular bus agent can have one or more of several roles in a transaction.
Requesting Agent. The agent that issues the transac tio n.
•
Addressed Agent. The agent that is addressed by the transac tion. Al so called the Target
•
Agent. A memory or I/O transaction is addressed to the memory or I/O agent that
recognizes the specified memory or I/O address. A Deferred Reply transaction is addressed
to the agent tha t issued the original transa ction. Special transa ctions are conside red to be
issued to the central agent.
1-6
COMPONENT INTRODUCTION
Snooping Agent. A caching bus agent that observes (“snoops”) bus transactions to
•
maintain ca che coheren cy.
Responding Agent. The agent that provides the res ponse on the RS[2:0]# signals to the
•
transaction. Typically the addressed agent.
Each transaction has several phases that include some or all of the following phases.
Arbitration Phase. No transactions can be issued until the bus agent owns the bus. A
•
transaction onl y needs to have this phase if the agent that wants to drive the tra nsaction
doesn’t already own the bus. Note that there is a distinction between a symmetric busowner and the actual bus owner. The actual bus owner is the one and only bus agent that is
allowed to drive a transaction at that time. The symmetric bus owner is the bus owner
unless the priority agent owns the bus.
Request Phase. This is the phase in which the transaction is actually issued to the bus. The
•
request agent drives ADS# and the address in this phase. All transactio ns must have this
phase.
Error Phase . Any errors that occur during the Request Phase are reported in the Error
•
Phase. All transactions have this phase (1 clock).
Snoop Phase. This is the phase in which ca che coherency is enforced. All caching agents
•
(snoop agents) drive HIT# and HITM# to appropriate values in this pha se. All memory
transactions have this phase.
Response Phase. The response agent drives the transaction response during this phas e.
•
The response agent is the target device addressed during the Request Phase unless a
transaction is deferred for later completion. All transactions have this phase.
Data Phase. The re sponse agent drive s or accepts the transac tio n data, if t here is a ny. Not
•
all transactions have this phase.
Other commonly used terms include:
A request initiated data transfer m ea ns that the re quest agent has writ e data to tra ns fe r. A re -
quest initia te d da t a t ra nsfer ha s a request initiated TRDY# assertion.
A response initiated data transfer means that the response agent must provide the rea d data to
the request agent.
A snoop initiated data transfer means that there was a hit to a modified line during the snoop
phase, and the agent that assert ed HITM # is going to drive the modified data to the bus. This is
also called an implicit writeback because every time HITM# is asserted, the addressed memory
agent knows that writeback data will follow. A snoop init iated data transfer has a snoop initiated
TRDY# assertion .
There is a DEFER# signal that is sampled during the Snoop Phase to determine if a transaction
can be guaranteed in-order completion at tha t time. If the DEFE R# si g nal is asserted, only t w o
responses are allowed by the bus protocol during the Response Phase, the Deferre d Response
or the Retry Response . If the Deferred Response is given, the response agent must later complete
the transactio n with a Deferred Reply transaction.
1-7
COMPONENT INTRODUCTION
1.5.COMPATIBILITY NOTE
In this document, some regis ter bits are Intel Reserved. When reserve d bits are documented,
treat them as fully undefined. This is essential for software compatibility with future processors.
Follow the guidelines below:
1. Do not depend on the states of any undefined bits when testing the values of defined
register bits. Mask them out whe n testing.
2. Do not depend on the states of any undefined bits when storing them to memory or another
register.
3. Do not depend on the abi lity to retain informa tio n written into an y undefined bits.
4. When loading registers, always load the undefined bits as zeros.
1-8
Pentium® Pro
Processor
Architecture
Overview
2
CHAPTER 2
®
PENTIUM
PRO PROCESSOR
ARCHITECTURE OVERVIEW
The Pentium Pro processor has a decoupled, 12 -stage, superpipel ined impleme ntatio n, trading
less work per pipestage for more stages . The Pentium Pro p rocessor also has a pipe stage time
33 percent less than the Pentium processor, which helps achieve a higer clock rate on any given
process.
The approach used by the Pentium Pro processor removes the constraint of linear instruction sequencing between the traditional “fetch” and “execute” phases, and opens up a wide instruction
window using an instruction pool. This approach allows the “execute” phase of the Pentium Pro
processor to have much more visibility into the program’s instruction stream so that better
scheduling may t ake pl ac e. It re q uires the instruction “fetch/decode” phase of the Pent ium Pro
processor to be much more intelligent in terms of predicting program flow. Optimized scheduling requires the fundamental “exec ute” phase to be replace d by decoupled “dispat ch/execut e”
and “retire” phases. This allows instructions to be started in any order but always be completed
in the original program order. The Pentium Pr o processor is implement ed a s three independent
engines coupled with an instruction pool as shown in Figure 2-1.
.
Fetch/
Decode
Dispatch
/Execute
Retire
Unit
Unit
Unit
Instruction
Pool
Figure 2-1. Three Engines Communicating Using an Instruction Pool
2-1
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
2.1.FULL CORE UTILIZATION
The three independent-engine approach was taken to more fully utilize the CPU core. Consider
the code fragment in Figure 2-2:
The first instruct ion in this example is a load of r1 that, at run time, causes a cache mis s. A traditional CPU core must wait for its bus interface unit to read this data from main memo ry and
return it before moving on to instruction 2. This CPU stalls while waiting for this data and is thus
being under-utili zed .
T o avoid this memory latency problem, the Pentium Pro processor “looks-ahead” into its instruction pool at subsequent instructions and will do useful work rather than be stalled. In the example in Figure 2-2, instructi on 2 is not e xecutable since it depends upon the result of instruction
1; however both instruc tions 3 and 4 are execut able. The P entium Pro processor exe cutes instructions 3 and 4 out-of-order. The results of this out-of-order execution can not be comm it ted
to permanent machine sta te (i.e ., the p rogra mme r-visi ble registers) immedia te ly since the original program order must be mai ntained. The result s are instea d stored back in t he instruct ion
pool awaiting in-order retirement. The core executes instructions depending upon their readiness
to execute, and not on their original program order , and is therefor e a true da taflow engine. Th is
approach has the side effect that instructions are typically executed out-of-order.
The cache miss on instruction 1 wi ll take many internal clocks, so the Pentium Pro processor
core continues to l o ok ahea d for ot her ins truct ions that co uld be s pecul at ive ly exec uted, and i s
typically looking 20 to 30 instructions in front of the instruct ion pointer. Within this 20 to 30
instruction window there will be, on average, five branches that the fetch/decode unit must correctly predict if the dispatch/execute unit is to do useful work. The sparse register set of an Intel
Architecture (IA) processor will create many false dependencies on registers so the dispatch/execute unit wi ll rename the IA re gisters into a la rger register set to enabl e additional forward
progress. T he retire unit owns the programmer’s IA register set and results are only committ ed
to permanent machi ne state in these regis ters when it remove s completed i nstruct i ons from the
pool in original program order .
Dynamic Execution technology can be summarized as optimally adjusting instruction execution
by predicting program flow, having the ability to speculatively execute instructions in any
order, and then analyzing the pro gram’s dataflow graph to ch oose the best order to ex ec ute
the instr ucti ons.
2-2
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
2.2.THE PENTIUM® PRO PROCESSOR PIPELINE
In order to get a closer look at how the Pentium Pro processor implements Dynamic Execution,
Figure 2-3 shows a block diagram including cache and memory interfaces. The “Units” shown
in Figure 2-3 represent stages of the Pentium Pro process or pipel ine .
System Bus
L2 Cache
Bus Interface Unit
L1 ICacheL1 DCache
FetchLoadStore
Fetch/
Decode
Unit
Dispatch
/Execute
Unit
Retire
Unit
Figure 2-3. The Three Core Engines Interface with Memory via Unified Caches
Instruction
Pool
2-3
Loading...
+ 298 hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.