books: Specifications, Order Number 242690; Programmer’s Reference
Manual, Order Number 242691; and the Operating System Writer’s Guide,
Please refer to all three volumes when evaluating your design needs.
®
Pro Family Developer’s Manual consists of three
Order Number 242692.
1996
Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or
otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions
of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating
to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability,
or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical,
life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined."
Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising
from future changes to them.
The Pentium® Pro processor may contain design defects or errors known as errata which may cause the product to deviate
from published specifications. Such errata are not covered by Intel’s warranty. Current characterized errata are available on
request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product
order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be
obtained from:
Intel Corporation
P.O. Box 7641
Mt. Prospect IL 60056-7641
or call 1-800-879-4683
or visit Intel’s website at http:\\www.intel.com
The Pentium® Pro microprocessor is the ne xt generati on in the Int el386™, Intel486™, and Pentium family of processors. The Pentium Pro processor implements a Dynamic Ex ecution microarchitecture — a unique combination of multiple branch prediction, data flow analysis, and
speculative execution while maintaining binary compatibility with the 8086/88, 80286,
Intel386, Intel486, and Pentium processors. The Pentiu m Pro processor integrates the second
level cache, the APIC, and the memory b us controller found in pre vious Intel processor fam ilies
into a single component, as shown in Figure 1-1.
™
I
n
l
e
t
8
4
Bus
Controller
6
r
o
P
r
®
P
e
t
u
n
i
m
o
c
e
s
s
o
r
Cache
SRAMs
Pentium Pro
Processor
Pentium Pro
Processor
L2
Cache
Cache
Controller
APIC
Pentium Pro Processor
Bus Interface Unit
APIC
Figure 1-1. The Pentium® Pro Processor Integrating the CPU, L2 C ache, APIC and Bus
Controller
A significant new feature of the Pentium Pro processor, from a system perspective, is the builtin direct multi-processing support. In order to achieve multi-processing for up to four processors
and maintain the memory and I/O bandwidth to support them, new system designs are needed
which consider the additio nal power requirements and sig nal integrity issues o f supporting u p
to eight loads on a high speed bus.
The Pentium Pro processor may be upgraded by a future OverDrive
®
processor and matching
voltage re gulat or mo dul e des cri bed i n C hapter 17, OverDrive® Pr ocessor Soc ket Specification.
Since increasing clock frequencies and silicon density can complicate system designs, the Pen-
tium Pro processor integrates several system components which alleviate some of the previous
system requirements. The second level cache, cache controller, and Advanced Programmable
Interrupt Controller (APIC) are some of the components that existed in previous Intel processor
1-1
COMPONENT INTRODUCTION
family systems which are integrated into this single component. This integration results in the
Pentium Pro processor bus more closely resembling a symmetric multi-processing (SMP) system bus rather than a previous generation processor-to-cache bus. This added level of integration
and improved performance results in higher power consumption and a new bus technology. This
means it is more important than ever to ensure adherence to the specifications contained in this
document.
The Pentium Pro processor may contain design defects or errors known as errata. Current characterized errata are available upon request.
1.1.BUS FEATURES
The design of the external Pentium Pro processor bus enables it to be “multiprocessor ready.”
Bus arbitration and control, cache coherency circuitry, an MP interrupt controller and other system-level functions are integrated into the bus interface.
To relax timing constraints, the Pentium Pro processor implements a synchronous, latched bus
protocol to enable a full clock cycle for signal transmission and a full clock cycle for signal interpretation and generation. This latched protocol simplifies interconnect timing requirements
and supports higher frequency system designs using inexpensive ASIC interconnect technology.
The Pentium Pro processor bus uses low-voltage-swing GTL+ I/O buffers, making
high-frequency signal communication easier.
All output pins are actually implemented in the Pentium Pro processor as I/O buffers. This buffer
design complies with IEEE 1149.1 Boundary Scan Specification, allowing all pins to be sampled and tested. An output only buffer is used only for TDO, which is not sampled in the boundary scan chain. A pin is an output pin when it is not an input for normal operation or FRC.
Most of the Pentium Pro processor cache protocol complexity is handled by the processor. A
non-caching I/O bridge on the Pentium Pro processor bus does not need to recognize the cache
protocol and does not need snoop logic. The I/O bridge can issue standard memory accesses on
the Pentium Pro processor bus, which are transparently snooped by all Pentium Pro processor
bus agents. If data is modified in a Pentium Pro processor cache, the processor transparently provides data on the bus, instead of the memory controller. This functionality eliminates the need
for a back-off capability that existing I/O bridges require to enable cache writeback cycles. The
memory controller must observe snoop response signals driven by the Pentium Pro processor
bus agents, absorb writeback data on a modified hit, and merge any write data.
The Pentium Pro processor integrates memory type range registers (MTRRs) to replace the external address decode logic used to decode cacheability attributes.
The Pentium Pro processor bus protocol enables a near linear increase in system performance
with an increase in the number of processors. The Pentium Pro processor interfaces to a multiprocessor system without any support logic. This “glueless” interface enables a desktop system
to be built with an upgrade socket for another Pentium Pro processor.
The external Pentium Pro processor bus and Pentium Pro processor use a ratio clock design that
provides modularity and an upgrade path. The processor internal clock frequency is an n/2 multiple of the bus clock frequency where n is an integer equal to or greater than 4 but only certain
1-2
COMPONENT INTRODUCTION
The ratio clock approach reduces the tight coupling between the pr ocessor clock and the external
bus clock. For a fix ed system b us cl ock frequenc y, Pentium Pro processors int roduced later wit h
higher processor clock frequencies can use the same support chip-set at the same bus frequency.
An investment in a Pentium Pro pro cessor chip-set is protected for a longer time and for a greater
range of processor frequencies. The ratio clock approach also preserves system modularity, allowing the system electrical topology to determine the system bus clock frequency while process technology can determine the processor clock frequency.
The Pentium Pro processor bus architecture provides a number of features to support high reliability and high availability designs. Most of these additional features can be disabled, if necessary. For example, the bus architecture allows the data bus to be unprotected or protected with
an error correcting code (ECC). Error detection and limited recovery are built into the bus
protocol.
A Pentium Pro processor b us can con tain up t o four Pentiu m Pr o proces sors, and a combi nation
of four other loads consisting primarily of bus clusters, memory controllers, I/O bridges , and
custom attachments.
In a four-processor system, the data bus is the most critical resource. To account for this situation, the Pentium Pro processor bus implements several features to maximize available bus
bandwidth including pipe lined transactions in which bus transactions in different phases overlap, an increase in transaction pi peline depth over previous generations, and support for deferring a transaction for later completion .
The Pentium Pro processor bus architecture is therefore ada ptable to various classes of systems.
In desktop multiprocesso r systems, a subset o f the bus features can b e used. In se rver designs,
the Pentium Pro processor bus provides an entry into low-end multiprocessing offering linear
increases in performance as CPUs are added to scale performance upward allo wing Pentium Pro
processor systems to be superior for applications that would otherwise indicate a downsized
solution.
1.2.BUS DESCRIPTION
The Pentium Pro processor bus is a demultiplexed bus with a 64-bit data path and a 36-bit
address path. This section provides more details on the bus features introduced in the preceding
section:
Ease of system design
•
Efficient bus utilization
•
Multiprocessor ready
•
Data integrity
•
1-3
COMPONENT INTRODUCTION
1.2.1.Sy st em De sign As pec ts
The Pentium Pro processor bus clock and the Pentium Pro processor internal execution clock
run at different frequencies, related b y a ratio. Section 9.2., “Clock Frequencies and Ratios” provides more information about bus frequency and processor frequency.
The Pentium Pro processor bus uses GTL+. The GTL+ low voltage swing reduces both power
consumption and electromagnetic interference (EMI). The lo w v o ltage swing GTL+ I/O b uf fers
also enable direct drive by ASICs and make high-frequency signal communi cation easier and
cheaper to implement.
The Pentium Pro proces sor bus i s a synchronous, l atched b us. The b us protocol la tches all input s
on the bus clock rising edge, which are used internally in the following cycle. The Pentium Pro
processor and other bus agents drive outputs on the bus clock rising edge. The bus protocol
therefore provides a full cycle for signal transmission and an agent also has a full clock period
to determine its output.
1.2.2.Efficient Bus Utilization
The Pentium Pro processor bus supports multiple outstanding bus transactions. The transaction
pipeline depth is limited to the smallest depth supported by any agent (processors, memory, or
I/O). The Pentium Pro processor bus can be configured at power-on to support a maximum of
eight outstanding b us transacti ons depending on the amount of b uf fering a v ailable in the s ystem.
Each Pentium Pro processor is capable of issuing up to four outstanding transactions.
The Pentium Pro processor bus enables transactions with long latencies to be completed at a later time using separate deferred reply transactions. The same Pentium Pro p rocessor bus agent or
other Pentium Pro processor bus agents can continue with subsequent reads and writes while a
slow agent is processing an outstanding request.
1.2.3.Multiprocessor Ready
The Pentium Pro processor bus enables multiple Pentium Pro processors to operate on one bus,
with no external support logic. The P entium Pro process or requires no separat e snoop generatio n
logic. The processor I/O buffers can drive the Pentium Pro processor bus in an MP system.
The Pentium Pro processors and bus support a MESI cache protocol in the internal caches. The
cache protocol enables direct cache-to-cache line transfers with memory reflection.
The Pentium Pro processors and bus support fair, symmetric, round-robin bus arbitration that
minimizes overhead associated with bus o wnership exchange. An I/O agent may gener ate a high
priority bus request.
1-4
COMPONENT INTRODUCTION
1.2.4.Data Integrity
The Pentium Pro processor bus provides parity signals for address, request, and response signals. The bus protocol supports retrying bus requ ests.
The Pentium Pro processor bus supports error correcting code (ECC) on the data bus and has
correction capability at the receiver.
The Pentium Pro processor supports functional redundancy checking (FRC), similar to that of
the Pentium processor. FRC support enables the Pentium Pro processor to be used in high dataintegrity , fault-tolerant applications. In addition, two Pentium Pro processors can be configured
at power-on as an FRC pair or a multiprocessor-ready pair.
1.3.SYSTEM OVERVIEW
Figure 1-2 illustrates the Pentium Pro proces sor system environment, containing m ultiple processors (MP), memory, and I/O. This particular architectural view is not intended to imply any
implementation trade-offs.
Pentium® Pro
ProcessorProcessor
P6
Agent 0
Pentium Pro
Agent 1
High Speed I/O
Interface
System Interface
Pentium Pro
ProcessorProcessor
Agent 2
Memory
Interface
Pentium Pro
Agent 3
Figure 1-2. Pe ntium® Pro Processor System Interface Block Diagram
1-5
COMPONENT INTRODUCTION
Up to four Pentium Pro processo rs can be gl ueles sly in tercon nected on the Pent ium Pr o proces sor bus. These agents are bus masters, capable of supporting all the features described in this
document. The interface to the remainder of the system is represented by the high-speed I/O interface and memory interface blocks. The memory interface block represents a path to system
memory capable of supporting o ve r 500 Mbytes /second data bandwidt h. The high-sp eed I/O interface block provides a fast path to system I/O. Various implementations of these two blocks
can provide different cost vs. performance trade-offs. For example, more than one memory interface or high-speed I/O interface may be included.
An MP system containing more than four Pentium Pro processors can be created based on clusters that each contain four processors. Such a system can use cluster controllers that connect
Pentium Pro processor buses to a global memory bus. The Pentium Pro processor bus provides
appropriate protocol support for b uilding e xternal caches and memory directory-b ased systems.
1.4.TERMINOLOGY CLARIFICATION
Some key definitions and concepts are introduced here to aid the understanding of this
document.
A ‘#” symbol after a signal name refers to an active low signal. This means that a signal is in
the active state (based on the name of the signal) wh en dri ven lo w. For example, when FLUSH#
is low a flush has been requested. When NMI is high, a Non-maskable interrupt has occurred.
In the case of lines where the name does not imply an active state but describes part of a binary
sequence (such as address or data), the ‘#’ symbol implies that the signal is inverted. For
example, D[3:0] = ‘HLHL’ refers to a hex ‘A’, and D#[3:0] = ‘LHLH’ also refers to a hex
‘A’. (H= High logic level, L= Low logic level)
Pentium Pro processor bus agen ts issu e transa ctions to transfer data and sys tem inform ation.
A bus agent is any device that connects to the processor bus including the Pentium Pro processors themselves.
This specification refers to several classifications of bus agents.
Central Agent. Handles reset, hardware configuration and in itialization, special transac-
•
tions, and centralized hardware error detection and handling.
I/O Agent. Interfaces to I/O devices using I/O port addresses. Can be a bus bridge to
•
another bus used for I/O devices, such as a PCI bridge.
Memory Agent. Provides access to main memory.
•
A particular bus agent can have one or more of several roles in a transaction.
Requesting Agent. The agent that issues the transaction.
•
Addressed Agent. The agent that is addressed by the transaction. Also called the Target
•
Agent. A memory or I/O transaction is addressed to the memory or I/O agent that
recognizes the specified memory or I /O address. A Deferred Reply tr ansaction is ad dressed
to the agent that issued the original transaction. Special transactions are considered to be
issued to the central agent.
1-6
COMPONENT INTRODUCTION
Snooping Agent.
•
maintain cache coherency.
Responding Age nt.
•
transaction. Typically the addressed agent.
Each transaction has several phases that include some or all of the following
Arbitration Phase.
•
transaction only needs to have this phase if the agent that wants to drive the transaction
doesn’t already own the bus. Note that there is a distinction between a
and the actual
owne r
allowed to drive a transaction at that time. The symmetric bus owner is the bus owner
unless the priority agent owns the bus.
Request Phase.
•
request agent
phase.
Error Phase.
•
Phase. All transactions have this phase (1 clock).
Snoop Phase.
•
(snoop agents) drive HIT# and HITM# to appropriate values in this phase. All memory
transactions have this phase.
Response Phase.
•
The
response agent
transaction is deferred for later completion. All transactions have this phase.
Data Phase.
•
all transactions have this phase.
A caching bus agent that observes (“snoops”) bus transactions to
The agent that provides the response on the R S[2:0]# signals to the
.
phases
No transactions can be issued until the bus agent
bus owner
This is the phase in which the transaction is actually issued to the bus. The
drives ADS# and the address in this phase. All transactions must have this
Any errors that occur during the Request Phase are reported in the Error
This is the phase in which cache coherency is enforced. All caching agents
The response agent drives the transaction response during this phase.
is the target device addressed during the Request Phase unless a
The response agent drives or accepts the transaction data, if there is any. Not
. The actual bus owner is the one and only bus agent that is
the bus. A
owns
symmetric bus
Other commonly used terms include:
A
request initiated data transfer
quest initiated data transfer has a
A
response initiated data transfer
the request agent.
A
snoop initiated data transfer
phase, and the agent that asserted HITM# is going to drive the modified data to the bus. This is
also called an
agent knows that writeback data will follow . A snoop initiated data transfer has a
TRDY# assertion.
There is a
can be guaranteed in-order completion at that time. If the DEFER# signal is asserted, only two
responses are allowed by the bus protocol during the Response Phase, the
or the
Retry Response
the transaction with a
implicit writeback
DEFER#
signal that is sampled during the Snoop Phase to determine if a transaction
. If the Deferred Response is given, the response agent must later complete
Deferred Reply
means that the request agent has write data to transfer. A re-
request initiated TRDY# assertion.
means that the response agent must provide the read data to
means that there was a hit to a modified line during the snoop
because every time HITM# is asserted, the addressed memory
snoop initiated
Deferred Response
transaction.
1-7
COMPONENT INTRODUCTION
1.5.COMPATIBILITY NOTE
In this document, some register bits are Intel Reserved. When reserved bits are documented,
treat them as fully undefined. This is essential for software compatibility with future processors.
Follow the guidelines below:
1. Do not depend on the states of any undefined bits when testing the values of defined
register bits. Mask them out when testing.
2. Do not depend on the states of any undefined bits when storing them to memory or another
register.
3. Do not depend on the ability to retain information written into any undefined bits.
4. When loading registers, always load the undefined bits as zeros.
1-8
Pentium® Pro
Processor
Architecture
Overview
2
CHAPTER 2
Dispatch
/Execute
Unit
Retire
Unit
Instruction
Pool
Fetch/
Decode
Unit
®
PENTIUM
PRO PROCESSOR
ARCHITECTURE OVERVIEW
The Pentium Pro processor has a decoupled, 12-stage, superpipelined implementation, trading
less work per pipestage for more stages. The Pentium Pro processor also has a pipestage time
33 percent less than the Pentium processor , which h elps ach ieve a higer clock rate on any given
process.
The approach used by the Pentium Pro processor remo v es the constraint of linear instruction sequencing between the traditional “fetch” and “execute” phases, and open s up a wide instruction
window using an instruction pool. This approach allo ws the “e x ecute” phase of the Pentium Pro
processor to have much more visibility into the program’s instruction stream so that better
scheduling may take place. It requires the instruction “fetch/decode” phase of the Pentium Pro
processor to be much more intellige nt in term s of predicting program flow. Opti mi zed sch edu ling requires the fundamental “execute” phase to be replaced by decoupled “dispatch/execute”
and “retire” phases. This allows instructions to be started in any order but always be completed
in the original program order. The Pentium Pro processor is implemented as three independent
engines coupled with an instruction pool as shown in Figure 2-1.
.
Figure 2-1. Three Engines Communicating Using an Instruction Pool
2-1
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
2.1.FULL CORE UTILIZATION
The three independent-engine approach was taken to more fully utilize the CPU core. Consider
the code fragment in Figure 2-2:
The first instruction in this example is a load of r1 that, at run time, causes a cache miss. A traditional CPU core must wait for its bus interface unit to read this data from main memory and
return it before moving on to instruction 2. This CPU stalls whi le waiting for this data and is
thus being under-utilized.
To avoid this memory l atency problem, the Pentium Pro processor “loo ks-ahead” into its instruction pool at subs equent ins tructions and w ill do useful work rather than be stalled. In the
example in Fi gure 2-2, instruction 2 is not ex ecutabl e since i t depends upon the result o f instru ction 1; however both instructions 3 and 4 are executable. The Pentium Pro processor executes
instructions 3 and 4 out-of-order . The results of this out-of-order execution can not be committed
to permanent machine state (i.e., the programmer-visible registers) immediately since the original program order must be maintained. The results are instead stored back in the instruction
pool awaiting in-ord er retirement. The core executes instructions depending upon their readiness to execute, and no t on their or iginal pro gram order, and is therefore a true dataflow engin e.
This approach has the side effect that instructions are typically executed out-of-order.
The cache miss on instruction 1 will take many internal clocks, so the Pentium Pro processor
core continues to look ahead for other instruction s that could be speculatively executed, and is
typically looking 20 to 30 instructions in front of the instruction pointer. Within this 20 to 30
instruction window there will be, on average, five branches that the fetch/decode unit must correctly predict if the dispatch/execute unit is to do useful work. The sparse register set of an Intel
Architecture (IA) processor will create many false dependencies on registers so the dispatch/execute unit will rename th e IA registers into a larger register set to enable add itional forward
progress. The retire unit owns the programmer’s IA register set and results are only committed
to permanent machine state in these registers when it removes completed ins tructions from the
pool in original program order.
Dynamic Execution technology can be summarized as optimally adjusting instruction execution
by predicting program flow, having the ability to speculatively execute instructions in any
order, and then analyzing the program ’s dataflow graph to choose the b est orde r to execute
the instructions.
2-2
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
Bus Interface Unit
Dispatch
/Execute
Unit
Retire
Unit
Instruction
Pool
Fetch/
Decode
Unit
2.2.THE PENTIUM® PRO PROCESSOR PIPE LINE
In order to get a closer look at ho w the Pen tium Pro processor implements Dynamic Execution,
Figure 2-3 shows a block diagram including cache and memory interfaces. The “Units” shown
in Figure 2-3 represent stages of the Pentium Pro processor pipeline.
System Bus
L2 Cache
L1 ICacheL1 DCache
FetchLoadStore
Figure 2-3. The Three Core Engines Interface with Memory via Unified Caches
2-3
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
ID
(x3)
Next_IP
BTB
MIS
RAT
Allocate
From BIU
ICache
To
Instruction
Pool (ROB)
The FETCH/DECODE unit: An in-order unit that takes as input the user program
•
instruction stream from the instruction cache, and decodes them into a series of microoperations (µops) that represent the dataflow of that instruction stream. The pre-fetch is
speculative.
The DISPATCH/EXECUTE unit: An out-of-order unit that accepts the dataflow stream,
•
schedules execution of the µops subject to data dependencies and reso urce availability and
temporarily stores the results of these speculative executions.
The RETIRE unit: An in-order unit that knows how an d when to commit (“retire”) the
•
temporary, speculative results to permanent architectural state.
The BUS INTERFACE unit: A partially ordered unit responsible for connecting the three
•
internal units to the real world. The bus interface unit com municates directly with t he L2
(second level) cache supporting up to four concurrent cache accesses. The bus interface
unit also controls a transaction bus, with MESI snooping protocol, to system memory.
2.2.1.The Fetch/Decode Unit
Figure 2-4 shows a more detailed view of the Fetch/Decode Unit.
2-4
BIU - Bus Interface Unit
ID - Instruction Decoder
BTB - Branch Target Buffer
MIS - Microcode Instruction
Sequencer
RAT - Register Alias Table
ROB - ReOrder Buffer
Figure 2-4. Inside the Fetch/Decode Unit
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
The ICache is a local instruction cache. The Next_IP unit provides the ICache index, based on
inputs from the Branch Target Buffer (BTB), trap /interru pt status, and b ranch -misprediction indications from the integer execution section.
The ICache fetches the cache line corresponding to the inde x from the Next_IP, and the next line,
and presents 16 aligned bytes to the decoder. The prefetched bytes are rotated so that they are
justified for the instruction decoders (ID). The beginning and end of the IA instructions are
marked.
Three parallel decoders accept this stream of marked bytes, and proceed to find and decode the
IA instructions contained therein. The decoder converts the IA instructions into triadic µops
(two logical sources, one logical destination per µop). Most IA instructions are converted directly into single µops, some in structions are decoded into one-to-four µ ops and the complex instructions require mi crocode (the box labeled MIS in Fig ure 2-4). This m icrocode is j ust a set of
preprogrammed sequences of normal µop s. Th e µops are queued, and sent to the Register Alias
T able (RAT) unit, where the logical IA-based register references are conv erted into Pentium Pro
processor physical register references, and to the Allocator stage, which adds status information
to the µops and enters them into the instruction pool. The instruction pool is implemented as an
array of Content Addressable Memory called the ReOrder Buffer (ROB).
This is the end of the in-order pipe.
2.2.2.The Dispatch/Execute Unit
The dispatch unit select s µops from the instructio n pool depend ing upon their status. If the status
indicates that a µop has all of its operands then the dispatch unit checks to see if the execution
resource needed by that µop is also available. If both are true, the Res ervation Station removes
that µop and sends it to the resource where it is ex ecuted. The results of the µop are later retu rned
to the pool. There are five ports on the Reservation Station, and the multiple resources are
accessed as shown in Figure 2-5.
2-5
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
RS - Reservation Station
EU - Execution Unit
FEU - Floa tin g P o in t E U
IEU - Intege r E U
JEU - Jum p E U
AG U - Address Generation Unit
R O B - ReO rde r B uffe r
To/from
Instruction
RS
Port 0
Port 1
FEU
IEU
JEU
IEU
Pool (RO B )
Port 2
Port 3,4
Figure 2-5. Inside the Dispatch/Execute Unit
The Pentium Pro processor can schedule at a peak r ate of 5 µops per clock, one to each resource
port, but a sustained rate of 3 µops per clock is typical. The activity of this scheduling process
is the out-of-order process; µops are dispatched to the execution resources strictly according to
dataflow constraints and resource availability, without regard to the original ordering of the
program.
Note that the actual algorithm employed by this execution-scheduling pr ocess is v ital ly imp ortant to performance. If only o ne µop per resource becomes data-ready per clock c ycle, then th ere
is no choice. But if several are av ailable, it must choose. The Pentium Pro processor uses a pseudo FIFO scheduling algorithm favoring back-to-back µops.
AGU
AGU
Load
Store
Note that many of the µops are branches. The Branch Target Buffer will correctly predict most
of these branches but it can’t correctly predict them all. Consider a BTB that is correctly pr edicting the backward branch at the bottom of a loop; eventually that loop is going to terminate, and
when it does, that branch will be mispredicted. Branch µops are tagged (in the in-order pipeline)
with their fall-through address and the destination that was pred icted for them. When the branch
executes, what the branch actually did is compared against what the prediction hardware said it
would do. If those coincide, then the branch ev entually retires, and most of the speculati vely e xecuted work behind it in the instruction pool is good.
But if they do not coincide, then the Jump Execution Unit (JEU) changes the status of all of the
µops behind the branch to remo v e them fr om the in struction pool. In that case the p roper branch
destination is pro vided to the BTB which restarts the who le pipeline from th e new tar get address.
2-6
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
R
S
RRF
2.2.3.The Retire Unit
Figure 2-6 shows a more detailed view of the Retire Unit.
To/from DCache
MIU
FromTo
Instruction Pool
Figure 2-6. Inside the Retire Unit
RS - Reservation Station
MIU - Memory Interface Unit
RRF - Retirement Register File
The retire unit is also checking the status of µops in the instruction pool. It is looking fo r µops
that have executed and can be removed from the pool. Once removed, the original architectural
target of the µops is written as per the original IA instruction. The retirement unit must not only
notice which µops are complete, it must also re-impose the original program order on them. It
must also do this in the face of interrupts, traps, faults, breakpoints and mispredictions.
The retirement unit must first read the instruction pool to find the potential candidates for retirement and determine which of these candidates are next in the original program order. Then it
writes the results of this cycle’s retirements to both the Instruction Pool and the Retirement Register File (RRF). The retirement unit is capable of retiring 3 µops per clock.
2.2.4.The Bus Interface Unit
Figure 2-7 shows a more detailed view of the Bus Interface Unit.
2-7
PENTIUM® PRO PROCESSOR ARCHITECTURE OVERVIEW
Mem
I/F
Sys Mem
MOB
MOB - Memory Order Buffer
AGU - Address Generation Unit
ROB - ReOrder Buffer
L2 Cache
Figure 2-7. Inside the Bus Interface Unit
DCache
From
AGU
To/from
Instruction
Pool (ROB)
There are two types of memory access: load s and stores. Loads o nly need to specify the memor y
address to be accessed, the width of the data being retrieved, and the destination register. Loads
are encoded into a single µop.
Stores need to provide a memory address, a data width, and the data to be written. Stores therefore require two µops, one to generate the address, an d one to generate the d ata. These µops must
later re-combine for the store to complete.
Stores are never perf orm ed specu latively since there is no transparent way to undo them. Stores
are also never re-ordered among themselves. A store is dispatched only when both the address
and the data are available and there are no older stores awaiting dispatch.
A study of the importance of memory access reordering concluded:
Stores must be constrained from passing other stores, for only a small impact on
•
performance.
Stores can be constrained from passing loads, for an inconsequential performance loss.
•
Constraining loads from passing other loads or stores has a significant impact on
•
performance.
The Memory Order Buffer (MOB) allows loads to pass other loads and stores by acting like a
reservation st ation and re-order buffer. It holds suspended loads and stores and re-dispatches
them when a blocking condition (dependency or resource) disappears.
2.3.ARCHITECTURE SUMMARY
Dynamic Execution is this combination of improved branch prediction, speculative execution and data flow analysis that enables the Pentium Pro processor to deliver its superior
performance.
2-8
Bus Overview
3
CHAPTER 3
BUS OVERVIEW
This chapter provides an overview of the Pentium Pro processor bus protocol, transactions, and
bus signals. The Pentium Pro processor supports two other synchronous busses, APIC and
JTAG. It also has PC compatibility signals and implement ation specific signals. This chapter
provides a functional description of the Pentium Pro process or bus only. For the Pentium Pro
processor bus protocol specifications, see Chapter 4, Bus Protocol. For details on the Pentium
Pro processor bus transactions, see Chapter 5, Bus Transactions and Operations. For the full
Pentium Pro processor signal specifications, see Append ix A, Sign als Refer en ce and Table 11-2.
3.1.SIGNAL AND DIAGRAM CONVENTIONS
Signal names use uppercase letters, such as ADS#. Signals in a set of related signals are distinguished by numeric suffixes, such as AP1 for address parity bit 1. A set of signals covering a
range of numeric suffixes is denoted as AP[1:0], for address parity b its 1 and 0. A # s uffix indicates that the signal is active low. A signal name without a # suffix indicates that the signal is
active high.
In many cases, signals are mapped one-to-one to physical pins with the same names. In other
cases, different signals are mapped onto the same pin. F o r e x ample, this is the case with the address pins A[35:3]#. During the first clock of the Request Phase, the address signals are driven.
The first clock is indicated by the lower case a, or just the pin name itself: Aa[35:3]#, or
A[35:3]#. During the second clock of the Request Phase other information is driven on the request pins. These signals are referenced either b y their fu nctional signal names DID[7 :0]#, or b y
using a lower case b wit h the pin name: Ab[23:16]#. Note also that several pins have configuration functions at the active to inactive edge of RESET#.
The term asserted imp lies that a signal is driven to its active level (logic 1, FRCERR high, or
ADS# low). The term deasserted implies that a signal is driven to its inactive level (logic 0,
FRCERR low, or ADS# high). A signal driven to its active level is said to be active; a signal
driven to its inactive level is said to be inactive.
In timing diagrams, square and circle symbols indicate the clock in which particular signals of
interest are driven and sampled. The square indicates that a signal is driven in that clock. The
circle indicates that a signal is sampled in that clock.
All timing diagrams in this specification show signals as they are driven asserted or deasserted
on the Pentium Pro processor bus. There is a one-clock delay in the signal values observed by
bus agents. Any signal nam es that appear in lo wer case letters in br ackets {rcnt} are internal signals only, and are not driven to the bus. Upper case letters that appear in brackets represent a
group of signals such as the Request Phase signals {REQUEST}. The timing diagrams sometimes include internal signals to indicate internal states and show how it affects external signals.
3-1
BUS OVERVIEW
When signal values are referenced in tables, a 0 indicates inacti ve and a 1 indicates acti ve. 0 an d
1 do not reflect voltage levels. Remember, a # after a signal name indicates active low. An entry
of 1 for ADS# means that ADS# is active, with a low voltage level.
3.2.SIGNALING ON THE PENTIUM® PRO PROCESSOR BUS
The Pentium Pro processor bus supports a synchronous latched protocol. On the rising edge of
the bus clock, all agents on the Pentium Pro processor bus are required to drive their active outputs and sample required inputs. No additional logic is located in the output and input paths between the buffer and the latch stage, thus keeping setup and hold times constant for all bus
signals following the latched protocol. The Pentium Pro processor bus requires that every input
be sampled during a valid sampling window on a ri sing clock edg e and its effect be dr iven
out no sooner than the next rising clock edge. This approach allows one full clock for intercomponent communication and at least one full clock at the receiver to compute a response.
Figure 3 -1 illustrates th e latched bus protocol as it appears on the bus. In s ubsequent descriptions, the protocol is described as “B# is asserted in the clock after A# is observed active”, or
“B# is asserted two clocks after A# is asserted”. Note that A# is asserted in T1, but not observed
active until T2. The receiv ing agent uses T2 to determine its response and asserts B# in T3. Other agents observe B# active in T4.
The square and circle symbols are used in the timing diagrams to indicate the clock in which
particular signals of interest are dri ven and sampled. The square indicates that a signal is driven
(asserted, initiated) in that clock. The circle indicates that a signal is sampled (observed, latched)
in that clock.
3-2
BUS OVERVIEW
Full clock allowed
for signal propagation
BCLK
12 34
A#
B#
Assert A#
Latch A#
Assert B#
Latch B#
Figure 3-1. Latched Bus Protocol
Full clock allowed
for logic delays
Any signal names that appear in brackets {} are internal signals only, and are not driven to the
bus. The timing diagrams sometimes include internal signals to indicate internal state and show
how it affects e xternal signals. All timing diagrams in this specif ication sho w b us signals as they
are driven asserted or deasserted on the Pentium Pro proces sor bus. Internal signals are shown
to change state in the clock that they would be driven to the bus if they were external signals.
Internal signals actually change state internally one clock earlier.
Signals that are driven in the same clock by multiple Pentium Pro processor bus agents exhibit
a “wired-OR glitch” on the electrical-low-to-electrical-high transition. To account for this situation, these signal state transitions are specified to have two clocks of settling time when deasserted before they can be safely observed. The bus signals that must meet this criteria are:
BINIT#, HIT#, HITM#, BNR#, AERR#, BERR#.
3-3
BUS OVERVIEW
3.3.PENTIUM® PRO PROCESSOR BUS PROTOCOL OVERVIEW
Bus activity is hierarchically organized into operations, transactions, and phases.
An operation is a b us procedure that appears atomic to software ev en though it may not be atom -
ic on the bus. An operation may consist of a single bus transaction, but sometimes may involve
multiple bus transactions or a single transaction with multiple data transfers. Examples of complex bus op erations include: locked read/modify/write operations and deferred operations.
A transaction is the set of bus activities related to a single bus request. A transaction begins with
bus arbitration, and the assertion of ADS# and a transaction address. Transactions are driven to
transfer data, to inquire about or change cache state, or to provide the system with information.
A transaction contains up to six phases. A phase uses a specific set of signals to communicate a
particular type of information. The six phases of the Pentium Pro processor bus protocol are:
Arbitration
•
Request
•
Error
•
Snoop
•
Response
•
Data
•
Not all transactions contain all phases, and some phases can be overlapped.
3.3.1.Transaction Phase Description
Figure 3 -2 shows all of the Pentium Pro processor bus transaction phases for two transactions
with data transfers.
3-4
BUS OVERVIEW
4
1 23
BCLK
Arbitration
Request
Error
Snoop
Response
Data Transfer
*
The shaded vertical bar indicates one or more clock cycles are allowed between different phases.
NOTE:
1
5
678
2
1
2
1
10 11 12*
9
2
1
2
13
14
1
1
15
17
16
2
2
Figure 3-2. Pentium® Pro Processor Bus Transaction Phases
When the requesting agent does not own the bus, transactions begin with an Arbitration Phase,
in which a requesting agent becomes the bus owner.
After the requesting agent becomes the bus owner, the transaction enters the Request Phase. In
the Request Phase, the bus owner drives request and address information on the bus. The Request Phase is two clocks long. In the first clock, ADS# is driv en along with the transaction address and suff icient inf ormation to begin snooping and memory access. In the second clock, the
byte enables, deferred ID, transaction length, and other transaction information are driven.
Every transaction’s third phase is an Error Phase which occurs three clocks after the Request
Phase begins. The Error Phase indicates any parit y errors triggered by the request.
Every transaction that isn’t cancelled because an error was indicated in the Error Phase has a
Snoop Phase, four or more clocks from the Request Phase. The snoop results indicate if the address driven for a transaction references a v alid or modif ied (dirty) cache line in an y bu s agent’s
cache. The snoop results also indicate whether a transaction will be completed in-order or may
be deferred for possible out-of-order completion.
Every transaction that isn’t cancelled because an error was indicated in the Error Phase has a
Response Phase. The Response Phase indicates whether the transaction has failed or succeeded,
whether transaction completion is immediate or deferred, whether the transaction will be retried,
and whether the transaction contains a Data Phase. The valid transaction responses are:
Normal Data
•
Implicit Writeback
•
No Data
•
Hard Failure
•
3-5
BUS OVERVIEW
Deferred
•
Retry
•
If the transaction does not have a Data Phase, that transaction is complete after the Response
Phase. If the request agent has write data to transfer or is requesting read data, the transaction
has a Data Phase which may extend beyond the Response Phase.
Not all transactions contain all phases, not all phases occur in order, and some phases can be
overlapped.
All transactions that are not cancelled in the Error P hase have the Request, Erro r, Snoop,
•
and Response Phases.
Arbitration can be explicit or implicit. The Arbitration Phase only needs to occur if the
•
agent that is driving the next transaction does not already own the bus.
The Data Phase only occurs if a transaction requires a data transfer. The Data Phase can be
•
absent, response initiated, request initiated, snoop initiated, or request and snoop initiated.
The Response Phase overlaps with the beginning of the Data Phase for read transactions.
•
The Response Phase (TRDY#) triggers the Data Phase for write transactions.
•
In addition, since the Pentium Pro processor bus supports bus transaction pipelining, phases
from one transaction can overlap ph ases from another transaction, see Figure 3-2.
3.3.2.Bus Transaction Pipelining and Transaction Tracking
The Pentium Pro processor bus architecture supports pipelined transactions in which bus transactions in dif ferent phases over lap. The Pent ium Pro pro cessor b us may be conf igured t o support
a maximum of 1 or 8 outstanding transactions simultaneous ly. Each Pentium Pro processor is
capable of issuing up to four outs tanding transactions.
In order to track transactions, all bus agents must track certain trans action information. The
transaction information that must be tracked by each bus agent is:
Number of transactions outstanding
•
What transaction is next to be snooped
•
What transaction is next to receive a response
•
If the transaction was issued to or from this agent
•
This information is tracked in a queue called an In-order Queue (IOQ). All bu s agents maintain
identical In-order Queue status to track e v ery tra nsaction that is issued to the b us. When a transaction is issued to the bus, it is also entered in the IOQ of each agent. The depth of the smallest
IOQ is the limit of how many transactions can be outstanding on the bus simultaneously. Because transactions receive their responses and data in the same order as they were issued, the
transaction at the top of the IOQ is the next transaction to enter the Response and Data Phases.
A transaction is removed from the IOQ after the Response Phase is complete or after an error is
detected in the Error Phase. The simplest bus agents can simply count e vents rather than implement a queue.
3-6
BUS OVERVIEW
Other, agent specific, bus information must be tracked as well. Note that not every agent needs
to track all of this additional information. Examp les of additional information that might be
tracked follow.
Request agents (agents that issue transactions) might track:
How many more transactions this agent can still issue?
•
Is this transaction a read or a write?
•
Does this bus agent need to provide or accept data?
•
Response agents (agents that can provide transaction response and data) might track:
Does this agent own the response for the transaction at the top of the IOQ?
•
Does this transaction contain an implicit writeback data and does this agent have to recei ve
•
the writeback data?
If the transaction is a read, does this agent own the data transfer?
•
If the transaction is a write, must this agent accept the data?
•
Availability of buffer resources so it can stall further transactions if it needs to.
•
Snooping agents (agents with a cache) might track:
If the transaction needs to be snooped.
•
If the Snoop Phase needs to be extended.
•
Does this transaction contain an implicit writeback data to be supplied by this agent?
•
How many snoop requests are in the queue.
•
Agents whose transactions can be deferred might track:
The deferred transaction and its agent ID.
•
Availability of buffer resources.
•
This transaction information can be tracked by implementing multiple queues or one all encompassing In-order Queue. This document refers to these internal queue(s) as the Transaction
Queues (TQ), unless the In -or de r Queu e is sp ecifically being referenced. Note that th e IOQ
is completely visible from the bus protocol, but the Transaction Queues use internal state
information.
3.3.3.Bus Transactions
The Pentium Pro processor bus supports the following types of bus transactions.
Read and write a cache line.
•
Read and write any combination of bytes in an aligned 8-byte span.
•
3-7
BUS OVERVIEW
Read and write multiple 8-byte spans.
•
Read a cache line and invalidate it in other caches.
•
Invalidate a cache line in other caches.
•
I/O read and write.
•
Interrupt Acknowledge (requiring a 1 byte interrupt vector).
•
Special transactions are used to send various messages on the bus. The special transaction
•
for the Pentium Pro processor are:
—Shutdown
—Flush
—Halt
—Sync
— Flush Acknowledge
— Stop Clock Acknowledge
— SMI Acknowledge
— Branch trace message (providing an 8-byte branch trace address)
Deferred reply to an earlier read or write that received a deferred response.
•
Specific descriptions of each transa ction can be fou nd in Chapter 5 ,
Operations
.
Bus Transactions and
3.3.4.Data Transfers
The Pentium Pro processor bus distinguishes between
Memory transactions are used to transfer data to and from memory. Memory transactions ad-
dress memory using the full width of the address bus. The Pentium Pro processor can addres s
up to 64 Gbytes of physical memory.
I/O transactions are used to transfer data to and from the I/O address space. The Pentium Pro
processor limits I/O accesses to a 64K + 3 byte I/O address space. I/O transactions use A[16:3]#
to address I/O ports and always deassert A[35:17 ]#. A16# is zero except when the first three
bytes above the 64KByte address space are accessed (I/O wrapar ound). This is required for compatibility with previous Intel processors.
The Pentium Pro processor bus distinguishes between different transfer lengths.
3-8
memory an d I/O
transactions.
BUS OVERVIEW
3.3.4.1.LINE TRANSFERS
A line transfer reads or writes a cache line, the unit of caching in a Pentium Pro processor system. On the Pentium Pro processor this is 32 by t es ali g ned on a 32-byte boundary. While a line
is always aligned on a 32-byte boundary, a line transfer need not begin on that boundary. For a
line transfer on the Pentium Pro pro cessor, A[35:3]# carry the upper 33 bits of a 3 6-bit ph ysical
address. Address bits A[4:3]# determine the transfer order, called burst o rder. A line is trans-
ferred in four eight-byte chunks, each of which can b e identified by address bits 4:3. The chunk
size is 64-bits. Table 3-1 specifies the transfer order used for a 32-byte line, bas ed on address
bits A[4:3]# specified in the transaction’s Request Phase.
Table 3-1. Burst Order Used For Pentium® Pro Processor Bus Line Transfers
Requested
A[4:3]#
(binary)
000081018
018801810
1010101808
1118181080
Address
(hex)
1st Address
T ransferred
(hex)
2nd Address
Transferred
(hex)
3rd Address
Transferred
(hex)
4th Address
Transferred
(hex)
Note that the requested read data is always transferred first. Unlik e the Pentium processor, which
always transfers writeback data address 0 first, the Pentium Pro processor transfers writeback
data requested address first.
3.3.4.2.PART LINE ALIGNED TRANSFERS
A part-line aligned transfer moves a quantity of data smaller than a cache line but an even multiple of the chunk size between a bus agent and memory using the burst order. A part-line transfer affects no more than one line in a cache.
A 16-byte transfer on a 64-bit data bus with a 32-b yte cache line size is a part-line transfer , where
a chunk is eight bytes aligned on an eight-byte boundary. All chunks in the span of a part-line
transfer are moved across the data bus. Address bits A[4:3]# determines the transfer order for
the included chunks, using the burst order specified in Table 3-1 for line transfers.
A 16-byte aligned transfer requires two data transfer clocks on a 64-bit bus. Note that the Pentium Pro processor will not issue 16 -byte transactions.
3.3.4.3.PARTIAL TRANSFERS
On a 64-bit data bus, a partial transfer mov es from 0-8 bytes within an aligned 8-byte span to or
from a memory or I/O address. The byte enable signals, BE[7:0]#, select which bytes in the span
are transferred.
3-9
BUS OVERVIEW
The Pentium Pro processor converts non-cacheable misaligned memory accesses that cross 8byte boundaries into two partial transfers. F or example, a non-cach eable, misaligned 8-byte read
requires two Read Data Partial transactions. Similarly, the Pentium Pro processor converts I/O
write accesses that cross 4-byte boundaries into 2 partial transfers. I/O reads are treated the same
as memory reads.
On the Pentium Pro processor, I/O Read and I/O Write transactions are 1 to 4 byte partial transactions.
3.4.SIGNAL OVERVIEW
This section describes the function of the Pentium Pro processor bus signals. In this section, the
signals are grouped according to function.
In many cases, signals are mapped one-to-one to physical pins with the same names. In other
cases, different signals are mapped onto the same pin. F o r e x ample, th is is the case with the address pins A[35:3]#. During the first clock of the Request Phase, the address signals are driven.
The first clock is indicated by the lower case a, or just the pin name itself: Aa[35:3]#, or
A[35:3]#. During the second clock of the Request Phase, other information is driven on the request pins. These signals are referenced eith er by their functiona l signal names DID[7:0]# , or by
using a lower case b with the pin name: Ab[23:16]#. Note that several pins also have configuration functions at the active to inactive transition of RESET#.
3.4.1.Execution Control Signals
Table 3-2. Execution Control Signals
Pin/Signal NamePin/Signal MnemonicNumber
Bus ClockBCLK1
InitializationINIT#, RESET#2
FlushFLUSH#1
Stop ClockSTPCLK#1
Interprocessor Communication and InterruptsPICCLK, PICD[1:0]#, LINT[1:0]5
The BCLK (Bus Clock) input signal is the Pentium Pro processor bus clock. All agents drive
their outputs and latch their inputs on the BCLK rising edge. Each Pentium Pro processor derives its internal clock from BCLK by multiplying the BCLK frequency by a multiplier determined at configuration. See Chapter 9, Configuration for configuration specifications.
The RESET# input signal resets all Pentium Pro processor bus agents to known states and invalidates their internal caches. Modified or dirty cache lines are NOT written back. After RESET# is deasserted, each Pentium Pro processor begins execution at the power on reset vector
defined during configuration. On observing active RESET#, all bus agents must deassert their
outputs within two clocks. Configuration parameters are sampled on the clock following the
sampling of RESET# inactive. (Two clocks following the deassertion of RESET#.)
3-10
BUS OVERVIEW
The INIT# input signal resets all Pentium Pro processor bus agents without affecting their internal (L1 or L2) caches or their floating-point registers. Each Pentium Pro processor begins execution at the address vector as defined during power on configuration. INIT# has another
meaning on RESET#’s acti ve to inacti ve tran sition: if INIT# is sampled active on RE SET#’ s active to inactive transition, then the Pentium Pro processor executes its built-in self test (BIST).
If the FLUSH# input signal is asserted, the Pentium Pro processor bus agent writes back all internal cache lines in the Modified state (L1 and L2 caches) and invalidates all internal cache
lines (L1 and L2 caches). The flush operation puts all internal cache lines in the Invalid state.
After all lines are written back and invalidated, the Pentium Pro processor drives a special transaction, the Flush Acknowledge transaction, to indicate completion of the flush operation. The
FLUSH# signal has a different meaning when it is sampled asserted on the active to inactive
transition of RESET#. If FLUSH# is sampled asserted on the active to inactive transition of RESET#, then the Pentium Pro processor tristates all of its outpu ts. This function is used during
board testing.
The Pentium Pro processor supplies a STPCLK# pin to enable the processor to en ter a lo w po wer state. When STPCLK# is asserted, the Pentium Pro processor puts itself into the stop grant
state, issues a Stop Grant Acknowledge special transaction, and op tionally stops providing internal clock signals to all units except the bus unit and the APIC unit. The processor continues
to snoop bus transactions while in stop grant state. When STPCLK# is deasserted, the processor
restarts its internal clock to all un its and resumes execution. The assertion of STPCLK# has no
effect on the bus clock.
The PICCLK and PICD[1:0]# signals support the Adv anced Pro grammable Inte rrupt Contro ller
(APIC) interface. The PICCLK signal is an input clock to the Pentium Pro processor for synchronous operation of the APIC bus. The PICD[1:0]# signals are used for bidirectional serial
message passing on the APIC bus.
LINT[1:0] are local interrupt signals, also defined by the APIC interface. In APIC disabled
mode, LINT0 defaults to INTR, a maskable interrupt request signal. LINT1 defaults to NMI, a
non-maskable interrupt. Both signals are asynchrono us inputs. In the APIC enable mode, LINT0
and LINT1 are defined with the local vector table.
LINT[1:0] are also used along with the A20 M# and IGNNE# signals to de termine the multiplier
for the internal clock frequency as described in Chapter 9, Configuration.
3-11
BUS OVERVIEW
3.4.2.Arbitration Phase Si gnals
This signal group is used to arbitrate for the bus.
Table 3-3. Arbitration Phase Signals
Pin/Signal NamePin MnemonicSignal MnemonicNumber
Symmetric Agent Bus RequestBR[3:0]#BREQ[3:0]#4
Priority Agent Bus RequestBPRI#BPRI#1
Block Next RequestBNR#BNR#1
LockLOCK#LOCK#1
Up to five agents can simultaneously arbitrate for the bus, one to four symmetric agents (on
BREQ[3:0]#) and one priority agent (on BPRI#). Pentium Pro processors arbitrate as symmetric
agents. The priority agent normally arbitrates on behalf of the I/O subsystem (I/O agents) and
memory subsystem (memory agents).
Owning the bus is a necessary condition for initiating a bus transaction.
The symmetric agents arbitrate for the bus based on a round-robin rotating priority scheme. The
arbitration is fair and symmetric. After reset, agent 0 has the highest priority followed b y agents
1, 2, and 3. All bus agents track the current bus owner. A symmetric agent requests the bus by
asserting its BREQn# signal. Bas ed on the values sampled on BR EQ[3:0]#, and the last symmetric bus owner, all agen ts sim ultaneously determine the next symmetric bus owner.
The priority agent asks for the bus by asserting BPRI#. The assertion of BPRI# temporarily
overrides, but does not otherwise alter the symmetric arbitration scheme. When BPRI# is sampled active, no symmetric agent issues another unlocked b us transaction until BPRI# is sampled
inactive. The priority agent is always the next bus owner.
BNR# can be asserted by any bus agent to block further transactions from being issued to the
bus. It is typically asserted when system resources (such as address and/or data buffers) are
about to become temporarily busy or filled and cannot accommodate another transaction. After
bus initialization, BNR# can be asserted to delay the first bu s transaction until all bus agents are
initialized.
The assertion of the LOCK# signal indicates that the bus agent is executing an atomic sequence
of bus transactions that must not be interrupted. A lock ed operation cannot b e interrupted b y another transaction regardless of the assertion of BREQ[3:0]# or BPRI#. LOCK# can be used to
implement memory-based semaphores. LOCK# is asserted from the first transaction’s Request
Phase through the last transaction’s Response Phase.
3-12
BUS OVERVIEW
3.4.3.Request Signals
The request signals transfer request information, including the transaction address. A Request
Phase is two clocks long beginning with the assertion of ADS#, the Address Strobe signal, as
shown in Table 3-4.
Address ParityAP[1:0]#Address Pa rityAP[1:0]# 2
Request ParityRP#Request ParityRP#1
NOTES:
1. These signals are driven on the indicated pin during the first clock of the Request Phase (the clock in
which ADS# is driven asserted).
2. These signals are driven on the indicated pin during the second clock of the Request Phase (the clock
after ADS# is driven asserted).
1
Extended Request
1
Debug (optional)
Attributes
Deferred ID
Byte Enables
Extended Functions
2
2
2
2
REQa[4:0]#5
2
REQb[4:0]#
Aa[35:3]#33
Ab[35:32]#
ATTR[7:0]# or
Ab[31:24]#
DID[7:0]# or
Ab[23:16]#
BE[7:0]# or
Ab[15:8]#
2
EXF[4:0]# or
Ab[7:3]#
The assertion of ADS# defines the beginning of the Request Phase. The REQa[4:0]# and
Aa[35:3]# signals are valid in the clock that ADS# is ass erted. The REQ b[4:0]#, ATTR[7:0 ]#,
DID[7:0], BE[7:0]#, and the EXF[4:0]# signals are all v alid in the clock after ADS# is asserted.
RP# and AP[1:0]# are valid in bot h clocks of the Request Phase. The LOCK# signal from the
Arbitration Phase is asserted in the clock that ADS# is asserted for a bus locked operation.
The REQa[4:0]# and the REQb[4:0]# signals id entify the transactio n type as defined by Table
3-5. Note that partial memory read/write transactions can be locked on the bus by asserting
the LOCK# signal. Transactions are described in detail in Chapter 5, Bus Transactions andOperations.
3-13
BUS OVERVIEW
Table 3-5. Transaction Types Defined by REQa#/REQb# Signals
REQa[4:0]# REQb[4:0]#
Transaction
Deferred Reply00000xxxxx
(Ignore)
Rsvd
Interrupt Acknowledge01000DSZ#x00
Special Transactions01000DSZ#x01
8.LEN# indicates the length of the data transfer. See Table 3-7.
9.REQa0# active indicates the bus agent will have to provide write data and must have a TRDY#.
10. REQa1# or REQa2# active indicate that the transaction is to memory.
11. DSZ# is driven by the initiator and ignored by the responder. For the Pentium Pro processor, DSZ# =
00.
3-14
Pro processor, x implies “don’t care.”
BUS OVERVIEW
Table 3-6. Address Space Size
ASZ[1:0]#Memory Address SpaceObserving Agents
0032-bit32 & 36 bit agents
0136-bit36 bit agents only
10ReservedNone
11ReservedNone
If the memory access is within the 0-to-(4GBy te -1) address space, ASZ[1:0]# must be 00B. If
the memory access is within the 4Gbyte-to-(64GByte -1) address space, ASZ[1:0]# must be
01B. All observing bus agents that support the 4Gbyte (32 bit) address space must respond to
the transaction only when ASZ[1:0]# equals 00B. All observing bus agents that support the
64GByte (36- bit) address space must respond to the transaction wh en ASZ[1:0]# equals 00B or
01B.
Table 3-7. Length of Data Transfer
LEN[1:0]#LengthBE[7:0]#
000-8-bytesSpecify granularity
0116-bytesAll active
1032-bytesAll active
11Reserved
The LEN[1:0]# signals determine the length of the transfer . The Pen tium Pro processor will not
issue a request for a 16 byte data transfer.
In the clock that ADS# is asserted, the Aa[35:3]# signals provide a 36-bit, active-low
address as part of the request. The Pentium Pro processor physical address space is 2
36
bytes
or 64-Gigabytes (64 Gbyte). Address bits 2, 1, and 0 are mapped into byte enab le signals
for 0 to 8 byte transfers.
The address signals are protected by the AP[1:0]# pins. AP1# covers A[35:24]#, AP0# covers
A[23:3]#. AP[1:0]# must be valid for two clocks beginning when ADS# is asserted. A parity
error detected on AP[1:0]# is indicated in the Error Phase. A parity signal on the Pentium Pro
processor bus is correct if there are an e ven number o f electrically low signals in the set con sisting of the covered signals plus the parity signal. Parity is computed using voltage levels, re gardless of whether the covered signals are active high or acti v e low.
The Request Parity pin RP# covers the request pins REQ[4:0]# and the address strobe, ADS#.
RP# must be valid for two clocks beginning when ADS# is asserted. A parity error detected on
RP# is indicated in the Error Phase.
In the clock after ADS# is asserted, the A[35:3]# pins supply cache attribute information, a
deferred ID, the byte enables and other information regarding the transaction. Specifically,
the following signals are supported: ATTR[7:0]#, DID[7:0]#, BE[7:0]#, and EXF[4:0]#. The
description for these signals follows.
3-15
BUS OVERVIEW
The ATTR[7:0]# pins describe the cache attributes. Th ey are driven based on the Memory Type
Range Register attributes and the Page Table attributes as described in Table 3-8. See Chapter
6, Range Registers for a description of the memory types.
The DID[7:0]# signals contain the request agent ID on bits DID[6:4]#, the transaction ID on
DID[3:0]#, and the agent type on DID[7]#. Symm etric agents use an agent type of 0. All prio rity
agents use an agent type of 1. Every deferrable transaction (DEN# asserted) issued on the Pentium Pro processor bus which has not been guaranteed completion will have a unique Deferred
ID. After one of these transactions passes its Snoop Result Phase without DEFER# asserted, its
Deferred ID may be reused. During a deferred reply transaction, the Deferred ID of the agent
that deferred the original transaction is driven instead of an address.
Table 3-9. DID[7:0]# Encoding
DID[7]#DID[6:4]#DID[3:0]#
Agent TypeAgent IDTransaction ID
The Byte Enables BE[7:0]# are us ed to determ ine which bytes of data should be transferred if
the data transfer is less than 8 bytes wide. BE7# applies to D[63:56 ], BE0# applies to D[7:0].
The byte enables are also used for special transaction encoding (see Table 3-10).
3-16
Table 3-10. Special Transaction Encoding on Byte Enables
Special TransactionByte Enables[7:0]#
Shutdown0000 0001
Flush0000 0010
Halt0000 0011
Sync0000 0100
Flush Acknowledge0000 0101
Stop Grant Acknowledge0000 0110
SMI Acknowledge0000 0111
Reservedall other encodings
The Extended Functions, EXF[4:0]#, supported are listed in Table 3-11.
Table 3-11. Extended Function Pins
Extended Function PinExtended Function SignalFunction
EXF4#SMMEM#Accessing SMRAM space
EXF3#SPLCK#Split Lock
EXF2#Reserved
EXF1#DEN#Defer Enable
EXF0#Reserved
BUS OVERVIEW
EXF4# (SMM Memory) is asserted by the Pentium Pro processor i f the processor is in System
Management Mode and indicates that the processor is accessing a separate “shadow” memory,
the SMRAM. Each memory or I/O agent must observe this signal and only accept a tran saction
involving SMRAM if the agent provides the SMRAM.
EXF3# (Split Lock) is asserted to indicate that a locked operation is split across 32-byte boundaries for writeback memory or 8-byte boundaries for uncacheable memory. Note that SPLCK#
is asserted for the first transaction in a locked operation only.
EXF1# is asserted if the transaction can be deferred by the responding agent. EXF 1# is always
deasserted for the transactions in a locked operation, deferr ed reply transactions, and bus Writeback Line transactions.
3-17
BUS OVERVIEW
3.4.4.Error Phase Signals
The Error Phase signal group (see Table 3-12) contains signals driven in the Error Phase. This
phase is one clock long and always begin s three clocks after the Request Phase be gins (3 clocks
after ADS# is asserted).
Table 3-12. Error Phase Signals
T ypeSignal NamesNumber
Address Parity ErrorAERR#1
The AERR# driv er can be enabled or disabled as part of the po wer on conf iguration ( see Chapter
9, Configuration). If the AERR# driver of all bus agents is disabled, request and address parity
errors are ignored and no action is taken b y the Pentium Pro processor bus agents. If the AERR#
driver of at least one bus agent is enabled, the agents observing a Request Phase check the Address Parity signals (AP[1:0]#) and assert AERR# in the Error Phase if an address parity error
is detected. AERR# is also asserted if an RP# parity error is detected in the Request Phase.
AERR# must not be asserted by an agent for an upper address parity error (AP1#) when the
transaction address is not in the address range of the agent. Thus 32-bit agents must ignore memory transactions unless ASZ[1:0]# = 00B. 36- bit agents must ig nore memory tran sactions unless
ASZ[1:0]# = 00B or 01B.
The Pentium Pro processor sup ports two modes of response when the AERR# dri v er is enabled .
This is the “AERR# observation” which may be configured at power-up. AERR# observation
configuration must be consistent betw een all bus agents. If AERR# observation is disabled,
AERR# is ignored and no action is taken by the bus agents. If AERR# observation is enabled
and AERR# is sampled asserted, the request is cancelled. In addition, the request agent may r etry the transaction at a later time up to its retry limit. The Pentium Pro processor has a retry limit
of 1, after which the error becomes a hard error as determined by the initiating processor.
If a transaction is cancelled by AERR# assertion, then the transaction is aborted, remov ed fro m
the In-order Queue and there are no further valid phases for that transaction. Snoop results are
ignored if they cannot be cancelled in time. All agents reset their rotating ID for bus arbitration
to the state at reset (such that bus agent 0 has highest priority).
3.4.5.Snoop Signals
The snoop signal group (see Table 3-13) provides snoop result information to the Pentium Pro
processor bus agents in the Snoop Phase. The Snoop Phase is four clocks after a transaction’s
Request Phase begins (4 clocks after ADS# is asserted), or the 3rd clock after the previous snoop
results, whichever is later.
3-18
BUS OVERVIEW
Table 3-13. Snoop Signals
TypeSignal NamesNumber
Keeping a Non-Modified Cache LineHIT#1
Hit to a Modified Cache LineHITM#1
Defer Transaction CompletionDEFER#1
On observing a Request Phase (ADS# active) for a memory access, all caching agents are required to perform an inter nal snoop operation and appropriately return HIT# and HITM# in th e
Snoop Phase. HIT# and HITM# are be used to indicate that the line is valid or invalid in the
snooping agent, whether the line is in the modified (dirty) state in the caching agen t, or whether
the Snoop Phase needs to be e xtended. The HIT# and HITM# signals ar e used to maintain cach e
coherency at the system level. A caching agent must assert HIT# and deassert HITM# in the
Snoop Phase if the agent plans to retain the line in its cache after the snoop. Otherwise, unless
the caching agent wishes to stall the Snoop Phase, the HIT# signal should be deasserted. The
requesting agent determines the highest permissible cache state of the line using the HIT# signal.
If HIT# is asserted, the requester may cache the line in the Shared state. If HIT# is deasserted,
the requester may cache the line in the Exclusive o r Shared state. Multiple caching agents can
assert HIT# in the same Snoop Phase.
A snooping agent asserts HITM# if the line is in the Modified state. After asserting HITM#, the
agent assumes responsibility for writing back the modified line during the Data Phase (this is
called an implicit writeback).
The memory agent must observe HITM# in the Snoop Phase. If the memory agent observes
HITM# active, it relinquishes responsibility for the data return and becomes a target for the implicit cache line writeback. The memory agent must merge the cache line being written back
with any write data and update memor y . The mem ory agent must also pro vide the implicit writeback response for the transaction.
The Pentium Pro processor and bus supports self snooping. Self snooping means that an
agent can snoop its own request and drive the snoop result in the Snoop Phase. The Pentium
Pro processor uses self-snooping to resolve certain boundary conditions associated with
bus-lock operations that hit Modified cac he lines, and co nflicts associat ed with page table
aliasing. Because the Pentium Pro p rocessor uses self-snooping, the mem ory agent mustalways provide support for implicit writebacks, even in uniprocessor systems.
If HIT# and HITM# are sampled asserted together in the Snoop Phase, it means that a caching
agent is not ready to indicate snoop status, and it n eeds to stall the Sno op Phase. Th e snoo p signals (HIT#, HITM#, and DEFER#) are sampled again two clocks later. This process continues
as long as the stall state is sampled. The snoop stall is provided to stretch the completion of a
snoop as needed by any agent that needs to block further progress of snoops.
The DEFER# signal is also driven in the Snoop Phase. DEFER# is deasserted to indicate that
the transaction can be guaranteed in-order completion. An agent asserting DEFER# ensures
proper removal of the transaction from the In-order Que ue by generating the appr opriate
response. There are three valid responses when DEFER# is sampled asserted (and HITM# is
sampled deasserted): the deferred response, implying that the operation will be completed at a
3-19
BUS OVERVIEW
later time; a retry response, implying that the transaction should be retried; or a hard erro r
response.
HITM# overrides DEFER# to determine the response type. DEFER# may still affect a locked
operation. See Chapter 5, Bus Transactions and Operations for details.
The requesting agent observes HIT#, HITM#, and DEFER# to determ ine the lin e’s final state in
its cache. DEFER# inactive enables the requesting agent to complete the transaction in order and
make the transition to the final cache state. A transaction with DEFER# active (and HITM# inactive) can be completed with a deferred reply transaction (and a delayed transition to final
cache state) or can be retried.
3.4.6.Re sponse Signals
The response signal group (see Table 3-14) provides response information to the requesting
agent in the Response Phase. The Response Phase of a transaction occurs after the Snoop Phase
of the same transaction, and after the Response Phase of a previous transaction. Also, if the
transaction includes a data transfer, the data transfer of a previou s transaction must be complete
before the Response Phase for the new transaction is entered.
Table 3-14. Response Signals
T ypeSignal NamesNumber
Response Status RS[2:0]#3
Response ParityRSP#1
Target Ready (for writes)TRDY#1
Requests initiated in the Request Phase enter the In-order Queue, which is maintained by every
agent. The response agent is the agent responsible for completing the transaction at the top of
the In-order Queue. The response agent is the agent addressed by the transaction.
For write transactions, TRDY# is asserted by the response agent to indicate that it is ready to
accept write or writeback data. For write transactions with an implicit writeback, TRDY#
is asserted twice, first for the write data transfer and then again for the implicit writeback
data transfer.
The response agent asserts RS[2:0]# to indicate one of the valid transaction responses indicated
in Table 3-15.
3-20
BUS OVERVIEW
Table 3-15. Transaction Response Encodings
RS2#RS1#RS0#Description and Required Snoop Result
0 0 0 Idle state. (The RS[2:0]# pins must be driven inactive after being
0 0 1 Retry response.
0 1 0 Deferred response. The data bus is used only by a writing agent.
0 1 1 Reserved.
1 0 0Hard failure response.
1 01 No Data response.
1 10 Implicit writeback response. A snooping agent will transfer writeback
11 1Normal data response
sampled asserted)
data on the data bus. Memory agent must merge writeback data
with any transaction data and provide the response. (HITM#=1)
The RS2#, RS1#, an d RS0# s ignals mu st be interp reted toge ther and c annot be interpreted
individually.
The RSP# signal provides parity for RS[2:0]#. RSP# must be valid on all clocks, not just response clocks. A parity signal on the Pentium Pro processor bus is correct if there are an even
number of low signals in the set consisting of the covered signals plus the parity signal. Parity
is computed using voltage levels, regardless of whether the covered signals are act ive high or
active low.
3.4.7.Data Phase Signals
The data transfer signals group (see Table 3-16) contains signals dri ven in the Data Phase. Some
transactions do no t t rans fer data and have no Data Phase. A Data Phase ranges from one to four
clocks of actual data being transferred. A cache line transfer takes four data transfers on a 64-bit
bus. A transfer can contain waitst ates which extends the length of the Data Phase. Read transactions have zero or one Data Phase, write transactions have zero, one or two Data Phases.
Table 3-16. Data Phase Signals
TypeSignal NamesNumber
Data Ready DRDY#1
Data Bus BusyDBSY#1
DataD[63:0]#64
Data ECC ProtectionDEP[7:0]#8
3-21
BUS OVERVIEW
DRDY# indicates that valid data is on the bus and must be latched. The data bus owner asserts
DRDY# for each clock in which valid data is to be transferred. DRDY# can be deasserted to
insert wait states in the Data Phase.
DBSY# is used to hold the bus before the first DRDY# and between DRDY# assertions for a
multiple clock data transfer. DBSY# need not be asserted for single clock data tran sfers if no
wait states are needed.
During deferred reply transactions, the ag ent that initiates the deferred reply provides the response for the transaction. If there is data to transfer, it is transferred with the same protocol as
read data (in other words, no TRDY# is needed).
The D[63:0]# signals provide a 64-bit data path between bus agents. For a partial transfer, including I/O Read and I/O Write, the byte enable signals, BE[7:0]# determine which bytes of the
data bus will contain the valid data.
The DEP[7:0]# signals pr ov ide o ptional ECC (err or corr ectin g code) cov erin g D[63:0] #. As d escribed in Chapter 9, Configuration, the Pentium Pro processor data bus can be configured with
either no checking or ECC. If ECC is enabled, then DEP[7:0]# provides v alid ECC for the entire
data bus on each data clock, regardless of which bytes are enabled. The error correcting code
can correct single bit errors and detect double bit errors.
3.4.8.Error Signals
The error signals group (see Table 3-17) contains error signals that are not part of the Error
Phase.
Table 3-17. Error Signals
T ypeSignal NamesNumber
Bus InitializationBINIT#1
Bus ErrorBERR#1
Internal ErrorIERR#1
FRC ErrorFRCERR1
BINIT# is used to signal any bus condition that prevents reliable future operation of the bus.
Like the AERR# pin, the BINIT# dri v er can be en abled or disabled a s part of the power-on configuration (see Chapter 9, Configuration). If the BINIT# driver is disabled, BINIT# is never asserted and no action is taken by the Pentium Pro processor on bus errors.
Regardless of whether the BINIT# driver is enabled, the Pentium Pro processor bus agent supports two modes of operation that may b e con figured at power on. These are the BINIT# observation and driving modes. If BINIT# observation is disabled, BINIT# is ignored and no action
is taken by the processor even if BINIT# is sampled asserted. If BINIT# observation is enabled
and BINIT# is sampled asserted, all bus state machines are reset. All agents reset their rotating
ID for bus arbitration, and internal state information is lost. L1 and L2 cache contents are not
affected. BINIT# observation and driving must be enabled for proper Pentium Pro processor
operation.
3-22
BUS OVERVIEW
A machine-check exception may or may not be taken for each assertion of BINIT# as conf igured
in software.
The BERR# pin is used to signal any error condition caused by a bus transaction that will not
impact the reliable operation of the bus p rotocol (for e xample, memo ry data error, non-modified
snoop error). A bus error that causes the assertion of BERR# can be detected by the processor,
or by another bu s agent. The BERR# driver can be enabled or disabled at power-on reset. If the
BERR# driver is disabled, BERR# is never asserted. If the BERR# driver is enabled, the Pentium Pro processor may assert BERR#.
A machine check exception may or may not be taken for each assertion of BERR# as config ured
at power on. The Pentium Pro processor will always disable the machine check exception by
default.
If a Pentium Pro processor detects an internal error unrelated to bus operation, it asserts IERR#.
For example, a parity error in an L1 or L2 cache causes a Pentium Pro processor to assert IERR#.
A machine check exception may or may no t be tak en for each asser tion of IERR# as conf ig ured
with software.
Two Pentium Pro processors may be configured as an FRC (functional redundancy checking)
pair. In this configuration, one processor acts as the master and the other acts as a checker, and
the pair operates as a single, logical Pentium Pro processor. If the checker Pentium Pro processor
detects a mismatch between its internally sampled outputs and the master Pentium Pro processor’s outputs, the checker asserts FRCERR. FRCERR observation can be enabled at the master
processor with software. The master enters machine check on an FRCERR provided that Machine Check Execution is enabled.
The FRCERR signal is also t oggle d during the Pen tium P ro proces sor’s reset action. A Pentium
Pro processor asserts FRCERR one clock after RESET# transitions from its active to inactive
state. If the Pentium Pro processor executes its built-in self test (BIST), then FRCERR is asserted throughout that test. When BIST completes, the Pentium Pro processor desserts FRCERR if
BIST succeeds and continues to assert FRCERR if BIST fails. If the Pentium Pro processor does
not execute the BIST action, then it keeps FRCERR asserted for less than 20 clocks and then
deasserts it.
3.4.9.Compatibility Signals
The compatibility signals group (see Table 3-18) contains signals defined for compatibility
within the Intel architecture processor family.
The Pentium Pro processor asserts FERR# when it detects an unmasked floating-point error.
FERR# is included for compatibility with systems using DOS-type floating-point error
reporting.
If the IGNNE# input signal is asserted, the Pentium Pro processor ignores a numeric error and
continues to execu te non-control fl oating-p oint instruct ions. If the IGNNE# input s ignal is deasserted, the Pentium Pro processor freezes on a non-control floating-point instruction if a previous instruction caused an error.
If the A20M# input signal is asserted, the Pentium Pro processor masks physical address bit 20
(A20#) before looking up a line in any internal cache and before driving a memory read/write
transaction on the bus. Asserting A20M# emulates the 8086 processor’s address wraparound at
the one Mbyte boundary. A20M# must only be asserted when the processor is in real mode.
A20M# is not used to mask external snoop addresses.
The IGNNE# and A20M# signals are valid at all times. These signals are normally not guaranteed recognition at specific boundaries. However, to guarantee recognition of A20M#, and the
trailing edge of IGNNE# following an I/O write instruction, these signals must be valid in the
Response Phase of the corresponding I/O Write bus transaction.
The A20M# and IGNNE# signals have different meanings during a reset. A20M# and IGNNE#
are sampled on the active to inactive transition of RESET# to determin e the multiplier for th e
internal clock frequency, as described in Chapter 9, Configuration.
System Management Interrupt is asserted asynchronously by system logic. On accepting a System Management Interrupt, the Pentium Pro processor saves the current state and enters SMM
mode. It issues an SMI Acknowledge Bus transaction and then begins program execution from
the SMM handler.
The BP[3:2]# signals are the Sy stem Support group Breakpoint signals. They are outputs from
the Pentium Pro processor that indicate the status of breakpoints.
The BPM[1:0]# signals are more System Support group breakpoint and performance monitor
signals. They are outputs from the Pen tium Pro pro cessor that ind icate the status of breakpoints
and programmable counters used for monitoring Pentium Pro processor performance.
3-24
BUS OVERVIEW
The diagnostic signals g rou p s hown in T abl e 3 -19 provides signals for probing the Pentium Pr o
processor, monitoring Pentium Pro processor performance, and implementing an IEEE 1149.1
boundary scan.
PM[1:0]# are the Performance Monitor signals. These signals are outputs from the Pentium Pro
processor that indicate the status of four programmable counters for monitoring Pentium Pro
processor performance.
TCK is the Test Clock, used to clock activity on the five-signal Test Access Port (TAP). TDI is
the Tes t Data In signal, transferring serial test data into the Pen tium Pro process or. TDO is the
T est Data Ou t signal, transferr ing serial test data o ut of the Pentium Pro processor. TMS is used
to control the sequence of TAP controller state changes. TRST# is used to asynchronously initialize the TAP controller.
3.4.11.Power, Ground, and Reserved Pins
The Pentium Pro processor bus and Pentium Pro processor dedicate many pins to power and
ground signals. Refer to Chapter 15, Mechanical Specifications for the pin assignment.
3-25
Bus Protocol
4
CHAPTER 4
BUS PROTOCOL
This chapter describes the protocol followed by bus agents in a transaction’s six phases. The
phases are:
Arbitration Phase
•
Request Phase
•
Error Phase
•
Snoop Phase
•
Response Phase
•
Data Phase
•
4.1.ARBITRATION PHASE
A bus agent must have bus ownership before it can initiate a transaction. If the agent is not the
bus owner, it enters the Arbitration Phase to obtain ownership. Once ownership is obtained, the
agent can enter the Request Phase and issue a transaction to the bus.
4.1.1.Protocol Overview
The Pentium Pro processor bus arbitration protoco l supports two classes of bus agents: symmetric agents and priority agents.
The symmetric agents support fair, distributed arbitration using a round-robin algorithm. Each
symmetric agent has a unique Agent ID between zero and three assigned at reset. The algorithm
arranges the four symmetric agents in a circular order of priority: 0, 1, 2, 3, 0, 1, 2, etc. Each
symmetric agent also maintains a common Rotating ID that reflects the symmetric Agent ID of
the most recent bus owner. On every arbitration e v ent, the symmetric agent with the highe st priority becomes the symmetric owner . Note that the symmetric owner is not necessarily the ov erall
bus owner. The symmetric owner is allowed to enter the Request Phase provided no other action
of higher priority is preventing the use of the bus.
The priority agent(s) has higher priority than the symmet ric owner. Once the priority agent arbitrates for the bus, it prevents the symmetric owner from entering into a new Request Phase
unless the new transaction is part of an ongoing bus l ocked operation. The priority agent is allowed to enter the Request Phase provided no other action of higher priority is preventing the
use of the bus.
Pentium Pro processors are symmetric agents. The priority agent normally arbitrates on behalf
of the I/O and possibly memory subsystems.
4-1
BUS PROTOCOL
Besides the two classes of arbitration agents, each bus agent has two actions available that act
as arbitration modifiers: the bu s lock and the request stall.
The bus lock action is available to the current symmetric owner to block other agents, including
the priority agent from acquiring the bus. Typically a bus locked operation consists of two or
more transactions issued on the bus as an indivisible sequence (this is indicated on the bus by
the assertion of the LOCK# pin). Once th e symmetric bus owner has successfully initiated the
first bus locked transaction it continues to issue remaining requests that are part of the same indivisible operation without releasing the bus.
The request stall action is available to any bus agent that is unable to accept new bus transactions. By asserting a signal (BNR#) any agent can prevent the current bus owner from issuing
new transactions.
In summary, the priority for entering the Request T ransfer Phase, assuming there is no bus stall
or arbitration reset event, is:
1. The current bus owner retains ownership until it completes an ongoing indivisible bus
locked operation.
2. The priority agent gains bus ownership over a symmetric owner.
3. Otherwise, the current symmetric owner as determined by the rotating priority is allowed
to generate new transactions.
4.1.2.Bus Signals
The Arbitration Phase signals are BREQ[3:0]#, BPRI#, BNR#, and LOCK#.
BREQ[3:0]# bus signals are connected to the four symmetric agents in a rotating manner as
shown in Figure 4-1. This arrangement initializes every symmetric agent with a unique Agent
ID during power-on configuration. Every symmetric agent has one input/output pin, BR0#, to
arbitrate for the bus during normal o peration. The remaining three pins, BR1#, B R2#, and BR3#,
are input only and are used to observe the arbitration requests of the remaining three symmetric
agents.
At reset, the central agent is respon sible for asserting the BREQ0# bus signal. BREQ[3:1]#
remain deasserted. All Pentium Pro processors sample BR[3:1]# on the active to inactive transition of RESET# to determine their arbitration IDas follows :
The BR1#, BR2#, and BR3# pins are all inactive on Agent 0.
•
Agent 1 has BR3# active.
•
Agent 2 has BR2# active.
•
Agent 3 has BR1# active.
•
The BPRI# signal is an output from the p riority agent by w hich it arb itrates for th e bus ownership and an input to the symmetric agents. The LOCK# and BNR# signals are bi-directional signals bused among all agents. The current bus owner uses LOCK# to define an indivisible bus
locked operation. BNR# is used by any bus agent to stall further request phase initiation.
4-2
BPRI#
BUS PROTOCOL
Priority
Agent
Agent 0 Agent 1
BR0#
BR1#
System
Interface Logic
During Reset
BR2#
BR3#
BREQ0#
BREQ1#
BREQ2#
BREQ3#
BR0#
BR1#
BR2#
BR3#
BR0#
Agent 2
BR1#
BR2#
BR3#
BR0#
Agent 3
BR1#
BR2#
BR3#
Figure 4-1. BR[3:0]# Physical Interconnection
4.1.3.Internal Bus States
In order to maintain a glueless MP interface, some bus state is distributed and must be tracked
by all agents on the bus. This section describes the bu s state that needs to be tracked internally
by Pentium Pro processor bus agents.
4.1.3.1.SYMMETRIC ARBITRATION STATES
As described before, each symmetric agent must maintain a two-bit Agent ID and a two-bit
Rotating ID to perform distributed round-robin arbitration. In addition, each symmetric agent
must also maintain a symmetric ownership state bit that describes if the bus ownership is being
retained by the current symmetric owner (“busy” state) or being returned to a state where no
4-3
BUS PROTOCOL
symmetric agent currently owns the bus (“idle” state). The Pentium Pro processor will enter the
idle state after AERR#, BINIT# and RESET#. The notion of idle state enables a shorter, twoclock arbitration laten c y from b us requ est to i ts o wn ers hip. The no tio n of b us y state enables b us
parking but increases arbitration latency to a minimum of four clocks due to a handshake with
the current symmetric owner. Bus parking means that the current bus owner maintains bus ownership even if it curren tly does not ha v e a pendin g transaction. If a transaction becomes pendin g
before that bus owner relinquishes bus o wners hi p, i t can drive the transaction without having to
arbitrate for the bus. The Pentium Pro process or implements bus parking.
4.1.3.1.1.Agent ID
An agent’s Agen t ID is determined at reset and can not change without the assertion of RESET#.
The Agent ID is unique for every symmetric agent.
4.1.3.1.2.Rotating ID
The Rotating ID points to the agent that will be the lowest priority agent in the next arbitration
event with active requests, (this is the Agent ID of the current symmetric bus owner). Al l symmetric agents maintain the same Ro tating ID. The Rotating ID is in itialized to 3 at reset. It is
assigned the Agent ID of the new symmetric owner after an arbitration event so that t he new
owner becomes the lowest priority agent on the next arbitration event.
4.1.3.1.3.Symmetric Ownership State
The symmetric ownership state is reset to idle on an arbitration reset. The state becomes busy
when any symmetric agent completes the Arbitration Phase and becom es symmetric o wner. The
state remains busy while the current symmetric owner retains bus ownership or transfers it to a
different symmetric agent on the next arbitration event. When the state is busy, the Rotating ID
is the same as the symmetric owner Agent ID. When the state is idle, the Rotating ID is the same
as the last symmetric owner Agent ID. Note that the symmetric ownership state refers only to
the symmetric bus owner. The priority agent can have actual physical ownership of the request
bus, even while the state is busy and there is also a symmetric bus owner.
4.1.3.2.REQUEST STALL PROTOCOL
Any bus agent can stop all agents from issuing transactions via the BNR# (block next request)
pin. This is typically done when the agent has one f ree request b uffer remaining and cannot rely
on the In-order Queue depth limit to sufficiently limit the number of transactions initiated on the
bus. BNR# can be used to stall transactions for a user-defined amount of time, or it can be used
to throttle the frequency of the transactions issued to the bus. BNR# can also be used to prevent
any transactions from being issu ed after RESET# or BINIT# to blo ck transactions while bus
agents initialize themselves. For debugging, performance monitoring, or test purposes, an agent
can assert BNR# to issue one transaction to the bus at a time (no pipelining). When stalling the
bus, the stalling condition must be able to cl ear without requiring access to the bus.
4-4
BUS PROTOCOL
4.1.3.2.1.Request Stall States
The request stall protocol can be described using three states: The “free” state in which transactions can be driv en to the bu s nor mally, one every 3 clocks, th e “stalled” state in wh ich no transactions are driven to the bus, and the “throttled” state in which one transaction may be dri v en to
the bus. The throttled state is a temporary state which w ill transition to either free or stalled at
the next sample point.
If BNR# is always active when sampled, then no transactions are driven to the bus because all
agents remain in the stalled state.
To get to the free state where transactions are driven normally to the bus (a maximum of one
ADS# every three clocks), BNR# must be sampled inactive on two consecutive sample points.
The existence of the throttled state enables one transaction to be sent to the bus every time BNR#
is sampled deasserted. When the processor is in the throttled state, one transaction can be driven
to the bus. The throttled state is a temporary state.
4.1.3.2.2.BNR# Sampling
BNR# is deasserted with RESET# and BINIT#. After RESET#, BNR# is first sampled 2 clocks
after RESET# is sampled deasserted. After BINIT#, BNR# is first sampled 4 clocks after
BINIT# is sampled asserted. BNR# is a wired-OR signal and must not be driven active for two
consecutive clocks, if it is asserted in one clock, it must be deasserted in the next clock.
BNR# has two sampling modes. It is sampled every other clock while in the stalled or throttled
state, and it is sampled in the third clock after ADS# is sampled asserted in the free state.
BNR# must be driven active only during a valid sampling window and should be deasserted in
the following clock. Bus agents must ignore BNR# in the clock after a valid sampling window.
4.1.4.Arbitration Protocol Description
This section describes the arbitration protocol using examples. For reference, Section 4.1.5.,
“Symmetric Agent Arbitration Protocol Rules” through Section 4.1.7., “Bus Lock Protocol
Rules” list the rules.
4.1.4.1.SYMMETRIC ARBITRATION OF A SINGLE AGENT AFTER RESET#
Figure 4-2 illustrates bus arbitration initiated after a reset sequence. BREQ[3:0]#, BPRI#,
LOCK#, and BNR# must be deasserted during RESET#. (BREQ0# is asserted 2 clocks before
RESET# is deasserted for initialization reasons as described in Section 4.1.2., “Bus Signals”.)
Symmetric agents can begin arbitration after BIST and MP initialization by driving the
BREQ[3:0]# signals. Once ownership is obtained, the symmetric owner can park on the bus as
long as no other symmetric agent is requesting it. The symmetric owner can voluntarily release
the bus to idle.
4-5
BUS PROTOCOL
3
I
6
7
8910 11
3
3
3
331101111
I
I
IIBBBBBI
II
I
12
B
13
1
15
14
16
17
CLK
BREQ0#
BREQ1#
BREQ2#
BREQ3#
BPRI#
RESET#
{rotating id}
{ownership}
1 2345
--
3
I
3
I
--
Figure 4-2. Symmetric Arbitration of a Single Agent After RESET#
RESET# is asserted in T1, which is observed by all agents in T2. This signal forces all agents to
initialize their internal states and bus signals . In T3 or T4, al l agents deassert their arbi tration
request signals BREQ[3:0]#, BPRI# and arbitration modifier signals BNR# and LOCK#. The
symmetric agents reset the ownership state to idle and the Rotating ID to three (so that bus agent
0 has the highest symmetric priority after RESET# is deasserted).
In T9, after BIST and MP initialization, agent 1 asserts BREQ1# to arbitrate for the bus. In T10,
all agents observe acti ve BREQ1# and inactiv e BREQ[0,2,3]#. During T10, all agents determine
that agent 1 is the only symmetric agent arbitrating for the b us and therefore h as the highest priority. As a result, in T11, all agents update their Rotating ID to “1”, the Agent ID of the new
symmetric owner and its ownership state to busy, indicating that the bus is busy.
Starting from T10, agent 1 continually monitors BREQ[0,2,3]# to determine if it can park on the
bus. Since BREQ[0,2,3]# are observed inacti ve, it continues to maintain bus o wnership by keeping BREQ1# asserted.
In T16, agent 1 voluntarily deasserts BREQ1# to release bus ownership, which is observed by
all agents in T17. In T18 all agents up date the ownership state from busy to idle. This action
reduces the arbitration latency of a new symmetric agent to two clocks on the next arbitration
event .
4-6
BUS PROTOCOL
4.1.4.2.SIGNAL DEASSERTION AFTER BUS RESET
Figure 4-3 illustrates how signals are deasserted after a bus reset. This relaxed deassertion protocol gives all bu s agents time to initialize. Since agents must deassert bus signals in response
to both BINIT# and RESET#, agents will respond to both reset assertions in the same fashion.
1 2345
CLK
BINIT#
BNR#
wire-or signals
other signals
Figure 4-3. Signal Deassertion After Bus Reset
On observation of the start of the reset event, all bus s ignals must be deasserted as indicated in
Figure 4-3. This event is the deasserted to asserted transition of RESET# or BINIT#. In T1 the
first agent asserts BINIT#. In T2 all agents sample RESET# or BINIT# active. In response to
observing BINIT# active in T2 any agent driving BINIT# from the first or second clock must
deassert BINIT# in T4 (see Chapter 8, Data Integrity for details on the BINIT# protocol). Also
in T4, at the latest, all agents must deassert the wired-or contr ol signals HIT#, HITM#, AER R#,
BERR# and BNR#.
In T5, BINIT#, BNR#, HIT#, HITM#, AERR# and BERR# may have invalid signal level due
to wired-or glitches. T5 is the latest that an agent can deassert all other non wired- or b us signals.
In T6 all signals should have a valid inactive level.
All bus signals are sampled two clocks after the end of the reset event. This event for RESET#
is sampling the asserted to deasserted transition. For BINIT#, this event is the fourth clock of
BINIT# assertion. BNR# must be asserted in the clock after the end of reset event, if the agent
intends to block ADS#.
All bus drivers must be aware of potential wired-or glitches due to power on configuration. If a
signal could be driven due to power on configuration, a driver must wait one additional cycle
after the end of the reset event before the signal can be asserted for normal operation.
4-7
BUS PROTOCOL
4.1.4.3.DELAY OF TRANSACTION GENERATION AFTER RESET
Figure 4-4 illustrates how transactions can be prevented from being issued to the bus after reset
in order to give all bus agents time to initialize. Note that symmetric arbitration is not affected
by the state of BNR#.
12
1
T
T
F
16
15
FFFSSSSS
17
CLK
BREQ0#
BREQ1#
BREQ2#
BREQ3#
BPRI#
BNR#
RESET#
{rotating id}
{ownership}
{request stall
state}
1 234568910 111314
--333311101111
--II I
3
7
3
I
IBBB BBI
B
B
S
Figure 4-4. Delay of Transaction Generation After Reset
Figure 4 -4 is iden tical to Figu re 4-2 except that BNR# is sampled as serted at its first sampling
point in T8. This keeps the request stall state in the stalled state(S) where no transactions are
allowed to be generated. Note that this does not affect the arbitration event starting with
BREQ1# assertion in T7. Agent 1 wi ns symmetric o wnership in T8 , ev en though no t ransactions
may be generated.
BNR# is sampled deasserted in its next two sampling points and the request stall state transitions
through the throttled state(T) in T11 to the free state(F) in T13. Transactions can be issued by
agent 1 in any clock starting from T11 through T15.
4-8
BUS PROTOCOL
4.1.4.4.SYMMETRIC ARBITRATION WITH NO LOCK#
Figure 4-5 illustrates arbitration between two or more symmetric agents while LOCK# and
BPRI# stay inactive. Because LOCK# a nd BPRI# rem ain inacti v e, b us ownership is determined
based on a Rotating ID and bus o wne rship state. The symmetric ag ent that wins the b us releases
it to the other agent as s oon as p ossib le (the Pen tium Pro process or limit s it to one tran saction,
unless the outstanding operation is locked). Th e symm etric agen t may re -arbitrate one clock after releasing the bus. Also note th at when a symmetric agent n issues a transaction to t he bus,
BREQn# must stay asserted until the clock in which ADS# is asserted.
1 23456789 10111213141516
CLK
BREQ0#
BREQ1#
BREQ2#
BREQ3#
BPRI#
BNR#
LOCK#
ADS#
{REQUEST}
{rotating id}
{ownership}
0a 1a 2a
333 001 1 1 2000002
I
IBBBBBBB
I
2
B
0b
BB BB
B
Figure 4-5. Symmetric Bus Arbitration with no LOCK#
In T1, all arbitration requests BREQ[3:0]# and BPRI# are inactive. The bus is not stalled by
BNR#. The Rotating ID is 3 and bus ownership state is idle(I). Hence, the round-robin arbitration priority is 0,1,2,3.
In T2, agent 0 and agent 1 activate BREQ0# and BREQ1# respectively to arbitrate for the bu s.
In T3, all agents observe inactive BREQ[3:2]# and active BREQ[1:0]#. Since the Rotating ID
is 3, during T3, all agents determine that agent 0 has the highest prio rity and is the ne xt symmetric owner. In T4, all agents update the Rotating ID to zero and the bus ownership state to
busy(B).
4-9
BUS PROTOCOL
Since BPRI# is observed inactive in T3 and the bus is not stalled, in T4, agent 0 can begin a ne w
Request Phase. (If BPRI# has been asserted in T3, the arbitration event, the updating of the Rotating ID, and ownership states would no t have been affected. However, agent 0 would not be
able to drive a transaction in T4). In T4, agent 0 initiates request phase 0a.
In response to active BREQ1# observed in T3, agent 0 deasserts BREQ0# in T4 to releas e bus
ownership. Since it has another internal request, it immediately reasserts BREQ0# after one
clock in T5.
In T5, all symmetric agents observe BREQ0# deassertion, the release of bus ownership by the
current symmetric owner. During T5, all symmetric agents recognize that agent 1 now remains
the only symmetric agent arbitrating for the bus. In T6, they update the Rotating ID to 1. The
ownership state remains busy.
Agent 1 assumes bus ownership in T6 and generates request phase 1a in T7 (three cycles from
request 0a). In response to active BR EQ0# observed in T5, agent 1 deasserts BREQ1# in T7
along with the first clock of the Request Phase and releases symmetric ownership. Meanwhile,
agent 2 asserts BREQ2# to arbitrate for the bus. In T8, all agents observe inactiv e BREQ1#, the
release of ownership by the current symmetric owner. Since the Rotating ID is one, and
BREQ0#, BREQ2# are active, all agents determine that agent 2 is the next symmetric owner. In
T9, all agents update the Rotating ID to 2. The ownership state remains busy.
In T10, (three cycles from request 1a) agent 2 drives request 2a. In response to active BREQ0#
observed in T9, agent 2 deasserts BREQ2# in T10. In T11 all agents observe inactive BREQ2#
and active BREQ0#. During T11, they recognize that agent 0 is the only symmetric agent arbitrating for the bus. In T12, all agents update the Rotating ID to 0. The ownership state remain s
busy.
In T12, agent 0 assumes bus ownership. In T13 agent 0 initiates request 0b (three cycles from
request 2a). Because no other agent has requested the bus, agent 0 parks on the bus by keeping
its BREQ0# signal active.
4.1.4.5.SYMMETRIC BUS ARBITRATION WITH NO TRANSACTION
GENERATION
Figure 4-6 is a modification of Figure 4-5 to illustrate what happens if an agent n asserts
BREQn#, but does not drive a transaction. Note that once bus ownership is requested by an
agent by asserting its BREQn# signal, BREQn# must n ot be deasserted un til bus ownership is
gained by agent n. Bus agent n need not drive a transaction, however bus ownership must be
acquired. Notice that since transaction 2a is not driven that transaction 0b can be driven sooner
than it was in Figure 4-5.
Figure 4-6. Symmetric Arbitration with no Transaction Generation
This figure is the same as Figure 4-5 up until T9.
In T9, the clock that bus agent 2 wins bus ownership, bus agent 2 deasserts BREQ2# because
the need to drive the transaction was removed (for example, on the Pentium Pro processor, if a
transaction is pending to writeback a replaced cach e line and it gets sno oped, HITM# will be
asserted and the line will be written out as an implicit writeback. The pending transaction to
writeback the line gets cancelled).
In T10, all agents observe an inactive BREQ2# and an acti v e BREQ0#. During T10 the y recognize that agent 0 is the only symmetric agent arbitrating for the bus. In T11, all agents update
the Rotating ID to 0. The ownership remains busy and agent 0 initiates request 0b. Because no
other agent has requested the bus, agent 0 parks on the bus by keeping its BREQ0# signal activ e.
4.1.4.6.BUS EXCHANGE AMONG SYMMETRIC AND PRIORITY AGENTS
WITH NO LOCK#
Figure 4-7 illustrates bus exchange between a priority agent and two symmetric agents. A symmetric agent relinquishes ph ysical b us o wnershi p to a priority agent as soon as pos sible. A maximum of one unlocked ADS# can be ge nerated b y the current symmetr ic b us owner in the clock
after BPRI# is asserted because BPRI# has not yet been observed. Note that the symmetric bus
owner (Rotating ID) does not change due to the asser tion of BPRI#. BPRI# do es not affect symmetric agent arbitration, or the symmetric bus owner. Finally, note that in this example BREQ0#
must remain asserted until T12 because transaction 0b has not yet been dri v en. An agent can not
drive a transaction unless it owns the bus in the clock in which ADS# is to be driven for that
transaction.
4-11
BUS PROTOCOL
1 2345678910111213141516
CLK
BREQ0#
BREQ1#
BREQ2#
BREQ3#
BPRI#
LOCK#
ADS#
{REQUEST}
{rotating id}0
000 00 0 00 0000011
0a
I/O a
I/O b
0b
Figure 4-7. Bus Exchange Among Symmetric and Priority Agent with no LOCK#
In Figure 4-7, before T1, agent 0 owns the bus. The Rotating ID is zero, the ownership state is
busy.
In T3, the priority agent asserts BPRI# to request bus ownership. In T4, agent 0, the current own er, issues its last request 0a. In T4, all symmetric agents observ e BPRI# acti ve, and guarantee no
new unlocked request generation starting in T5.
In T3, the priority agent observes inactive ADS# and inactive LOCK# and determines that it
may not gain request bus ownership in T5 because the current request bus owner might issue
one last request in T4. In T5, the pr iority agent o bserves inactive LOCK# and determines that it
owns the bus and may begin issuing requ ests s tar ting in T7, f our cl ock s fro m BP RI# ass ertion
and three clocks from previous request generation.
The priority agent issues two requests, I/Oa, and I/Ob, and continues to assert BPRI# through
T10. In T10, the priority agent deasserts BPRI# to release bus ownershi p back to the symmetric
agents. In T10, agent 1 asserts BREQ1# to arbitrate for the bus.
In T11, agent 0, the current symmetric owner observes inactive BPRI# and initiates request 0b
in T13 (three clocks from previous request.) In response to active BREQ1#, agent 0 deasserts
BREQ0# in T13 to release symmetric ownership. In T14 all symmetric agents observe inactive
BREQ0#, the release of ownership by the current symmetric owner. Since BREQ1# is the only
active bus request they assign agent 1 as the next symmetric owner. In T15 symmetric agents
update the Rotating ID to one the Agent ID of the new symmetric owner.
4-12
BUS PROTOCOL
4.1.4.7.SYMMETRIC AND PRIORITY BUS EXCHANGE DURING LOCK#
Figure 4-8 illustrates an ownership request made by both a symmetric and a priority agent du ring an ongoing indivisible sequence by a symmetric o wner. When this is the case, LOCK# takes
priority over BPRI#. That is, the symmetric bus owner does not give up the bus to the priority
agent while it is driving an indi visible locked operation . Note that bu s agent 1 can ho ld bu s ownership even though BPRI# is asserted. Like the BREQ[3:0]# signals, if the priority agent is going
to issue a transaction, BPRI# must not be driven inactive until the clock in which ADS# is driven
asserted.
1 23456789 10111213141516
CLK
BREQ0#
BREQ1#
BREQ2#
BREQ3#
BPRI#
LOCK#
ADS#
{REQUEST}
{rotating id}
0a
000 00 0 0 0 00 001111
0b
I/Oa
1a
Figure 4-8. Symmetric and Priority Bus Exchange During LOCK#
Before T1, agent 0 owns the bus. In T1, agent 0 initiates the first transaction in a bus locked operation by asserting LOCK# along with request 0a. Also in T1, the priority agent and agent 1
assert BPRI# and BREQ1# respectively to arbitrate for the bus. Agent 0 does not deassert
BREQ0# or LOCK# since it is in th e middle of a bus locked operation.
In T7, agent 0 initiates the last transaction in the bus locked operation. At the request’s successful completion the indivisibl e sequence is complete an d agent 0 deasserts L OCK# in T11. Since
BREQ1# is observed active in T10, agent 0 also deasserts BREQ0# in T11 to release symmetric
ownership.
The deassertion of LOCK# is observed by the priority agent in T12 and it begins new-request
generation from T13. Th e deasser tion of BREQ0# is observ ed b y all s ymmetri c agents and the y
assign the symmetric ownership to agent 1, the agent with active bus request. In T13, all symmetric agents update the Rotating ID to one, the Ag ent ID of th e new symmetric owner.
4-13
BUS PROTOCOL
Since agent 1 observe d active BPRI# in T12, it guarantees n o ne w requ est g enerati on be gi nning
T13. In T13, the priority agent deasserts BPRI#. In T15, three clocks from the previous request
and at least two clocks from BPRI# deassertion agent 1, the current symmetric owner issues
request 1a.
4.1.4.8.BNR# SAMPLING
This section illustrates how BNR# is sampled by all agents, and how the stall protocol is implemented. Figure 4-9 illustrates BNR# samp ling as it begins aft er the process or is br oug ht out of
reset. Figure 4-10 illustrates how BNR# is sampled once the stall protocol state machine
reac hes th e fre e st ate. Section 4.1.3.2., “Request Stall Protocol” may be useful as reference
when reading this section.
CLK
RESET#
BINIT#
ADS#
BNR#
{request stall
state}
1 23456789 101112
- - S SSS SSSS SSTTSS
13 14 15 16
17 18 19
TT F
Figure 4-9. BNR# Sampling After RESET#
RESET# is asserted in T1, and observed by all agents in T2. In T3 or T4, BNR# must be deasserted and the request stall state is initialized to the stalled state.
In T5, RESET# is driven inactive, and in T6, RESET# is sampled inactive. Any agent that requires more time to initialize its bus unit logic after reset is allowed to delay transaction generation by asserting BNR# in T7. In T7, the clock after RESET# is sampled inactive, BNR# is
driven to a valid level. In T8, two clocks after RESET# is sampled inactive, BNR# is sampled
active, causing the processor to remain in the stalled state in T9.
Because the processor is in the stalled state, BNR# is sampled ev er y 2 clocks. B NR# is sampled
asserted again in T10, so the state remains stalled. In T12, BNR# is sampled inactive. In T13,
the request stall state transitions to the throttled state. One transaction can b e issued t o the bus
in the throttled state, so ADS# is driven active in T13. In the throttled state, BNR# continues to
be sampled every other clock.
4-14
BUS PROTOCOL
In T14, BNR# is again sampled asserted, so the state transitions to stalled in T15 and n o further
transactions are issued. In T16, BNR# is sampled deasserted, which causes the state machine to
transition to throttled in T17. In T18, BNR is again samp led deasserted, which transi tions the
state machine to free in T19. BNR# is not sampled again until after ADS#, RESET#, or BINIT#.
A transaction may be issued in T17 or any time after.
Once the request stall state moves into the free state, BNR# sampling no longer occurs
every other clock, it occurs 3 clocks after ADS# is driven asserted. Figure 4-10 illustrates
this occurrence.
1 23456789 10111213141516
CLK
RESET#
BINIT#
ADS#
BNR#
{request stall
state}
TTS STTFFFF FFSSTT
Figure 4-10. BNR# Sampling After ADS#
In T1, the request stall state is in the throttled state and a transaction is issued. BNR# is sampled
every other clock. BNR# is sampled asserted in T2, so the request-stall state transitions to the
stall state in T3 and no further transactions are issued. BNR# sampling continues every other
clock.
In T4, BNR# is sampled deasserted, so the throttled state is entered again in T5, and a transaction
is issued. In T6, BNR# is sampled deasserted again, so the request-stall state machine moves
into the free state in T7. BNR# sampling changes to the 3rd clock after ADS# is sampled active.
In T8 (3 clocks after the last ADS# is driven), another Request Phase is driven. In T9, 3 clocks
after the last ADS# is sampled active, BNR# is again sampled. Because BNR# is sampled deasserted, the state remains free in T10. ADS# could have b een dr iven asserted in T11, but a transaction was not internally pending in time, so a new transaction is driven to the bus in T12.
BNR# is sampled again in T12 (3 clocks after the last ADS# was sampled active). BNR# is sampled asserted, so in T13, the request stall state transitions to the stalled state, and BNR# sampling
returns to every other clock. Note that the ADS# driven in T12 is the last time a transaction can
be driven to the bus after BNR# is sampled active.
In T14, BNR# is sampled deasserted so the req uest stall state transiti ons to throttl ed in T1 5. In
T16, BNR# is again sampled deasserted, so the state transitions to free in T17 (not shown).
4-15
BUS PROTOCOL
4.1.5.Symmetric Agent Arbitration Protocol Rules
4.1.5.1.RESET CONDITIONS
On observation of acti ve RES ET# or BINIT#, all BREQ[3:0]# signals must be deasserted in one
or two clocks. On observation of active AERR# (with AERR# observation enabled), all
BREQ[3:0]# signals must be deasserted in the next clock. All agents also re-initialize Rotating
ID to three and ownership state to idle. Based on this situation, the new arbitration priority is
0,1,2,3 and there is no current symmetric owner.
When a reset condition is generated by the activation of BINIT#, BREQn# must remain deasserted until 4 clocks after BINIT# is driven inactive. The first BREQ# sample point is 4 clocks
after BINIT# is sampled inactive.
When the reset conditi on is genera ted b y the activation of RESET#, BREQn # as dr i v en b y s ymmetric agents must remain deasserted until 2 clocks after RESET# is driven inactive. The first
BREQ# sample point is 2 clocks after RESET# is sampled inactive. For po wer-on configuration,
the system interface logic must assert BREQ0# for at least two clocks before the c lock in which
RESET# is deasserted. BREQ0# must be deasserted by the system interface logic in the clock
after RESET# is sampled deasserted. Agent 0 must delay BREQ0# assertion for a minimum
of three clocks after the clock in which R ESET# is deasserted to guarantee wire-or glitch
free operation.
When a reset condition is generated by AERR#, all agents except for a symmetric ow ner that
has issued the second or subsequent transaction of a bus-locked operation mus t keep BREQn#
inactive for a minimum of four clocks. The bus o wner n that has is sued the second or subs equent
transaction of bus locked operation must activate its BREQn# two clocks from inactive
BREQn#. This approach ensures that the locked operation remains indivisible.
4.1.5.2.BUS REQUEST ASSER TION
A symmetric agent n can activate BREQn# to arbitrate f or the bus provided the reset conditions
described in Section 4.1.5.1., “Reset Conditions” are satisfied. Once activated, BR EQn# must
remain active until the agent becomes the symmetric owner. Becoming the symmetric owner is
a precondition to entering the Request Phase.
4.1.5.3.OWNERSHIP FROM IDLE STATE
When the ownership state is idle, a new arbitration event begins with activation of at least one
BREQ[3:0]#. During the next c lock, all s ymme tric ag ents assi gn owners hip t o the highe st priority symmetric agent with active bus request. In the following clock, all sy mmetric agents update the Rotating ID to the new symmetric owner Agent ID and the ownership state to b usy. The
new symmetric owner may enter the Request Phase as early as the clock the Rotating ID is
updated.
4-16
BUS PROTOCOL
4.1.5.4.OWNERSHIP FROM BUSY STATE
When the ownership state is busy, the next arbitration event begins with the deassertion of
BREQn# by the current symm etric owner.
4.1.5.4.1.Bus Parking and Release with a Single Bus Request
When the ownership state is busy, bus parking is an accepted mode of operation. The symmetric
owner can retain ownership even if it has no pending requests, provided no other symmetric
agent has an active arbitration request.
The symmetric owner “n” may eventually deassert BREQn# to release symmetric ownership
even when other requests are not active. When the owner deasserts BREQn#, all agents update
the ownership state to idle, but maintain the same Rotating ID.
4.1.5.4.2.Bus Exchange with Multiple Bus Requests
When the ownership state is busy, on observing at least one other BREQm# active, the current
symmetric owner n can hold the bus for back-to-back transactions by simply keeping BR EQn#
active. This mechanism must be used for bus-lock operations and can be used for unlocked operations, with care to prevent other symmetric agents from gaining o wnership. (The Pentium Pro
processor limits the number of additional unlocked requests to one.)
A new arbitration event begins with deactivation of BREQn#. On observing release of ownership by the current symmetric owner, all agents assign the ownership to the highest priority symmetric agent arbitrating for the bus. In the following clock, all agents update the Rotating ID to
the new symmetric owner Agent ID and maintain bus ownership state as busy.
A symmetric agent n shall deassert BREQn# for a minimum of one clock.
4.1.6.Priority Agent Arbitration Protocol Rules
4.1.6.1.RESET CONDITIONS
On observation of active RESET# or BINIT#, BPRI# must be deasserted in one or two clocks.
On observation of active AERR# (with AERR# observation enabled), BPRI# must be deasserted in the next clock.
When the reset condition is generated by the activation of BINIT#, BPRI# must remain deas serted until 4 clocks after BINIT# is d riven inactive. The first BPRI# sample point is 4 clocks
after BINIT# is sampled inactive.
When reset condition is generated by AERR# , the prio rity agent m ust keep BPR I# inactive for
a minimum of four clocks unless it has issued the second or subsequent transaction of a locked
operation. The priority owner that has issued the second or subsequent transaction of a locked
operation must activate its BPRI# two clocks from inacti ve BPR I#. This ensures that the locked
operation remains indivisible.
4-17
BUS PROTOCOL
4.1.6.2.BUS REQUEST ASSER TION
The priority agent can activate BPRI# to seek bus ownership provi ded the reset conditions described in Section 4.1.6.1., “Reset Conditions” are satisfied. BPRI# can be deactivated at any
time.
On observing active BPRI#, all symmetric agents guarantee no new non-locked requests are
generated.
4.1.6.3.BUS EXCHANGE FROM AN UNLOCKED BUS
If LOCK# is observed inactive in two clocks after BPRI# is driven asserted, the priority agent
has permission to drive ADS# four clocks after BPRI# assertion. The prio rity agent can further
reduce its arbitration latency by observing the bus protocol and determining that no other agent
could drive a request. For example, Arbitration latency can be reduced by to two clocks by observing ADS# active and LOCK# inactive on the same clock BPRI# is driven asserted or it can
be reduced to three clocks by observing ADS# active and LOCK# inactive in the clock after
BPRI# is driven asserted.
4.1.6.4.BUS RELEASE
The priority agent can deassert BPRI# and release bus ownership in the same cycl e that it generates its last request. It can keep BPRI# active even after the last request generation prov ided
it can guarantee forward progress of the symmetric agents . When deasserted, BPRI# must stay
inactive for a minimum of two clocks.
4.1.7.Bus Lock Protocol Rules
4.1.7.1.BUS OWNERSHIP EXCHANGE FROM A LOCKED BUS
The current symmetric owner n can retain ownership of the bus by keeping the LOCK# signal
active (e ven if BPRI# is asserted). This mechanism is used during bus lock operations. After the
lock operation is complete, the symmetric owner deasserts LOCK# and guarantees no new request generation until BPRI# is observed inactive.
On asserting BPRI#, the priority a gent observes LOCK# for the next two clocks to monitor
request bu s activity. If the current symmetric ow ner is performing locked reques ts ( LOC K#
active), the priority agent must wait until LOCK# is observed inactive.
4.2.REQUEST PHASE
After completion of the Arbitration Phase, an agent is allowed to enter the Request Phase. This
phase is used to initiate new transactions on the bus, and lasts for two consecutive clocks. During
the first clock, the information required to snoop a transaction and start a memory access becomes available. During the next clock, complete information required for the entire transaction
becomes available.
4-18
BUS PROTOCOL
4.2.1.Bus Signals
The Request Phase bus signals are ADS#, A[35:3]#, REQa[4:0]#, REQb[4:0]#, ATTR[7:0]#,
DID[7:0]#, BE[7:0]#, EXF[4:0]#, AP[1:0]#, and RP#. In ad dition, th e LOCK# si gnal is d riven
during this phase. Request Phase signals are bused among all agents. Since information is carried during two clocks, the first clock is identif ied with the suff ix a and the second clock is identified with the suffix b. For example, RPa# and RPb#.
4.2.2.Request Phase Protocol Description
The Request Phase occurs when a transac tion is actuall y issued to the bus. ADS# is asserted
and the transaction information is driven. Figure 4-11 shows the Request Phase of several
transactions.
1 23456789 10111213141516
CLK
BREQ0#
BPRI#
BNR#
ADS#
A[35:3]#
REQ[4:0]#
{.rcnt}
0 0 00 1 11 2 22 77 7 7 88
Figure 4-11. Request Generation Phase
In T1, only one bus agent (agent 0) dri v es a request for the bus . In T2, BREQ[3:0 ]#, BPRI# and
BNR# are sampled and it is determined that BREQ0# becomes the bus owner in T3.
In T3, agent 0 drives a transaction by asserting ADS#. Also in T3, A[35:3]#, REQa[4:0]#,
AP[1:0]# and RP# are driv en valid. REQa0# indicates that the transaction is a write transaction.
In T4, the second clock of the Request Phase, the rest of the transaction information is driven
out on the following signals: REQb[4:0]#, ATTR[7:0]#, DID[7:0]#, BE[7:0]#, and EXF[4:0]#.
AP[1:0]#, and RP# remain valid in this clock.
When a transaction is driven to the bus, the internal state must be updated in the clock after
ADS# is observed asserted. Therefore, in T5 the intern al request count {rcn t} is incremented by
one.
4-19
BUS PROTOCOL
In T6, agent 0 issues another transaction, and in T8, the internal state is updated appropriately.
In the series of clocks indicated in the diagram by T10, fi v e more transactions become outstand -
ing (this status is indicated by the {rcnt}). In T13, the 8th transaction is issued as indicated on
the bus by ADS# assertion in T13. In T15, the {rcnt} is incremented to 8, the highest p ossible
value for {rcnt}. No additional transactions can be issued until a response is given for
transaction 0.
4.2.3.Request Phase Protocol Rules
4.2.3.1.REQUEST GENERATION
The Request Phase is always one clock of active ADS# follo wed by one clock of inactiv e ADS#.
There is always an idle clock between request phases for bus turnaround. Address, command,
and parity information is transferred on the first two clocks on pins A[35:3]#, REQ[4 :0]#, and
AP[1:0]# and RP#. Refer to Chapter 3, Bus Overview for a description of which s ignals are dri ven on these pins. Although LOCK# is part of the Arbitration Phase, it is driven during the first
clock of the Request Phase. AP[1:0]# and RP# are valid during a valid Request Phase.
On observation of a ne w r equest, the transaction counts including {rcn t} and {scnt} are upd ated
with the new transaction.
4.2.3.2.REQUEST PHASE QUALIFIERS
The Request Phase for a new transaction may be initiated when:
The agent contains one or more pending reques ts.
•
The agent owns the bu s as described in the Arbitration Phase section.
•
The internal request count state is less than the maximum number of entries in the IOQ.
•
The bus is not stalled. In other words, the Request Stall state (as described in Section 4.1.,
•
“Arbitration Phase”) is free or throttled.
The preceding transaction’s Request Phase is complete. In other words, ADS# is observed
•
inactive on the previous clock.
4.3.ERROR PHASE
Receiving agents use the Error Phase to indicate parity errors in Request Phase. Parity is
checked during valid R equest Phase (One clock active ADS# followed by one clock inactive
ADS#) on AP[1:0]# and RP# signals.
If the request parity is enabled in the power -on configuration as described in Chapter 9, Config-uration, then the a gent checks parity in the two clocks. If transaction cancellation due to AERR#
is enabled (AERR# observation) in the power-on-con figuration and AERR# is observed active
4-20
BUS PROTOCOL
during Error Phase, then all agents remove the transaction from their In-order Queue, cancel
subsequent transaction phases, remove bus requests, and reset their b us arbiters. Reset of the bus
arbiters enables errors in the Arbitration Phase to be corrected. The transaction may be retried.
4.3.1.Bus Signals
The only signal driven in this state is AERR#. AERR# is bused among all agents.
4.4.SNOOP PHASE
In the Snoop Phase, all caching agents dri ve their snoo p results and par ticipate in coheren cy resolution. The agents generate internal sno op requests for all memory transactions. An agen t is
also allowed to snoop its own bus requests and participate in the Snoop Phase along with other
bus agents. The Pentium Pro processor snoops its own transactions. The snoop results are driven
on HIT# and HITM# signals in this phase.
In addition, during the Snoop Phase, the memory agent or I/O agent drives DEFER# to indicate
whether the transaction is committed for completion immediately or if the commitment is
deferred.
The results of the Snoop Phase are used to determine the final state of the cache line in all agents
and which agent is responsible for completion of Data Phase and Response Phase of the current
transaction.
4.4.1.Snoop Phase Bus Signals
The bus signals driven in this phase are HIT#, HITM# and DEFER#. These signals are bused
among all agents. The requesting agent uses the HIT# signa l to determine th e permissible cache
state of the line. The HITM# signal is used to indicate what agent will provide the requested data. The DEFER# signal indicates whether the transaction will be committed for completion immediately or if the commitment is deferred.
The results of combinations of HIT# and HITM# signal encodings during a valid Snoop Phase
is shown in Table 4-1.
Table 4-1. HIT# and HITM# During Snoop Phase
Snoop ResultHIT#HITM#
CLEAN0
MODIFIED01
SHARED10
STALL11
NOTE:
1. 0 indicates inactive, 1 indicates active.
1
0
4-21
BUS PROTOCOL
The CLEAN result means that at the end of the transaction, no other caching agent will retain
the addressed line in its cache, and that the requesting agent can store the cache line in any state
(Modified, Exclusive, Shared or Invalid).
The MODIFIED result means that the addressed line is in the modified state in an agent on the
Pentium Pro processor bus. The agent that “owns” the line will writeback th e line to memory.
The requesting agent will pick the line off the bus as it is written back.
The SHARED result means that addressed line is valid in the cache of another agent on the Pentium Pro processor bus, but that it is not modified. The requesting agent therefore can store the
cache line in the shared state only.
The STALL result means that the all agents on the Pentium Pro processor bus are not yet ready
to provide a snoop result, and that the Snoop Phase will be stalled for another 2 clocks. Any
agent on the bus may use the STALL state on any transaction as a stall mechanism.
4.4.2.Snoop Phase Protocol Description
This section describes the Snoop Phase using examples.
4.4.2.1.NORMAL SNOOP PHASE
Figure 4-12 illustrates a four-cl ock Snoop Resul t Phase for pipelin ed requests. The snoop results
are driven four clocks after ADS# is asserted and at least three clocks from the Snoop Phase of
a previous transaction. Note that no snoop results are stalled and the maximum request generation rate is one request every three clocks.
4-22
CLK
ADS#
{REQUEST}
AERR#
HIT#, HITM#,
DEFER#
{scnt}
1 2456789 10111213141516
0001112112111000
3
1
2 3
1
2 3
Figure 4-12. Four-Clock Snoop Phase
BUS PROTOCOL
In T1, there are no transactions outstanding on the bus and {scnt} is 0. In T2, transaction 1 is
issued. In T4, as a result of the transaction driven in T2, {scnt} is incremented.
In T5, transaction 2 is issued. In T6, which is four clocks after the correspondi ng ADS# in T2,
the snoop results for transaction 1 are driven. In T7, {scnt} is incremented indicating that there
are two transactions on the bus that ha ve not completed the Snoop Phase. Also in T7, the snoop
results for transaction 1 are observed. As a result, in T8, {scnt} is decremented.
In T8, the third transaction is issued. Two clocks later in T10, {scnt} is incremented. In T11,
{scnt} is decremented because the snoop results from transaction 2 are observed in T10.
In T13, the snoop results for transaction 3 are obser ved and in T1 4 {scnt} is again decr emented.
4.4.2.2.STALLED SNOOP PHASE
Figure 4-13 illustrates how a slo wer snooping agent can delay the Snoop Phase if it is unable to
deliver valid snoop results within four clocks after ADS# is asserted . The figure also illustrates
that the snoop phase of subsequent trasactions are also stalled and occur two clocks late due to
the stall of transaction one’s snoop phase.
1 23456789 10111213141516
CLK
ADS#
{REQUEST}
1
1 2
2
3
3
AERR#
HIT#
HITM#
DEFER#
{scnt}
1
0001112222221110
1
1
2
2
2
3
3
3
Figure 4-13. Snoop Phase Stall Due to a Slower Agent
Transactions 1, 2 and 3 are i nitiated wi th ADS# activation in T2, T5, and T8.
The Snoop Phase for transaction 1 begins in T6 four clocks from ADS#. All agents capable of
driving valid snoop response in four clocks drive appropriate levels on the snoop signals HIT#,
HITM#, and DEFER#. A slo wer agent t hat is unable t o generate a snoop respon se in fo ur clocks
asserts both HIT# and HI TM# t o get her in T6 to extend the Snoop Phase. Note that if t he Snoop
4-23
BUS PROTOCOL
Phase is extended, {scnt} is not decremented. Because the Snoop Phase is extended, the value
of DEFER# is a “don’t care”.
On observing active HIT# and HITM# in T7, all agents determine that the transaction’s Snoop
Phase is extended by two additional clock s through T8. In T8, the slower snooping agent is
ready with valid snoop results and needs no additional Snoop Phase extensions. In T8, all agents
drive valid snoop results on the snoop signals. In T9, all agents observe that HIT# and HITM#
are not asserted in the same clock and determine that the valid snoop results for transaction 1 are
available on the snoop signals.
The Snoop Phase for transaction 2 begins in T11, three clocks from Snoop Phase of transact io n
1 or four clocks from Request Phas e of t rans acti on 2 , whic hever is later. Since the Snoop Phase
for transaction 2 is not extended, the Snoop Phase for transaction 2 completes in one clock.
The Snoop Phase for transact io n 3 be gins in T14, the later of three clocks from Snoop Phase of
transaction 2, and four clocks from Request Phase of transaction 3. Since the Snoop Phase for
transaction 3 is not extended, the Snoop Phase for transaction 3 completes in one clock.
For the example shown, the Snoop Phase is alw ays six clocks from the Request Phase due to the
initial Snoop Phase stall from Transaction 1. However, the maximum request generation rate is
still one request every three clocks.
4.4.3.Snoop Phase Protocol Rules
This section will list the Snoop Phase protocol rules for reference.
4.4.3.1.SNOOP PHASE RESULTS
During a valid Snoop Phase (as defined below), snoop results are presented on HIT#, HITM#,
and DEFER# signals for one clock. If the snooping agent contains a MODIFIED copy of the
cache line, then HITM# must be asserted. If the snooping agent does not assert HITM# and it
plans to retain a SHARED copy of the cache line at the end of the Snoop Phase, it must assert
HIT#. HIT# and HITM# are asserted together to indicate that the agent is requesting a STALL.
All non-memory accesses will indicate CLEAN or STALL. DEFER# must be asserted by an addressed memory or I/O agent if the agent is unable to guarantee in-o rder completion of the requested transaction.
The results of the Snoop Phase require specific behavior from the addressed and snooping
agents for future phases of the transaction. The agent asserting HITM# normally must writeback
the modified cache line. The addressed agent must accept the writeback line from the snooping
agent, merge it with any write data, and dr ive an implicit writeback response.
If HITM# is inactive, the agent asserting DEFER# must reply with a deferred or retry response
for the transaction. Only the addressed agent can assert DEFER#. The requesting agent must not
begin another order-dependent transaction until ei ther DEFER# is sampled deasserted in the
Snoop Phase, or the deferred transaction receives a successful completion via a deferred reply
or a retry .
4-24
BUS PROTOCOL
For all transactions with LOCK# inactive, HITM# active guarantees in-order completion. During unlocked transactions, HITM# overrides the assertion of DEFER#.
If DEFER# is asserted during the Snoop Phase of a locked operation, the locked o peration is prematurely aborted. During the first transaction of a locked operation, if HITM# and DEFER# are
active together, the transaction completes with cache line writeback and implicit writeback response, but the request agent must begin a ne w locked operation starting from a new Arbitration
Phase (BREQn# of the requesting agent must be deasserted if a symmetric agent iss ued the
locked operation). The assertion of DEFER# during the second or subsequent transaction of a
locked operation is a protocol violation. If DEFER# is asserted and HITM# is not asserted, a
Retry Response is driven in the Response Phase to force a retry of the entire locked operation.
4.4.3.2.VA LID SNOOP PHASE
The Snoop Phase for a transaction begins 4 clo cks after ADS# is dri ven asserted or 3 clocks after
the snoop results of the previous transaction are driven, whichever is later.
4.4.3.3.SNOOP PHASE STALL
A slow snooping agent can request a two-clock STALL in a valid Snoop Phase by activating
both HIT# and HITM#. In the case of a STALL, snoop results are sampled again 2 clocks after
the previous sample point. This process continues as long as the STALL state is sampled. When
stalling the bus, the stalling condition must be able to clear without requiring access to the bus.
4.4.3.4.SNOOP PHASE COMPLETION
If no STALL is requested during the valid Snoop Phase, the Snoop Phase is completed in the
clock after the snoop results are driven.
4.4.3.5.SNOOP RESULTS SAMPLING
Snoop Results are sampled during the valid snoop phase. Bus agents must ignore Snoop Results
in the clock after a valid sampling window.
4.5.RESPONSE PHASE
4.5.1.Response Phase Overview
A transaction enters the Response Phase when it is at the head of the In-order Queue. The agent
responsible for the response is referred to as the response agent. The agent decoded by the ad-
dress in the Request Phase determines the response agent for the transaction.
After completion of the Response Phase, the transaction is removed from the In-order Queue.
4-25
BUS PROTOCOL
4.5.1.1.BUS SIGNALS
The Response Phase signals are TRDY#, RS[2:0]#, and RSP#. These signals are bused. RSP#
provides parit y suppo rt only for RS[ 2:0]#. The t ransact ion resp onse i s encoded on t he RS[ 2:0]#
signals. TRDY# is only asserted for transactions with write or writeback data to transfer. The
response encodings are indicated in Table 4-2.
Table 4-2. Response Phase Encodings
ResponseRS2#RS1#RS0#
Idle0
Retry001
Deferred010
reserved011
Hard Failure100
No data101
Implicit Writeback110
Normal Data111
NOTE:
1. 0 indicates inactive, 1 indicates active
.
1
00
There is no single response strobe signal. The response value is Idle until the response is driven.
A response is driven when any one of RS[2:0]# is asserted.
4.5.2.Response Phase Protocol Description
The Response Phase is described in this section using examples. The rules for the Response
Phase are listed in the next section for reference.
4.5.2.1.RESPONSE FOR A TRANSACTION WITHOUT WRITE DATA
Figure 4- 14 shows several transactions that have no write or writeback data to transfer. Therefore the TRDY# signal is not asserted. The DBSY# signal is observed in this phase because if
there is read data to transfer, DBSY# must be sampled inactive before the response for transaction n can be driven (this ensures that any data transfers from transaction n-1 are complete before
the response is driven for transaction n).
4-26
CLK
ADS#
REQ0#
HITM#
TRDY#
RS[2:0]#
DBSY#
BUS PROTOCOL
1 23456789 10111213141516
1
2
3
1
23
{rcnt}
00 11122222211110
Figure 4-14. RS[2:0]# Activation with no TRDY#
Three transactions are issued in clocks T1, T4, and T7. None of these transactions have write
data to transfer as indicated by the REQa0# signal.
The Snoop Phase for each transaction indicates that no implicit writeback data will be transferred and the response agent indicated by the address will provide the transaction response and
the read data if there is any.
Because the transactions have no write or implicit writeback data, the TRDY# signal is not
asserted.
The rcnt indicates that the In-order Queue is empty. The ADS# for transaction 1 is driven in T1.
The snoop results for transaction 1 are dri v en four clocks later in T5 (observ ed in T6). Note that
the Response and Data Phases for transaction n-1 have to be complete before the response for
transaction n can be driven. Since transaction 1 is at the top of the IOQ and DBSY# is inactive
in T6, RS[2:0]# can be dri ven for transaction 1 in T7, two clocks after the snoop results are d ri ven. Transaction 1 is removed from the IOQ after T8, and transaction 2 is now at the top of the
IOQ. The rcnt is not decremented in T9 because transaction 3 w as issued in th e same clock that
transaction 1 received its response.
Transaction 2 is issued to the bus in T4 (three clocks after Transaction 1). The snoop results for
transaction 2 are driven fo ur clock s later in T8. Transaction 2 is at the top of the IOQ. RS[2:0]#
for transaction 2 is driven two clocks later in T10 because DBSY# and RS[2:0]# were sampled
deasserted in T9.
The response for transaction 3 cannot be driven two clocks after the snoop results are driven in
T11 because DBSY# is asserted in T11. DBSY# is sampled deasserted in T13 and RS[2:0]# for
transaction 3 is driven i n T14.
The response driven for each of these transactions is the Normal Data Response.
4-27
BUS PROTOCOL
4.5.2.2.WRITE DATA TRANSACTION RESPONSE
Figure 4-15 shows a transaction with a simple request initiated data transfer. A r equest initiated
data transfer means that the request agent issuing the transaction has write data to transfer. Note
that TRDY# is always asserted after the response for transaction n-1 is driven and before the
transaction response for transaction n is driven.
1 23456789
CLK
ADS#
REQ0#
HITM#
TRDY#
RS[2:0]#
DBSY#
{rcnt}
Figure 4-15. RS[2:0]# Activation with Request Initiated TRDY#
00
1
111
11
0
Before T1, the IOQ is empty. A write transaction as indicated by active ADS# and REQa0# is
issued in T1.
Since the Response Phase for the previous transaction is complete, the Response Phase for transaction 1 can begin with the assertion of TRDY# as early as T4, 3 clocks after ADS# is asserted.
In T4, DBSY# is observed inactive on the clock TRDY# is asse rted and TRDY# had previously
been inactive for 3 clocks, so the TRDY# agent is allowed to deassert TRDY# within one clock
as a special optimization. Data is driven the clock after TRDY# is sampled and the data bus is
free. TRDY# need not be deasserted until the response is driven.
The snoop results are driven in T5 and sampled in T6.
Since RS[2:0]# is deasserted in T6, TRDY# h as been asserted and deasserted, and the snoop re-
sults were observed in T6, the response for th e transaction is driven on RS[2:0]# in T7. Notice
even if TRDY# is only asserted for on e clock, th e response ma y still b e asserted when TRDY#
is deasserted (assuming snoop results have been o bserved). B ecause this is a simple write transaction the response driven is the No Data Response.
4-28
BUS PROTOCOL
4.5.2.3.IMPLICIT WRITEBACK ON A READ TRANSACTION
Figure 4-1 6 shows a read transaction with an implicit writeback. TRDY# is asserted in this operation because there is writeback data to transfer. Note that the implicit writeback response
must be asserted exactly one clock after valid TRDY# assertion is sampled. That is, TRDY# is
sampled active and DBSY# is sampled inactive.
1 23456789 101112
CLK
ADS#
REQ0#
HITM#
TRDY#
RS[2:0]#
DBSY#
{scnt}
{rcnt}
001111000000
001111111100
Figure 4-16. RS[2:0]# Activation with Snoop Initiated TRDY#
A transaction is issued in T1. The REQa0# pin indicates a read transaction, so TRDY# is as sumed not needed for this transaction.
But snoop results observed in T6 indicate that an implicit writeb ack will occur (HITM# is asserted), therefore a TRDY# assertion is needed. Since the response for the previous transaction
is complete, and no request initiated TRD Y# assertion is needed, TRDY# for the implicit writeback is asserted in T7. (TRDY# assertion due to an implicit writeback is called a snoop initiated
TRDY#.) Since DB SY# is observed inactive i n T7, TRDY# can be deasserted in one clock in
T8, but need not be deasserted until the response is driven on RS[2:0]#.
In T9, one clock after the observation of active TRDY# with inactive DBSY# for the implicit
writeback, the Implicit Writeback R espon se must be driven on RS[2:0]# and the data is driven
on the data bus. This makes the data trans fer and response behave like bot h a read (for the requesting agent) and a write (for the addressed agent).
4-29
BUS PROTOCOL
4.5.2.4.IMPLICIT WRITEBACK WITH A WRITE TRANSACTION
Figure 4-17 shows a write transaction combined with a hit to a modified line that requires an
implicit writeback. This operation has two data transfers and requires two assertions of TRDY#.
The first TRDY# is asserted by the receiver of the write data whenever it is ready to receive the
write data. Once active T RDY# and inactive DBSY# is observ ed, the first TRDY# is deasserted
to allow the second TRD Y#. The second TRDY# is asserted by the receiver whenev er it is ready
to receive the writeback data. The second TRDY# may be deasserted when active TRDY# and
inactive DBSY# is sampled or when the response is dri v en on RS[2:0]#. One clock after ob ser vation of activ e TRDY# (and inactive DBSY#) for the implicit writeback, the imp licit writeback
response is driven on RS[2:0]# at the same time data is driven for the implicit writeback.
CLK
ADS#
REQa0#
HITM#
TRDY#
RS[2:0]#
DBSY#
{rcnt}
1 23456789
1
1
0
0
1
1
1
10
1112
1
11
0
1314
0
0
0
Figure 4-17. RS[2:0]# Activation After Two TRDY# Assertions
In T1, a write transaction is issued as indicated by active ADS# and REQa0#. At this point, the
transaction appears to be a normal write transaction, so TRDY# is asserted 3 clocks later in T4.
TRDY# is deasserted in T5. Since DBSY# was observed inactive in T4, TRDY# can be deasserted in one clock as a special optimization to allow a faster implicit writeback TRDY#.
In T5, the snoop results are driven, and in T6, they are observed. In T7, TRD Y# is asserted again
for the implicit writeback. TRDY# can be asserted immediately because the TRDY# for the request initiated data transfer was already deasserted.
In T9, one clock after observation of active TRDY# with inactive DBSY# for the implicit writeback, TRDY# must be deasserted and the implicit writeback response is driven on RS[2:0]#.
Since DBSY# was observed active in T7, but inactive in T8, TRDY# is deasserted in T9.
4-30
BUS PROTOCOL
4.5.3.Response Phase Protocol Rules
4.5.3.1.REQUEST INITIATED TRDY# ASSERTION
A request initiated transaction is a transaction where the request agent has write data to transfer .
The addressed agent asserts TRDY# to indicate its ability to receive data from the request
agent intending to perform a write data operation. Request initiated TRDY# for transaction
“n” is asserted:
when the transaction has a write data transfer,
•
a minimum of 3 clocks after ADS# of transaction “n”, and
•
a minimum of 1 clock after RS[2:0]# active assertion for transaction “n-1”. (After the
•
response for transaction n-1 is driven).
4.5.3.2.SNOOP INITIATED TRDY# PROTOCOL
The response agent asserts TRDY# to indicate its ability to receiv e the modif ied cache line from
a snooping agent. Snoop Initiated TRDY# for transaction “n” is asserted when:
the transaction has an implicit writeback data transfer indicated in the Snoop Result Phase.
•
in the case of a request initiated transfer, the request initiated TRDY# was asserted and
•
then deasserted (TRDY# must be deasserted for at least one clock between the TRDY# for
the write and the TRDY# for the implicit writeback),
at least 1 clock has passed after RS[2:0]# active assertion for transaction “n-1” (after the
•
response for transaction n-1 is driven).
4.5.3.3.TRDY# DEASSERTION PROTOCOL
The agent asserting TRDY# can deassert it as soon as it can ensure that TRDY# deassertion
meets following conditions.
TRDY# may be deasserted when inactive DBSY# and active TRDY# are ob served for one
•
clock.
TRDY# can be deasserted within one clock if DBSY# was observed inactive on the clock
•
TRDY# is asserted and the deassertion is at least three clocks from previous TRDY#
deassertion.
TRDY# does not need to be deasserted until the response on RS[2:0]# is asserted.
•
TRDY# for a request initiated transfer must be deasserted to allow the TRDY# for an
•
implicit writeback.
4-31
BUS PROTOCOL
4.5.3.4.RS[2:0]# ENCODING
Valid response encodings are determined based on the snoop res ults and the following request:
Hard Failure is a valid response for all transactions and indicates transaction failure. The
•
requesting agent is required to take recovery acti on.
Implicit Writeback is a required response when HITM# is asserted during the Snoop
•
Phase. The snooping agent is required to transfer the modified cache line. The memory
agent is required to drive the response and accept the modified cache line.
Deferred Response is only allowed when DEN# is asserted in the Request Phase and
•
DEFER# (with HITM# inactive) is asserted during Snoop Phase. With the Deferred
Response, the response agent promises to complete the transaction in the future using the
Deferred Reply transaction.
Retry Response is only allowed when DEFER# (with HITM# inactive) is asserted during
•
the Snoop Phase. With the Retry Response, the response agent informs the request agent
that the transaction must be retried.
Normal Data Response is required when the REQ[4:0]# encoding in the Request Phase
•
requires a read data response and HITM# and DEFER# are both inactive during Snoop
Phase. With the Normal Data Response, the r esponse age nt is required to tran sfer read data
along with the response.
No Data Response is required when no data will be returned by the addressed agent and
•
DEFER# and HITM# are inactive during the Snoop Phase.
4.5.3.5.RS[2:0]#, RSP# PROTOCOL
The response signals are normally in idle state when not being driven active by any agent. The
response agent asserts RS[2:0]# and RSP# for one clock to indicate the type of response used
for transaction completion. In the next clock, the response agent must drive the signals inactive
to the idle state.
Response for transaction “n” is asserted when the following are true:
Snoop Phase for transaction “n” is observed.
•
RS[2:0]# for transaction “n-1” were asserted to an active response state and then sampled
•
inactive in the idle state (the response for transaction “n” is driven no sooner than three
clocks after the response for transaction “n-1”) .
If the transaction contains a write data transfer, TRDY# deassertion conditions have been
•
met.
If the transaction contains an imp licit writeback data transfer, snoop initiated TRDY# is
•
asserted for transaction “n” and TRDY# is sampled active with inactive DBSY#.
DBSY# is observed inactive if RS[2:0]# response is Normal Data Response.
•
4-32
BUS PROTOCOL
A response tha t does not require the data bus (no data response, de ferred response , retry
•
response, or hard failure response) may be driven even if DBSY# is active due to a
previous transaction.
On observation of active RS[2:0]# response, the Transaction Queues are updated and {rcnt} is
decremented.
4.6.DATA PHASE
4.6.1.Data Phase Overview
During the Data Phase, data is transferred between different bus agents. Data transfer responsibilities are negotiated between bus agents as the transaction proceeds through various phases.
Based on the Request Phase, a transaction either contains a “request-initiated” (write) data transfer, a “response-initiated” (read) data transfer, or no data transfer. On a modified hit during the
Snoop Phase, a “snoop-initiated” data transfer may be ad ded to the reque st or subs tituted from
the response in place of the “response-initiated” data transfer. On a deferred completion response in the Response Phase, “response-initiated” data transfer is deferred.
4.6.1.1.BUS SIGNALS
The bus signals driven in this phase are D[63:0]#, DEP[7:0]#, DRDY#, and DBSY#.
All Data Phase signals are bused.
4.6.2.Data Phase Protocol Description
4.6.2.1.SIMPLE WRITE TRANSFER
Figure 4-18 shows a simple write transaction (request-initiated data transfer). Note that the data
is transferred before the response is driven.
4-33
BUS PROTOCOL
21438765
CLK
ADS#
REQa0#
HITM#
TRDY#
DBSY#
D[63:0]#
DRDY#
RS[[2:0]#
9
Figure 4-18. Request Initiated Data Transfer
The write transaction is driven in T1 as indicated b y acti v e ADS# and REQa0#. TRDY# is driven 3 clocks later in T4. The No Data response is driven in T7 after inactive HITM# sampled in
T6 indicates no implicit writeback.
In the example, the data transfer only takes one clock, so DBSY# is not asserted.
TRDY# is observed active and DBSY# is observed inactive in T5. Therefore the data transfer
can begin in T6 as indicated by DRDY# assertion. Note that since DBSY# was also observed
inactive in T4, th e same clock that TRDY# was asserted, TRD Y# can b e deasserted in T6. Ref er
to Section 4.5.3.3., “TRDY# Deassertion Protocol” for further details.
RS[2:0]# is driven to No Data Response in T7, two clocks after the snoop phase.
4.6.2.2.SIMPLE READ TRANSACTION
Figure 4-19 shows a simple read transaction (response-initiated data transfer). Note that the data
transfer begins in the same clock that the response is driven on RS[2:0]#.
4-34
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.