NOTE: The IA-32 Intel Architecture Software Developer's Manual consists
of five volumes: Basic Architecture, Order Number 253665; InstructionSet Reference A-M, Order Number 253666; Instruction Set Reference N-Z,
Order Number 253667; System Programming Guide, Part 1, Order
Number 253668; System Programming Guide, Part 2, Order Number
253669. Refer to all five volumes when evaluating your design needs.
Order Number: 253668-019
March 2006
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTEL LECTUAL PROPERTY RIGHTS IS GRANTED BY
THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS,
INTEL ASSUMES NO LIABILITY WHATSOEVER , AND INTEL DI SCLAIMS ANY EXPRESS OR IMPLIED WARRANTY , RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR
OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE
SAVING, OR LIFE SUSTAINING APPLICATIONS.
Intel may make changes to specifications and product descriptions at any time, without notice.
Developers must not rely on the absence or characteristics of any features or inst ructions marke d “reserved” or “undefine d.”
Improper use of reserved or undefined features or instructions may cause unpredictable behavior or failure in developer's
software code when running on an Intel processor. Intel reserves these features or instructions for fut ur e def init ion and shal l
have no responsibility whatsoever for conflicts or incompatibilities arising from their unauthorized use.
®
The Intel
known as errata. Current characterized errata are available on request.
Hyper-Threading Technology requires a computer system with an Intel
Technology and an HT Technology enabled chipset, BIOS and operating system. Performance will vary depending on the
specific hardware and software you use. See http://www.intel.com/techtrends/technologies/hyperthreading.htm
formation including details on which processors support HT Technology.
Intel
(VMM) and for some uses, certain platform software enabled for it. Functionality, performance or other benefits wi ll
pending on hardware and software configurations. Intel
IA-32 architecture processors (e.g., Pe ntium® 4 and Pentium III processors) may cont ain de sign def ects or errors
®
Pentium® 4 processor supporting Hyper-Threading
for more in-
®
Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine mon itor
®
Virtualization Technology-enabled BIOS and VMM applications are
vary de-
currently in development.
®
Extended Memory 64 Technology (Intel® EM64T) requires a computer system with a processor, chipset, BIOS, OS,
Intel
device drivers and applications enabled for Intel EM64T. Processor will not operate (including 32-bit operation) with-
out an Intel EM64T-enabled BIOS. Performa nce will vary d epend ing on you r hard ware and software configurations. Intel
EM64T-enabled OS, BIOS, device drivers and applications may not be available. Check with your vendor for more
trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
*Other names and brands may be claimed as the property of others.
Contact your local Intel sales office or your distributor to obt ain the latest specifications and befor e placing your product order.
Copies of documents which have an ordering number and are referenced in this document, or other Intel literature, may be
obtained from:
Intel Corporation
P.O. Box 5937
Denver, CO 80217-9808
or call 1-800-548-4725
or visit Intel’s website at http://www.intel.com
Figure 3-14.Format of Page-Directory and Page-Table Entries for 4-KByte Pages
Figure 3-15.Format of Page-Directory Entries for 4-MByte Pages and 32-Bit
Figure 3-16.Format of a Page-Table or Page-Directory Entry for a
Figure 3-17.Register CR3 Format When the Physical Address Extension
The IA-32 Intel® Architecture Software Developer’ s Manual, Volume 3A: System Programming
Guide, Part 1 (order number 253668) and the IA-32 Intel® Architecture Software Developer’s
Manual, Volume 3B: System Programming Guide, Part 2 (order number 253669) are part of a
set that describes the architecture and programming environment of all IA-32 Intel Architecture
processors. The other volumes in this set are:
The IA-32 Intel® Architecture Software Developer’s Manual, Volume 1, describes the basic
architecture and programming environment of an IA-32 processor. The IA-32 Intel® Architec-ture Software Developer’s Manual, Volumes 2A & 2B, describe the instruction set of the
processor and the opcode structure. These volumes apply to application programmers and to
programmers who write operating systems or executives. The IA-32 Intel® Architecture Soft-ware Developer’s Manual, Volumes 3A & 3B, describe the operating-system support environment of an IA-32 processor and IA-32 processor compatibility information. These volumes
target operating-system and BIOS designers. In addition, IA-32 Intel® Architecture SoftwareDeveloper’s Manual, Volume 3B, addresses the program ming environment for classes of software that host operating systems.
1.1IA-32 PROCESSORS COVERED IN THIS MANUAL
This manual includes information pertaining primarily to the most recent IA-32 processors,
which include the Pentium
®
Xeon® processors, the Pentium M processors, the Pentium D processors, and the Pentium
Intel
processor Extreme Edition. The P6 family processors are those IA-32 processors based on the
P6 family microarchitecture, which include the Pentium Pro, Pentium II, and Pentium III processors. The Pentium 4, Intel Xeon, Pentium D processors, and Pentium processor Extreme
Editions are based on the Intel NetBurst
®
processors, the P6 family processors, the Pentium 4 processors, the
®
microarchitecture.
Vol. 3A 1-1
ABOUT THIS MANUAL
1.2OVERVIEW OF THE SYSTEM PROGRAMMING GUIDE
A description of this manual’s content follows:
Chapter 1— About This Manual. Gives an overview of all three volumes of the IA-32 Intel
Architectur e Softwar e Developer’s Manual. It also describes the notational conventions in these
manuals and lists related Intel manuals and documentation of interest to programmers and hardware designers.
Chapter 2 — System Architecture Overview. Describes the modes of operation of an IA-32
processor and the mechanisms provided in the IA-32 architecture to support operating systems
and executives, including the system-oriented registers and data structures and the systemoriented instructions. The steps necessary for switching between real-address and protected
modes are also identified.
Chapter 3 — Protected-Mode Memory Management. Describes the data structures, registers,
and instructions that support segmentation and paging. The chapter explains how they can be
used to implement a “flat” (unsegmented) memory model or a segmented memory model.
Chapter 4 — Protection. Describes the support for page and segment protection provided in
the IA-32 architecture. This chapter also explains the implementation of privilege rules, stack
switching, pointer validation, user and supervisor modes.
Chapter 5 — Interrupt and Exception Handling. Describes the basic interrupt mechanisms
defined in the IA-32 architecture, shows how interrupts and exceptions relate to protection, and
describes how the architecture handles each exception type. Reference information for each
IA-32 exception is given at the end of this chapter.
Chapter 6 — Task Management. Describes mechanisms the IA-32 architecture provides to
support multitasking and inter-task protection.
Chapter 7 — Multiple-Processor Management. Describes the instructions and flags that
support multiple processors with shared memory, memory ordering, and Hyper-Threading T echnology.
Chapter 8 — Advanced Programmable Interrupt Controller (APIC). Describes the
programming interface to the local APIC and gives an overview of the interface between the
local APIC and the I/O APIC.
Chapter 9 — Processor Management and Initialization. Defines the state of an IA-32
processor after reset initialization. This chapter also explains how to set up an IA-32 processor
for real-address mode operation and protected- mode operation, and how to switch between
modes.
Chapter 10 — Memory Cache Control. Describes the general concept of caching and the
caching mechanisms supported by the IA-32 architecture. This chapter also describes the
memory type range registers (MTRRs) and how they can be used to map memory types of physical memory. Information on using the new cache control and memory streaming instructions
introduced with the Pentium III, Pentium 4, and Intel Xeon processors is also given.
Chapter 11 — Intel
the Intel
1-2 Vol. 3A
®
MMX™ technology that must be handled and considered at the system programming
®
MMX™ T echnology System Programming. Describes those aspects of
ABOUT THIS MANUAL
level, including: task switching, exception handling, and compatib ility with existing system
environments.
Chapter 12 — SSE, SSE2 and SSE3 System Programming. Describes those aspects of
SSE/SSE2/SSE3 extensions that must be handled and considered at the system programming
level, including task switching, exception handling, and compatibility with existing system
environments.
Chapter 13 — Power and Thermal Management. Describes the IA-32 architecture’s power
and the thermal monitoring facilities.
Chapter 14 — Machine-Check Architecture. Describes the machine-check architecture.
Chapter 15 — 8086 Emulation. Describes the real-address and virtual-8086 modes of the
IA-32 architecture.
Chapter 16 — Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code
modules within the same program or task.
Chapter 17 — IA-32 Architecture Compatibility. Describes architectural compatibility
among the IA-32 processors, which include the Intel 286, Intel386™, Intel 486™, Pentium , P6
family, Pentium 4, and Intel Xeon processors. The differences among the 32-bit IA-32 processors are also described throughout the three volumes of the IA-32 Software Developer’s
Manual, as relevant to particular features of the architecture. This chapter provides a collection
of all the relevant compatibility information for all IA-32 processors and also describes the basic
differences with respect to the 16-bit IA-32 processors (the Intel 8086 and Intel 286 processors).
Chapter 18 — Debugging and Performance Monitoring. Descri bes the debugging registers
and other debug mechanism provided in the IA-32 architecture. This chapter also describes the
time-stamp counter and the performance-monitoring counters.
Chapter 19 — Introduction to Virtual-Machine Extensions. Describes the basic elements of
virtual machine architecture and the virtual-machine extensions of IA-32 Intel Architecture..
Chapter 20 — Virtual-Machine Control Structures. Describes components that manage
VMX operation. These include the working-VMCS pointer and the controlling-VMCS pointer.
Chapter 21— VMX Non-Root Operation. Describes the operation of a VMX non-root operation. Processor operation in VMX non-root mode can be restricted programmatically such that
certain operations, events or conditions can cause the processor to transfer control from the guest
(running in VMX non-root mode) to the monitor software (running in VMX root mode).
Chapter 22 — VM Entries. Describes VM-entries. VM-entry transitions the processor from
the VMM running in VMX root-mode to a VM running in VMX non-root mode. VM-Entry is
performed by the execution of VMLAUNCH or VMRESUME instructions.
Chapter 23 — VM Exits. Describes VM-exits. Certain events, operations or situations while
the processor is in VMX non-root operation may cause VM-exit transitions. In addition VMexits can also occur on failed VM-entries.
Chapter 24 — System Management. Describes the IA-32 architecture’s system management
mode (SMM) facilities.
Chapter 26 — Virtualization of System Resources. Describes the virtualization of the system
resources. These include: debugging facilities, address translation, physical memory , and microcode update facilities.
Chapter 27 — Handling Boundary Conditions in a Virtual Machine Monitor. Describes
what a VMM must consider when handling exceptions, interrupts, error conditions, an d tran sitions between activity states.
Appendix A — Performance-Monitoring Events. Lists the events that can be counted with
the performance-monitoring counters and the codes used to select these events. Both Pentium
processor and P6 family processor events are described.
Appendix B — Model-Specific Registers (MSRs). Lists the MSRs available in the Pentium
processors, the P6 family processors, and the Pentium 4 and Intel Xeon processors and describes
their functions.
Appendix C — MP Initialization For P6 Family Processors. Gives an example of how to use
of the MP protocol to boot P6 family processors in n MP system.
Appendix D — Programming the LINT0 and LINT1 Inputs. Gives an example of how to
program the LINT0 and LINT1 pins for specific interrupt vectors.
Appendix E — Interpreting Machine-Check Error Codes. Gives an example of how to interpret the error codes for a machine-check error that occurred on a P6 family processor.
Appendix F — APIC Bus Message Formats. Describes the message formats for messages
transmitted on the APIC bus for P6 family and Pentium processors.
Appendix G — VMX Capability Reporting Facility. Describes the VMX capability MSRs.
Support for specific VMX features is determined by reading capability MSRs.
Appendix H — Field Encoding in VMCS. Enumerates all fields in the VMCS and their encodings. Fields are grouped by width (16-bit, 32-bit, etc.) and type (guest-state, ho st-state, etc.).
Appendix I — VM Basic Exit Reasons. Describes the 32-bit fields that encode reasons for a
VM-Exit. Examples of exit reasons include, but are not limited to: software interrupts, processor
exceptions, software traps, NMIs, external interrupts, and triple faults.
Appendix J — VM Instruction Error Numbers. Describes the VM-instruction error codes
generated by failed VM instruction executions (that have a valid working-VMCS pointer).
1.3NOTATIONAL CONVENTIONS
This manual uses specific notation for data-structure formats, for symbolic representation of
instructions, and for hexadecimal and binary numbers. A review of this notation makes the
manual easier to read.
1-4 Vol. 3A
ABOUT THIS MANUAL
1.3.1Bit and Byte Order
In illustrations of data structures in memory, smaller addresses appear toward the bottom of the
figure; addresses increase toward the top. Bit positions are numbered from right to left. The
numerical value of a set bit is equal to two raised to the power of the bit position. IA-32 processors are “little endian” machines; this means the bytes of a word are numbered starting from the
least significant byte. Figure 1-1 illustrates these conventions.
1.3.2Reserved Bits and Software Compatibility
In many register and memory layout descriptions, certain bits are marked as reserved. When
bits are marked as reserved, it is essential for compatibility with future processors that software
treat these bits as having a future, though unknown, effect. The behavior of reserved bits should
be regarded as not only undefined, but unpredictable. Software sh ould follow these gui delines
in dealing with reserved bits:
•Do not depend on the states of any reserved bits when testin g the val ues of registers which
contain such bits. Mask out the reserved bits before testing.
•Do not depend on the states of any reserved bits when storing to memory or to a register.
•Do not depend on the ability to retai n informati on written into any reserved bits.
•When loading a register, always load the reserved bits with the values indicated in the
documentation, if any, or reload them with values previously read from the same register.
NOTE
Avoid any software dependence upon the state of reserved bits in IA-32
registers. Depending upon the values of reserved register bits will make
software dependent upon the unspecified manner in which the processor
handles these bits. Programs that depend upon reserved values risk incompatibility with future processors.
Vol. 3A 1-5
ABOUT THIS MANUAL
Highest
Address
31
Byte 3
Data Structure
23
24
Byte 2
Figure 1-1. Bit and Byte Order
16
15
Byte 1
8
7
Byte 0
0
28
24
20
16
12
8
4
0
Byte Offset
Bit offset
Lowest
Address
1.3.3Instruction Operands
When instructions are represented symbolically, a subset of the IA-32 assembly language is
used. In this subset, an instruction has the following format:
label: mnemonic argument1, argument2, argument3
where:
•A label is an identifier which is followed by a colon.
•A mnemonic is a reserved name for a class of instruction opcodes which have the same
function.
•The operands argument1, argument2, and argument3 are optional. There may be from
zero to three operands, depending on the opcode. When present, they take the form of
either literals or identifiers for data items. Operand identifiers are either reserved names of
registers or are assumed to be assigned to data items declared in another part of the
program (which may not be shown in the example).
When two operands are present in an arithmetic or logical instruction, the right operand i s the
source and the left operand is the destination.
For example:
LOADREG: MOV EAX, SUBTOTAL
In this example LOADREG is a label, MOV is the mnemonic identifier of an opcode, EAX is
the destination operand, and SUBTOTAL is the source operand. Some assembly languages put
the source and destination in reverse order.
1-6 Vol. 3A
ABOUT THIS MANUAL
1.3.4Hexadecimal and Binary Numbers
Base 16 (hexadecimal) numbers are represented by a string of hexadecimal digits followed by
the character H (for example, F82EH). A hexadecimal digit is a character from the following
set: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F.
Base 2 (binary) numbers are represented by a string of 1s and 0s, sometimes fo llowed by the
character B (for example, 1010B). The “B” designation is only used in situations where confusion as to the type of number might arise.
1.3.5Segmented Addressing
The processor uses byte addressing. This means memory is organized and accessed as a
sequence of bytes. Whether one or more bytes are being accessed, a byte address is used to
locate the byte or bytes memory. The range of memory that can be addressed is called an
address space.
The processor also supports segmented addressing. This is a form of addressing where a
program may have many independent address spaces, called segments. For example, a program
can keep its code (instructions) and stack in separate segments. Code addresses would always
refer to the code space, and stack addresses would always refer to the stack space. The following
notation is used to specify a byte address within a segment:
Segment-register:Byte-address
For example, the following segment address identifies the byte at address FF79H in the segment
pointed by the DS register:
DS:FF79H
The following segment address identifies an instruction address in the code segment. The CS
register points to the code segment and the EIP register contains the address of the instruction.
CS:EIP
1.3.6Syntax for CPUID, CR, and MSR Values
Obtain feature flags, status, and system information by using the CPUID instruction, by
checking control register bits, and by reading model-specific registers. We are moving toward a
single syntax to represent this type of information. See Figure 1-2.
Vol. 3A 1-7
ABOUT THIS MANUAL
CPUID Input and Output
Control Register Values
Model-Specific Register Values
CPUID.01H:ECX.SSE [bit 25] = 1
Input value for EAX register
Output register and feature flag or field
name with bit position(s)
Value (or range) of output
CR4.OSFXSR[bit 9] = 1
Example CR name
Feature flag or field name
with bit position(s)
Value (or range) of output
IA32_MISC_ENABLES.ENABLEFOPCODE[bit 2] = 1
Example MSR name
Feature flag or field name with bit position(s)
Value (or range) of output
OM17732
Figure 1-2. Syntax for CPUID, CR, and MSR Data Presentation
1.3.7Exceptions
An exception is an event that typically occurs when an instruction causes an error. For example,
an attempt to divide by zero generates an exception. However, some exceptions, such as breakpoints, occur under other conditions. Some types of exceptions may provid e error codes. An
error code reports additional information abo ut the error. An example of the notation used to
show an exception and error code is shown below:
#PF(fault code)
This example refers to a page-fault exception under conditions where an error code naming a
type of fault is reported. Under some conditions, exceptions which produce error codes may not
1-8 Vol. 3A
ABOUT THIS MANUAL
be able to report an accurate code. In this case, the error code is zero, as shown below for a
general-protection exception.
#GP(0)
1.4RELATED LITERATURE
Literature related to IA-32 processors is listed on-line at this link:
http://developer.intel.com/design/processor/
Some of the documents listed at this web site can be viewed on-line; others can be ordered. The
literature available is listed by Intel processor and then by the following literature types: applications notes, data sheets, manuals, papers, and specification updates.
See also:
•the data sheet for a particular Intel IA-32 processor
•the specification update for a particular Intel IA-32 processor
•AP-485, Intel Processor Identification and the CPUID Instruction, Order Number 241618
•IA-32 Intel® Architecture Optimization Reference Manual, Order Number 248966
Vol. 3A 1-9
ABOUT THIS MANUAL
1-10 Vol. 3A
System Architecture
Overview
2
CHAPTER 2
SYSTEM ARCHITECTURE OVERVIEW
IA-32 architecture (beginning with the Intel386 processor family) provides extensive support
for operating-system and system-development software. This support offers multiple mo des of
operation, which include:
•Real mode, protected m ode, virtual 80 86 mode, and syst em management mod e. These are
sometimes referred to as legacy modes.
•IA-32e mode (added b y Intel
in one of two sub-modes: 64-bit mode or compatibility mode.
The IA-32 system-level architecture and includes features to assist in the following operations:
This chapter provides a description of each part of this architecture. It also describes the system
registers that are used to set up and control the processor at the system level and gives a brief
overview of the processor’s system-level (operating system) instructions.
Many features of the IA-32 system-level architectural are used only by system programmers.
However, application programmers may need to read this chapter and the following chapters in
order to create a reliable and secure environment for application programs.
This overview and most subsequent chapters of this book focus on protected-mode operation of
the IA-32 architecture. IA-32e mode operation, as it differs from protected mode operation, is
also described.
All IA-32 processors enter real-address mode following a power-up or reset (see Chapter 9,
“Processor Management and Initialization”). Software then initiates the switch from realaddress mode to protected mode. If IA-32e mode operation is desired, software also initiates a
switch from protected mode to IA-32e mode.
Vol. 3A 2-1
SYSTEM ARCHITECTURE OVERVIEW
2.1OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE
IA-32 system-level architecture consists of a set of registers, data structures, and instructions
designed to support basic system-level operations such as memory management, interrupt and
exception handling, task management, and control of multiple processors.
Figure 2-1 provides a summary of system regi sters and data structures that applies to 32-bit
modes. System registers and data structures that apply to IA-32e mode are shown in Figure 2-2.
2-2 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
EFLAGS Register
Control Registers
Task Register
Interrupt
Vector
Interrupt Descriptor
Table (IDT)
Interrupt Gate
Task Gate
Trap Gate
IDTR
CR4
CR3
CR2
CR1
CR0
Segment Sel.
TSS Seg. Sel.
Call-Gate
Segment Selector
LDTR
Physical Address
Linear Address
Segment Selector
Register
Global Descriptor
Table (GDT)
Seg. Desc.
TSS Desc.
Seg. Desc.
TSS Desc.
LDT Desc.
GDTR
Local Descriptor
Table (LDT)
Seg. Desc.
Call Gate
Code, Data or
Stack Segment
Task-State
Segment (TSS)
Current
TSS
Task-State
Segment (TSS)
Current
TSS
Protected Procedure
Current
TSS
Task
Code
Data
Stack
Interrupt Handler
Code
Stack
Task
Code
Data
Stack
Exception Handler
Code
Stack
Code
Stack
Linear Address Space
Linear Addr.
0
Figure 2-1. IA-32 System-Level Registers and Data Structures
Page Directory
Pg. Dir. Entry
CR3*
*Physical Address
This page mapping example is for 4-KByte pages
and the normal 32-bit physical address size.
Dir
Linear Address
TableOffset
Page Table
Pg. Tbl. Entry
Page
Physical Addr.
Vol. 3A 2-3
SYSTEM ARCHITECTURE OVERVIEW
RFLAGS
Physical Address
Control Register
CR8
CR4
CR3
CR2
CR1
CR0
Task Register
Linear Address
Segment Selector
Register
Global Descriptor
Table (GDT)
Code, Data or Stack
Segment (Base =0)
Task-State
Segment (TSS)
Segment Sel.
Interrupt
Vector
Interrupt Descriptor
Table (IDT)
Interrupt Gate
Interrupt Gate
Trap Gate
IDTR
Linear Address Space
Linear Addr.
TR
Segment Selector
Call-Gate
LDTR
PML4
PML4.
Entry
Seg. Desc.
TSS Desc.
Seg. Desc.
Seg. Desc.
LDT Desc.
GDTR
Local Descriptor
Table (LDT)
Seg. Desc.
Call Gate
Dir. Pointer
PML4
Pg. Dir. Ptr.
Current TSS
IST
Linear Address
Page Dir.
Pg. Dir.
Entry
Interrupt Handler
NULL
Exception Handler
NULL
Protected Procedure
NULL
TableDirectory
Page Table
Offset
Entry
Code
Stack
Interr. Handler
Code
Stack
Code
Stack
Code
Stack
Page
Physical
Addr.Page Tbl
2-4 Vol. 3A
0
CR3*
*Physical Address
This page mapping example is for 4-KByte pages
and 40-bit physical address size.
Figure 2-2. System-Level Registers and Data Structures in IA-32e Mode
SYSTEM ARCHITECTURE OVERVIEW
2.1.1Global and Local Descriptor Tables
When operating in protected mode, all memory accesses pass through either the global
descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These
tables contain entries called segment descriptors. Segment descriptors provide the base address
of segments well as access rights, type, and usage information.
Each segment descriptor has an associated segm ent selector. A segment selector provides the
software that uses it with an index into the GDT or LDT (the offset of its associated segment
descriptor), a global/local flag (determines whether the selector points to the GDT or the LDT),
and access rights information.
To access a byte in a segment, a segment selector and an offset must be supplied. The segment
selector provides access to the segment descriptor for the segment (in the GDT or LDT). From
the segment descriptor, the processor obtains the base address of the segment in the linear
address space. The offset then provides the location of the byte relative to the base address. This
mechanism can be used to access any valid code, data, or stack segment, provided the segment
is accessible from the current privilege level (CPL) at which the processor is operating. The CPL
is defined as the protection level of the currently executing code segment.
See Figure 2-1. The solid arrows in the figure indicate a linear address, dashed lines indicate a
segment selector, and the dotted arrows indicate a physical address. For simplicity, many of the
segment selectors are shown as direct pointers to a segment. However, the actual path from a
segment selector to its associated segment is always through a GDT or LDT.
The linear address of the base of the GDT is contained in the GDT register (GDTR); the linear
address of the LDT is contained in the LDT register (LDTR).
2.1.1.1Global and Local Descriptor Tables in IA-32 Mode
GDTR and LDTR registers are expanded to 64-bit wide in both IA-32e sub-modes (64-bit mode
and compatibility mode). For more information: see Section 3.5.2, “Segment Descriptor Tables
in IA-32e Mode.”
Global and local descriptor tables are expanded in 64-bit mode to support 64-bit base addresses,
(16-byte LDT descriptors hold a 64-bit base address and various attributes). In compatibil ity
mode, descriptors are not expanded.
2.1.2System Segments, Segment Descriptors, and Gates
Besides code, data, and stack segments that make up the execution environment of a program or
procedure, the architecture defines two system segments: the task-state segment (TSS) and the
LDT. The GDT is not considered a segment because it is not accessed by means of a segment
selector and segment descriptor. TSSs and LDTs have segment descriptors defi ned for them.
The architecture also defines a set of special descriptors called gates (call gates, interrupt gates,
trap gates, and task gates). These provide protected gateways to system procedures and handlers
that may operate at a different privilege level than application programs and most procedures.
Vol. 3A 2-5
SYSTEM ARCHITECTURE OVERVIEW
For example, a CALL to a call gate can provide access to a procedure in a code segment that is
at the same or a numerically lower privilege level (more privileged) than the current code
segment. To access a procedure through a call gate, the calling procedure
1
supplies the selector
for the call gate. The processor then performs an access rights check on the call gate, comparing
the CPL with the privilege level of the call gate and the destination code segment pointed to by
the call gate.
If access to the destination code segment is allowed, the processor gets the segment selector for
the destination code segment and an offset into that code segment from the call gate. If the call
requires a change in privilege level, the processor also switches to the stack for the targeted privilege level. The segment selector for the new stack is obtained from the TSS for the currently
running task. Gates also facilitate transitions between 16-bit and 32-bit code segments, and vice
versa.
2.1.2.1Gates in IA-32e Mode
In IA-32e mode, the following descriptors are 16-byte descriptors (expanded to allow a 64-bit
base): LDT descriptors, 64-bit TSSs, call gates, interrupt gates, and trap gates.
Call gates facilitate transitions between 64-bit mode and compatibility mode. Task gates are not
supported in IA-32e mode. On privilege level changes, stack segment selectors are not read from
the TSS. Instead, they are set to NULL.
2.1.3Task-State Segments and Task Gates
The TSS (see Figure 2-1) defines the state of the execution environment for a task. It includes
the state of general-purpose registers, segment registers, the EFLAGS register, the EIP register,
and segment selectors with stack pointers for three stack segments (one stack for each privilege
level). The TSS also includes the segment selector for the LDT associated with the task and the
page-table base address.
All program execution in protected mode happens within the context of a task (called the current
task). The segment selector for the TSS for the current task is stored in the task register. The
simplest method for switching to a task is to make a call or jump to the new task. Here, the
segment selector for the TSS of the new task is given in the CALL or JMP instruction. In
switching tasks, the processor performs the follo wi n g act ions:
1. Stores the state of the current task in the current TSS.
2. Loads the task register with the segment selector for the new task.
3. Accesses the new TSS through a segment descriptor in the GDT.
4. Loads the state of the new task from the new TSS into the general-purpose registers, the
segment registers, the LDTR, control register CR3 (page-table base address), the EFLAGS
register, and the EIP register.
5. Begins execution of the new task.
1. The word “procedure” is commonly used in this document as a general term for a logical unit or block of
code (such as a program, procedure, function, or routine).
2-6 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
A task can also be accessed through a task gate. A task gate is similar to a call gate, except that
it provides access (through a segment selector) to a TSS rather than a code segment.
2.1.3.1Task-State Segments in IA-32e Mode
Hardware task switches are not supported in IA-32e mode. However, TSSs continue to exist.
The base address of a TSS is specified by its descriptor.
A 64-bit TSS holds the following information that is important to 64-bit operation:
•Stack pointer addresses for each privilege level
•Pointer addresses for the interrupt stack table
•Offset address of the IO-permission bitmap (from the TSS base)
The task register is expanded to hold 64-bit base addresses in IA-32e mode. See also: Section 6.7,
“Task Management in 64-bit Mode.”
2.1.4Interrupt and Exception Handling
External interrupts, software interrupts and exceptions are handled through the interrupt
descriptor table (IDT). The IDT stores a collection of gate descriptors that provide access to
interrupt and exception handlers. Like the GDT, the IDT is not a segment. The linear address for
the base of the IDT is contained in the IDT register (IDTR).
Gate descriptors in the IDT can be interrupt, trap, or task gate descriptors. T o access an interrupt
or exception handler, the processor first receives an interrupt vector (interrupt number) from
internal hardware, an external interrupt controller, or from software by means of an INT , INT O,
INT 3, or BOUND instruction. The interrupt vector provides an index into the IDT. If the
selected gate descriptor is an interrupt gate or a trap gate, the associated handler procedure is
accessed in a manner similar to calling a procedure through a call gate. If the descriptor is a task
gate, the handler is accessed through a task switch.
2.1.4.1Interrupt and Exception Handling IA-32e Mode
In IA-32e mode, interrupt descriptors are expanded to 16 bytes to support 64-bit base addresses.
This is true for 64-bit mode and compatibility mode.
The IDTR register is expanded to hold a 64-bit base address. Task gates are not supported.
2.1.5Memory Management
System architecture supports either direct physical addressing of memory or virtual memory
(through paging). When physical addressing is used, a linear address is treated as a physical
address. When paging is used: all code, data, stack, and system segments (including the GDT
and IDT) can be paged with only the most recently accessed pages being held in physical
memory.
Vol. 3A 2-7
SYSTEM ARCHITECTURE OVERVIEW
The location of pages (sometimes called page frames) in physical memory is contained in two
types of system data structures: page directories and page tables. Both structures reside in physical memory (see Figure 2-1).
The base physical address of the page directory is contained in control register CR3. An entry
in a page directory contains the physical address of the base of a page table, access rights and
memory management information. An entry in a page table contains the physical address of a
page frame, access rights and memory management information.
T o use this paging mechanism, a linear address is broken into three parts. The parts provide separate offsets into the page directory, the page table, and the page frame. A system can have a
single page directory or several. For example, each task can have its own page directory.
2.1.5.1Memory Management in IA-32e Mode
In IA-32e mode, physical memory pages are managed by a set of system data structures. In
compatibility mode and 64-bit mode, four levels of system data structures are used. These
include:
•The page map level 4 (PML4) — An entry in a PML4 table contains the physical address
of the base of a page directory pointer table, access rights, and memory management information. The base physical address of the PML4 is stored in CR3.
•A set of page directory pointers — An entry in a page directory pointer table contains the
physical address of the base of a page directory table, access rights, and memory
management information.
•Sets of page directories — An entry in a page directory table contains the physical
address of the base of a page table, access rights, and memory management information.
•Sets of page tables — An entry in a page table contains the physical address of a page
frame, access rights, and memory management information.
2.1.6System Registers
T o assist in initializing the processor and controlling system operations, the system architecture
provides system flags in the EFLAGS register and several system registers:
•The system flags and IOPL field in the EFLAGS register control task and mode switching,
interrupt handling, instruction tracing, and access rights. See also: Section 2.3, “System
Flags and Fields in the EFLAGS Register.”
•The control registers (CR0, CR2, CR3, and CR4) contain a variety of flags and data fields
for controlling system-level operations. Other flags in these registers are used to indicate
support for specific processor capabilities within the operating system or executive. See
also: Section 2.5, “Control Registers.”
•The debug registers (not shown in Figure 2-1) allow the setting of breakpoints for use in
debugging programs and systems software. See also: Chapter 18, “Debugging and
Performance Monitoring.”
2-8 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
•The GDTR, LDTR, and IDTR registers contain the linear addresses and sizes (limits) of
their respective tables. See also: Section 2.4, “Memory-Management Registers.”
•The task register contains the linear address and size of the TSS for the current task. See
also: Section 2.4, “Memory-Management Registers.”
•Model-specific registers (not shown in Figure 2-1).
The model-specific registers (MSRs) are a group of registers available primarily to operatingsystem or executive procedures (that is, code running at privilege level 0). These registers
control items such as the debug extensions, the performance-monitoring counters, the machinecheck architecture, and the memory type ranges (MTRRs).
The number and function of these registers varies among different members of the IA-32
processor families. See also: Section 9.4, “Model-Specific Registers (MSRs),” and Appendix B,
“Model-Specific Registers (MSRs).”
Most systems restrict access to system registers (other than the EFLAGS register) by application
programs. Systems can be designed, however, where all programs and procedures run at the
most privileged level (privilege level 0). In such a case, application programs would be allowed
to modify the system registers.
2.1.6.1System Registers in IA-32e Mode
In IA-32e mode, the four system-descriptor-table registers (GDTR, IDTR, LDTR, and TR) are
expanded in hardware to hold 64-bit base addresses. EFLAGS becomes the 64-bit RFLAGS
register. CR0-CR4 are expanded to 64 bits. CR8 becomes available. CR8 provides read-write
access to the task priority register (TPR) so that the operating system can control the priority
classes of external interrupts.
In 64-bit mode, debug registers DR0–DR7 are 64 bits. In compatibility mode, address-matching
in DR0-DR3 is also done at 64-bit granularity.
On systems that support IA-32e mode, the exte nded feature enable register (IA32_EFER) is
available. This model-specific register controls activation of IA-32e mode and other IA-32e
mode operations. In addition, there are several model-specific registers that govern IA-32e
mode instructions:
•IA32_KernelGSbase — Used by SWAPGS instruction.
•IA32_LSTAR — Used by SYSCALL instruction.
•IA32_SYSCALL_FLAG_ M ASK — Used by SYSCALL instruction.
•IA32_STAR_CS — Used by SYSCALL and SYSRET instruction.
Vol. 3A 2-9
SYSTEM ARCHITECTURE OVERVIEW
2.1.7Other System Resources
Besides the system registers and data structures described in the previous sections, system architecture provides the following additional resources:
•Operating system instruction s (see also: Section 2.6, “System Instruction Summary”).
•Performance-monitoring counters (not shown in Figure 2-1).
•Internal caches and buffers (not shown in Figure 2-1).
Performance-monitoring counters are event counters that can be programmed to count processor
events such as the number of instructions decoded, the number of interrupts received, or the
number of cache loads. See also: Section 18, “Debugging and Performance Monitoring.”
The processor provides several internal caches and buffers. The caches are used to store both
data and instructions. The buffers are used to store things like decoded addresses to system and
application segments and write operations waiting to be performed. See also: Chapter 10,
“Memory Cache Control.”
2.2MODES OF OPERATION
The IA-32 architecture supports four operating modes and one quasi-operating mode:
•Protected mode — This is the native operating mode of the processor. It provides a rich
set of architectural features, flexibility, high performance and backward compatibility to
existing software base.
•Real-address mode — This operating mode provides the programming environment of
the Intel 8086 processor, with a few extensions (such as the ability to switch to protected or
system management mode).
•System management mode (SMM) — SMM is a standard architectural feature in all
IA-32 processors, beginning with the Intel386 SL processor. This mode provides an
operating system or executive with a transparent mechanism for implementing power
management and OEM differentiation features. SMM is entered through activation of an
external system interrupt pin (SMI#), which generates a system management interrupt
(SMI). In SMM, the processor switches to a separate address space while saving the
context of the currently running program or task. SMM-specific code may then be
executed transparently. Upon returning from SMM, the processor is placed back into its
state prior to the SMI.
•Virtual-8086 mode — In protected mode, the processor supports a quasi-operating mode
known as virtual-8086 mode. This mode allows the processor execute 8086 software in a
protected, multitasking environment.
•IA-32e mode — In IA-32e mode, the processor supports two sub-modes: compatibility
mode and 64-bit mode. 64-bit mode provides 64-bit linear addressing and support for
physical address space larger than 64 GBytes. Compatibility mode allows most legacy
protected-mode applications to run unchanged.
Figure 2-3 shows how the processor moves among these operating modes.
2-10 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
SMI#
System
Reset
Real-Address
Reset or
PE=0
Protected Mode
Mode
PE=1
Reset
or
RSM
SMI#
RSM
Management
VM=1VM=0
Virtual-8086
Mode
LME=1, CR0.PG=1*
See**
SMI#
RSM
IA-32e
Mode
SMI#
RSM
Figure 2-3. Transitions Among the Processor’s Operating Modes
Mode
* See Section 9.8.5
** See Section 9.8.5.4
The processor is placed in real-address mode following power-up or a reset. The PE flag in
control register CR0 then controls whether the processor is operating in real-address or protected
mode. See also: Section 9.9, “Mode Switching.”
The VM flag in the EFLAGS register determines whether the processor is operating in protected
mode or virtual-8086 mode. Transitions between protected mode and virtual-8086 mode are
generally carried out as part of a task switch or a return from an interrupt or exception handler.
See also: Section 15.2.5, “Entering Virtual-8086 Mode.”
The LMA bit (IA32_EFER.LMA.LMA[bit 10]) determines whether the processor is operating
in IA-32e mode. When running in IA-32e mode, 64-bit or compatibility sub-mode operation is
determined by CS.L bit of the code segment. The processor enters into IA-32e mode from
protected mode by enabling paging and setting the LME bit (IA32_EFER.LME[bit 8]). See also:
Chapter 9, “Processor Management and Initialization.”
The processor switches to SMM whenever it receives an SMI while the processor is in realaddress, protected, virtual-8086, or IA-32e modes. Upon execution of the RSM instruction, the
processor always returns to the mode it was in when the SMI occurred.
Vol. 3A 2-11
SYSTEM ARCHITECTURE OVERVIEW
2.3SYSTEM FLAGS AND FIELDS IN THE EFLAGS REGISTER
The system flags and IOPL field of the EFLAGS register control I/O, maskable hardware interrupts, debugging, task switching, and the virtual-8086 mode (see Figure 2-4). Only privileged
code (typically operating system or executive code) should be allowed to modify these bits .
The system flags and IOPL are:
TFTrap (bit 8) — Set to enable single-step mode for debugging; clear to disable single-
step mode. In single-step mode, the processor generates a debug exception after each
instruction. This allows the execution state of a program to be inspected after each
instruction. If an application program sets the TF flag using a POPF, POPFD, or IRET
instruction, a debug exception is generated after the instruction that follows the POPF,
POPFD, or IRET.
31
Reserved (set to 0)
22
21 20 19
I
D
17
16
151314 12 11
18
V
V
A
I
I
C
P
F
N
R
V
0
T
F
M
I
O
P
L
10 9
O
DFIFTFSFZ
F
876
543
F
1
0
2
P
A
F
C
1
00
F
F
ID — Identification Flag
VIP — Virtual Interrupt Pending
VIF — Virtual Interrupt Flag
AC — Alignment Check
VM — Virtual-8086 Mode
RF — Resume Flag
NT — Nested Task Flag
IOPL— I/O Privilege Level
IF— Interrupt Enable Flag
TF — Trap Flag
Reserved
Figure 2-4. System Flags in the EFLAGS Register
IFInterrupt enable (bit 9) — Controls the response of the processor to maskable hard-
ware interrupt requests (see also: Section 5.3.2, “Maskable Hardware Interrupts”). The
flag is set to respond to maskable hardware interrupts; cleared to inhibit maskable hardware interrupts. The IF flag does not affect the generation of exceptions or
nonmaskable interrupts (NMI interrupts). The CPL, IOPL, and the state of the VME
flag in control register CR4 determine whether the IF flag can be modified by the CLI,
STI, POPF, POPFD, and IRET.
IOPLI/O privilege level field (bits 12 and 13) — Indicates the I/O privilege level (IOPL)
of the currently running program or task. The CPL of the currently running program
or task must be less than or equal to the IOPL to access the I/O address space. This
field can only be modified by the POPF and IRET instructions when operating at a
CPL of 0.
2-12 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
The IOPL is also one of the mechanisms that controls the modification of the IF flag
and the handling of interrupts in virtual-8086 mod e when vi rtu al mode extensions are
in effect (when CR4.VME = 1). See also: Chapter 13, “Input/Output,” in the IA-32
NTNested task (bit 14) — Controls the chaining of interrupted and called tasks. Th e
processor sets this flag on calls to a task initiated with a CALL instruction, an interrupt,
or an exception. It examines and modifies this flag on returns from a task initiated with
the IRET instruction. The flag can be explicitly set or cleared with the PO PF/POPFD
instructions; however, changing to the state of this flag can generate unexpected excep-
tions in application programs.
See also: Section 6.4, “Task Linking.”
RFResume (bit 16) — Controls the processor’s response to instruction-breakpoint condi-
tions. When set, this flag temporarily disab les debug exceptions (#DB) from being
generated for instruction breakpoints (although other exception conditions can
cause an exception to be generated). When clear, instruction breakpoints will
generate debug exceptions.
The primary function of the RF flag is to allow the restarting of an instruction following
a debug exception that was caused by an instruction breakpoint condition. Here, debug
software must set this flag in the EFLAGS image on the stack just prior to returning to
the interrupted program with IRETD (to prevent the instruction breakpoint from
causing another debug exception). Th e p roc esso r then automatically clears this flag
after the instruction returned to has been successfully executed, enabling instruction
breakpoint faults again.
See also: Section 18.3.1.1, “Instruction-Breakpoint Exception Condition.”
VMVirtual-8086 mode (bit 17) — Set to enable virtual-8086 mode; clear to return to
protected mode.
See also: Section 15.2.1, “Enabling Virtual-8086 Mode.”
ACAlignment check (bit 18) — Set th is flag and the AM flag in control register CR0 to
enable alignment checking of memory references; clear the AC flag and/or the AM flag
to disable alignment checking. An alignment-check exception is generated when refer-
ence is made to an unaligned operand, such as a word at an odd byte address or a
doubleword at an address which is not an integral multiple of four. Alignment-check
exceptions are generated only in user mode (privilege level 3). Memory references that
default to privilege level 0, such as segment descriptor loads, do not generate this
exception even when caused by instructions executed in user-mode.
The alignment-check exception can be used to check alignment of data. This is useful
when exchanging data with processors which require all data to be aligned. The align-
ment-check exception can also be used by interpreters to flag some pointers as special
by misaligning the pointer. This eliminates overhead of checking each pointer and only
handles the special pointer when used.
Vol. 3A 2-13
SYSTEM ARCHITECTURE OVERVIEW
VIFVirtual Interrupt (bit 19) — Contains a virtual image of the IF flag. This flag is used
in conjunction with the VIP flag. The processor only recognizes the VIF flag when
either the VME flag or the PVI flag in control register CR4 is set and the IOPL is less
than 3. (The VME flag enables the virtual-8086 mode extensions; the PVI flag enables
the protected-mode virtual interrupts.)
See also: Section 15.3.3.5, “Method 6: Software Interrupt Handling,” and Section 15.4,
“Protected-Mode Virtual Interrupts.”
VIPVirtual interrupt pending (bit 20) — Set by software to indicate that an interrupt is
pending; cleared to indicate that no interrupt is pending. This flag is used in conjunction
with the VIF flag. The processor reads this flag but never modifies it. The processor
only recognizes the VIP flag when either the VME flag or the PVI flag in control
register CR4 is set and the IOPL is less than 3. The VME flag enables the virtual-8086
mode extensions; the PVI flag enables the protected-mode virtual interrupts.
See Section 15.3.3.5, “Method 6: Software Interrupt Handling,” and Section 15.4,
“Protected-Mode Virtual Interrupts.”
IDIdentification (bit 21). — The ability of a program or procedure to set or clear this flag
indicates support for the CPUID instruction.
2.3.1System Flags and Fields in IA-32e Mode
In 64-bit mode, the RFLAGS register expands to 64 bits with the upper 32 bits reserved. System
flags in RFLAGS (64-bit mode) or EFLAGS (compatibility mode) are shown in Figure 2-4.
In IA-32e mode, the processor does not allow the VM bit to be set because virtual-8086 mode
is not supported (attempts to set the bit are ignored). Also, the processor will not set the NT bit.
The processor does, however, allow software to set the NT bit (note that an IRET causes a
general protection fault in IA-32e mode if the NT bit is set).
In IA-32e mode, the SYSCALL/SYSRET instructions have a programmable method of specifying which bits are cleared in RFLAGS/EFLAGS. These instructions save/restore
EFLAGS/RFLAGS.
2.4MEMORY-MANAGEMENT REGISTERS
The processor provides four memory-management registers (GDTR, LDTR, IDTR, and TR)
that specify the locations of the data structures which control segmented memory management
(see Figure 2-5). Special instructions are provided for loading and storing these registers.
2-14 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
GDTR
IDTR
Task
Register
LDTR
47(79)
32(64)-bit Linear Base Address
32(64)-bit Linear Base Address
32(64)-bit Linear Base Address
32(64)-bit Linear Base Address
0
Attributes
Segment Limit
Segment Limit
Figure 2-5. Memory Management Registers
2.4.1Global Descriptor Table Register (GDTR)
The GDTR register holds the base address (32 bits in protected mode; 64 bits in IA-32e mode)
and the 16-bit table limit for the GDT. The base address specifies the linear address of byte 0 of
the GDT; the table limit specifies the number of bytes in the table.
The LGDT and SGDT instructions load and store the GDTR register, respectively. On power up
or reset of the processor, the base address is set to the default value of 0 and the limit is set to
0FFFFH. A new base address must be loaded into the GDTR as part of the processor initialization process for protected-mode operation.
See also: Section 3.5.1, “Segment Descriptor Tables.”
2.4.2Local Descriptor Table Register (LDTR)
The LDTR register holds the 16-bit segment selector, base address (32 bits in protected mode;
64 bits in IA-32e mode), segment limit, and descriptor attributes for the LDT. The base address
specifies the linear address of byte 0 of the LDT segment; the segment limit specifies the number
of bytes in the segment. See also: Section 3.5.1, “Segment Descriptor Tables.”
The LLDT and SLDT instructions load and store the segment selector part of the LDTR register,
respectively. The segment that contains the LDT must have a segment descriptor in the GDT.
When the LLDT instruction loads a segment selector in the LDTR: the base address, limit, and
descriptor attributes from the LDT descriptor are automatically loaded in the LDTR.
When a task switch occurs, the LDTR is automatically loaded with the segment selector and
descriptor for the LDT for the new task. The contents of the LDTR are not automatically saved
prior to writing the new LDT information into the register.
On power up or reset of the processor, the segment selector and base address are set to the
default value of 0 and the limit is set to 0FFFFH.
Vol. 3A 2-15
SYSTEM ARCHITECTURE OVERVIEW
2.4.3IDTR Interrupt Descriptor Table Register
The IDTR register holds the base address (32 bits in protected mode; 64 bits in IA-32e mod e)
and 16-bit table limit for the IDT. The base address specifies the linear address of byte 0 of the
IDT; the table limit specifies the number of bytes in the table. The LIDT and SIDT instructions
load and store the IDTR register, respectively. On power up or reset of the processor, the base
address is set to the default value of 0 and the limit is set to 0FFFFH. The base address and limi t
in the register can then be changed as part of the processor initialization process.
See also: Section 5.10, “Interrupt Descriptor Table (IDT).”
2.4.4Task Register (TR)
The task register holds the 16-bit segment selector, base address (32 bits in protected mode; 64
bits in IA-32e mode), segment limit, and descriptor attributes for the TSS of the current task.
The selector references the TSS descriptor in the GDT. The base address specifies the linear
address of byte 0 of the TSS; the segment limit specifies the number of bytes in the TSS. See
also: Section 6.2.4, “Task Register.”
The LTR and STR instructions load and store the segment selector part of the task register,
respectively. When the LTR instruction loads a segment selector in the task register, the base
address, limit, and descriptor attributes from the TSS descriptor are automatically loaded in to
the task register. On power up or reset of the processor, the base address is set to the default
value of 0 and the limit is set to 0FFFFH.
When a task switch occurs, the task register is automatically loaded with the segment selector
and descriptor for the TSS for the new task. The contents of the task register are not automatically saved prior to writing the new TSS information into the register.
2.5CONTROL REGISTERS
Control registers (CR0, CR1, CR2, CR3, and CR4; see Figure 2-6) d etermine operating mode
of the processor and the characteristics of the currently executing task. These registers are 32
bits in all 32-bit modes and compatibility mode.
In 64-bit mode, control registers are expanded to 64 bits. The MOV CRn instructions are used
to manipulate the register bits. Operand-size prefixes for these instructions are ignored. The
following is also true:
•Bits 63:32 of CR0 and CR4 are reserved and must be written with zeros. Writing a nonzero
value to any of the upper 32 bits results in a general-protection exception, #GP(0).
•All 64 bits of CR2 are writable by software.
•Bits 51:40 of CR3 are reserved and must be 0.
•The MOV CRn instructions do not check that addresses written to CR2 and CR3 are within
the linear-address or physical-address limitations of the implementation.
•Register CR8 is available in 64-bit mode only.
2-16 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
The control registers are summarized below, and each architecturally defined control field in
these control registers are described individually. In Figure 2-6, the width of the register in
64-bit mode is indicated in parenthesis (except for CR0).
•CR0 — Contains system control flags that control operating mode and states of the
processor.
•CR1 — Reserved.
•CR2 — Contains the page-fault linear address (the linear address that caused a page fault).
•CR3 — Contains the physical address of the base of the page directory and two flags (PCD
and PWT). This register is also known as the page-directory base register (PDBR). Only
the most-significant bits (less the lower 12 bits) of the base address are specified; the lower
12 bits of the address are assumed to be 0. The page directory must thus be aligned to a
page (4-KByte) boundary. The PCD and PWT flags control caching of the page directory
in the processor’s internal data caches (they do not control TLB caching of page-directory
information).
When using the physical address extension, the CR3 register contains the b ase address of
the page-directory-pointer table In IA-32e mode, the CR3 register contains the base
address of the PML4 table.
See also: Section 3.8, “36-Bit Physical Addressing Using the P A E Paging Mechanism.”
•CR4 — Contains a group of flags that enable several architectural extensions, and indicate
operating system or executive support for specific processor capabilities. The control
registers can be read and loaded (or modified) using the move-to-or-from-control-registers
forms of the MOV instruction. In protected mode, the MOV instructions al low the cont rol
registers to be read or loaded (at privilege level 0 only). This restriction means that
application programs or operating-system procedures (running at privilege levels 1, 2, or
3) are prevented from reading or loading the control registers.
•CR8 — Provides read and write access to the Task Priority Register (TPR). It specifies the
priority threshold value that operating systems use to control the priority class of external
interrupts allowed to interrupt the processor. This register is available only in 64-bit mode.
However, interrupt filtering continues to apply in compatibility mode.
Vol. 3A 2-17
SYSTEM ARCHITECTURE OVERVIEW
31(63)
31(63)
31(63)
31(63)
31293028
P
N
C
G
W
D
Reserved
Reserved (set to 0)
OSXMMEXCPT
Page-Directory Base
Page-Fault Linear Address
19
18
A
M
Figure 2-6. Control Registers
OSFXSR
17
16
W
P
P
P
M
C
G
C
E
E
E
543210
V
T
P
A
E
54320
P
P
D
M
S
V
S
E
E
D
I
E
P
P
C
W
T
D
0
CR4
CR3
(PDBR)
9876
10
12
11
CR2
0
CR1
15
543
6
N
E
1
0
2
E
P
M
T
E
S
T
CR0
M
E
P
When loading a control register, reserved bits should always be set to the values previously read.
The flags in control registers are:
PGPaging (bit 31 of CR0) — Enables paging when set; disables paging when clear.
When paging is disabled, all linear addresses are treated as physical addresses. The PG
flag has no effect if the PE flag (bit 0 of register CR0) is not also set; setting the PG
flag when the PE flag is clear causes a general-protection exception (#GP). See also:
Section 3.6, “Paging (Virtual Memory) Overview.”
On IA-32 processors that support Intel
®
EM64T , enabling and disabling IA-32e mode
operation also requires modifying CR0. PG.
CDC ache Disable (bit 30 of CR0) — When the CD and NW flags are clear, caching of
memory locations for the whole of physical memory in the processor’s internal (and
external) caches is enabled. When the CD flag is set, caching is restricted as described
in Table 10-5. To prevent the processor from accessing and updating its caches, the CD
flag must be set and the caches must be invalidated so that no cache hits can occur.
See also: Section 10.5.3, “Preventing Caching,” and Section 10.5, “Cache Control.”
2-18 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
NWNot Write-through (bit 29 of CR0) — When the NW and CD flags are clear, write-
back (for Pentium 4, Intel Xeon, P6 family, and Pentium processors) or write-through
(for Intel486 processors) is enabled for writes that hit the cache and invalidation cycles
are enabled. See Ta ble 10-5 for detailed information about the affect of the NW flag on
caching for other settings of the CD and NW flags.
AMAlignment Mask (bit 18 of CR0) — Enables automatic alignment checking when set;
disables alignment checking when clear. Alignment checking is performed only when
the AM flag is set, the AC flag in the EFLAGS register is set, CPL is 3, and the
processor is operating in either protected or virtual-8086 mode.
WPWrite Protect (bit 16 of CR0) — Inhibits supervisor-level procedures from writing
into user-level read-only pages when set; allows supervisor-level procedures to write
into user-level read-only pages when clear. This flag facilitates im plementation of the
copy-on-write method of creating a new process (fo rking) used by operating systems
such as UNIX*.
NENumeric Error (bit 5 of CR0) — Enables the native (internal) mechanism for
reporting x87 FPU errors when set; enables the PC-style x87 FPU error reporting
mechanism when clear. When the NE flag is clear and the IGNNE# input is asserted,
x87 FPU errors are ignored. When the NE flag is clear and the IGNNE# input is deas-
serted, an unmasked x87 FPU error causes the processor to assert the FERR# pin to
generate an external interrupt and to stop instruction execution imm ediately before
executing the next waiting floating-point instruction or WAIT/FWAIT instruction.
The FERR# pin is intended to drive an input to an external interrupt controller (the
FERR# pin emulates the ERROR# pin of the Intel 287 and Int el 387 DX math copro-
cessors). The NE flag, IGNNE# pin, and FERR# pin are used with external logic to
implement PC-style error reporting.
See also: “Software Exception Handling” in Chapter 8, “Programming with the x87
FPU,” and Appendix A, “Eflags Cross-Reference,” in the IA-32 Intel® Architecture
Software Developer’s Manual, Volume 1.
ETExtension Type (bit 4 of CR0) — Reserved in the Pentium 4, Intel Xeon, P6 family,
and Pentium processors. In the Pentium 4, Intel Xeon, and P6 family processors, this
flag is hardcoded to 1. In the Intel386 and Intel486 processors, this flag indicates
support of Intel 387 DX math coprocessor instructions when set.
TST ask Switched (bit 3 of CR0) — Allows the saving of the x87 FPU/MMX/SSE/SSE2/
SSE3 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3
instruction is actually executed by the new task. The processor sets this flag on every
task switch and tests it when executing x87 FPU/MMX/SSE/SSE2/SSE3 instructions.
•If the TS flag is set and the EM flag (bit 2 of CR0) is clear, a device-not-available
exception (#NM) is raised prior to the execution of any x87 FPU/MMX/SSE/
SSE2/SSE3 instruction; with the exception of PAUSE, PREFETCHh, SFENCE,
LFENCE, MFENCE, MOVNTI, and CLFLUSH. See the paragraph below for the
special case of the WAIT/FWAIT instructions.
Vol. 3A 2-19
SYSTEM ARCHITECTURE OVERVIEW
•If the TS flag is set and the MP flag (bit 1 of CR0) and EM flag are clear, an #NM
exception is not raised prior to the execution of an x87 FPU WAIT/FWAIT
instruction.
•If the EM flag is set, the setting of the TS flag has no affect on the execution of
x87 FPU/MMX/SSE/SSE2/SSE3 instructions.
T able 2-1 shows the actions taken when the processor encounters an x87 FPU instruction based on the settings of the TS, EM, and MP flags. Table 11-1 and 12-1 show the
actions taken when the processor encounters an MMX/SSE/SSE2/SSE3 instruction.
The processor does not automatically save the context of the x87 FPU, XMM, and
MXCSR registers on a task switch. Instead, it sets the TS flag, which causes the
processor to raise an #NM exception whenever it encounters an x87 FPU/MMX/SSE
/SSE2/SSE3 instruction in the instruction stream for the new task (with the exception
of the instructions listed above).
The fault handler for the #NM exception can then be used to clear the TS flag (with the CLTS
instruction) and save the context of the x87 FPU, XMM, and MXCSR registers. If the task never
encounters an x87 FPU/MMX/SSE/SSE2/SSE3 instruction; the x87 FPU/MMX/SSE/SSE2/
SSE3 context is never saved.
Table 2-1. Action Taken By x87 FPU Instructions for Different
EMEmulation (bit 2 of CR0) — Indicates that the processor does not have an internal or
external x87 FPU when set; indicates an x87 FPU is present when clear. This flag also
affects the execution of MMX/SSE/SSE2/SSE3 instructions.
When the EM flag is set, execution of an x87 FPU instruction generates a device-notavailable exception (#NM). This flag must be set when the processor does not have an
interna l x87 FPU or is not connected to an external math coprocessor . Setting this flag
forces all floating-point instructions to be handled by software emulation. Table 9-2
shows the recommended setting of this flag, depending on the IA-32 processor and x87
2-20 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
FPU or math coprocessor present in the system. Table 2-1 shows the interaction of the
EM, MP, and TS flags.
Also, when the EM flag is set, execution of an MMX instruction causes an invalid-
opcode exception (#UD) to be generated (see Table 11-1). Thus, if an IA-32 processor
incorporates MMX technology, the EM flag must be set to 0 to enable execution of
MMX instructions.
Similarly for SSE/SSE2/SSE3 extensions, when the EM flag is set, execution of most
SSE/SSE2/SSE3 instructions causes an invalid opcode exception (#UD) to be gener-
ated (see Table 12-1). If an IA-32 processor incorporates the SSE/SSE2/SSE3 exten-
sions, the EM flag must be set to 0 to enable execution of these extensions.
SSE/SSE2/SSE3 instructions not affected by the EM flag include: PAUSE,
PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI, and CLFLUSH.
MPMonitor Coprocessor (bit 1 of CR0). — Controls the interaction of the WAIT (or
FWAIT) instruction wit h the TS flag (bit 3 of CR0). If the MP flag is set, a WAIT
instruction generates a device-not-available exception (#NM) if the TS flag is also set.
If the MP flag is clear, the WAIT instruction ignores the setting of the TS flag. Table 9-2
shows the recommended setting of this flag, depending on the IA-32 processor and x87
FPU or math coprocessor present in the system. Table 2-1 shows the interaction of the
MP, EM, and TS flags.
PEProtection Enable (bit 0 of CR0) — Enables protected mode when set; enables real-
address mode when clear. This flag does not enable paging directly. It only enables
segment-level protection. To enable paging, both the PE and PG flags must be set.
See also: Section 9.9, “Mode Switching.”
PCD Page-level Cache Disable (bit 4 of CR3) — Controls caching of the current page
directory. When the PCD flag is set, caching of the page-directory is prevented; when
the flag is clear, the page-directory can be cached. This flag affects only the processor’ s
internal caches (both L1 and L2, when present). The processor ignores this flag if
paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag
in CR0 is set.
See also: Chapter 10, “Memory Cache Control” (for more about the use of the PCD
flag) and Section 3.7.6, “Page-Directory and Page-Table Entries” (for a description of
a companion PCD flag in page-directory and page-table entries).
PWTPage-level Writes T ransparent (bit 3 of CR3) — Controls the write-through or write-
back caching policy of the current page directory. When the PWT flag is set, write-
through caching is enabled; when the flag is clear, write-back caching is enabled. This
flag affects only internal caches (both L1 and L2, when present). The processor ignores
this flag if paging is not used (the PG flag in register CR0 is clear) or the CD (cache
disable) flag in CR0 is set.
See also: Section 10.5, “Cache Control” (for more information about the use of this
flag), and Section 3.7.6, “Page-Directory and Page-T able Entries” (for a description of
a companion PCD flag in the page-directory and page-table entries).
Vol. 3A 2-21
SYSTEM ARCHITECTURE OVERVIEW
VMEVirtual-8086 Mode Extensions (bit 0 of CR4) — Enables interrupt- and exception-
handling extensions in virtual-8086 mode when set; disables the extensions when clear.
Use of the virtual mode extensions can improve the performance of virtual-8086 applications by eliminating the overhead of calling the virtual-8086 monitor to handle interrupts and exceptions that occur while executing an 8086 program and, instead,
redirecting the interrupts and exceptions back to the 8086 program’s handlers. It also
provides hardware support for a virtual interrupt flag (VIF) to impro ve reliability of
running 8086 programs in multitasking and multiple-processor environments.
See also: Section 15.3, “Interrupt and Exception Handling in Virtual-8086 Mode.”
PVIProtected-Mode Virtual Interrupts (bit 1 of CR4) — Enables hardware support for
a virtual interrupt flag (VIF) in protected mode when set; disables the VIF flag in
protected mode when clear.
See also: Section 15.4, “Protected-Mode Virtual Interrupts.”
TSDTime Stamp Disable (bit 2 of CR4) — Restricts the execution of the RDTSC instruc-
tion to procedures running at privilege level 0 when set; allows RDTSC instruction to
be executed at any privilege level when clear.
DEDebugging Extensions (bit 3 of CR4) — References to debug registers DR4 and DR5
cause an undefined opcode (#UD) exception to be generated when set; when clear,
processor aliases references to registers DR4 and DR5 for compatibility with software
written to run on earlier IA-32 processors.
See also: Section 18.2.2, “Debug Registers DR4 and DR5.”
PSEPage Size Extensions (bit 4 o f CR4) — Enables 4-MByte pages when set; restricts
pages to 4 KBytes when clear.
See also: Section 3.6.1, “Paging Options.”
PAEPhysical Address Extension (bit 5 of CR4) — When set, enables paging mechanism
to reference greater-or-equal-than-36-bit physical addresses. When clear, restricts
physical addresses to 32 bits. PAE must be enabled to enable IA-32e mode operation.
Enabling and disabling IA-32e mode operation also requires modifying CR4.PAE.
See also: Section 3.8, “36-Bit Physical Addressing Using the PAE Paging
Mechanism.”
MCEMachine-Check Enable (bit 6 of CR4) — Enables the machine-check exception
when set; disables the machine-check exception when clear.
See also: Chapter 14, “Machine-Check Architecture.”
PGEPage Global Enable (bit 7 of CR4) — (Introduced in the P6 family processors.)
Enables the global page feature when set; disables the global page feature when clear.
The global page feature allows frequently used or shared pages to be marked as global
to all users (done with the global flag, bit 8, in a page-directory or page-table entry).
Global pages are not flushed from the translation-lookaside buffer (TLB) on a task
switch or a write to register CR3.
2-22 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
When enabling the global page feature, paging must be enabled (by setting the PG flag
in control register CR0) before the PGE flag is set. Reversing this sequence may affect
program correctness, and processor performance will be impacted.
See also: Section 3.12, “Translation Lookaside Buffers (TLBs).”
PCEPerformance-Monitoring Counter Enable (bit 8 of CR4) — Enables execution of
the RDPMC instruction for programs or procedures running at any protection level
when set; RDPMC instruction can be executed only at protection level 0 when clear.
OSFXSR
Operating System Support for FXSAVE and FXRSTOR instructions (bit 9 of
CR4) — When set, this flag: (1) indicates to software that the operating system
supports the use of the FXSAVE and FXRSTOR instructions, (2) enables the FXSAVE
and FXRSTOR instructions to save and restore the contents of the XMM and MXCSR
registers along with the contents of the x87 FPU and MMX registers, and (3) enables
the processor to execute SSE/SSE2/SSE3 instructions, with the exception of the
PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI, and CLFLUSH.
If this flag is clear, the FXSAVE and FXRSTOR instructions will save and restore the
contents of the x87 FPU and MMX instructions, but they may not save and restore the
contents of the XMM and MXCSR registers. Also, the processor will generate an
invalid opcode exception (#UD) if it attempts to execute any SSE/SSE2/SSE3 instruc-
tion, with the exception of PA USE, PREFETCHh, SFENCE, LFENCE, MFENCE,
MOVNTI, and CLFLUSH. The operating system or executive must explicitly set this
flag.
NOTE
CPUID feature flags FXSR, SSE, SSE2, and SSE3 indi cate availability
of the FXSAVE/FXRESTOR instructions, SSE extensions, SSE2
extensions, and SSE3 extensions respectively. The OSFXSR bit
provides operating system software with a means of enabling these
features and indicating that the operating system supports the features.
OSXMMEXCPT
Operating System Support for Unmasked SIMD Floating-Point Exceptions (bit 10
of CR4) — When set, indicates that the operating system supports the handling of
unmasked SIMD floating-point exceptions through an exception handler that is invoked
when a SIMD floating-point exception (#XF) is generated. SIMD floating-point excep-
tions are only generated by SSE/SSE2/SSE3 SIMD floating-point instructions.
The operating system or executive must explicitly set this flag. If this flag is not set, the
processor will generate an invalid opcode exception (#UD) whenever it detects an
unmasked SIMD floating-point exception.
TPLTask Priority Level (bit 3:0 of CR8) — This sets the threshold value corresponding
to the highest-priority interrupt to be blocked. A value of 0 means all interrupts are
enabled. This field is available in 64-bit mode. A value of 15 means al l interrupts will
be disabled.
Vol. 3A 2-23
SYSTEM ARCHITECTURE OVERVIEW
2.5.1CPUID Qualification of Control Register Flags
The VME, PVI, TSD, DE, PSE, PAE, MCE, PGE, PCE, OSFXSR, and OSXMMEXCPT flags
in control register CR4 are model specific. All of these flags (except the PCE flag) can be qualified with the CPUID instruction to determine if they are implemented on the processor before
they are used.
The CR8 register is available on processors that support Intel EM64T . Support for Intel EM64T
can determined using CPUID.
2.6SYSTEM INSTRUCTION SUMMARY
System instructions handle system-level functions such as loading system registers, managing
the cache, managing interrupts, or setting up the debug registers. Many of these instructions can
be executed only by operating-system or executive procedures (th at is, procedures running at
privilege level 0). Others can be executed at any privilege level and are thus available to application programs.
Table 2-2 lists the system instructions and indicates whether they are available and useful for
application programs. These instructions are described in Chapter 3 and Chapter 4 of the IA-32
LLDTLoad LDT RegisterNoYes
SLDTStore LDT RegisterNoNo
LGDTLoad GDT RegisterNoYes
SGDTStore GDT RegisterNoNo
LTRLoad Task RegisterNoYes
STRStore Task RegisterNoNo
LIDTLoad IDT RegisterNoYes
SIDTStore IDT RegisterNoNo
MOV CRnLoad and store control registersNoYes
SMSWStore MSWYesNo
LMSWLoad MSWNoYes
CLTSClear TS flag in CR0NoYes
ARPLAdjust RPLYes
LARLoad Access RightsYesNo
LSLLoad Segment LimitYesNo
VERRVerify for ReadingYesNo
VERWVerify for WritingYesNo
Useful to
Application?
1, 5
Protected from
Application?
No
2-24 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
Table 2-2. Summary of System Instructions (Contd.)
InstructionDescription
MOV DRnLoad and store debug registersNoYes
INVDInvalidate cache, no writebackNoYes
WBINVDInvalidate cache, with writebackNoYes
INVLPGInvalidate TLB entryNoYes
HLTHalt ProcessorNoYes
LOCK (Prefix)Bus LockYesNo
RSMReturn from system management modeNoYes
3
RDMSR
WRMSR
RDPMC
RDTSC
NOTES:
1. Useful to application programs running at a CPL of 1 or 2.
2. The TS D and PCE flags in control register CR4 control access to these instructions by application
3. These instructions were introduced into the IA-32 Architecture with the Pentium processor.
4. This instruction was introduced into the IA-32 Architecture with the Pentium Pro processor and the
5. This instruction is not supported in 64-bit mode.
The GDTR, LDTR, IDTR, and TR registers each have a load and store instruction for loading
data into and storing data from the register:
•LGDT (Load GDTR Register) — Load s the GDT base address and limit from memory
into the GDTR register.
•SGDT (Store GDTR Register) — Stores the GDT base address and limit from the GDTR
register into memory.
•LIDT (Load IDTR Register) — Loads the IDT b ase address and limit from memory into
the IDTR register.
•SIDT (Load IDTR Register — Stores the IDT base address and limit from the IDTR
register into memory.
•LLDT (Load LDT Register) — Loads the LDT segment selector and segment descriptor
from memory into the LDTR. (The segment selector operand can also be located in a
general-purpose register.)
Vol. 3A 2-25
SYSTEM ARCHITECTURE OVERVIEW
•SLDT (Store LDT Register) — Stores the LDT segment selector from the LDTR register
into memory or a general-purpose register.
•LTR (Load Task Register) — Loads segment selector and segment descriptor for a TSS
from memory into the task register. (The segment selector operand can also be located in a
general-purpose register.)
•STR (Store Task Regist er) — Stores the segment selector for the current task TSS from
the task register into memory or a general-purpose register.
The LMSW (load machine status word) and SMSW (store machine status word) instructions
operate on bits 0 through 15 of control register CR0. These instructions are provided for compatibility with the 16-bit Intel 286 processor. Programs written to run on 32-bit IA-32 processors
should not use these instructions. Instead, they should access the control register CR0 using the
MOV instruction.
The CLTS (clear TS flag in CR0) instruction is provided for use in handling a device-not-available exception (#NM) that occurs when the processor attempts to execute a floating-point
instruction when the TS flag is set. This instruction allows the TS flag to be cleared after the x87
FPU context has been saved, preventing further #NM exceptions. See Section 2.5, “Control
Registers,” for more information on the TS flag.
The control registers (CR0, CR1, CR2, CR3, CR4, and CR8) are loaded using the MOV instruction. The instruction loads a control register from a general-purpose register or stores the content
of a control register in a general-purpose register.
2.6.2Verifying of Access Privileges
The processor provides several instructions for examining segment selectors and segment
descriptors to determine if access to their associated segments is allowed. These instructions
duplicate some of the automatic access rights and type checking done by the processor, thus
allowing operating-system or executive software to prevent exceptions from being generate d.
The ARPL (adjust RPL) instruction adjusts the RPL (requestor privilege level) of a segment
selector to match that of the program or procedure that supplied the segment selector. See
Section 4.10.4, “Checking Caller Access Privileges (ARPL Instruction),” fo r a detailed explanation of the function and use of this instruction. Not e that ARPL is not supported in 64-bit
mode.
The LAR (load access rights) instruction verifies the accessibility of a specified segment and
loads access rights information from the segment’s segment descriptor into a general-purpose
register. Software can then examine the access rights to determine if the segment type is compatible with its intended use. See Section 4.10.1, “Checking Access Rights (LAR Instruction),” for
a detailed explanation of the function and use of this instruction.
The LSL (load segment limit) instruction verifies the accessibility of a specified segment and
loads the segment limit from the segment’s segment descriptor into a general-purpose register.
Software can then compare the segment limit with an offset into the segment to determine
whether the offset lies within the segment. See Section 4.10.3, “Checking That the Pointer
2-26 Vol. 3A
SYSTEM ARCHITECTURE OVERVIEW
Offset Is Within Limits (LSL Instruction),” for a detailed explanation of the function and use of
this instruction.
The VERR (verify for reading) and VERW (verify for writing) instructions verify if a selected
segment is readable or writable, respectively, at a given CPL. See Section 4.10.2, “Checking
Read/Write Rights (VERR and VERW Instructions),” for a detailed explanation of the function
and use of this instruction.
2.6.3Loading and Storing Debug Registers
Internal debugging facilities in the processor are controlled by a set of 8 debug registers
(DR0-DR7). The MOV instruction allows setup data to be loaded to and stored from these
registers.
On processors that support Intel EM64T, debug registers DR0-DR7 are 64 bits. In 32-bit modes
and compatibility mode, writes to a debug register fill the upper 32 bits with zeros. Reads return
the lower 32 bits. In 64-bit mode, the upper 32 bits of DR6-DR7 are reserved and must be written
with zeros. Writing one to any of the upper 32 bits causes an exception, #GP(0).
In 64-bit mode, MOV DRn instructions read or write all 64 bits of a debug register (operandsize prefixes are ignored). All 64 bits of DR0-DR3 are writable by software. However,
MOV DRn instructions do not check that addresses written to DR0-DR3 are in the limits of the
implementation. Address matching is supported only on valid addresses generated by the
processor implementation.
2.6.4Invalidating Caches and TLBs
The processor provides several instructions for use in explicitly invalidating its caches and TLB
entries. The INVD (invalidate cache with no writeback) instruction invalidates all data and
instruction entries in the internal caches and sends a signal to the external caches indicating that
they should be also be invalidated.
The WBINVD (invalidate cache with writeback) instruction performs the same function as the
INVD instruction, except that it writes back modified lines in its internal caches to memory
before it invalidates the caches. After invalidating the internal caches, WBINVD signals
external caches to write back modified data and invalidate their contents.
The INVLPG (invalidate TLB entry) instruction invalidates (flushes) the TLB entry for a specified page.
2.6.5Controlling the Processor
The HL T (halt processor) instruction stops the processor until an enabled interrupt (such as NMI
or SMI, which are normally enabled), a debug exception, the BINIT# signal, the INIT# signal,
or the RESET# signal is received. The processor generates a special bus cycle to indicate that
the halt mode has been entered.
Vol. 3A 2-27
SYSTEM ARCHITECTURE OVERVIEW
Hardware may respond to this signal in a number of ways. An indicator light on the front panel
may be turned on. An NMI interrupt for recording diagnostic information may be generated.
Reset initialization may be invoked (note that the BINIT# pin was introduced with the Pentium
Pro processor). If any non-wake events are pending during shutdown, they will be handled after
the wake event from shutdown is processed (for example, A20M# interrupts).
The LOCK prefix invokes a locked (atomic) read-modify-write operation when modifying a
memory operand. This mechanism is used to allow reliable communications between processors
in multiprocessor systems, as described below:
•In the Pentium processor and earlier IA-32 processors, the LOCK prefix causes the
processor to assert the LOCK# signal during the instruction. This always causes an explicit
bus lock to occur.
•In the Pentium 4, Intel Xeon, and P6 family pro cessors, the locking operation is handled
with either a cache lock or bus lock. If a memory access is cacheable and affects only a
single cache line, a cache lock is invoked and the system bus and the actual memory
location in system memory are not locked during the operation. Here, o ther Pentium 4,
Intel Xeon, or P6 family processors on the bus write-back any modified data and invalidate
their caches as necessary to maintain system memory coherency. If the memory access is
not cacheable and/or it crosses a cache line boundary, the processor’s LOCK# signal is
asserted and the processor does not respond to requests for bus control during the locked
operation.
The RSM (return from SMM) instruction restores the processor (from a context dump) to the
state it was in prior to an system management mode (SMM) interrupt.
2.6.6Reading Performance-Monitoring and Time-Stamp
Counters
The RDPMC (read performance-monitoring counter) and RDTSC (read time-stamp counter)
instructions allow application programs to read the processor ’s performance-monitoring and
time-stamp counters, respectively. Pentium 4 and Intel Xeon processors have eighteen 40-bit
performance-monitoring counters; P6 family processors have two 40-bit counters.
Use these counters to record either the occurrence or duration of events. Events that can be
monitored are model specific; they may include the number of instructions decoded, interrupts
received, or the number of cache loads. Individual counters can be set up to monitor different
events. Use the system instruction WRMSR to set up values in the one of the 45 ESCRs and one
of the 18 CCCR MSRs (for Pentium 4 and Intel Xeon processors); or in the PerfEvtSel0 or the
PerfEvtSel1 MSR (for the P6 family processors). The RDPMC instruction loads the current
count from the selected counter into the EDX:EAX registers.
The time-stamp counter is a model-specific 64-bit counter that is reset to zero each time the
processor is reset. If not reset, the counter will increment ~9.5 x 10
the processor is operating at a clock rate of 3GHz. At this clock frequency, it would take
over 190 years for the counter to wrap around. The RDTSC instruction loads the current
count of the time-stamp counter into the EDX:EAX registers.
2-28 Vol. 3A
16
times per year when
SYSTEM ARCHITECTURE OVERVIEW
See Section 18.10, “Performance Monitoring Overview,” and Section 18.9, “Time-Stamp
Counter,” for more information about the performance monitoring and time-stamp counters.
The RDTSC instruction was introduced into the IA-32 architecture with the Pentium processor.
The RDPMC instruction was introduced into the IA-32 architecture with the Pentium Pro
processor and the Pentium processor with MMX technology. Earlier Pentium processors have
two performance-monitoring counters, but they can be read only wit h the RDMSR i nstruction,
and only at privilege level 0.
2.6.6.1Reading Counters in 64-Bit Mode
In 64-bit mode, RDTSC operates the same as in protected mode. The count in the time-stamp
counter is stored in EDX:EAX (or RDX[31:0]:RAX[31:0] with RDX[63:32]:RAX[63:32]
cleared).
RDPMC requires an index to specify the offset of the performance-monitoring counter. In 64-bit
mode for Pentium 4 or Intel Xeon processor families, the index is specified in ECX[30:0]. The
current count of the performance-monitoring counter is stored in EDX:EAX (or
RDX[31:0]:RAX[31:0] with RDX[63:32]:RAX[63:32] cleared).
2.6.7Reading and Writing Model-S pecific Registers
The RDMSR (read model-specific register) and WRMSR (write model-specific register)
instructions allow a processor’s 64-bit model-specific registers (MSRs) to be read and written,
respectively. The MSR to be read or written is specified by the value in the ECX register.
RDMSR reads the value from the specified MSR to the EDX:EAX registers; WRMSR writes
the value in the EDX:EAX registers to the specified MSR. RDMSR and WRMSR were introduced into the IA-32 architecture with the Pentium processor.
See Section 9.4, “Model-Specific Registers (MSRs),” for more information.
2.6.7.1Reading and Writing Model-Specific Registers in 64-Bit Mode
RDMSR and WRMSR require an index to specify the address of an MSR. In 64-bit mode, the
index is 32 bits; it is specified using ECX.
Vol. 3A 2-29
SYSTEM ARCHITECTURE OVERVIEW
2-30 Vol. 3A
Protected-Mode
Memory
Management
3
CHAPTER 3
PROTECTED-MODE MEMORY MANAGEMENT
This chapter describes the IA-32 architecture’s protected-mode memory management facilities,
including the physical memory requirements, segmentation mechanism, and paging mechanism.
See also: Chapter 4, “Protection” (for a description of the processor’s protection mechanism)
and Chapter 15, “8086 Emulation” (for a description of memory ad dressing protection in realaddress and virtual-8086 modes).
3.1MEMORY MANAGEMENT OVERVIEW
The memory management facilities of the IA-32 architecture are divided into two parts: segmentation and paging. Segmentation provides a mechanism of isolat ing individual code, data, and
stack modules so that multiple programs (or tasks) can run on the same processor without interfering with one another. Paging provides a mechanism for implementing a conventional
demand-paged, virtual-memory system where sections of a program’s execution environment
are mapped into physical memory as needed. Paging can also be used to provide isolation
between multiple tasks. When operating in protected mode, some form of segmentation must be
used. There is no mode bit to disable segmentation. The use of paging, however, is optional.
These two mechanisms (segmentation and paging) can be configured to supp ort simp le sin gleprogram (or single-task) systems, multitasking systems, or multiple-processor systems that used
shared memory.
As shown in Figure 3-1, segmentation provides a mechanism for dividing the processor’s
addressable memory space (called the linear address space) into smaller protected address
spaces called segments. Segments can be used to hold the code, data, and stack for a program
or to hold system data structures (such as a TSS or LDT). If more than one program (or task) is
running on a processor, each program can be assigned its own set of segments. The processor
then enforces the boundaries between these segments and insures that one program does not
interfere with the execution of another program by writing into the other program’s segments.
The segmentation mechanism also allows typing of segments so that the operations that may be
performed on a particular type of segment can be restricted.
All the segments in a system are contained in the processor’s linear address space. To locate a
byte in a particular segment, a logical address (also called a far pointer) must be provided. A
logical address consists of a segment selector and an offset. The segment selector is a unique
identifier for a segment. Among other things it provides an offset into a descriptor table (such
as the global descriptor table, GDT) to a data structure called a segment descriptor. Each
segment has a segment descriptor, which specifies the size of the segment, the access rights and
privilege level for the segment, the segment type, and the location of the first byte of the segment
in the linear address space (called the base address of the segment). The offset part of the logical
address is added to the base address for the segment to locate a byte within the segment. The
base address plus the offset thus forms a linear address in the processor’ s linear address space.
Vol. 3A 3-1
PROTECTED-MODE MEMORY MANAGEMENT
Logical Address
(or Far Pointer)
Segment
Selector
Offset
Linear Address
Space
Global Descriptor
Table (GDT)
Segment
Descriptor
Segment
Base Address
Segment
Page Directory
Lin. Addr.
Page
Segmentation
Dir
Entry
Linear Address
TableOffset
Page Table
Entry
Paging
Physical
Address
Space
Page
Phy. Addr.
Figure 3-1. Segmentation and Paging
If paging is not used, the linear address space of the processor is mapped directly into the physical address space of processor. The physical address space is defined as the range of addresses
that the processor can generate on its address bus.
Because multitasking computing systems commonly define a linear address space much larger
than it is economically feasible to contain all at once in physical memory, some method of
“virtualizing” the linear address space is needed. This virtualization of the linear address space
is handled through the processor’s paging mechanism.
Paging supports a “virtual memory” environment where a large linear address space is simulated
with a small amount of physical memory (RAM and ROM) and some disk storage. When using
paging, each segment is divided into pages (typically 4 KBytes each in size), which are stored
either in physical memory or on the disk. The operating system or executive maintains a page
directory and a set of page tables to keep track of the pages. When a program (or task) attempts
to access an address location in the linear address space, the processor uses the page directory
and page tables to translate the linear address into a physical address and then performs the
requested operation (read or write) on the memory location.
3-2 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT
If the page being accessed is not currently in physical memory, the processor interrupts execution of the program (by generating a page-fault exception). The operating system or executive
then reads the page into physical memory from the disk and continues executing the program.
When paging is implemented properly in the operating-system or executive, the swapping of
pages between physical memory and the disk is transparent to the correct execution of a
program. Even programs written for 16-bit IA-32 processors can be paged (transparently) when
they are run in virtual-8086 mode.
3.2USING SEGMENTS
The segmentation mechanism supported by the IA-32 architecture can be used to implement a
wide variety of system designs. These designs range from flat models that make only minimal
use of segmentation to protect programs to multi-segmented models that employ segmentat ion
to create a robust operating environment in which multiple programs and tasks can be executed
reliably.
The following sections give several examples of how segmentation can be employed in a system
to improve memory management performance and reliability.
3.2.1Basic Flat Model
The simplest memory model for a system is the basic “flat model,” in which the operating
system and application programs have access to a continuous, unsegmented address space. To
the greatest extent possible, this basic flat model hides the segmentation mechanism of the architecture from both the system designer and the application programmer.
To implement a basic flat memory model with the IA-32 architecture, at least two segment
descriptors must be created, one for referencing a code segment and one for referencing a dat a
segment (see Figure 3-2). Both of these segments, however, are mapped to the entire linear
address space: that is, both segment descriptors have the same base address value of 0 and the
same segment limit of 4 GBytes. By setting the segment limit to 4 GBytes, the segmentation
mechanism is kept from generating exceptions for out of limit memory references, even if no
physical memory resides at a particular address. ROM (EPROM) is generally located at the top
of the physical address space, because the processor begins execution at FFFF_FFF0H. RAM
(DRAM) is placed at the bottom of the address space because the initial base address for the DS
data segment after reset initialization is 0.
3.2.2Protected Flat Model
The protected flat model is similar to the basic flat model, except the segment limits are set to
include only the range of addresses for which physical memory actually exists (see Figure 3-3).
A general-protection exception (#GP) is then generated on any attempt to access nonexistent
memory. This model provides a minimum level of hardware protection against some kinds of
program bugs.
Vol. 3A 3-3
PROTECTED-MODE MEMORY MANAGEMENT
Segment
Registers
CS
SS
DS
ES
FS
GS
Code- and Data-Segment
Descriptors
LimitAccess
Base Address
Figure 3-2. Flat Model
Linear Address Space
(or Physical Memory)
Code
Not Present
Data and
Stack
FFFFFFFFH
0
Segment
Registers
CS
ES
SS
DS
FS
GS
Segment
Descriptors
LimitAccess
Base Address
LimitAccess
Base Address
Linear Address Space
(or Physical Memory)
Code
Not Present
Memory I/O
Data and
Stack
FFFFFFFFH
0
Figure 3-3. Protected Flat Model
More complexity can be added to this protected flat model to provide more protection. For
example, for the paging mechanism to provide isolation between user and supervisor code and
data, four segments need to be defined: code and data segments at privilege level 3 for the user,
and code and data segments at privilege level 0 for the supervisor. Usually these segments all
overlay each other and start at address 0 in the linear address space. This flat segmentation
model along with a simple paging structure can protect the operating sys tem from applications,
and by adding a separate paging structure for each task or process, it can also protect applications from each other. Similar designs are used by several popular multitasking operating
systems.
3-4 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT
3.2.3Multi-Segment Model
A multi-segment model (such as the one shown in Figure 3-4) uses t he full capabilities of the
segmentation mechanism to provided hardware enforced protection of code, data structures, and
programs and tasks. Here, each program (or task) is given its own table of segment descriptors
and its own segments. The segments can be completely private to their assigned programs or
shared among programs. Access to all segments and to the execution environments of individual
programs running on the system is controlled by hardware.
Segment
Registers
CS
SS
DS
ES
FS
GS
Segment
Descriptors
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
LimitAccess
Base Address
Linear Address Space
(or Physical Memory)
Stack
Code
Data
Data
Data
Data
Figure 3-4. Multi-Segment Model
Access checks can be used to protect not only against referencing an address outside the limit
of a segment, but also against performing disallowed operations in certain segments. For
example, since code segments are designated as read-only segments, hardware can be used to
prevent writes into code segments. The access rights information created for segments can also
be used to set up protection rings or levels. Protection levels can be used to protect operatingsystem procedures from unauthorized access by application programs.
Vol. 3A 3-5
PROTECTED-MODE MEMORY MANAGEMENT
3.2.4Segmentation in IA-32e Mode
In IA-32e mode, the effects of segmentation depend on whether the processor is running in
compatibility mode or 64-bit mode. In compatibility mode, segmentatio n functions just as it
does using legacy 16-bit or 32-bit protected mode semantics.
In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit
linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating
a linear address that is equal to the effective address. The FS and GS segments are exceptions.
These segment registers (which hold the segment base) can be used as an additional base registers in linear address calculations. They facilitate addressing local data and certain operating
system data structures.
Note that the processor does not perform segment limit checks at runtime in 64-bit mode.
3.2.5Paging and Segmentation
Paging can be used with any of the segmentation models described in Figures 3-2, 3-3, and 3-4.
The processor’s paging mechanism divides the linear address space (into which segments are
mapped) into pages (as shown in Figure 3-1). These linear-address-space pages are then mapped
to pages in the physical address space. The paging mechanism offers several page-level protection facilities that can be used with or instead of the segment-protection facilities. For example,
it lets read-write protection be enforced on a page-by-page basis. The paging mechanism also
provides two-level user-supervisor protection that can also be specified on a page-by-page basis.
3.3PHYSICAL ADDRESS SPACE
In protected mode, the IA-32 architecture provides a normal physical address space of 4 GBytes
32
(2
bytes). This is the address space that the processor can address on its address bus. This
address space is flat (unsegmented), with addresses ranging continuously from 0 to
FFFFFFFFH. This physical address space can be mapped to read-write memory, read-only
memory, and memory mapped I/O. The memory mapping facilities described in this chapter can
be used to divide this physical memory up into segments and/or pages.
Starting with the Pentium Pro processor, the IA-32 architecture also supports an extension of the
physical address space to 2
36
bytes (64 GBytes); with a maximum physical address of
FFFFFFFFFH. This extension is invoked in either of two ways:
•Using the physical address extension (PAE) flag, located in bit 5 of control register CR4.
•Using the 36-bit page size extension (PSE-36) feature (introduced in the Pentium III
processors).
See Section 3.8, “36-Bit Physical Addressing Using the P AE Paging Mechanism” and Section 3.9,
“36-Bit Physical Addressing Using the PSE-36 Paging Mechanism” for more information about
36-bit physical addressing.
3-6 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT
3.3.1Physical Address Space for Processors with Intel® EM64T
On processors that support Intel EM64T (CPUID.80000001.EDX[29] = 1), the size of physical
address range is implementation-specific and indicat ed by CPUID.80000001H. The physical
address size supported by a given implementation is available to IA-32e mode and enhanced
legacy PAE paging.
See also: Section 3.8.1, “Enhanced Legacy PAE Paging”.
3.4LOGICAL AND LINEAR ADDRESSES
At the system-architecture level in protected mode, the processor uses two stages of address
translation to arrive at a physical address: logical-address translation and linear address space
paging.
Even with the minimum use of segments, every byte in the processor’s address space is accessed
with a logical address. A logical address consists of a 16-bit segment selector and a 32-bit offset
(see Figure 3-5). The segment selector identifies the segment the byte is located in and the offset
specifies the location of the byte in the segment relative to the base address of the segment.
The processor translates every logical address into a linear address. A linear address is a 32-bit
address in the processor’s linear address space. Like the physical address space, the linear
address space is a flat (unsegmented), 2
FFFFFFFFH. The linear address space contains all the segments and system tables defined for
a system.
To translate a logical address into a linear address, the processor does the following:
32
-byte address space, with addresses ranging from 0 to
1. Uses the offset in the segment selector to locate the segment descriptor for the segment in
the GDT or LDT and reads it into the processor. (This step is needed only when a new
segment selector is loaded into a segment register.)
2. Examines the segment descriptor to check the access rights and range of the segment to
insure that the segment is accessible and that the offset is within the limits of the segment.
3. Adds the base address of the segment from the segment descriptor to the offset to form a
linear address.
Vol. 3A 3-7
PROTECTED-MODE MEMORY MANAGEMENT
Logical
Address
Figure 3-5. Logical Address to Linear Address Translation
Seg. Selector
015
Descriptor Table
Segment
Descriptor
31(63)
Offset (Effective Address)
Base Address
Linear Address
+
0
031(63)
If paging is not used, the processor maps the linear address directly to a physical address (that
is, the linear address goes out on the processor’s address bus). If the linear address space is
paged, a second level of address translation is used to translate the linear address into a physical
address.
See also: Section 3.6, “Paging (Virtual Memory) Overview”.
3.4.1Logical Address Translation in IA-32e Mode
In IA-32e mode, the processor uses the steps described above to translate a logical address to a
linear address. In 64-bit mode, the offset and base address of the segment are 64-bits instead of
32 bits. The linear address format is also 64 bits wide and is subject to the canonical form
requirement.
Each code segment descriptor provides an L bit. This bit allows a code segment to execute 64-bit
code or legacy 32-bit code by code segment.
3.4.2Segment Selectors
A segment selector is a 16-bit identifier for a segment (see Figure 3-6). It does not point directly
to the segment, but instead points to the segment descriptor that defines the segment. A segment
selector contains the following items:
Index(Bits 3 through 15) — Selects one of 8192 descriptors in the GDT or LDT. The
processor multiplies the index value by 8 (the number of bytes in a segment
descriptor) and adds the result to the base address of the GDT or LDT (from
the GDTR or LDTR register, respectively).
3-8 Vol. 3A
TI (table indicator) flag
(Bit 2) — Specifies the descriptor table to use: clearing this flag selects the
GDT; setting this flag selects the current LDT.
PROTECTED-MODE MEMORY MANAGEMENT
15
Index
Table Indicator
0 = GDT
1 = LDT
Requested Privilege Level (RPL)
Figure 3-6. Segment Selector
1
3
0
2
T
RPL
I
Requested Privilege Level (RPL)
(Bits 0 and 1) — Specifies the privilege level of the selector. The privilege level
can range from 0 to 3, with 0 being the most privileged level. See Section 4.5,
“Privilege Levels”, for a description of the relationship of the RPL to the CPL
of the executing program (or task) and the descripto r privileg e level (DPL) of
the descriptor the segment selector points to.
The first entry of the GDT is not used by the processor. A segment selector that points to this
entry of the GDT (that is, a segment selector with an index of 0 and the TI flag set to 0) is used
as a “null segment selector.” The processor does not generate an exception when a segment
register (other than the CS or SS registers) is loaded with a null selector. It does, however,
generate an exception when a segment register holding a null selector is used to access memory.
A null selector can be used to initialize unused segment registers. Loading the CS or SS register
with a null segment selector causes a general-protection exception (#GP) to be generated.
Segment selectors are visible to application programs as part of a pointer variable, but the values
of selectors are usually assigned or modified by link editors or linking loaders, not ap plication
programs.
3.4.3Segment Registers
To reduce address translation time and coding complexity, the processor provides registers for
holding up to 6 segment selectors (see Figure 3-7). Each of these segment registers support a
specific kind of memory reference (code, stack, or data). For virtually any kind of program
execution to take place, at least the code-segment (CS), data-segment (DS), and stack-segment
(SS) registers must be loaded with valid segment selectors. The processor also provides three
additional data-segment registers (ES, FS, and GS), which can be used to make additional data
segments available to the currently executing program (or task).
For a program to access a segment, the segment selector for the segment must have been loaded
in one of the segment registers. So, although a system can define thousands of segments, only 6
Vol. 3A 3-9
PROTECTED-MODE MEMORY MANAGEMENT
can be available for immediate use. Other segments can be made available by loading their
segment selectors into these registers during program execution.
Visible PartHidden Part
Segment SelectorBase Address, Limit, Access Information
Figure 3-7. Segment Registers
CS
SS
DS
ES
FS
GS
Every segment register has a “visible” part and a “hidden” part. (The hidden part is sometimes
referred to as a “descriptor cache” or a “shadow register.”) When a segment selector is loaded
into the visible part of a segment register, the processor also loads the hidden part of the segment
register with the base address, segment limit, and access control information from the segment
descriptor pointed to by the segment selector. The information cached in the segment register
(visible and hidden) allows the processor to translate addresses without taking extra bus cycles
to read the base address and limit from the segment descriptor. In systems in which multiple
processors have access to the same descriptor tables, it is the responsibility of software to reload
the segment registers when the descriptor tables are modified. If this is not done, an old segment
descriptor cached in a segment register might be used after its memory-resident version has been
modified.
T wo kinds of load instructions are provided for loading the segment registers:
1. Direct load instructions such as the MOV, POP, LDS, LES, LSS, LGS, and LFS instructions. These instructions explicitly reference the segment registers.
2. Im plied load instructions such as the far pointer versions of the CALL, JMP, and RET
instructions, the SYSENTER and SYSEXIT instructions, and the IRET, INTn, INTO and
INT3 instructions. These instructions change the contents of the CS register (and
sometimes other segment registers) as an incidental part of their operation.
The MOV instruction can also be used to store visible part of a segment register in a generalpurpose register.
3-10 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT
3.4.4Segment Loading Instructions in IA-32e Mode
Because ES, DS, and SS segment registers are not used in 64-bit mode, their fields (base, limit,
and attribute) in segment descriptor registers are ignored. Some forms of segment load instructions are also invalid (for example, LDS, POP ES). Address calculations that reference the ES,
DS, or SS segments are treated as if the segment base is zero.
The processor checks that all linear-address references are in canonical form instead of
performing limit checks. Mode switching does not change the contents of the segment registers
or the associated descriptor registers. These registers are also not changed during 64-bit mode
execution, unless explicit segment loads are performed.
In order to set up compatibility mode for an application, segment-load instructions (MOV to
Sreg, POP Sreg) work normally in 64-bit mode. An entry is read from the system descriptor
table (GDT or LDT) and is loaded in the hidden portion of the segment descriptor register. The
descriptor-register base, limit, and attribute fields are all loaded. However, the contents of the
data and stack segment selector and the descriptor registers are ignored.
When FS and GS segment overrides are used in 64-bit mode, their respective base addresses are
used in the linear address calculation: (FS or GS).base + index + displacement. FS.base and
GS.base are then expanded to the full linear-address size supported by the implementation. The
resulting effective address calculation can wrap across positive and negative addresses; the
resulting linear address must be canonical.
In 64-bit mode, memory accesses using FS-segment and GS-segment overrides are not checked
for a runtime limit nor subjected to attribute-checking. Normal segment loads (MOV to Sreg and
POP Sreg) into FS and GS load a standard 32-bit base value in the hidden portion of the segment
descriptor register. The base address bits above the standard 32 bits are cleared to 0 to allow
consistency for implementations that use less than 64 bits.
The hidden descriptor register fields for FS.base and GS.base are physically mapped to MSRs
in order to load all address bits supported by a 64-bit implementation. Software with CPL = 0
(privileged software) can load all supported linear-address bits into FS.base or GS.base using
WRMSR. Addresses written into the 64-bit FS.base and GS.base registers must be in canonical
form. A WRMSR instruction that attempts to wr ite a non-canonical address to those registers
causes a #GP fault.
When in compatibility mode, FS and GS overrides operate as defined by 32-bit mode behavior
regardless of the value loaded into the upper 32 linear-address bits of the hidden descriptor
register base field. Compatibility mode ignores the upper 32 bits when calculating an effective
address.
A new 64-bit mode instruction, SWAPGS, can be used to load GS base. SWAPGS exchanges
the kernel data structure pointer from the IA32_KernelGSbase MSR with the GS base regi ster.
The kernel can then use the GS prefix on normal memory references to access the kernel data
structures. An attempt to write a non-canonical value (using WRMSR) to the
IA32_KernelGSBase MSR causes a #GP fault.
Vol. 3A 3-11
PROTECTED-MODE MEMORY MANAGEMENT
3.4.5Segment Descriptors
A segment descriptor is a data structure in a GDT or LDT that provides the processor with the
size and location of a segment, as well as access control and status information. Segment
descriptors are typically created by compilers, linkers, loaders, or the operating system or executive, but not application programs. Figure 3-8 illustrates the general descriptor fo rmat for all
types of segment descriptors.
31
Base 31:24
31
Base Address 15:00
242322
G
21 20 1916
D
A
L
/
V
B
19:16
L
Seg.
Limit
151314 12
P
16
15
11
D
TypeS
P
L
Segment Limit 15:00
L— 64-bit code segment (IA-32e mode only)
AVL —Available for use by system software
S— Descriptor type (0 = system; 1 = code or data)
TYPE — Segment type
Figure 3-8. Segment Descriptor
The flags and fields in a segment descriptor are as follows:
Segment limit field
Specifies the size of the segment. The processor puts together the two segment
limit fields to form a 20-bit value. The processor interprets the segment limit
in one of two ways, depending on the setting of the G (granularity) flag:
8
7
Base 23:16
0
4
0
0
3-12 Vol. 3A
•If the granularity flag is clear, the segment size can range from 1 byte to
1 MByte, in byte increments.
•If the granularity flag is set, the segment size can range from 4 KBytes to
4 GBytes, in 4-KByte increments.
The processor uses the segment limit in two different ways, depending on
whether the segment is an expand-up or an expand-down segment. See Section
3.4.5.1, “Code- and Data-Segment Descriptor Types”, for more information
about segment types. For expand-up segments, the offset in a logical address
can range from 0 to the segment limit. Offsets greater than the segment limit
generate general-protection exceptions (#GP). For expand-down segments, the
PROTECTED-MODE MEMORY MANAGEMENT
segment limit has the reverse function; the offset can range from the segment
limit to FFFFFFFFH or FFFFH, depending on the setting of the B flag. Offsets
less than the segment limit generate general-protection exceptions. Decreasing
the value in the segment limit field for an expand-down segment allocates new
memory at the bottom of the segment's address space, rather than at the top.
IA-32 architecture stacks always grow downwards, making this mechanism
convenient for expandable stacks.
Base address fields
Defines the location of byte 0 of the segment within the 4-GByte linear address
space. The processor puts together the three base address fields to form a single
32-bit value. Segment base addresses should be aligned to 16-byte boundaries.
Although 16-byte alignment is not required, this alignment allows programs to
maximize performance by aligning code and data on 16-byte boundaries.
Type fieldIndicates the segment or gate type and specifies the kinds of access that can be
made to the segment and the direction of growth. The interpretation of this field
depends on whether the descriptor type flag specifies an application (code or
data) descriptor or a system descriptor. The encoding of the type field is
different for code, data, and system descriptors (see Figure 4-1). See Section
3.4.5.1, “Code- and Data-Segment Descriptor Types”, for a description of how
this field is used to specify code and data-segment types.
S (descriptor type) flag
Specifies whether the segment descriptor is for a system segment (S flag is
clear) or a code or data segment (S flag is set).
DPL (descriptor privilege level) field
Specifies the privilege level of the segment. The privilege level can range from
0 to 3, with 0 being the most privileged level. The DPL is used to control access
to the segment. See Section 4.5, “Privilege Levels”, for a description of the
relationship of the DPL to the CPL of the executing code segment and the RPL
of a segment selector.
P (segment-present) flag
Indicates whether the segment is present in memory (set) or not present (clear).
If this flag is clear, the processor generates a segment-not-present exception
(#NP) when a segment selector that points to the segment descriptor is loaded
into a segment register. Memory management software can use this flag to
control which segments are actually loaded into physical memory at a given
time. It offers a control in addition to paging for managing virtual memory.
Figure 3-9 shows the format of a segment descriptor when the segment-present
flag is clear. When this flag is clear, the operating system or executive is free
to use the locations marked “Available” to store its own data, such as information regarding the whereabouts of the missing segment.
Performs different functions depending on whether the segment descriptor is
an executable code segment, an expand-down data segment, or a stack
Vol. 3A 3-13
PROTECTED-MODE MEMORY MANAGEMENT
segment. (This flag should always be set to 1 for 32-bit code and data segments
and to 0 for 16-bit code and data segments.)
•Executable code segment. The flag is called the D flag and it indicates the
default length for effective addresses and operands referenced by instructions in the segment. If the flag is set, 32-bit addresses and 32-bit or 8-bit
operands are assumed; if it is clear, 16-bit addresses and 16-bit or 8-bit
operands are assumed.
The instruction prefix 66H can be used to select an operand size other than
the default, and the prefix 67H can be used select an address size other than
the default.
•Stack segment (data segment pointed to by the SS register). The flag is
called the B (big) flag and it specifies the size of the stack pointer used for
implicit stack operations (such as pushes, pops, and calls). If the flag is set,
a 32-bit stack pointer is used, which is stored in the 32-bit ESP register; if
the flag is clear, a 16-bit stack pointer is used, which is stored in the 16-bit
SP register. If the stack segment is set up to be an expand-down data
segment (described in the next paragraph), the B flag also specifies the
upper bound of the stack segment.
•Expand-down data segment. The flag is called the B flag and it specifies
the upper bound of the segment. If the flag is set, the upper bound is
FFFFFFFFH (4 GBytes); if the flag is clear, the upper bound is FFFFH
(64 KBytes).
31
31
Figure 3-9. Segment Descriptor When Segment-Present Flag Is Clear
G (granularity) flag
Determines the scaling of the segment limit field. When the granularity flag is
clear, the segment limit is interpreted in byte units; when flag is set, the
segment limit is interpreted in 4-KByte units. (This flag does not affect the
granularity of the base address; it is always byte granular.) When the granularity flag is set, the twelve least significant bits of an offset are not tested when
checking the offset against the segment limit. For example, when the granularity flag is set, a limit of 0 results in valid offsets from 0 to 4095.
3-14 Vol. 3A
Available
16
15131412
0
Available
11
D
P
L
7
8
TypeS
Available
0
4
0
0
PROTECTED-MODE MEMORY MANAGEMENT
L (64-bit code segment) flag
In IA-32e mode, bit 21 of the second doubleword of the segment descriptor
indicates whether a code segment contains native 64-bit code. A value of 1
indicates instructions in this code segment are executed in 64-bit mode. A
value of 0 indicates the instructions in this code segment are executed in
compatibility mode. If L-bit is set, then D-bit must be cleared. When not in
IA-32e mode or for non-code segments, bit 21 is reserved and should always
be set to 0.
Available and reserved bits
Bit 20 of the second doubleword of the segment descriptor is available for use
by system software.
3.4.5.1Code- and Data-Segment Descriptor Types
When the S (descriptor type) flag in a segment descriptor is set, the descriptor is for either a code
or a data segment. The highest order bit of the type field (bit 11 of the second double word of
the segment descriptor) then determines whether the descriptor is for a data segment (clear) or
a code segment (set).
For data segments, the three low-order bits of the type field (bits 8, 9, and 10) are interpreted as
accessed (A), write-enable (W), and expansion-direction (E). See Table 3-1 for a description of
the encoding of the bits in the type field for code and data segments. Data segments can be readonly or read/write segments, depending on the setting of the write-enable bit.
Stack segments are data segments which must be read/write segments. Loading the SS register
with a segment selector for a nonwritable data segment generates a general-protection exception
(#GP). If the size of a stack segment needs to be changed dynamically , the stack segment can be
an expand-down data segment (expansion-direction flag set). Here, dynamically changing the
segment limit causes stack space to be added to the bottom of the stack. If the size of a stack
segment is intended to remain static, the stack segment may be either an expand-up or expanddown type.
The accessed bit indicates whether the segment has been accessed since the last time the operating-system or executive cleared the bit. The processor sets this bit whenever it loads a segment
selector for the segment into a segment register, assuming that the type of memory that contains
the segment descriptor supports processor writes. The bit remains set u ntil explicitly cleared.
This bit can be used both for virtual memory management and for debugging.
For code segments, the three low-order bits of the type field are interpreted as accessed (A), read
enable (R), and conforming (C). Code segments can be execute-only or execute/read, depending
on the setting of the read-enable bit. An execute/read segment might be used when constants or
other static data have been placed with instruction code in a ROM. Here, data can be read from
the code segment either by using an instruction with a CS override prefix or by loading a
segment selector for the code segment in a data-segment register (the DS, ES, FS, or GS registers). In protected mode, code segments are not writable.
Code segments can be either conforming or nonconforming. A transfer of execution into a moreprivileged conforming segment allows execution to continue at the current privilege level. A
transfer into a nonconforming segment at a different privilege level results in a general-protection exception (#GP), unless a call gate or task gate is used (see Section 4.8.1, “Direct Calls or
Jumps to Code Segments”, for more information on conforming and nonconforming code
segments). System utilities that do not access protected facilities and handlers for some types of
exceptions (such as, divide error or overflow) may be loaded in conforming code segments. Utilities that need to be protected from less privileged programs and procedures should be placed in
nonconforming code segments.
NOTE
Execution cannot be transferred by a call or a jump to a less-privileged
(numerically higher privilege level) code segment, regardless of whether the
target segment is a conforming or nonconforming code segment. Attemptin g
such an execution transfer will result in a general-protection exception.
All data segments are nonconforming, meaning that they cannot be accessed by less privileged
programs or procedures (code executing at numerically high privilege levels). Unlike code
segments, however, data segments can be accessed by more privileged programs or procedures
(code executing at numerically lower privilege levels) without using a special access gate.
If the segment descriptors in the GDT or an LDT are placed in ROM, the processor can enter an
indefinite loop if software or the processor attempts to update (write to) the ROM-based
segment descriptors. To prevent this problem, set the accessed bits for all segment descriptors
placed in a ROM. Also, remove operating-system or executive code that attempts to modify
segment descriptors located in ROM.
3-16 Vol. 3A
PROTECTED-MODE MEMORY MANAGEMENT
3.5SYSTEM DESCRIPTOR TYPES
When the S (descriptor type) flag in a segment descriptor is clear, the descriptor type is a system
descriptor. The processor recognizes the following types of system descriptors:
•Local descriptor-table (LDT) segment descriptor.
•Task-state segment (TSS) descriptor.
•Call-gate descriptor.
•Interrupt-gate descriptor.
•Trap-gate descriptor.
•Task-gate descriptor.
These descriptor types fall into two categories: system-segment descriptors and gate descriptors.
System-segment descriptors point to system segments (LDT and TSS segments). Gate descriptors are in themselves “gates,” which hold pointers to procedure entry points in code segments
(call, interrupt, and trap gates) or which hold segment selectors for TSS’s (task gates).
T able 3-2 shows the encoding of the type field for system-segment descriptors and gate descriptors. Note that system descriptors in IA-32e mode are 16 bytes instead of 8 bytes.
Table 3-2. System-Segment and Gate-Descriptor Types
See also: Section 3.5.1, “Segment Descriptor Tables”, and Section 6.2.2, “TSS Descriptor”
(for more information on the system-segment descriptors); see Section 4.8.3, “Call Gates”,
Section 5.11, “IDT Descriptors”, and Section 6.2.5, “Task-Gate Descriptor” (for more information on the gate descriptors).
3.5.1Segment Descriptor Tables
A segment descriptor table is an array of segment descriptors (see Figure 3-10). A descriptor
table is variable in length and can contain up to 8192 (2
13
) 8-byte descriptors. There are two
kinds of descriptor tables:
•The global descriptor table (GDT)
•The local descriptor tables (LDT)
T
I
Segment
Selector
Global
Descriptor
Table (GDT)
TI = 0
56
48
40
32
24
16
Local
Descriptor
Table (LDT)
TI = 1
56
48
40
32
24
16
3-18 Vol. 3A
8
First Descriptor in
GDT is Not Used
GDTR RegisterLDTR Register
Limit
Base Address
0
Base Address
Seg. Sel.
Figure 3-10. Global and Local Descriptor Tables
8
0
Limit
PROTECTED-MODE MEMORY MANAGEMENT
Each system must have one GDT defined, which may be used for all programs and tasks in the
system. Optionally, one or more LDTs can be defined. For example, an LDT can be defined for
each separate task being run, or some or all tasks can share the same LDT.
The GDT is not a segment itself; instead, it is a data structure in linear address space. The base
linear address and limit of the GDT must be loaded into the GDTR register (see Section 2.4,
“Memory-Management Registers”). The base addresses of the GDT should be aligned on an
eight-byte boundary to yield the best processor performance. The limit value for the GDT is
expressed in bytes. As with segments, the limit value is added to the base address to get the
address of the last valid byte. A limit value of 0 results in exactly one valid byte. Because
segment descriptors are always 8 bytes long, the GDT limit should always be one less than an
integral multiple of eight (that is, 8N – 1).
The first descriptor in the GDT is not used by the processor. A segment selector to this “null
descriptor” does not generate an exception when loaded into a data-segment register (DS, ES,
FS, or GS), but it always generates a general-protection exception (#GP) when an attempt is
made to access memory using the descriptor. By initializing the segment registers with this
segment selector, accidental reference to unused segment registers can be guaranteed to generate
an exception.
The LDT is located in a system segment of the LDT type. The GDT must contain a segment
descriptor for the LDT segment. If the system supports multiple LDTs, each must have a separate segment selector and segment descriptor in the GDT. The segment descriptor for an LDT
can be located anywhere in the GDT. See Section 3.5, “System Descriptor Types”, information
on the LDT segment-descriptor type.
An LDT is accessed with its segment selector . To eliminate address translations when accessing
the LDT , the segment selector , base linear address, limit, and access rights of the LDT are stored
in the LDTR register (see Section 2.4, “Memory-Management Registers”).
When the GDTR register is stored (using the SGDT instruction), a 48-bit “pseudo-descriptor”
is stored in memory (see top diagram in Figure 3-11). To avo id alignment check faults in user
mode (privilege level 3), the pseudo-descriptor should be located at an odd word address (that
is, address MOD 4 is equal to 2). This causes the processor to store an aligned word, followed
by an aligned doubleword. User-mode programs normally do not store pseudo-descriptors, but
the possibility of generating an alignment check fault can be avoided by aligning pseudodescriptors in this way. The same alignment should be used when storing the IDTR register
using the SIDT instruction. When storing the LDTR or task register (using th e SLTR or STR
instruction, respectively), the pseudo-descriptor should be located at a doubleword address (that
is, address MOD 4 is equal to 0).
471516
32-bit Base Address
791516
64-bit Base Address
Figure 3-11. Pseudo-Descriptor Formats
Limit
Limit
0
0
Vol. 3A 3-19
PROTECTED-MODE MEMORY MANAGEMENT
3.5.2Segment Descriptor Tables in IA-32e Mode
In IA-32e mode, a segment descriptor table can contain up to 8192 (213) 8-byte descriptors. An
entry in the segment descriptor table can be 8 bytes. System descriptors are expanded to 16 bytes
(occupying the space of two entries).
GDTR and LDTR registers are expanded to hold 64-bit base address. The corresponding
pseudo-descriptor is 80 bits. (see the bottom diagram in Figure 3-11).
The following system descriptors expand to 16 bytes:
— Call gate descriptors (see Section 4.8.3.1, “IA-32e Mode Call Gates”)
— IDT gate descriptors (see Section 5.14.1, “64-Bit Mode IDT”)
— LDT and TSS descriptors (see Section 6.2.3, “TSS Descriptor in 64-bit mode”).
3.6PAGING (VIRTUAL MEMORY) OVERVIEW
When operating in protected mode, IA-32 architecture permits linear address space to be
mapped directly into a large physical memory (for example, 4 GBytes of RAM) or indirectly
(using paging) into a smaller physical memory and disk storage. This latter method of mapping
the linear address space is referred to as virtual memory or demand-paged virtual memory.
When paging is used, the processor divides the linear address space into fixed-size pages (of
4 KBytes, 2 MBytes, or 4 MBytes in length) that can be mapped into physical memory and/or
disk storage. When a program (or task) references a logical address in memory, the processor
translates the address into a linear address and then uses its paging mechanism to translate the
linear address into a corresponding physical address.
If the page containing the linear address is not currently in physical memory, the processor
generates a page-fault exception (#PF). The exception handler for the page-fault exception typically directs the operating system or executive to load the page from disk storage into physical
memory (perhaps writing a different page from physical memory out to disk in the process).
When the page has been loaded in physical memory, a return from the exception handler causes
the instruction that generated the exception to be restarted. The information that the processor
uses to map linear addresses into the physical address space and to generate page-fault exceptions (when necessary) is contained in page directories and page tables stored in memory.
Paging is different from segmentation through its use of fixed-size pages. Unlike seg ments,
which usually are the same size as the code or data structures they hold, pages have a fixed size.
If segmentation is the only form of address translation used, a data structure present in physical
memory will have all of its parts in memory. If paging is used, a data structure can be partly in
memory and partly in disk storage.
To minimize the number of bus cycles required for address translation, the most recently
accessed page-directory and page-table entries are cached in the processor in devices called
translation lookaside buffers (TLBs). The TLBs satisfy most requests for reading the current
page directory and page tables withou t requi ri ng a bus cycle. Extra bus cycles occur only when
the TLBs do not contain a page-table entry, which typically happens when a page has not been
3-20 Vol. 3A
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.