THIS DOCUMENT IS PROVIDED “AS IS” WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY,
FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR
SAMPLE.
®
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL
ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PRO PERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN
INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEV ER, AND INTEL DISCLAIMS
ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES
RELATING T O FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER
INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING
PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY
APPLICATIONS.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undef ined." Intel reserves these for
future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
processors based on the Itanium architecture may cont a in design defect s or errors know n as errat a which may cause t he product to deviate f rom
Intel
published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your produ ct order.
Copies of documents which have an order number and are referenced i n this document, or other Intel literature, may be obtained by calling
1-800-548-4725, or by visiting Intel's website at http://www.intel.com.
Intel, Intel486, Itanium, Pentium, VT une and MMX ar e trademar ks or registe red trademarks of I ntel Corporat ion or it s subsidiari es in the Uni ted States
The Intel® Itanium® architecture is a unique combination of innovative features such as explicit
parallelism, predication, speculation and more. The architecture is designed to be highly scalable to
fill the ever increasing performance requirements of various server and workstation market
segments. The Itanium architecture features a revolutionary 64-bit instruction set architecture (ISA)
which applies a new processor architecture technology called EPIC, or Explicitly Parallel
Instruction Computing. A key feature of the Itanium architecture is IA-32 instruction set
compatibility.
The Intel
description of the programming environment, resources, and instruction set visible to both the
application and system programmer. In addition, it also describes how programmers can take
advantage of the features of the Itanium architecture to help them optimize code.
®
Itanium® Architecture Software Developer’s Manual provides a comprehensive
1.1Overview of Volume 1: Application Architecture
This volume defines the Itanium application architecture, including application level resources,
programming environment, and the IA-32 application interface. This volume also describes
optimization techniques used to generate high performance software.
1.1.1Part 1: Application Architecture Guide
Chapter 1, “About this Manual” provides an overview of all volum es in the Intel® Itanium®
Architecture Software Developer’s Manual.
Chapter 2, “Introduction to the Intel
architecture.
Chapter 3, “Execution Environment” describes the Itanium register set used by applications and the
memory organization models.
®
Itanium® Architecture” provides an overview of the
Chapter 4, “Application Programming Model” gives an overview of the behavior of Itanium
application instructions (grouped into related functions).
Chapter 5, “Floating-point Programming Model” describes the Itanium floating-point architecture
(including integer multiply).
Chapter 6, “IA-32 Application Execution Model in an Intel
describes the operation of IA-32 instructions within the Itanium System Environment from th e
perspective of an application programmer.
Volume 2: About this Manual2:1
®
Itanium® System Environment”
1.1.2Part 2: Optimization Guide for the Intel® Itanium
Architecture
Chapter 1, “About the Optimization Guide” gives an overview of the optimization guide.
®
Chapter 2, “Introduction to Program ming for the Intel
overview of the application programming environment for the Itanium architecture.
Chapter 3, “Memory Reference” discusses features and optimizations related to control and data
speculation.
Chapter 4, “Predication, Control Flow, and Instruction Stream” describes optimization features
related to predication, control flow, and branch hints.
Chapter 5, “Software Pipelining and Loop Support” provides a detailed discussion on optimizing
loops through use of software pipelining.
Chapter 6, “Floating-point Applications” discusses current performance limitations in
floating-point applications and features that address these limitations.
®
Itanium® Architecture” provides an
1.2Overview of Volume 2: System Architecture
This volume defines the Itanium system architecture, including system level resources and
programming state, interrupt model, and processor firmware interface. This volume also provides a
useful system programmer's guide for writing high performance system software.
1.2.1Part 1: System Architecture Guide
Chapter 1, “About this Manual” provides an overview of all volum es in the Intel® Itanium®
Architecture Software Developer’s Manual.
Chapter 2, “Intel
execution of Itanium architecture-based operating systems running IA-32 or Itanium
architecture-based applications.
Chapter 3, “System State and Programming Model” describes the Itanium architectural state which
is visible only to an operating system.
Chapter 4, “Addressing and Protection” defines the resources available to the operating system for
virtual to physical address translation, virtual aliasing, physical addressing, and memory ordering.
Chapter 5, “Interruptions” describes all interruptions that can be generated by a processor based on
the Itanium architecture.
Chapter 6, “Register Stack Engine” describes the architectural mechanism which automatically
saves and restores the stacked subset (GR32 – GR 127) of the general register file.
Chapter 7, “Debugging and Performance Monitoring” is an overview of the performance
monitoring and debugging resources that are available in the Itanium architecture.
2:2Volume 2: About this Manual
®
Itanium® System Environment” introduces the environment designed to support
Chapter 8, “Interruption Vector Descriptions” lists all interruption vectors.
Chapter 9, “IA-32 Interruption Vector Descriptions” lists IA-32 exceptions, interrupts and
intercepts that can occur during IA-32 instruction set execution in the Itanium System
Environment.
Chapter 10, “Itanium
Applications” defines the operation of IA-32 instructions within the Itanium System Environment
from the perspective of an Itanium architecture-based operating system.
Chapter 11, “Processor Abstraction Layer” describes the firmware layer which abstracts processor
implementation-dependent features.
®
Architecture-based Operating System Interaction Model with IA-32
1.2.2Part 2: System Programmer’s Guide
Chapter 1, “About the System Programmer’s Guide” gives an introduction to the second section of
the system architecture guide.
Chapter 2, “MP Coherence and Synchronization” describes m ulti processing synchronization
primitives and the Itanium memory ordering model.
Chapter 3, “Interruptions and Serialization” describes how the processor serializes execution
around interruptions and what state is preserved and made available to low-level system code when
interruptions are taken.
Chapter 4, “Context Management” describes how operating systems need to preserve Itanium
register contents and state. This chapter also describes system architecture mechanisms that allow
an operating system to reduce the number of registers that need to be spilled/filled on interruptions,
system calls, and context switches.
Chapter 5, “Memory Management” introduces various memory management strategies.
Chapter 6, “Runtime Support for Control and Data Speculation” describes the operating system
support that is required for control and data speculation.
Chapter 7, “Instruction Emulation and Other Fault Handlers” descri bes a variety of instruction
emulation handlers that Itanium architecture-based operating systems are expected to support.
Chapter 8, “Floating-point System Software” discusses how processors based on the Itanium
architecture handle floating-point numeric exceptions and how the software stack provides
complete IEEE-754 compliance.
Chapter 9, “IA-32 Application Support” describes the support an Itanium architecture-based
operating system needs to provide to host IA-32 applications.
Chapter 10, “External Interrupt Architecture” describes the external interrupt architecture with a
focus on how external asynchronous interrupt handling can be controlled by software.
Chapter 11, “I/O Architecture” describes the I/O architecture with a focus on platform issues and
support for the existing IA-32 I/O port space.
Chapter 12, “Performance Monitoring Supp ort ” describes the performance monitor architecture
with a focus on what kind of support is needed from Itanium architecture-based operating systems.
Volume 2: About this Manual2:3
Chapter 13, “Firmware Overview” introduces the firmware model, and how various firmware
layers (PAL, SAL, EFI) work together to enable processor and system initialization, and operating
system boot.
1.2.3Appendices
Appendix A, “Code Examples” provides OS boot flow sample code.
1.3Overview of Volume 3: Instruction Set Reference
This volume is a comprehensive reference to the Itanium instruction set, including instruction
format/encoding.
1.3.1Part 1: Intel® Itanium® Instruction Set Descriptions
Chapter 1, “About this Manual” provides an overview of all volum es in the Intel® Itanium®
Architecture Software Developer’s Manual.
Chapter 2, “Instruction Reference” provides a detailed description of all Itanium instructions,
organized in alphabetical order by assembly language mnemonic.
Chapter 3, “Pseudo-Code Functions” provides a table of pseudo-code functions which are used to
define the behavior of the Itanium instructions.
Chapter 4, “Instruction Formats” describ es the encoding and instruction format instructions.
Chapter 5, “Resource and Dependency Semantics” summarizes the dependency rules that are
applicable when generating code for processors based on the Itanium architecture.
1.3.2Part 2: IA-32 Instruction Set Descriptions
Chapter 1, “Base IA-32 Instruction Reference” provides a detailed description of all base IA-32
instructions, organized in alphabetical order by assembly language mnemonic.
Chapter 2, “IA-32 Intel
description of all IA-32 Intel
of multimedia intensive applications. Organized in alphabetical order by assembly language
mnemonic.
Chapter 3, “IA-32 SSE Instruction Reference” provides a detailed description of all IA-32
Streaming SIMD Extension (SSE) instructions designed to increase performance of multimedia
intensive applications, and is organized in alphabetical order by assembly language mnemonic.
®
MMX™ Technology Instruction Reference” provides a detailed
®
MMX™ technology instructions designed to increase performance
2:4Volume 2: About this Manual
1.4Terminology
The following definitions are for terms related to the Itanium architecture and will be used
throughout this document:
Instruction Set Architecture (ISA) – Defines application and system level resources. These
resources include instructions and registers.
Itanium Architecture – The new ISA with 64-bit instruction capabilities, new performanceenhancing features, and support for the IA-32 instruction set.
IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the IA-32 Intel
Architecture Software Developer’s Manual.
Itanium System Environment – The operating system environment that supports the execution of
both IA-32 and Itanium architecture-based code.
IA-32 System Environment – The operating system privileged environment and resources as
defined by the IA-32 Intel
®
Architecture Software Developer’s Manual. Resources include virtual
paging, control registers, debugging, performance monitoring, machine checks, and the set of
privileged instructions.
Itanium Architecture-based Firmwar e – The Processor Abstraction Layer (PAL) and System
Abstraction Layer (SAL).
Processor Abstraction Layer (PAL) – The firmware layer which abstracts processor features that
are implementation dependent.
System Abstraction Layer (SAL) – The firmware layer which abstracts system features that are
implementation dependent.
1.5Related Documents
The following documents can be downloaded at the Intel’s Developer Site at
http://developer.intel.com:
®
• Intel
• Intel
• IA-32 Intel
• Intel
Itanium® 2 Processor Reference Manual for Software Development and
Optimization – This document (Document number 251110) describes model-specific
architectural features incorporated into the Intel
based on the Itanium architecture.
®
Itanium® Processor Reference Manual for Software Development – This document
(Document number 245320) describes model-specific architectural features incorporated into
the Intel
®
Itanium® processor, the first processor based on the Itanium architecture.
®
Architecture Software Developer’s Manual – This set of manuals describes the
Intel 32-bit architecture. They are available from the Intel Literature Department by calling
1-800-548-4725 and requesting Document Numbers 243190, 243191and 243192.
®
Itanium® Software Conventions and Runtime Architecture Guide – This document
(Document number 245358) defines general information necessary to compile, link, and
execute a program on an Itanium architecture-based operating system.
®
®
Itanium® 2 processor, the second processor
Volume 2: About this Manual2:5
• Intel® Itanium® Processor Family System Abstraction Layer Specification – This document
(Document number 245359) specifies requirements to develop platform firmware for Itanium
architecture-based systems.
• Extensible Firmware Interface Specification – This document defines a new model for the
interface between operating systems and platform firmware.
1.6Revision History
Date of
Revision
December 20052.2Added TF instruction in Vol 3 Ch 2.
Revision
Number
Description
Updated IA-32 CPUID I-page in Vol 4 Ch 2.
Add support for the absence of INIT, PMI, and LINT pins in Vol 2, Part I,
Section 5.8.
Add text to "ev" field of Vol 2, Section 7.2.1 T able 7.4 to define a PMU external
notification mechanism as implementation dependent.
Extensions to PAL procedures to support data poisoning in Vol 2, Part I, Ch
11.
Virtualization Addendum - Requires that processors have a way to
enable/disable vmsw instruction in Vol 2, Part I, Sections 2.2, 3.4 and 11.9.3.
Change the description of CR[IFA] and CR[ITIR] to provide hardware the
option of checking them for reserved values on a write. Also mention this
option in the description of the Translation Insertion Format.
Addition of new return status to PAL_TEST_PROC in Vol 2, Part I, Ch 11.
Fix small holes in INTA/XTP definition in Vol 2, Part I, Sections 5.8.4.3 and
Ch 2.
Fix small discrepancies in the cmp8xchg16 definition in Vol 3 Ch 2.
Change rules about overlapping inserts to allow Itanium 2 behavior in Vol 2,
Part I, Section 4.1.8.
Update PAL_BUS_GET/SET_FEATURES bit 52 definition in Vol 2 Ch 11.
Allow register fields in CR.LID register to be read-only and CR.LID checking
on interruption messages by processors optional. See Vol 2, Part I, Ch 5
“Interruptions” and Section 11.2.2 PALE_RESET Exit State for details.
Relaxed reserved and ignored fields checkings in IA-32 application registers
in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.
Introduced visibility constraints between stores and local purges to ensure
TLB consistency for UP VHPT update and local purge scenarios. See Vol 2,
Part I, Ch 4 and description of
Architecture extensions for processor Power/Performance states (P-states).
See Vol 2 PAL Chapter for details.
Introduced Unimplemented Instruction Address fault.
Relaxed ordering constraints for VHPT walks. See Vol 2, Part I, Ch 4 and 5 for
details.
Architecture extensions for processor virtualization.
All instructions which must be last in an instruction group results in undefined
behavior when this rule is violated.
Added architectural sequence that guarantees increasing ITC and PMD
values on successive reads.
ptc.l instruction in Vol 3 for details.
2:6Volume 2: About this Manual
Date of
Revision
December 2005
(Continued)
October 20022.1Added New fc.i Instruction (Sections 4.4.6.1 and 4.4.6.2, Part I, Vol. 1;
Revision
Number
2.2Addition of PAL_BRAND_INFO, PAL_GET_HW_POLICY,
PAL_MC_ERROR_INJECT, PAL_MEMORY_BUFFER,
PAL_SET_HW_POLICY and PAL_SHUTDOWN procedures.
Allows IPI-redirection feature to be optional.
Undefined behavior for 1-byte accesses to the non-architected regions in the
IPI block.
Modified insertion behavior for TR overlaps. See Vol 2, Part I, Ch 4 for details.
“Bus parking” feature is now optional for PAL_BUS_GET_FEATURES.
FR32-127 is now preserved in PAL calling convention.
New return value from PAL_VM_SUMMARY procedure to indicate the
number of multiple concurrent outstanding TLB purges.
Performance Monitor Data (PMD) registers are no longer sign-extended.
New memory attribute transition sequence for memory on-line delete. See Vol
2, Part I, Ch 4 for details.
Added 'shared error' (se) bit to the Processor State Parameter (PSP) in
PAL_MC_ERROR_INFO procedure.
Clarified PMU interrupts as edge-triggered.
Modified ‘proc_number’ parameter in PAL_LOGICAL_TO_PHYSICAL
procedure.
Modified pal_copy_info alignment requirements.
New bit in PAL_PROC_GET_FEATURES for variable P-state performance.
Clarified descriptions for check_target_register and
check_target_register_sof.
Various fixes in dependency tables in Vol 3 Ch 5.
Clarified effect of sending IPIs to non-existent processor in Vol 2, Part I, Ch 5.
Clarified instruction serialization requirements for interruptions in Vol 2, Part II,
Ch 3.
Updated performance monitor context switch routine in Vol 2, Part I, Ch 7.
Sections 4.3.3, 4.4.1, 4.4.5, 4.4.7, 5.5.2, and 7.1.2, Part I, Vol. 2; Sections 2.5,
2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Vol. 2; and Sections 2.2, 3, 4.1, 4.4.6.5,
and 4.4.10.10, Part I, Vol. 3).
Added New Atomic Operations ld16,st16,cmp8xchg16 (Sections 3.1.8,
3.1.8.6, 4.4.1, 4.4.2, and 4.4.3, Part I, Vol. 1; Section 4.5, Part I, Vol. 2; and
Sections 2.2, 3, 5.3.2, and 5.4, Part I, Vol. 3).
Added Spontaneous NaT Generation on Speculative Load (Sections 5.5.5
and 11.9, Part I, Vol. 2 and Sections 2.2 and 3, Part I, Vol. 3).
Added New Hint Instruction (Section 2.2, Part I, Vol. 3).
Added Fault Handling Semantics for lfetch.fault Instruction (Section 2.2,
Part I, Vol. 3).
Added Capability for Allowing Multiple PAL_A_SPEC and PAL_B Entries in
the Firmware Interface Table (Section 11.1.6, Part I, Vol. 2).
Added BR1 to Min-state Save Area and Clarified Alignment (Sections 1 1.3.2.3
and 11.3.3, Part I, Vol. 2).
Added New PAL Procedures: PAL_LOGICAL_TO_PHYSICAL and
PAL_CACHE_SHARED_INFO (Section 11.9.1, Part I, Vol. 2).
Added Op Fields to PAL_MC_ERROR_INFO (Section 11.9, Part I, Vol. 2).
Added New Error Exit States (Section 11.2.2.2, Part I, Vol. 2).
Added Performance Counter Standardization (Sections 7.2.3 and 11.6, Part I,
Vol. 2).
Modified CPUID[4] for Atomic Operations and Spontaneous Deferral
(Section 3.1.11, Part I, Vol. 1).
Description
Volume 2: About this Manual2:7
Date of
Revision
October 2002
(continued)
December 20012.0Volume 1:
Revision
Number
2.1Modified PAL_FREQ_RATIOS (Section 11.2.2, Part I, Vol. 2).
Modified PAL_VERSION (Section 11.9, Part I, Vol. 2).
Modified PAL_CACHE_INFO Store Hints (Section 11.9, Part I, Vol. 2).
Modified PAL_MC_RESUME (Sections 11.3.3 and 11.4, Part I, Vol. 2).
Modified IA_32_Exception (Debug) IIPA Description (Section 9.2, Part I,
Vol. 2).
Clarified Predicate Behavior of alloc Instruction (Section 4.1.2, Part I, Vol. 1
and Section 2.2, Part I, Vol. 3).
Clarified ITC clocking (Section 3.1.8.10, Part I, Vol. 1; Section 3.3.4.2, Part I,
Vol. 2; and Section 10.5.5, Part II, Vol. 2).
Clarified Interval Time Counter (ITC) Fault (Section 3.3.2, Part I, Vol. 2).
Clarified Interruption Control Registers (Section 3.3.5, Part I, Vol. 2).
Clarified Freeze Bit Functionality in Context Switching and Interrupt
Generation (Sections 7.2.1, 7.2.2, 7.2.4.1, and 7.2.4.2, Part I, Vol. 2).
Clarified PAL_BUS_GET/SET_FEATURES (Section 11.9.3, Part I, Vol. 2).
Clarified PAL_CACHE_FLUSH (Section 11.9, Part I, Vol. 2).
Clarified Cache State upon Recovery Check (Section 11.2, Part I, Vol. 2).
Clarified PALE_INIT Exit State (Section 11.4.2, Part I, Vol. 2).
Clarified Processor State Parameter (Section 11.4.2.1, Part I, Vol. 2).
Clarified Firmware Address Space at Reset (Section 11.1, Part I, Vol. 2).
Clarified PAL PMI, AR.ITC, and PMD Register Values (Sections 11.3, 11.5.1,
and 11.5.2, Part I, Vol. 2).
Clarified Invalid Arguments to PAL (Section 11.9.2.4, Part I, Vol. 2).
Clarified itr/itc Instructions (Section 2.2, Part I, Vol. 3).
Faults in ld.c that hits ALAT clarification (Section 4.4.5.3.1).
IA-32 related changes (Section 6.2.5.4, Section 6.2.3, Section 6.2.4, Section
6.2.5.3).
Load instructions change (Section 4.4.1).
Volume 2:
Class pr-writers-int clarification (Table A-5).
PAL_MC_DRAIN clarification (Section 4.4.6.1).
VHPT walk and forward progress change (Section 4.1.1.2).
IA-32 IBR/DBR match clarification (Section 7.1.1).
ISR figure changes (pp. 8-5, 8-26, 8-33 and 8-36).
PAL_CACHE_FLUSH return argument change - added new status return
argument (Section 11.8.3).
PAL self-test Control and PAL_A procedure requirement change - added new
references (Section 4.4.6).
PAL memory accesses and restrictions clarification (Section 11.9).
PSP validity on INITs from PAL_MC_ERROR_INFO clarification (Section
Section 5.5, Section 8.3, and Section 2.2).
Volume 3:
IA-32 CPUID clarification (p. 5-71).
Revised figures for extract, deposit, and alloc instructions (Section 2.2).
RCPPS, RCPSS, RSQRTPS, and RSQRTSS clarification (Section 7.12).
IA-32 related changes (Section 5.3).
tak, tpa change (Section 2.2).
Processor Serial Number feature removed (Chapter 3).
Clarification on exceptions to instruction dependency (Section 3.4.3).
Volume 2:
Clarifications regarding “reserved” fields in ITIR (Chapter 3).
Instruction and Data translation must be enabled for executing IA-32
instructions (Chapters 3,4 and 10).
FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI
(Chapters 3 and 4).
Clarification regarding ordering data dependency.
Out-of-order IPI delivery is now allowed (Chapters 4 and 5).
Content of EFLAG field changed in IIM (p. 9-24).
PAL_CHECK and PAL_INIT calls – exit state changes (Chapter 11).
PAL_CHECK processor state parameter changes (Chapter 11).
PAL_BUS_GET/SET_FEATURES calls – added two new bits (Chapter 11).
PAL_MC_ERROR_INFO call – Changes made to enhance and simplify the
call to provide more information regarding machine check (Chapter 11).
PAL_ENTER_IA_32_Env call changes – entry parameter represents the entry
order; SAL needs to initialize all the IA-32 registers properly before making
this call (Chapter 11).
PAL_CACHE_FLUSH – added a new cache_type argument (Chapter 11.
PAL_SHUTDOWN – removed from list of PAL calls (Chapter 11).
Clarified memory ordering changes (Chapter 13).
Clarification in dependence violation table (Appendix A).
Description
Volume 2: About this Manual2:9
Date of
Revision
July 2000
(continued)
January 20001.0Initial release of document.
Revision
Number
1.1Volume 3:
fmix instruction page figures corrected (Chapter 2).
Clarification of “reserved” fields in ITIR (Chapters 2 and 3).
Modified conditions for alloc/loadrs/flushrs instruction placement in bundle/
instruction group (Chapters 2 and 4).
IA-32 JMPE instruction page typo fix (p. 5-238).
Processor Serial Number feature removed (Chapter 5).
Description
2:10Volume 2: About this Manual
2
Intel® Itanium® System Environment2
As described in Section 2.1, “Operating Environments” on page 1:11, the Itanium architecture
features two full operating system environments: the IA-32 System Environment supports IA-32
operating systems, and the Itanium System Environment supports Itanium architecture-based
operating systems. The architectural model also supports a mixture of IA-32 and Itanium
architecture-based application code within an Itanium architecture-based operating system.
The system environment determines the set of processor system resources seen by the operating
system. These resources include: virtual memory management, physical memory attributes,
external interrupt mechanisms, exception and interrupt delivery, machine check architectures,
debug, performance monitoring, control registers, and the set of privileged instructions.
The choice of system environment is made when a processor boots, and is described in Section 2.1,
“Processor Boot Sequence” on page 2:11. Section 2.2 in this chapter defines the Itanium System
Environment.
2.1Processor Boot Sequence
Figure 2-1 shows the defined boot sequence. Unlike IA-32 processors, which power up in 32-bit
Real Mode, processors in the Itanium processor family power up in the Itanium System
Environment running Itanium architecture-based code. Processor initialization, testing, memory,
and platform initialization/testing are performed by processor firmware. Mechanisms are provided
to execute Real Mode IA-32 boot BIOSs and device drivers during the boot sequence. After the
boot sequence, a determination is made by boot software to continue executing in Itanium Sy stem
Environment (for example to boot an Itanium architecture-based operating systems) or to enter the
IA-32 operating system environment through the PAL_ENTER_IA_32_ENV firmware call. Refer
to Chapter 11, “Processor Abstraction Layer” for details.
Volume 2: Intel® Itanium® System Environment2:11
Figure 2-1. System Environment Boot Flow
Intel® Itanium®
System Environment
Processor
Reset
Test & Initialization
(Intel® Itanium®
Instructions)
Platform Test &
Initialization
(Intel® Itanium® or
IA-32 Instructions)
IA-32_boot?
Itanium® architecture-based
OS Boot
(Intel® Itanium
Instructions & IA-32
Instructions)
®
IA-32 System Environment
Firmware Call to PAL_ENTER_IA_32_ENV
Yes
No
IA-32 OS Boot
(IA-32 Instructions
Only)
2.2Intel® Itanium® System Environment Overview
The Itanium System Environment is designed to support execution of Itanium architecture-based
operating systems running IA-32 or Itanium architecture-based applications. IA-32 applications
can interact with Itanium architecture-based operating systems, applications and libraries within
this environment. Both IA-32 application level code and Itanium instructions can be executed by
the operating system and user level software. The entire machine state, including the IA-32 general
registers and floating-point registers, segment selectors and descriptors is accessible to Itanium
architecture-based code. As shown in Figure 2-2, all major IA-32 operating modes are fully
supported.
2:12Volume 2: Intel® Itanium® System Environment
Figure 2-2. Intel
®
Itanium® System Environment
®
Intel
Real ModeVM86Protected Mode
IA-32 Real Mode
Instructions andInstructions and Instructions and
Segmentation
Interruption &
Intercepts
IA-32 VM86
Segmentation
Paging & Interruption
Handling in the
Intel
®
Itanium® Architecture
IA-32 PM
Segmentation
Itanium
Architecture
Intel® Itanium
Instructions
In the Itanium system environment, Itanium architecture operating system resources supersede all
IA-32 system resources. Specifically, the IA-32 defined set of control, test, debug, machine check
registers, privilege instructions, and virtual paging algorithms are replaced by the Itanium
architecture system resources. When IA-32 code is running on an Itanium architecture-based
operating system, the processor directly executes all performance critical but non-sensitive IA-32
application level instructions. Accesses to sensitive system resources (interrupt flags, control
registers, TLBs, etc.) are intercepted into the Itanium architecture-based operating system. Using
this set of intervention hooks, an Itanium architecture-based operating system can emulate or
virtualize an IA-32 system resource for an IA-32 application, OS, or device driver.
®
®
The Itanium system architecture features are presented in the following chapters:
• Chapter 3, “System State and Programming Model” describes system resources.
• Chapter 4, “Addressing and Protection” describes the virtual memory architecture.
• Chapter 5, “Interruptions” defines the interrupt and exception architecture.
• Chapter 10, “Itanium®Architecture-based Operating System Interaction Model with IA-32
Applications”describes how IA-32 applications interact with Itanium architecture-based
operating systems.
Volume 2: Intel® Itanium® System Environment2:13
2:14Volume 2: Intel® Itanium® System Environment
2
System State and Programming Model 3
This chapter describes the architectural state visible only to an operating system and defines system
state programming models. It covers the functional descriptions of all the system state registers,
descriptions of individual fields in each register, and their serialization requirements. The virtual
and physical memory management details are des cribed in Chapter 4, “Addressing and Protection.”
Interruptions are described in Chapter 5, “Interruptions.”
Note:Unless otherwise noted, references to “interruption” in this chapter refer to IVA-based
interruptions. See “Interruption Definitions” on page 2:8 9.
3.1Privilege Levels
Four privilege levels, numbered from 0 to 3, are provided to control access to system instructions,
system registers and system memory areas. Level 0 is the most privileged and level 3 the least
privileged. Application instructions and registers can be accessed at any privilege level. System
instructions and registers defined in this chapter can only be accessed at privilege level 0;
otherwise, a Privilege Operation fault is raised. The processor maintains a Current Privilege Level
(CPL) in the cpl field of the Processor Status Register (PSR). CPL can only be modified by
controlled entry and exit points managed by the operating system. Virtual memory protection
mechanisms control memory accesses based on the Privilege Level (PL) of the virtual page and the
CPL.
3.2Serialization
For all application and system level resources, apart from the control register file, the processor
ensures values written to a register are observed by instructions in subsequent instruction groups.
This is termed data dependency. For example, writes to general registers, floating-point and
application registers are observed by subsequent reads of the same register. (See “Control
Registers” on page 2:26 for control register serialization requirements.) For modifications of
application level resources with side effects, the side effects are ensured by the processor to be
observed by subsequent instruction groups. This is termed implicit serialization. Application
registers (ARs), with the exception of the Interval Time Counter, the User Mask, when modified by
sum, rum, and mov to psr.um, and the Current Frame Marker (CFM), are implicitly serialized. PMD
registers have special serialization requirements as described in “Generic Performance Counter
Registers” on page 2:148. All other application-level resources (GRs, FRs, PRs, BRs, IP, CPUID)
have no side effects and so need not be serialized.
To avoi d serialization overhead in privileged operating system code, system register resources are
not implicitly serialized. The processor does not ensure modification of registers with side effects
are observed by subsequent instruction groups. For system register resources other than control
registers, the processor ensures data dependencies are honored (reads see the results of prior writes
to the same register). See Section 3.3.3, “Control Registers” and Table 3-3 on page 2:26 for control
register serialization requirements. This approach simplifies hardware and allows for more efficient
Volume 2: System State and Programming Model2:15
software operations. For example, during a low level context switch where there is no immediate
use of loaded system registers, these registers can be loaded without any serialization overhead. To
ensure side effects are observed before a dependent instruction is fetched or executed, two
serialization operations are provided: instruction serialization and data serialization.
3.2.1Instruction Serialization
Instruction serialization ensures that modifications to processor resources are observed before
subsequent instruction group fetches are re-initiated. Software must use an instruction serialization
operation before any instruction group that is dependent upon the modified system resource.
Resource side effects may be observed at any point before the explicit serialization operation.
Modification of the following system resources (if the modification affects instruction fetching)
require instruction serialization: RR, PKR, ITR, ITC, IBR, PMC, PMD, PSR bits as defined in
“Processor Status Register (PSR)” on page 2:20 and Control Registers as defined in “Control
Registers” on page 2:26.
The instructions Return from Interruption (
explicit instruction serialization.
An interruption performs an implicit instruction serialization operation , so the fi rst instruction
group in the interruption handler will observe the serialized state.
Instruction Serialization Example:
mov ibr[reg]= reg// move to instruction debug register
;;// end of instruction group
srlz.i// ensure subsequent instruction fetches observe
// modification
;;// end of instruction group
inst// dependent instruction
Note:The serializing instruction, the instruction to be serialized, and any operations dependent
on the serialization must be in three separate instruction groups.
3.2.2Data Serialization
Data serialization ensures that modifications to processor resources affecting both execution and
data memory accesses are observed. Software must issue a data serialize operation prior to the
instruction dependent upon the modified resource. Data serialization can be issued within the same
instruction group as the dependent instruction. Resource side effects may be observed at any point
before the explicit serialization operation.
rfi) and Instruction Serialize (srlz.i) perform
Modification of the following system resources require data serialization: RR, PKR, DTR, DTC,
DBR, PMC, PMD, PSR bits as defined in “Processor Status Register (PSR)” on page 2:20 and
Control Registers as defined in “Control Registers” on page 2:26.
The control registers are different from the general registers and other registers. Most control
registers require an explicit data serialization between the writing of a control register and the
reading of that same control register. (See Table 3-3 on page 2:26 for serialization requirements for
specific control registers.)
2:16Volume 2: System State and Programming Model
The Data Serialize (srlz.d) instruction performs explicit data serialization. Instruction
serialization operations (
rfi, srlz.i, and interruptions) also perform a data serialization
operation.
Data Serialization Example:
mov rr[reg] = reg //move into region register
;;//end of instruction group
srlz.d//serialize region register modification
ld//perform a dependent load
The serializing instruction and the instruction to be serialized (the one writing the resource) must be
in two different instruction groups. Operations dependent on the serialization and the serialization
can be in the same instruction group, but the
srlz instruction must be before the dependent
instruction slot.
3.2.3Definition of In-flight Resources
When the value of a resource that requires an explicit instruction or data serialization is changed by
one or more writers, that resource is said to be in-flight until the required serialization is
performed. There can be multiple in-flight values if multiple writers have occurred since the last
serialization.
An instruction that reads an in-flight resource will see one of the in-flight values or the state prior to
any of the unserialized writers. However, whether such a reader sees the original or one of the
in-flight values is not predictable.
For a reader of an in-flight resource, this definition includes (but is not limited to) the following
possible outcomes:
• The reader of an in-flight resource may see the most-recently-serialized value or any of the
in-flight values each time it is executed
does not guarantee that the same writer’s value will be seen by that reader the next time.
• Multiple readers of an in-flight resource may see different values
most-recently-serialized value or any of the in-flight values, independent of what other readers
may see.
• If a single execution of an instruction reads an in-flight resource more than once during its
execution, each read may see a different value.
Thus, the only way to guarantee that the latest value is seen by a reader is to perform the required
serialization.
3.3System State
The architecture provides a rich set of system register resources for process control, interruptions
handling, protection, debugging, and performance monitoring. This section gives an overv iew of
these resources.
– seeing the value from a particular writer one time
– each may see the
Volume 2: System State and Programming Model2:17
3.3.1System State Overview
Figure 3-1 shows the set of all defined privileged system register resources. Application state as
defined in “Application Register State” on page 1:21 is also accessible.
• Processor Status Register (PSR) – 64-bit register that maintains control information for the
currently running process. See “Processor Status Register (PSR)” on page 2:20 for complete
details.
• Control Registers (CR) – This register name space contains several 64-bit registers that
capture the state of the processor on an interruption, enable system-wide features, and specify
global processor parameters for interruptions and memory management. See “Control
Registers” on page 2:26 for complete information.
• Interrupt Registers – These registers provide the capability of masking external interrupts,
reading external interrupt vector numbers, programming vector numbers for internal processor
asynchronous events and external interrupt sources. For complete information, see “Interrupts”
on page 2:108.
• Interval Timer Facilities – A 64-bit interval timer is provided for privileged and
non-privileged use and as a time base for performance measurements. Timing facilities are
defined in detail in “Interval Time Counter and Match Register (ITC – AR44 and ITM – CR1)”
on page 2:29.
• Debug Breakpoint Registers (DBR/IBR) – 64-bit Data and 64-bit Instruction Breakpoint
Register pairs (DBR, IBR) can be programmed to fault on reference to a range of virtual and
physical addresses generated by either Itanium or IA-32 instructions. See “Debugging” on
page 2:143 for details. The minimum number of DBR register pairs and IBR regi ster pair s is 4
in any implementation. On some implementations, a hardware debugger may use two or more
of these register pairs for its own use; see “Data and Instruction Breakpoint Registers” on
monitors can be programmed to measure a wide range of user, operating system, or processor
performance values. Performance monitors can be programmed to measure performance
values from either IA-32 or Itanium instructions. Performance monitors are defined in
“Performance Monitoring” on page 2:147 . The minim um num ber of generic PMC/PMD
register pairs in any implementation is 4.
• Banked General Registers – A set of 16 banked 64-bit general purpose registers, GR 16-GR
31, are available as temporary storage and register context when operating in low level
interruption code. See “Banked General Registers” on page 2:37 for complete details.
• Region Registers (RR) – Eight 64-bit region registers specify the identifiers and preferred
page sizes for multiple virtual address spaces. Refer to “Region Registers (RR)” on page 2:53
for complete information.
• Protection Key Registers (PKR) – At least sixteen 64-bit protection key registers contain
protection keys and read, write, execute permissions for virtual memory protection domains.
Please see the processor-specific documentation for further information on the number of
Protection Key Registers implemented on the Itanium processor. Refer to “Protection Keys”
on page 2:54 for details.
2:18Volume 2: System State and Programming Model
Figure 3-1. System Register Model
General Registers
63 0
gr
0
gr
1
gr
2
gr
Banked
Reg
16
gr
31
gr
32
gr
127
Advanced Load
Address Table
Region Registers
63
rr
0
rr
1
rr
7
Protection Key Regs
63
pkr
0
pkr
1
pkr
n
Floating-point Registers
NaTs
0
0
cpuid
cpuid
cpuid
Translation Lookaside Buffer
0
itr
itr
itr
0
itc
81
fr
0
fr
1
fr
2
fr
31
fr
32
fr
127
Processor Identifiers
630
0
1
n
0
1
n
APPLICATION REGISTER SET
Branch Registers
Predicates
+0.0
+1.0
0
pr
1
0
pr
1
pr
2
pr
15
pr
16
63
br
0
br
1
br
2
br
7
Instruction Pointer
630
IP
pr
Current Frame Marker
63
370
CFM
User Mask
50
Performance Monitor
Data Registers
630
pmd
0
pmd
1
pmd
n
SYSTEM REGISTER SET
Processor Status Register
630
630
ibr
0
ibr
1
ibr
n
PSR
I/DBR1
dtr
dtr
dtr
dtc
0
1
n
Debug Breakpoint Registers
Performance Monitor
Configuration Registers
630
pmc
0
pmc
1
pmc
n
dbr
dbr
dbr
Application Registers
63 0
ar
0
0
ar
7
ar
16
ar
17
ar
18
ar
19
ar
21
ar
24
ar
25
ar
26
ar
27
ar
28
ar
29
ar
30
ar
32
ar
36
ar
40
ar
44
ar
64
ar
65
ar
66
ar
127
cr
cr
cr
cr
0
1
cr
cr
cr
cr
n
cr
cr
cr
cr
cr
cr
...
cr
KR0
KR7
RSC
BSP
BSPSTORE
RNAT
FCR
EFLAG
CSD
SSD
CFLG
FSR
FIR
FDR
CCV
UNAT
FPSR
ITC
PFS
LC
EC
Control Registers
63 0
DCR
0
ITM
1
IVA
2
PTA
8
IPSR
16
ISR
17
IIP
19
IFA
20
ITIR
21
IIPA
22
IFS
23
IIM
24
IHA
25
External
64
Interrupt
Control
Registers
81
• Translation Lookaside Buffer (TLB) – Holds recently used virtual to physical address
mappings. The TLB is divided into Instruction (ITLB), Data (DTLB), Translation Registers
(TR) and Translation Cache (TC) sections. See “Translation Lookaside Buffer (TLB)” on
page 2:43 for complete details. Translation Registers are software managed portions of the
TLB and the Translation Cache section of the TLB is directly managed by the processor.
Volume 2: System State and Programming Model2:19
3.3.2Processor Status Register (PSR)
The PSR maintains the current execution environment. The PSR is divided into four overlapping
sections (See Figure 3-2): user m a sk bits (PSR{5:0}), system mask bits (PSR{23:0}), the lower
half (PSR{31:0}), and the entire PSR (PSR{63:0}). PSR fields are defined in Table 3-2 along with
serialization requirements for modification of each field and the state of the field after an
interruption.
The PSR instructions and their serialization requirements are defined in Table 3-1. These
instructions explicitly read or write portions of the PSR. Other instructions also read and write
portions of the PSR as described in Table 3-2 and Table 5-2.
Table 3-1. Processor Status Register Instructions
MnemonicDescriptionOperation
sum imm
rum imm
movpsr.um = r
movr
= psr.um
1
ssm imm
rsm imm
movpsr.l = r
movr
= psr
1
bsw.0, bsw.1
rfi
a. Based upon the resource being serialized, use data or instruction serialization.
b. All other bits of the PSR read as zero.
Set user mask
from immediate
Reset user
mask from
immediate
Move to user
2
mask
Move from user
mask
Set system
mask from
immediate
Reset system
mask from
immediate
Move to lower
2
PSR
Move from PSR GR[r1] ←PSR{36:35,31:0}
Bank switchPSR{44} ← 0 or 1Bimplicit
Return From
Interruption
PSR{5:0} ← PSR{5:0} | immMimplicit
PSR{5:0} ← PSR{5:0} & ~immMimplicit
PSR{5:0} ← GR[r
GR[r1] ←PSR{5:0}Mnone
PSR{23:0} ← PSR{23:0} | immMdata/inst
PSR{23:0} ← PSR{23:0} &~immMdata/inst
PSR{31:0} ← GR[r
PSR{63:0} ← IPSRBimplicit
system mask
user mask
Instr.
Serialization
Type
]Mimplicit
2
]Mdata/inst
2
b
Required
a
a
a
Mnone
The user mask, PSR{5:0}, can be set and cleared by the Set User Mask (
(
rum) and Move to User Mask (mov psr.um=) instructions at any privilege level. For user mask
modifications by
sum, rum and mov, the processor ensures all side effects are observed before
sum), Reset User Mask
subsequent instruction groups.
2:20Volume 2: System State and Programming Model
The system mask, PSR{23:0}, can be set and cleared by the Set System Mask (ssm) and Reset
System Mask (
rsm) instructions. Software must issue the appropriate serialization operation before
dependent instructions. The system mask instructions are privileged.
The lower half of the PSR, PSR{31:0}, can be written with the Move to Lower PSR (
instruction. Software must issue the appropriate serialization operation before dependent
instructions. The Move to Lower PSR instruction is privileged.
The PSR can be read with the Move from PSR (
PSR{31:0} are written to the target register by Move from PSR. PSR{63:37} and PSR{34:32} can
only be read after an interruption by reading the state in IPSR. The entire PSR is updated from
IPSR by the Return from Interruption (
Both Move from PSR and Return from Interruption are privileged.
Table 3-2. Processor Status Register Fields
FieldBitsDescription
User Mask = PSR{5:0}
rv0reserved
be1Big-Endian – When 1, data memory references are
big-endian. When 0, data memory references are little
endian. This bit is ignored for IA-32 data references,
which are always performed little-endian. Instruction
fetches are always performed little endian.
up2User Performance monitor enable – When 1,
performance monitors configured as user monitors are
enabled to count events (including IA-32). When 0, user
configured monitors are disabled. See “Performance
Monitoring” on page 2:147 for details.
ac3Alignment Check – When 1, all unaligned data memory
references result in an Unaligned Data Reference fault.
When 0, unaligned data memory references may or
may not result in a Unaligned Data Reference fault. See
“Memory Datum Alignment and Atomicity” on page 2:86
for details. Unaligned semaphore references also result
in a Unaligned Data Reference fault, regardless of the
state of PSR.ac. For IA-32 instructions, if PSR.ac is 1
an unaligned IA-32 data memory reference raises an
IA_32_Exception(AlignmentCheck) fault. When 0,
additional IA-32 control bits as defined in Section
10.6.7, “Memory Alignment” also generate alignment
checks.
mfl4Lower (f2 .. f31) floating-point registers written – This bit
is set to one when an Intel Itanium instruction
completes that uses register f2..f31 as a target register.
This bit is sticky and only cleared by an explicit write of
the user mask. When leaving the IA-32 instruction set,
PSR.mfl is set to 1 if PSR.dfl is 0, otherwise PSR.mfl is
unmodified.
mfh5Upper (f32 .. f127) floating-point registers written – This
bit is set to one when an Intel Itanium instruction
completes that uses register f32..f127 as a target
register. This bit is sticky and only cleared by an explicit
write of the user mask. PSR.mfh is unmodified by IA-32
instruction set execution.
System Mask = PSR{23:0}
mov psr.l=)
mov =psr) instruction. Only PSR{36:35} and
rfi) instruction. An rfi also implicitly serializes the PSR.
Interruption
State
DCR.bedata
unchangeddata
0data
unchangeddata
unchangeddata
Serialization
Required
a
a
b
inst
a
a
a
Volume 2: System State and Programming Model2:21
Table 3-2. Processor Status Register Fields (Continued)
FieldBitsDescription
ic13Interruption Collection – When 1 and an interruption
occurs, the current state of the processor is loaded in
IIP, IPSR, IIM and IFS; and additional registers defined
in “Interruption Vector Descriptions” on page 2:157.
When 0, IIP, IPSR, IIM and IFS are not modified on an
interruption (see “Writing of Interruption Resources by
Vector” on page2:158 for details). When 0, speculative
load exceptions result in deferred exception behavior,
regardless of the state of the DCR and ITLB deferral
bits. Processor operation is undefined if PSR.ic is 0 and
a transition is made to execute IA-32 code.
i14Interrupt Bit – When 1 and executing Intel Itanium
instructions, unmasked pending external interrupts will
interrupt the processor by transferring control to the
external interrupt handler. When 0, pending external
interrupts do not interrupt the processor. The effect of
clearing PSR.i via Reset System Mask (
rsm)
instructions is observed by the next instruction.
Toggling PSR.i from one to zero via Move to PSR.l
requires data serialization. When executing IA-32
instructions, external interrupts are enabled if PSR.i
and (CFLG.if is 0 or EFLAG.if is 1). NMI interrupts are
enabled if PSR.i is 1 regardless of EFLAG.if.
pk15Protection Key enable – When 1 and PSR.it is 1,
instruction references (including IA-32) check for valid
protection keys. When 1 and PSR.dt is 1, data
references (including IA-32) check for valid protection
keys. When 1 and PSR.rt is 1, protection key checks
are enabled for register stack references. When 0,
neither instruction, data, nor register stack references
are checked for valid protection keys. When PSR.dt,
PSR.rt or PSR.it are 0, PSR.pk is ignored for the
corresponding reference.
rv12:6, 16reserved
Interruption
State
0inst/data
Serialization
Required
c
0clear: implicit
serialization
set: data
unchangedinst/data
d
e
dt17Data address Translation – When 1, virtual data
unchanged/0
j
inst/data
c
addresses are translated and access rights checked.
When 0, data accesses use physical addressing.
PSR.dt must be 1 when entering IA-32 code, otherwise
processor operation is undefined.
dfl18Disabled Floating-point Low register set – When 1, a
0data
read or write access to f2 through f31 results in a
Disabled Floating-Point Register fault. When 1, all
IA-32 FP, Intel SSE and Intel MMX technology
instructions raise a Disabled FP Register fault
(regardless whether the instruction actually references
f2-31).
dfh19Disabled Floating-point High register set – When 1, a
0data
read or write access to f32 through f127 results in a
Disabled Floating-Point Register fault. When 1, a
Disabled FP Register fault is raised on the first IA-32
target instruction following a
br.ia or rfi, regardless
whether f32-127 are referenced.
2:22Volume 2: System State and Programming Model
Table 3-2. Processor Status Register Fields (Continued)
FieldBitsDescription
sp20Secure Performance monitors – Controls the ability of
non-privileged code (including IA-32 code) to read
non-privileged performance monitors. See Table 7-5 on
page 2:150 for values returned by PMD read
instructions. Also, when 0, PSR.up can be modified by
user mask instructions; otherwise, PSR.up is
unchanged by user mask instructions. When 1 or
CFLG.pce is 0, non-privileged IA-32 performance
monitor reads (via
IA_32_Exception(GPFault).
pp21Privileged Performance monitor enable – When 1,
monitors configured as privileged monitors are enabled
to count events (including IA-32 events). When 0,
privileged monitors are disabled. See “Performance
Monitoring” on page 2:147 for details.
di22Disable Instruction set transition – When 1, attempts to
switch instruction sets via the IA-32
instructions results in a Disabled Instruction Set
Transition fault. This bit doesn’t restrict instruction set
transitions due to interruptions or
si23Secure Interval timer – When 1, the Interval Time
Counter (ITC) register is readable only by privileged
code; non-privileged reads result in a Privileged
Register fault. When 0, ITC is readable at any privilege
level. System software can secure the ITC from
non-privileged IA-32 access by setting either PSR.si or
CFLG .tsd to 1. When secured, an IA-32 rdt sc (read time
stamp counter) instruction at any privilege level other
than the most privileged raises an
IA_32_Exception(GPfault)
PSR.l = PSR{31:0}
db24Debug Breakpoint fault – When 1, data and instruction
address breakpoints are enabled and can cause an
Data/Instruction Debug fault. When 1, IA-32 instruction
address breakpoints are enabled and can cause an
IA_32_Exception(Debug) fault.When 1, IA-32 data
address breakpoints are enabled and can cause an
IA_32_Exception(Debug) Trap.When 0, address
breakpoint faults and traps are disabled.
lp25Lower Privilege transfer trap – When 1, a Lower
Privilege Transfer trap occurs whenever a taken branch
lowers the current privilege level (numerically
increases). This bit is ignored during IA-32 instruction
set execution.
tb26Taken Branch trap – When 1, the successful completion
of a taken branch results in a Taken Branch trap.
and interruptions can not raise a Taken Branch trap.
When 1, successful completion of a taken IA-32 branch
results in an IA_32_Exception(Debug) trap.
rdpmc) raise an
jmpe or br.ia
rfi.
rfi
Interruption
State
0data
DCR.ppinst/data
0data
0data
0inst/data
0data
0data
Serialization
Required
e
e
Volume 2: System State and Programming Model2:23
Table 3-2. Processor Status Register Fields (Continued)
FieldBitsDescription
rt27Register stack Translation – When 1, register stack
accesses are translated and access rights are checked.
When 0, register stack accesses use physical
addressing. PSR.dt is ignored for register stack
accesses. The register stack engine must be in
enforced lazy mode (RSC.mode = 00) when modifying
this bit; otherwise, processor behavior is undefined.
During IA-32 instruction execution this bit is ignored and
the register stack is disabled.
rv31:28reserved
PSR{63:0}
f
cpl
33:32Current Privilege Level –The current privilege level of
the processor (including IA-32). Controls accessibility to
system registers, instructions and virtual memory
pages. A value of 0 is most privileged, a value of 3 is
least privileged. Written by the
instructions. PSR.cpl is unchanged by the
rfi, epc, and br.ret
jmpe and
br.ia instructions. PSR.cpl cannot be updated by any
IA-32 instructions.
is34Instruction Set – When 0, Intel Itanium instructions are
executing. When 1, IA-32 instructions are executing.
Written by the
IA-32
jmpe instruction.
mc35Machine Check abort mask – When 1, machine check
aborts are masked. When 0, machine check aborts can
be delivered (including IA-32 instruction set execution).
Processor operation is undefined if PSR.mc is 1 and a
transition is made to execute IA-32 code.
it36Instruction address Translation – When 1, virtual
instruction addresses are translated and access rights
checked. When 0, instruction accesses use physical
addressing. PSR.it must be 1 when entering IA-32
code, otherwise processor operation is undefined.
id37Instruction Debug fault disable – When 1, Instruction
Debug faults are disabled on the first restart instruction
in the current bundle.
1, IA-32 instruction debug faults are disabled for one
IA-32 instruction. PSR.id and EFLAG .rf are set to 0 after
the successful execution of each IA-32 instruction.
da38Disable Data Access and Dirty-bit faults – When 1, Data
Access and Dirty-Bit faults are disabled on the first
restart instruction in the current bundle or for the first
mandatory RSE reference following the
Access/Dirty-bit faults are not affected by PSR.da.
dd39Data Debug fault disable – When 1, Data Debug faults
are disabled on the first restart instruction in the current
bundle or for the first mandatory RSE reference.
Data Debug traps are not affected by PSR.dd.
ss40Single Step enable – When 1, a Single Step trap occurs
following the successful execution of the first restart
instruction in the current bundle. Instruction slots 0, 1,
and 2 can be single stepped. When 1 or EFLAG.tf is 1,
an IA_32_Exception(Debug) trap is taken after each
IA-32 instruction.
rfi and br.ia instructions and the
k
When PSR.id is 1 or EFLAG.rf is
k
rfi.
IA-32
k
IA-32
l
l
Interruption
State
Serialization
Required
unchangeddata
0rfi
0rfi
unchanged/1
unchanged/0
0rfi
0rfi
0rfi
0rfi
g
g
h
, br.ia
i
g
rfi
j
g
rfi
g
g
g
g
2:24Volume 2: System State and Programming Model
Table 3-2. Processor Status Register Fields (Continued)
FieldBitsDescription
ri42:41Restart Instruction – Set on an interruption, indicating
the next instruction in the bundle to be executed. When
Interruption
State
instruction
pointer
Serialization
Required
g
rfi
the next instruction is the L+X instruction of an MLX,
this field is set to the value 1.
When restarting instructions with
rfi, this field in
IPSR specifies which instruction(s) in the bundle are
restarted. The specified and subsequent instructions
are restarted, all instructions prior to the restart point
are ignored.
0 – restart execution at instruction slot 0
1 – restart execution at instruction slot 1
2 – restart execution at instruction slot 2
3 – reserved
Except at an interruption and for the first restart
instruction following an
rfi, the value of this field is
undefined.
This field is set to 0 after any interruption from the IA-32
instruction set and is ignored when IA-32 instructions
are restarted.
ed43Exception Deferral – When 1, if the first restart
0rfi
g
instruction in the current bundle is a speculative load,
the operation is forced to indicate a deferred exception
by setting the load target register to NaT or NaTVal. No
memory references are performed, however any
address post increments are performed. If the operation
is a speculative advanced load, the ALAT entry
corresponding to the load address and target register is
purged. If the operation is an
lfetch instruction,
memory promotion is not performed, however any
address post increments are performed. When 0,
exception deferral is not forced on restarted speculative
loads. If the first restart instruction is not a speculative
load or
lfetch instruction, this bit is ignored.
bn44register Bank – When 1, registers GR16 to GR31 for
kl
0implicit
m
bank 1 are accessible. When 0, registers GR16 to
GR31 for bank 0 are accessible. Written by
rfi and
bsw instructions.
ia45Disable Instruction Access-bit faults – When 1,
Instruction Access-Bit faults are disabled on the first
restart instruction in the current bundle.
Access-bit faults are not affected by PSR.ia.
k
IA-32
l
vm46Virtual Machine – When 1, an attempt to execute
0rfi
0rfi
certain instructions results in a Virtualization fault.
Implementation of this bit is optional. If the bit is not
implemented, it is treated as a reserved bit. Written by
the
rfi and vmsw instructions.
rv63:47reserved
a. User mask bits are implicitly serialized if accessed via user mask instructions; sum, rum, and move to User
Mask. If modified with system mask instructions;
rsm, ssm and move to PSR.l, software must explicitly
serialize to ensure side effects are observed before dependent instructions.
b. User mask modification serialization is implicit only for monitoring data execution events. Software should
issue instruction serialization operations before monitoring instruction events to achieve better accuracy.
g
g
Volume 2: System State and Programming Model2:25
c. Requires instruction serialization to guarantee that VHPT walks initiated on behalf of an instruction reference
observe the new value of this bit. Otherwise, data serialization is sufficient to guarantee that the new value is
observed.
d. The effect of masking external interrupts with
does not ensure unmasking interruptions with ssm is immediately observed. Software can issue a data
serialization operation to ensure the effects of setting PSR.i are observed before a given point in program
execution.
e. Requires instruction or data serialization, based on whether the dependent “use” is an instruction fetch access
or data access.
f. CPL can be modified due to interruptions, Return From Interruption (
Branch Return (
g. Can only be modified by the Return From Interruption (
and data serialization operation.
h. Modification of the PSR.is bit by a
i. PSR.mc is set to 1 after a machine check abort or INIT; otherwise, unmodified on interruptions.
j. After an interruption this bit is normally unchanged, however after a PAL-based interruption this bit is set to 0.
k. This bit is set to 0 after the successful execution of each instruction in a bundle except for
it to 1.
l. This bit is ignored when restarting IA-32 instructions and set to zero when
complete and before the first IA-32 instruction starts execution.
m. After an interruption,
bank. For interruptions,
to the bank switch operate on the prior register bank.
br.ret) instructions.
rfi, or bsw the processor ensures register accesses are made to the new register
rfi and bsw, the processor ensures all register accesses and outstanding loads prior
3.3.3Control Registers
Table 3-3 defines all registers in the control register name space along with serialization
requirements to ensure side effects are observed by subsequent instructions. However, reads of a
control register must be data serialized with prior writes to the same register. The serialization
required column only refers to the side effects of the data value.
rsm is observed by the next instruction. However, the processor
rfi), Enter Privilege Code (epc), and
rfi) instruction. rfi performs an explicit instruction
br.ia instruction set is implicitly instruction serialized.
rfi which may set
br.ia or rfi successfully
Writes to read-only registers (IVR, IRR0-3) result in an Illegal Operation fault, accesses to reserved
registers result in a Illegal Operation fault. Accesses can only be performed by
instructions defined in Table 3-4 at privilege level 0; otherwise, a Privileged Operation fault is
raised.
Table 3-3. Control Registers
RegisterNameDescription
Global
Control
Registers
CR0DCRDefault Control Registerinst/data
CR1ITMInterval Timer Match registerdata
CR2IVAInterruption Vector Addressinst
CR3-CR7reserved
CR8PTAPage Table Addressinst/data
CR9-15reserved
mov to/from
Serialization
Required
a
a
b
2:26Volume 2: System State and Programming Model
Table 3-3. Control Registers (Continued)
Serialization
Required
d
c
d
d
d
c
d,e
c
c
Interruption
Control
Registers
RegisterNameDescription
CR16IPSRInterruption Processor Status Registerimplied
CR17ISRInterruption Status Registerimplied
CR18reserved
CR19IIPInterruption Instruction Pointerimplied
CR20IFAInterruption Faulting Addressimplied
CR21ITIRInterruption TLB Insertion Registerimplied
CR22IIPAInterruption Instruction Previous Addressimplied
CR23IFSInterruption Function State implied
CR24IIMInterruption Immediate Registerimplied
a. Serialization is needed to ensure external interrupt masking, new interval timer match values or new
interruption table addresses are observed before a given point in program execution.
b. Serialization is needed to ensure new values in PTA are visible to the hardware Virtual Hash Page Table
(VHPT) walker before a dependent instruction fetch or data access.
c. These registers are modified by the processor on an interruption or by an explicit move to these registers.
There are no side effects when written.
d. These registers are implied operands to the rfi and/or TLB insert instructions. The processor ensures writes in
previous instruction groups are observed by rfi and/or TLB insert instructions in subsequent instruction
groups. These registers are also modified by the processor on an interruption, subsequent reads return the
results of the interruption. There are no other side effects.
e. IFS written by a
cover instruction followed by a move-from IFS is implicitly serialized.
Table 3-4. Control Register Instructions
MnemonicDescriptionOperationFormat
mov cr3 = r
mov r1 = cr
srlz.i, rfi
srlz.d
Move to control registerCR[r
2
Move from control registerGR[r
3
Serialize instruction referencesEnsure side effects are observed by
Serialize data referencesEnsure side effects are observed by
Volume 2: System State and Programming Model2:27
] ← GR[r2]M
3
] ← CR[r3]M
1
M
the instruction fetch stream
M
the execute and data streams
3.3.4Global Control Registers
3.3.4.1Default Control Register (DCR – CR0)
The DCR specifies default parameters for PSR values on interruption, some additional global
controls, and whether speculative load faults can be deferred. Figure 3-3 and Table 3-5 define and
describe the DCR fields.
Figure 3-3. Default Control Register (DCR – CR0)
6315 14 13 12 11 10 98 73 2 1 0
rvdd da dr dx dk dp dmrvlc be pp
4911111115111
Table 3-5. Default Control Register Fields
FieldBitDescription
pp0Privileged Performance monitor default – On interruption, DCR.pp is
accesses are performed big-endian; otherwise, little-endian. On
interruption, DCR.be is loaded into PSR.be.
lc2IA-32 Lock Check enable – When 1, and an IA-32 atomic memory
reference is defined as requiring a read-modify-write operation external to
the processor under an external bus lock, an IA_32_Intercept(Lock) is
raised. (IA-32 atomic memory references are defined to require an
external bus lock for atomicity when the memory transaction is made to
non-write-back memory or are unaligned across an
implementation-specific non-supported alignment boundary.) When 0,
and an IA-32 atomic memory reference is defined as requiring a
read-modify-write operation external to the processor under external bus
lock, the processor may either execute the transaction as a series of
non-atomic transactions or perform the transaction with an external bus
lock, depending on the processor implementation. Intel Itanium
semaphore accesses ignore this bit. All unaligned Intel Itanium
semaphore references generate an Unaligned Data Reference fault. All
aligned Intel Itanium semaphore references made to memory that is
neither write-back cacheable nor a NaTPage result in an Unsupported
Data Reference fault.
dm8Defer TLB Miss faults only (VHPT data, Data TLB, and Alternate Data
TLB faults) – When 1, and a TLB miss is deferred, lower priority Debug
faults may still be delivered. A TLB miss fault, deferred or not, precludes
concurrent Page not Present, Key Miss, Key Permission, Access Rights,
or Access Bit faults. This bit is ignored by IA-32 instructions.
dp9Defer Page not Present faults only – When 1, and a Page not Present
fault is deferred, lower priority Debug faults may still be delivered. A Page
not Present fault, deferred or not, precludes concurrent Key Miss, Key
Permission, Access Rights, or Access Bit faults. This bit is ignored by
IA-32 instructions.
dk10Defer Key Miss faults only – When 1, and a Key Miss fault is deferred,
lower priority Access Bit, Access Rights or Debug faults may still be
delivered. A Key Miss fault, deferred or not, precludes concurrent Key
Permission faults. This bit is ignored by IA-32 instructions.
dx11Defer Key Permission faults only – When 1, and a Key Permission fault is
deferred, lower priority Access Bit, Access Rights or Debug faults may
still be delivered. This bit is ignored by IA-32 instructions.
Serialization
Required
data
inst
data
data
data
data
data
2:28Volume 2: System State and Programming Model
Table 3-5. Default Control Register Fields (Continued)
FieldBitDescription
dr12Defer Access Rights faults only – When 1, and an Access Rights fault is
deferred, lower priority Access Bit or Debug faults may still be delivered.
This bit is ignored by IA-32 instructions.
da13Defer Access Bit faults only – When 1, and an Access Bit fault is
deferred, lower priority Debug faults may still be delivered. This bit is
ignored by IA-32 instructions.
dd14Defer Debug faults – When 1, Data Debug faults on speculative loads are
deferred. This bit is ignored by IA-32 instructions.
rv7:3,
63:15
reservedreserved
Serialization
Required
data
data
data
For the DCR exception deferral bits, when the bit is 1, and a speculative load results in the specified
fault condition, and the speculative load’s code page exception deferral bit (ITLB.ed) is 1, the
exception is deferred by setting the speculative load target register to NaT or NaTVal. Otherwise,
the specified fault is taken on the speculative load. For a description of faults on speculative loads
see “Deferral of Speculative Load Faults” on page 2:98.
Since DCR.be also controls byte ordering of VHPT references that are the result of instruction
misses, DCR.be requires instruction serialization. Other DCR bits require data serialization only.
3.3.4.2Interval Time Counter and Match Register (ITC – AR44 and ITM – CR1)
The Interval Time Counter (ITC) and Interval Timer Match (ITM) register support elapsed time
notification, see Figure 3-4 and Figure 3-5.
Figure 3-4. Interval Time Counter (ITC – AR44)
630
ITC
64
Figure 3-5. Interval Timer Match Register (ITM – CR1)
630
ITM
64
The ITC is a free-running 64-bit counter that counts up at a fixed relationship to the input clock to
the processor. The ITC may be clocked at a somewhat lower frequency than the instruction
execution frequency. This clocking relationship is described in the PAL procedure
PAL_FREQ_RATIOS on page 2:380. The ITC is guaranteed to be clocked at a constant rate, even
if the instruction execution frequency may vary. The ITC counting rate is not affected by power
management mechanisms.
A sequence of reads of the ITC is guaranteed to return ever-increasing values (except for the case
of the counter wrapping back to 0) corresponding to the program order of the reads. Applications
can directly sample the ITC for time-based calculations.
Volume 2: System State and Programming Model2:29
A 64-bit overflow condition can occur without notification. The ITC can be read at any privilege
level if PSR.si is zero. The timer can be secured from non-privileged access by setting PSR.si to
one. When secured, a read of the ITC by non-privileged code results in a Privileged Register fault.
Writes to the ITC can only be performed at privilege level 0; otherwise, a Privileged Register fault
is raised.
The IA-32 Time Stamp Counter (TSC) is similar to ITC. The ITC can be read by the IA-32
rdtsc
(read time stamp counter) instruction. System software can secure the ITC from non-privileged
IA-32 access by setting either PSR.si or CFLG.tsd to 1. When secured, an IA-32 read of the ITC at
any privilege level other than the most privileged raises an IA_32_Exception(GPfault).
When the value in the ITC is equal to the value in the ITM an Interval Timer Interrupt is raised.
Once the interruption is taken by the processor and serviced by software, the ITC may not
necessarily be equal to the ITM. The ITM is accessible only at privilege level 0; otherwise, a
Privileged Operation fault is raised.
The interval counter can be written, for initialization purposes, by privileged code. The ITC is not
architecturally guaranteed to be synchronized with any other processor’ s interval time counter in an
multiprocessor system, nor is it synchronized with the wall clock. Software must calibrate interval
timer ticks to wall clock time and periodically adjust for drift. In a multiprocessor system, a
processor's ITC is not architecturally guaranteed to be clocked synchronously with the ITC's on
other processors, and may not be clocked at the same nominal clock rate as ITC's on other
processors. The platform firmware provides information on the clocking of processors in a
multiprocessor system.
Modification of the ITC or ITM is not necessarily serialized with respect to instruction execution.
Software can issue a data serialization operation to ensure the ITC or ITM updates and possible
side effects are observed by a given point in program execution. Software must accept a level of
sampling error when reading the interval timer due to various machine stall conditions,
interruptions, bus contention effects, etc. Please see the processor-specific documentation for
further information on the level of sampling error of the Itanium processor.
3.3.4.3Interruption Vector Address (IVA – CR2)
The IVA specifies the location of the interruption vector table in the virtual address space, or the
physical address space if PSR.it is 0, see Figure 3-6. The size of the vector table is 32K bytes and is
32K byte aligned. The lower 15 bits of the IV A are ignored when written, reads return zeros. All
upper 49 address bits of IVA must be implemented regardless of the size of the physical and virtual
address space. If an unimplemented virtual or physical address (see “Unimplemented Address Bits”
on page 2:67) is loaded into IVA, and an interruption occurs, processor behavior is unpredictable.
See “IVA-based Interruption Vectors” on page 2:106 for a description of an interruption table
layout.
The PTA anchors the Virtual Hash Page Table (VHPT) in the virtual address space. See “Virtual
Hash Page Ta ble (VHPT)” on page 2:56 for a complete definition of the VHPT. Operating systems
must ensure that the table is aligned on a natural boundary; otherwise, processor operation is
undefined. See Figure 3-7 and Table 3-6 for the PTA field definitions.
Figure 3-7. Page Table Address (PTA – CR8)
6315 149872 1 0
baservvfsizerv ve
496161 1
Table 3-6. Page Table Address Fields
FieldBitsDescription
ve0VHPT Enable – When 1, the processor is enabled to walk the VHPT.
size7:2VHPT Size – VHPT table size in power of 2 increments, table size is 2
generates a mask that is logically AND’ed with the result of the VHPT hash function.
Minimum VHPT table size is 32K bytes; otherwise, a Reserved Register/Field fault is
raised (see “Virtual Hash Page Table (VHPT)” on page 2:56). The maximum size is 2
bytes for long format VHPTs, and 2
vf8VHPT Format – When 0, 8-byte short format entries are used, when 1, 32-byte long
format entries are used.
base63:15VHP T Base virtual address – Defines the starting virtual address of the VHPT table. Base
is logically OR’ed with the hash index produced by the VHPT hash function when
referencing the VHPT. Base must be on 2
undefined. All base address bits of PTA must be implemented regardless of the size of
the physical and virtual address space. If an unimplemented virtual address (see
“Unimplemented Address Bits” on page 2:67) is used by the processor as a page table
base, all VHPT walks generate an Instruction/Data TLB miss (see “Translation Searching”
on page 2:63).
rv1, 14:9reserved
52
bytes for short format VHPTs.
size
boundary otherwise processor operation is
size
bytes. Size
61
3.3.5Interruption Control Registers
Registers CR16 - CR25 record information at the time of an interruption (including from the IA-32
instruction set) and are used by handlers to process the interruption.
The interruption control registers can only be read or written while PSR.ic is 0; otherwise, an
Illegal Operation fault is raised. These registers are only guaranteed to retain their values when
PSR.ic is 0. When PSR.ic is 1, the processor does not preserve their contents.
The contents of the interruption control registers are defined only when the PSR.ic bit is cleared by
an interruption. If the PSR.ic bit is explicitly cleared (e.g., by using
contents of these registers are undefined. If the PSR.ic bit is explicitly set (e.g., by using
mov to PSR), then the contents of these registers are undefined until the PSR.ic bit has been
serialized and an interruption occurs.
IIPA has special behavior in case of an
rfi to a fault. Refer to “Interruption Instruction Previous
Address (IIPA – CR22)” on page 2:35.
Volume 2: System State and Programming Model2:31
rsm, or mov to PSR), then the
ssm, or
3.3.5.1Interruption Processor Status Register (IPSR – CR16)
On an interruption and if PSR.ic is 1, the IPSR receives the value of the PSR. The IPSR, IIP and
IFS are used to restore processor state on a Return From Interruption (
rfi). The IPSR has the same
format as PSR, see “Processor Status Register (PSR)” on page 2:20 for details.
3.3.5.2Interruption Status Register (ISR – CR17)
The ISR receives information related to the nature of the interruption, and is written by the
processor on all interruption events regardless of the state of PSR.ic, except for Data Nested TLB
faults. The ISR contains information about the excepting instruction and its properties such as
whether it was doing a read, write, execute, speculative, or non-access operation, see Figure 3-8
and Table 3-7. Multiple bits may be concurrently set in the ISR, for example, a faulting semaphore
operation will set both ISR.r and ISR.w, and faults on speculative loads will set ISR.sp and ISR.r.
Additional fault- or trap-specific information is available in ISR.code and ISR.vector. Refer to
Section 8.2, “ISR Settings” for complete definition of the ISR field settings.
Figure 3-8. Interruption Status Register (ISR – CR17)
code15:0Interruption Code – 16 bit code providing additional information specific to the current
interruption. For IA-32 specific exceptions and software interrupts, contains the IA-32
interruption error code or zero.
vector23:16IA-32 exception/interception vector number. For IA-32 exceptions and software
interrupts, contains the IA-32 vector number (e.g., GPFault has a vector number of
13). See Chapter 9, “IA-32 Interruption Vector Descriptions” for details.
x32Execute exception – Interruption is associated with an instruction fetch (including
IA-32).
w33Write exception – Interruption is associated with a write operation. Both ISR.r and
ISR.w are set for IA-32 read-modify-write instructions.
r34Read exception – Interruption is associated with a read operation. Both ISR.r and
na35Non-access exception – See Section 5.5.2, “Non-access Instructions and
sp36Speculative load exception – Interruption is associated with a speculative load
rs37Register Stack – Interruption is associated with a mandatory RSE fill or spill. This bit is
ir38Incomplete Register frame – The current register frame is incomplete when the
ni39Nested Interruption – Indicates that PSR.ic was 0 or in-flight when the interruption
ISR.w are set for IA-32 read-modify-write instructions.
Interruptions” on page 2:97. This bit is always 0 for interruptions taken in the IA-32
instruction set.
instruction. This bit is always 0 for interruptions taken in the IA-32 instruction set.
always 0 for interruptions taken in the IA-32 instruction set.
interruption occurred. This bit is always 0 for interruptions taken in the IA-32 instruction
set.
occurred. This bit is always 0 for interruptions taken in the IA-32 instruction set.
2:32Volume 2: System State and Programming Model
Table 3-7. Interruption Status Register Fields (Continued)
FieldBitsDescription
so40IA-32 Supervisor Override – Indicates the fault occurred during an IA-32 instruction set
ei42:41Excepting Instruction –
ed43Exception Deferral – this bit is set to the value of the TLB exception deferral bit
rv31:24,
63:44
supervisor override condition (the processor was performing a data memory accesses
to the IDT , GDT, LDT or TSS segments) or an IA-32 data memory access at a privilege
level of zero. This bit is always 0 for interruptions taken while executing Intel Itanium
instructions.
0 – exception due to instruction in slot 0
1 – exception due to instruction in slot 1
2 – exception due to instruction in slot 2
For faults and external interrupts, ISR.ei is equal to IPSR.ri. For traps, ISR.ei defines
the slot of the excepting instruction. Traps on the L+X instruction of an MLX set ISR.ei
to 2. This field is always 0 for interruptions taken in the IA-32 instruction set.
(TLB.ed) for the instruction page containing the faulting instruction. If a translation
does not exist or instruction translation is disabled, or if the interruption is caused by a
mandatory RSE spill or fill, ISR.ed is set to 0. This bit is always 0 for interruptions taken
in the IA-32 instruction set.
On an interruption and if PSR.ic is 1, the IIP receives the value of IP. IIP contains the virtual
address (or physical if instruction translations are disabled) of the next instruction bundle or the
IA-32 instruction to be executed upon return from the interruption. For IA-32 instruction addresses,
IIP is zero extended to 64-bits and specifies a byte granular address. For traps and interrupts, IIP
points to the next instruction to execute. For faults, IIP points to the faulting instruction. As shown
in Figure 3-9, all 64-bits of the IIP must be implemented regardless of the size of the physical and
virtual address space supported by the processor model (see “Unimplemented Address Bits” on
page 2:67). IIP also receives byte-aligned IA-32 instruction pointers. The IIP, IPSR and IFS are
used to restore processor state on a Return From Interruption instruction (
Vector Descriptions” on page 2:157 for usages of the IIP.
rfi). See “Interruption
An
rfi to Itanium architecture-based code (IPSR.is is 0) ignores IIP{3:0}, an rfi to IA-32 code
(IPSR.is is 1) ignores IIP{63:32}. Ignored bits are assumed to be zero.
Control transfers to unimplemented addresses (see “Unimplemented Address Bits” on page 2:67)
result in an Unimplemented Instruction Address trap or fault. When the trap or fault is delivered,
IIP is written as follows:
• If the trap is taken for an unimplemented virtual address, IIP is written in one of two ways,
depending on the implementation: 1) IIP may be written with the implemented virtual address
bits IP{63:61} and IP{IMPL_VA_MSB:0} only. Bits IIP{60:IMPL_VA_MSB+1} are set to
IP{IMPL_VA_MSB}, i.e., sign-extended. 2) IIP may be written with the full, unimplemented
virtual address from IP.
Volume 2: System State and Programming Model2:33
• If the trap is taken for an unimplemented physical address, IIP is written in one of two ways,
depending on the implementation: 1) IIP may be written with the physical addressing memory
attribute bit IP{63} and the implemented physical address bits IP{IMPL_PA_MSB:0} only.
Bits IIP{62:IMPL_PA_MSB+1} are set to 0. 2) IIP may be written with the full,
unimplemented physical address from IP.
When an
rfi is executed with an unimplemented address in IIP (an unimplemented virtual address
if IPSR.it is 1, or an unimplemented physical address if IPSR.it is 0), and an Unimplemented
Instruction Address trap is taken, an implementation may optionally leave IIP unchanged
(preserving the unimplemented address in IIP).
Note:Since IP{3:0} are always 0 when executing Itanium architecture-based code, IIP{3:0} will
always be 0 when any interruption is taken from Itanium architecture-based code, with the
exception of an Unimplemented Instruction Address trap on an
optionally be preserved as whatever value it held before executing the
3.3.5.4Interruption Faulting Address (IFA – CR20)
On an interruption and if PSR.ic is 1, the IFA receives the virtual address (or physical address if
translations are disabled) that raised a fault. IF A reports the faulting address for both instruction and
data memory accesses (including IA-32). For faulting data references (including IA-32), IF A points
to the first byte of the faulting data memory operand. IFA reports a byte granular address. For
faulting instruction references (including IA-32), IFA contains the 16-byte aligned bundle address
(IF A{3:0} are zero) of the faulting instruction. For faulting IA-32 instructions, IIP points to the first
byte of the IA-32 instruction, and is byte granular. In the event of an IA-32 instruct ion spann ing a
virtual page boundary, IA-32 instruction fetch faults are reported as either (1) for faults on the first
page, IFA is set to the bundle address (IFA{3:0}=0) of the faulting instruction and IIP points to the
first byte of the faulting instruction, or (2) for faults on the second page, IFA contains the bundle
address of the second virtual page and IIP points to the first byte of the faulting IA-32 instruction.
The IF A also specifies a translation’s virtual address when a translation entry is inserted into the
instruction or data TLB. See “Interruption Vector Descriptions” on page 2:157 and “Translation
Insertion Format” on page 2: 48 fo r usages of the IFA. As shown in Figure 3-10, all 64-bits of the
IFA must be implemented regardless of the size of the virtual and physical space supported by the
processor model (see “Unimplemented Address Bits” on page 2:67). In some implem e ntati ons, a
mov to IFA instruction may raise an Unimplemented Data Address fault if an unimplemented
virtual address is used.
The ITIR receives default translation information from the referenced virtual region register on a
virtual address translation fault. See “Interruption Vector Descriptions” on page 2:157 for the fault
conditions that set the ITIR. The ITIR provides additional virtual address translation parameters on
an insertion into the instruction or data TLB. See “Translation Instructions” on page 2:55 for ITIR
usage information. Figure 3-11 and Table 3-8 define the ITIR fields.
ps7:2Page Size – On a TLB insert, specifies the size of the virtual to physical address
key31:8protection Key – On a TLB insert specifies a protection key that uniquely tags
Reserved / Check on Insert – On a read these fields may return zeros or the value last
written to them. If a non-zero value is written, a Reserved Register/Field fault may be
raised on the mov to ITIR instruction. If not, a subsequent TLB insert will raise a
Reserved Register Field fault depending on other parameters to the insert. See
“Translation Insertion Format” on page 2:48. On an instruction or data translation fault,
these fields are set to zero.
mapping.
raised on the mov to ITIR instruction. If not, a subsequent TLB insert will raise a
Reserved Register/Field fault. See “Translation Insertion Format” on page 2:48. On an
instruction or data translation fault, this field is set to the accessed region’s page size
(RR.ps).
translations to a protection domain. If non-zero values are written to unimplemented
protection key bits, a Reserved Register/Field fault may be raised on the mov to ITIR
instruction. If not, a subsequent TLB insert will raise a Reserved Register/Field fault
depending on other parameters to the insert. See “Translation Insertion Format” on
page 2:48. On an instruction or data translation fault, this field is set to the accessed
Region Identifier (RR.rid).
If an unsupported page size is written, a Reserved Register/Field fault may be
For Itanium instructions, IIPA records the last successfully executed instruction bundle address. For
IA-32 instructions, IIPA records the byte granular virtual instruction address zero extended to
64-bits of the faulting or trapping IA-32 instruction. In the case of a fault, IIPA does not report the
address of the last successfully executed IA-32 instruction, but rather the address of the faulting
IA-32 instruction. IIPA preserves bits 3:0 for byte aligned IA-32 instruction addresses.
The IIPA can be used by software to locate the address of the instruction bundle or IA-32
instruction that raised a trap or the instruction executed prior to a fault or interruption. In the case of
a branch related trap, IIPA points to the instruction bundle which contained the branch instruction
that raised the trap, while IIP points to the target of the branch.
When an instruction successfully executes without a fault, and the PSR.ic bit was 1 prior to
instruction execution, it becomes the “last successfully executed instruction.” On interruptions,
IIPA contains the address of the last successfully executed instruction bundle or IA-32 instruction,
if PSR.ic was 1 prior to the interruption. Note that execution of an
equal to 0, but which sets PSR.ic to 1 does not update IIPA, since PSR.ic was zero prior to
instruction execution.
When PSR.ic is one, accesses to IIP A cause an Illegal Operation fault. When PSR.ic is zero, IIPA is
not updated by hardware and can be read and written by software. This permits low-level code to
preserve IIPA across interruptions.
rfi instruction with PSR.ic
Volume 2: System State and Programming Model2:35
If the PSR.ic bit is explicitly cleared, e.g., by using rsm, then the contents of IIPA are undefined.
Only when the PSR.ic bit is cleared by an interruption is the value of IIPA defined. It may point at
the instruction which caused a trap, or at the instruction just prior to a faulting instruction, at an
earlier instruction that became defined by some prior interruption, or by a move to IIPA instruction
when PSR.ic was zero.
If the PSR.ic bit is explicitly set, e.g., by using
ssm, then the contents of IIPA are undefined until
the PSR.ic bit has been serialized and an interruption occurs.
During instruction set transitions the following boundary cases exist:
• On faults taken on the first IA-32 instruction after a
br.ia or rfi, IIPA records the faulting
IA-32 instruction address.
•On
br.ia traps, IIPA records the address of the trapping instruction bundle.
• On faults taken on the first Itanium instruction after leaving the IA-32 instruction set, due to a
jmpe or interruption, IIP A contains the address of the jmpe instruction or the interrupted IA-32
instruction.
•On
jmpe Data Debug, Single Step and Taken Branch traps, IIPA contains the address of the
jmpe instruction.
As shown in Figure 3-12, all 64-bits of the IIPA must be implemented regardless of the size of the
physical and virtual address space supported by the processor model (see “Unimplemented Address
The IFS register is used to reload the current register stack frame (CFM) on a Return From
Interruption (
rfi). If the IFS is accessed while PSR.ic is 1, an Illegal Operation fault is raised. The
IFS can only be accessed at privilege level 0; otherwise, a Privileged Operation fault is raised. The
IFS.v bit is cleared on interruption if PSR.ic is 1. All other fields are undefined after an
interruption. If PSR.ic is 0, the
cover instruction copies CFM to IFS.ifm and sets IFS.v to 1. See
Figure 3-13 and Table 3-9 for the IFS field definitions.
Figure 3-13. Interruption Function State (IFS – CR23)
63 62 38 370
vrvifm
12538
Table 3-9. Interruption Function State Fields
FieldBitsDescription
ifm37:0Interruption Frame Marker
v63Valid bit, cleared to 0 on interruption if PSR.ic is 1.
rv62:38reserved
2:36Volume 2: System State and Programming Model
3.3.5.8Interruption Immediate (IIM – CR24)
If PSR.ic is 1, the IIM (Figure 3-14) records the zero-extended immediate field encoded in chk.a,
chk.s, fchkf or break instruction faults. The break.b instruction always writes a zero value and
ignores its immediate field. The IA_32_Intercept vector writes all 64-bits of IIM to indicate the
cause of the intercept. See Table 8-1 on page 2:158 for the value of IIM in other situat ions. For the
purpose of resource dependency, IIM is written as a result of the fault, not by the instruction itself.
Figure 3-14. Interruption Immediate (IIM – CR24)
63 0
Interruption Immediate
64
3.3.5.9Interruption Hash Address (IHA – CR25)
The IHA (Figure 3-15) is loaded with the address of the Virtual Hash Page Table (VHPT) entry the
processor referenced or would have referenced to resolve a translation fault. The IHA is written on
interruptions by the processor when PSR.ic is 1. Refer to “VHPT Hashing” on page 2:59 for
complete details. See Table 8-1 on page 2:158 for the value of IHA in other situations. All upper 62
address bits of IHA must be implemented regardless of the size of the virtual address space
supported by the processor model (see “Unimplemented Address Bits” on page 2:67). The virtual
address written to IHA by the processor is guaranteed to be an implemented virtual addresses on all
processor models; however, if the address referenced by the VHPT is an unimplemented virtual
address, the value of IHA is undefined.
The external interrupt control registers (CR64-81) are defined in “External Interrupt Control
Registers” on page 2:115. They are used to prioritize and deliver external interrupts, send
inter-processor interrupts to other processors and assign interrupt vectors for locally generated
processor interrupts.
3.3.7Banked General Registers
Banked general registers (see Figure 3-16) provide immediate register context for low-level
interruption handlers (e.g., speculation and TLB miss handlers). Upon interruption, the processor
switches 16 general purpose registers (GR16 to GR31) to register bank 0, register bank 1 contents
are preserved.
When PSR.bn is 1, bank 1 for registers GR16 to GR31 is selected; when 0, bank 0 for registers
GR16 to GR31 is selected. Banks are switched in the following cases:
• An interruption selects bank 0,
•
rfi switches to the bank specified by IPSR.bn, or
Volume 2: System State and Programming Model2:37
• bsw switches to the specified bank.
On an interruption or bank switch, the processor ensures all prior register accesses (reads and
writes) are performed to the prior register bank. Data values in banked registers are preserved
across bank switches and both banks maintain NaT values when loaded from general registers.
Registers from both banks cannot be addressed at the same time. However, non-banked general
registers (GR0-15, and GR32-127) are accessible regardless of the state of PSR.bn.
Figure 3-16. Banked General Registers
General Registers
63 0
gr
0
gr
1
gr
16
gr
31
gr
32
gr
127
0
NaTs
0
Banked General
Registers
63 0
Volatile Registers
NaTs
0
gr
gr
gr
gr
16
23
24
31
The ALAT register target tracking mechanism (see “Data Speculation” on page 1:59) does not
distinguish the two register banks; from the ALAT’s perspective GR16 in bank 0 is the same
register as GR16 in bank 1.
Operating systems should ensure that IA-32 and Itanium architecture-based application code is
executed within register bank 1. If IA-32 or Itanium architecture-based application code executes
out of register bank 0, the application register state (including IA-32) will be lost on any
interruption. During interruption processing the operating system uses register bank 0 as the initial
working register context.
Usage of these additional registers is determined by software conventions. However, registers
GR24 to GR31, of bank 0, are not preserved when PSR.ic is 1; operating system code can not rely
on register values being preserved unless PSR.ic is 0. While PSR.ic is 1, processor-specific
firmware may use these registers for machine check or firmware interruption handling at any point
regardless of the state of PSR.i. If PSR.ic is 0, GR24 to GR31 can be used as scratch registers for
low-level interruption handlers. Registers GR16 to GR23 are always preserved; operating system
code can rely on the values being preserved.
3.4Processor Virtualization
Processors in the Itanium Processor Family may optionally implement a mechanism to support
processor virtualization. This includes an additional PSR.vm bit (see Section 3.3.2, “Processor
Status Register (PSR)”), which, when 1, causes certain instructions to take a Virtualization fault
(see Section 5.6, “Interruption Priorities” and “Virtualization vector (0x6100)” on page 2:198).
The set of instructions which are virtualized by PSR.vm are listed in Table 3-10 below.
Some non-privileged
instructions (virtualized at
all privilege levels)
Some non-privileged
instructions (virtualized at
privilege level 0)
Reading AR[ITC] with
PSR.si==1 takes
(virtualized at all privilege
levels)
Instructions which write
privileged registers
thash, ttag, mov from cpuid
cover
mov from ar.itc
mov to itc
Processors which support processor virtualization must provide an implementation-dependent
mechanism for disabling the
described on the
vmsw instruction page. When disabled, the vmsw instruction always raises a
Virtualization fault when executed at the most privileged level.
Processor virtualization is largely invisible to system software, and therefore its effects on
virtualized instructions are not discussed in this document, except on the instruction description
pages themselves.
vmsw instruction. When enabled, the vmsw instruction functions as
Volume 2: System State and Programming Model2:39
2:40Volume 2: System State and Programming Model
2
Addressing and Protection4
This chapter defines operating system resources to translate 64-bit virtual addresses into physical
addresses, 32-bit virtual addressing, virtual aliasing, physical addressing, memory ordering and
properties of physical memory. Register state defined to support virtual memory management is
defined in Chapter 3, while Chapter 5 provides complete information on virtual memory faults.
Note:Unless otherwise noted, references to “interruption” in this chapter refer to IVA-based
interruptions. See “Interruption Definitions” on page 2:8 9.
The following key features are supported by the virtual memory model.
• Virtua l Regions are defined to support contemporary operating system Multiple Address Space
(MAS) models of placing each process within a unique address space. Region identifiers
uniquely tag virtual address mappings to a given process.
• Protection Domain mechanisms support the Single Address Space (SAS) model, where
processes co-exist within the same virtual address space.
• Translation Lookaside Buffer (TLB) structures are defined to support high-performance paged
virtual memory systems. Software TLB fill and protection handlers are utilized to defer
translation policies and protection algorithms to the operating system.
• A Virtual Hash Page Table (VHPT) is designed to augment the performance of the TLB. The
VHPT is an extension of the processor’s TLB that resides in memory and can be automatically
searched by the processor. A particular operating system page table format is not dictated.
However, the VHPT is designed to mesh with two comm on translati on structures: the virtual
linear page table and hashed page table. Enabling of the VHPT and the size of the VHPT are
completely under software control.
• Sparse 64-bit virtual addressing is supported by providing for large translation arrays
(including multiple levels of hierarchy similar to a cache hierarchy), efficient translation miss
handling support, multiple page sizes, pinned translations, and mechanisms to promote sharing
of TLB and page table resources.
4.1Virtual Addressing
As seen by Itanium architecture-based application programs, the virtual addressing model is
fundamentally a 64-bit flat linear virtual address space. 64-bit general registers are used as pointers
into this address space. IA-32 32-bit virtual linear addresses are zero extended into the 64-bit
virtual address space.
As shown in Figure 4-1, the 64-bit virtual address space is divided into eight 2
regions. The region is selected by the upper 3-bits of the virtual address. Associated with each
virtual region is a region register that specifies a 24-bit region identifier (unique address space
number) for the region. Eight out of the possible 2
accessible via the 8 region registers. The region identifier can be considered the high order address
bits of a large 85-bit global address space for a single address space model, or as a unique ID for a
multiple address space model.
Volume 2: Addressing and Protection2:41
24
virtual address spaces are concurrently
61
byte virtual
Figure 4-1. Virtual Address Spaces
Virtual Address
224 Virtual
Address Spaces
8 Virtual
Regions
261 Bytes
Per Region
630
3
1
0
4K to 256M
Pages
By assigning sequential region identifiers, regions can be coalesced to produce larger 62-, 63- or
64-bit spaces. For example, an operating system could implement a 62-bit region for process
private data, 62-bit region for I/O, and a 63-bit region for globally shared data. Default page sizes
and translation policies can be assigned to each virtual region.
Figure 4-2 shows the process of mapping a virtual address into a physical address. Each virtual
address is composed of three fields: the Virtual Region Number, the Virtual Page Number, and the
page offset. The upper 3-bits select the Virtual Region Number (VRN). The least-significant bits
form the page offset. The Virtual Page Number (VPN) consists of the remaining bits. The VRN bits
are not included in the VPN. The page offset bits are passed through the translation process
unmodified. Exact bit positions for the page offset and VPN bits vary depending on the page size
used in the virtual mapping.
On a memory reference (any reference other than an insert or purge), the VRN bits select a Region
Identifier (RID) from 1 of the 8 region registers, the TLB is then searched for a translation entry
with a matching VPN and RID value. The VRN may optionally be used when searching for a
matching translation on memory references (references other than inserts and purges
– see Section
4.1.1.4, “Purge Behavior of TLB Inserts and Purges”). If a matching translation entry is found, the
entry’s physical page number (PPN) is concatenated with the page offset bits to form the physical
address. Matching translations are qualified by page-granular privilege level access right checks
and optional protection domain checks by verifying the translation’ s key is contained within a set of
protection key registers and read, write, execute permissions are granted.
If the required translation is not resident in the TLB, the processor may optionally search the VHPT
structure in memory for the required translation and install the entry into the TLB. If the required
entry cannot be found in the TLB and/or VHPT, the processor raises a TLB Miss fault to request
that the operating system supply the translation. After the operating system installs the translation
in the TLB and/or VHPT, the faulting instruction can be restarted and execution resumed.
Virtual addressing for instruction references are enabled when PSR.it is 1, data references when
PSR.dt is 1, and register stack accesses when PSR.rt is 1.
2:42Volume 2: Addressing and Protection
Figure 4-2. Conceptual Virtual Address Translation for References
Hash
Region
Registers
rr
0
rr
1
rr
2
Region ID
rr
7
Region ID
24
Search
Key
Virtual Region Number (VRN)
Search
Virtual Page Num (VPN)
VRN
Search
63 61 60
3
Virtual Address
Virtual Page Number (VPN)
Physical Page Num (PPN)
Rights
Translation Lookaside Buffer (TLB)
24
pkr
pkr
pkr
Search
0
Key
1
2
Protection
Rights
Key Registers
62
Physical Page Number (PPN)Offset
Physical Address
0
Offset
0
4.1.1Translation Lookaside Buffer (TLB)
The processor maintains two architectural TLBs as shown in Figure 4-3, the Instruction TLB
(ITLB) and Data TLB (DTLB). Each TLB services translation requests for instruction and data
memory references (including IA-32), respectively. The Data TLB also services translation
requests for references by the RSE and the VHPT walker. The TLBs are further divided into two
sub-sections; Translation Registers (TR) and Translation Cache (TC).
Figure 4-3. TLB Organization
ITLB
itr
0
itr
1
itr
2
itr
n
itc
In the remainder of this document, the term TLB refers to the combined instruction, data,
translation register, and translation cache structures.
ITR
ITC
dtr
dtr
dtr
dtr
dtc
0
1
2
n
DTLB
DTR
DTC
Volume 2: Addressing and Protection2:43
The TLB is a local processor resource; installation of a translation or local processor purges do not
affect other processor’s TLBs. Global TLB purges are provided to purge translations from all
processors within a TLB coherence domain in a multiprocessor system.
4.1.1.1Translation Registers (TR)
The Translation Register (TR) section of the TLB is a fully-associative array defined to hold
translations that software directly manages. Software can explicitly insert a translation into a TR by
specifying a register slot number. Translations are removed from the TRs by specifying a vi rtual
address, page size and a region identifier. Translation registers allow the operating system to “pin”
critical virtual memory translations in the TLB. Examples include I/O spaces, kernel memory areas,
frame buffers, page tables, sensitive interruption code, etc. Instruction fetches for interruption
handlers are performed using virtual addresses; therefore, virtual address ranges containing
software translation miss routines and critical interruption sequences should be pinned or else
additional TLB faults may occur. Other virtual mappings may be pinned for performance reasons.
Entries are placed into a specific TR slot with the Insert Translation Register (
itr) instruction.
Once a translation is inserted, the processor will not replace the translation to make room for other
translations. Local translations can only be removed by software issuing the Purge Translation
Register (
ptr) instruction.
TR inserts and purges may cause other TR and/or TC entries to be removed (refer to Section
4.1.1.4, “Purge Behavior of TLB Inserts and Purges” for details). Prior to inserting a TR entry,
software must ensure that no overlapping translation exists in any TR (including the one being
written); otherwise, a Machine Check abort may be raised, or the processor may exhibit other
undefined behavior. Translation register entries may be removed by the processor due to hardware
or software errors. In the presence of an error, the processor can remove TR entries; notification is
raised via a Machine Check abort.
There are at least 8 instruction and 8 data TR slots implemented on all processor models. Please see
the processor-specific documentation for further information on the number of translation registers
implemented on the Itanium processor. Translation registers support all implemented page sizes
and must be implemented in a single-level fully-associative array. Any register slot can be used to
specify any virtual address mapping. Translation registers are not directly readable.
In some processor models, translation registers are physically implemented as a subsection of the
translation cache array. Valid TR slots are ignored for purposes of processor replacement on an
insertion into the TC. However, invalid TR slots (unused slots) may be used as TC entries by the
processor. As a result, software inserts into previously invalid TR entries may invalidate a TC entry
in that slot.
Implementations may also place a floating boundary between TR and TC entries within the same
structure where any entry above the boundary is considered a TC and any entry below the boundary
a TR. T o maximize TC resources, software should allocate contiguous translation registers starting
at slot 0 and continuing upwards.
2:44Volume 2: Addressing and Protection
4.1.1.2Translation Cache (TC)
The Translation Cache (TC) is an implementation-specific structure defined to hold the large
working set of dynamic translations for memory references (including IA-32). Please see the
processor-specific documentation for further information on Itanium processor TC implementation
details. The processor directly controls the replacement policy of all TC entries.
Entries are installed by software into the translation cache with the Insert Data Translation Cache
(
itc.d) and Insert Instruction Translation Cache (itc.i) instructions. The Purge Translation
Cache Local (
specified virtual address range and region identifier. Purges of all ITC/DTC entries matching a
specified virtual address range and region identifier among all processors in a TLB coherence
domain can be globally performed with the Purge Translation Cache Global (
instruction. The TLB coherence domain covers at least the processors on the same local bus on
which the purge was broadcast. Propagation between multiple TLB coherence domains is platform
dependent. Software must handle the case where a purge does not propagate to all processors in a
multiprocessor system. Translation cache purges do not invalidate TR entries.
All the entries in a local processor’s ITC and DTC can be purged of all entries with a sequence of
Purge Translation Cache Entry (
processors.
In all processor models, the translation cache has at least 1 instruction and 1 data entry in addition
to the specified 8 instruction and 8 data translation registers. Implementations are free to implement
translation cache arrays of larger sizes. Implementations may also choose to implement additional
hierarchies for increased performance. At least one translation cache level is required to support all
implemented page sizes. Additional hierarchy levels may or may not be performance optimized for
the preferred page size specified by the virtual region, may be set-associative or fully associative,
and may support a limited set of page sizes. Please see the processor-specific documentation for
further information on the Itanium processor implementation details of the translation cache.
ptc.l) instruction purges all ITC/DTC entries in the local processor that match the
ptc.g, ptc.ga)
ptc.e) instructions. A ptc.e does not propagate to other
The translation cache is managed by both software and hardware. In general, software cannot
assume any entry installed will remain, nor assume the lifetime of any entry since replacement
algorithms are implementation specific. The processor may discard or replace a translation at any
point in time for any reason (subject to the forward progress rules below). TC purges may remove
more entries than explicitly requested. In the presence of a processor hardware error, the processor
may remove TC entries and optionally raise a Corrected Machine Check Interrupt.
In order to ensure forward progress for Itanium architecture-based code, the following rules must
be observed by the processor and software.
• Software may insert multiple translation cache entries per TLB fault, provided that only the
last installed translation is required for forward progress.
• The processor may occasionally invalidate the last TC entry inserted. The processor must
eventually guarantee visibility of the last inserted TC entry to all references while PSR.ic is
zero. The processor must eventually guarantee visibility of the last inserted TC entry until an
rfi sets PSR.ic to 1 and at least one instruction is executed with PSR.ic equal to 1, and
completes without a fault or interrupt. The last inserted TC entry may be occasionally removed
before this point, and software must be prepared to re-insert the TC entry on a subsequent fault.
For example, eager or mandatory RSE activity, speculative VHPT walks, or other interruptions
of the restart instruction may displace the software-inserted TC entry, but when software later
re-inserts the same TC entry, the processor must eventually compl ete the restart instruction to
ensure forward progress, even if that restart instruction takes other faults which must be
Volume 2: Addressing and Protection2:45
handled before it can complete. If PSR.ic is set to 1 by instructions other than rfi, the
processor does not guarantee forward progress.
• If software inserts an entry into the TLB with an overlapping entry (same or larger size) in the
VHPT, and if the VHPT walker is enabled, forward progress is not guaranteed. See “VHPT
Searching” on page 2:57.
• Software may only make references to memory with physical addresses or with virtual
addresses which are mapped with TRs, or to addresses mapped by the just-inserted translation,
between the insertion of a TC entry, and the execu tion of the instruction with PSR.ic equal to 1
which is dependent on that entry for forward progress. Software may also make repeated
attempts to execute the same instruction with PSR.ic equal to 1. If software makes any other
memory references than these, the processor does not guarantee forward progress.
• Software must not defeat forward progress by consistently displacing a required TC entry
through a global or local translation cache purge.
IA-32 code has more stringent forward progress rules that must be observed by the processor and
software. IA-32 forward progress rules are defined in Section 10.6.3, “IA-32 TLB Forward
Progress Requirements” on page 2:251.
The translation cache can be used to cache TR entries if the TC maintains the instruction vs. data
distinction that is required of the TRs. A data reference cannot be satisfied by a TC entry that is a
cache of an instruction TR entry , nor can an instruction reference be satisfied by a TC entry that is a
cache of a data TR entry. This approach can be useful in a multi-level TLB implementation.
4.1.1.3Unified Translation Lookaside Buffers
Some processor models may merge the ITC and DTC into a unified translation cache. The
minimum number of unified entries is 2 (1 for instruction, and 1 for data). Processors may service
instruction fetch memory references with TC entries originally installed into the DTC and service
data memory references with translations originally installed in the ITC. To ensure consistent
operation across processor implementations, software is recommended to not install different
translations into the ITC or DTC for the same virtual region and virtual address. ITC inserts may
remove DTC entries. DTC inserts may remove ITC entries. TC purges remove ITC and DTC
entries.
Instruction and data translation registers cannot be unified. DTR entries cannot be used by
instruction references and ITR entries cannot be used by data references. ITR inserts and purges do
not remove DTR entries. DTR inserts and purges do not remove ITR entries.
4.1.1.4Purge Behavior of TLB Inserts and Purges
Translations contained in the translation caches (TC) and translation registers (TR) are maintained
in a consistent state by ensuring that TLB insertions remove existing overlapping entries before
new TR or TC entries are installed. Similarly, TLB purges that partially or fully overlap with
existing translations may remove all overlapping entries. In this context, “overlap” refers to two
translations with the same region identifier (but not necessarily identical virtual region numbers),
and with partially or fully overlapping virtual address ranges (determined by the virtual address and
the page size). Examples are: two 4K-byte pages at the same virtual address, or an 8K-byte page at
virtual address 0x2000 and a 4K-byte page at 0x3000.
2:46Volume 2: Addressing and Protection
As described in Section 4.1, “Virtual Addressing” on page 2:41, each TLB may contain a VRN
field, and virtual address bits {63:61} may be used as part of the match for memory references
(references other than inserts and purges). This binding of a translation to the VRN implies that a
lookup of a given virtual address (region identifier/VPN pair) in either the translation cache or
translation registers may result in a TLB miss if a memory reference is made through a different
VRN (even if the region identifiers in the two region registers are identical). Some processor
models may also omit the VRN field of the TLB, causing the TLB search on memory references to
find an entry independent of VRN bits. However, all processor models are required, during
translation cache purge and insert operations, to purge all possible translations matching the region
identifier and virtual address regardless of the specified VRN.
Figure 4-4. Conceptual Virtual Address Searching for Inserts and Purges
Region
Registers
rr
0
rr
1
rr
2
Region ID
rr
7
63 61 60
Virtual Region Number (VRN)
24
3
Virtual Address
Virtual Page Number (VPN)
0
Hash
search
Physical Page Num (PPN)
Rights
Region ID
search
Key
Virtual Page Num (VPN)
VRN
Translation Lookaside Buffer (TLB)
A processor may overpurge translation cache entries; i.e., it may purge a lar g er virtual address
range than required by the overlap. Since page sizes are powers of 2 in size and aligned on that
same power of 2 boundary, pur ged entries can either be a superset of, identical to, or a subset of the
specified purge range.
Table 4-1 defines the purge behavior of the different TLB insert and purge instructions, as well as
VHPT walker inserts.
Table 4-1. Purge Behavior of TLB Inserts and Purges
[ID]VHPT overlaps [ID]TC
[ID]VHPT overlaps [DI]TCMustMayMust not
[ID]VHPT overlaps [ID]TRMayMust notMay
[ID]VHPT overlaps [DI]TRMustMust notMust not
Volume 2: Addressing and Protection2:47
a
e
j
b
Must
MustMay
g
Must
Must not
c
f
h
Must not
Must not
Must
MustMustMust not
d
i
k
Table 4-1. Purge Behavior of TLB Inserts and Purges
CaseInsert?Purge?Machine Check?
ptc.l overlaps [ID]TC
ptc.l overlaps [ID]TRMust notMust
ptc.g (local) overlaps [ID]TC
ptc.g (local) overlaps [ID]TRMust notMust
ptc.g (remote) overlaps [ID]TCMustMust not
ptc.g (remote) overlaps [ID]TRMust notMust not
ptc.e overlaps [ID]TCMustMust not
ptc.e overlaps [ID]TRMust notMust not
ptr.[id] overlaps [ID]TCMustMust not
ptr.[id] overlaps [DI]TCMayMust not
ptr.[id] overlaps [ID]TRMustMust not
ptr.[id] overlaps [DI]TRMust notMust not
a. Bracketed notation is intended to specify TC and TR overlaps in the same stream, e.g.
ITC.
b. Must Insert: requires that the translation specified by the operation is inserted into a TC or TR as
appropriate. For
exist in the future, with the exception of the relevant forward-progress requirements specified in
Section 4.1.1.2, “Translation Cache (TC)”.
c. Must Purge: requires that all partially or fully overlapped translations are removed prior to the insert or
purge operation.
d. Must not Machine Check: indicates that a processor does not cause a Machine Check abort as a
result of the operation.
e. Bracketed notation is intended to specify TC and TR overlaps in the opposite stream, e.g.
DTC.
f. May Purge: indicates that a processor may remove partially or fully overlapped translations prior to
the insert or purge operation. However, software must not rely on the purge.
g. May Insert: indicates that the translation specified by the operation may be inserted into a TC.
However, software must not rely on the insert.
h. Must not Purge: the processor does not remove (or check for) partially or fully overlapped translations
prior to the insert or purge operation. Software can rely on this behavior.
i. Must Machine Check: indicates that a processor will cause a Machine Check abort if an attempt is
made to insert or purge a partially or fully overlapped translation. The Machine Check abort may not
be delivered synchronously with the TLB insert or purge operation itself, but is guaranteed to be
delivered, at the latest, on a subsequent instruction serialization operation.
j. [ID]VHPT: These represent VHPT walker inserts into ITC and DTC entries, respectively.
k. May Machine Check: indicates that the processor may cause a Machine Check abort if an attempt is
made to insert or purge a partially or fully overlapped translation. The Machine Check abort is
required unless the implementation performs VRN matching on TLB lookups, and the VRN of the
partially or fully overlapped translation does not match the VRN of the insert.
l.
ptc.g (and ptc.ga): two forms of global TLB purges are distinguished: local and remote. The local
form indicates that the
indicates that this is an incoming TLB shoot-down from a remote processor.
itc and VHPT walker inserts, there is no guarantee to software that the entry will
l
N/A
ptc.g or ptc.ga was initiated on the local processor. The remote form
MustMust not
MustMust not
itc.i and
itc.i and
4.1.1.5Translation Insertion Format
Figure 4-5 shows the register interface to insert entries into the TLB. TLB insertions are performed
by issuing the Insert Translation Cache (
itr.i) instructions. The first 64-bit field containing the physical address, attributes and
permissions is supplied by a general purpose regi ster operand. Additional protection key and page
size information is supplied by the Interruption TLB Insertion Register (ITIR). The Interruption
Faulting Address register (IFA) specifies the virtual address for instruction and data TLB inserts.
2:48Volume 2: Addressing and Protection
itc.d, itc.i) and Insert Translation Registers (itr.d,
ITIR and IFA are defined in “Control Registers” on page 2:26. The upper 3 bits of IFA (VRN
bits{63:61}) select a virtual region register that supplies the RID field for the TLB entry. The RID
of the selected region is tagged to the translation as it is inserted into the TLB.
Reserved fields or encodings are checked as follows:
• The GR[r] value is checked when a TLB insert instruction is executed, and if reserved fields or
reserved encodings are used, a Reserved Register/Field fault is raised on the TLB insert
instruction. If GR[r]{0} is zero (not-present Translation Insertion Format), the rest of GR[r] is
ignored.
• The RR[vrn] value is checked when a mov to RR instruction is executed, and if reserved fields
or reserved encodings are used, a Reserved Register/Field fault is raised on the mov to RR
instruction.
• The ITIR value is checked either when a mov to ITIR instruction is executed, or when a TLB
insert instruction is executed, depending on the processor implementation. If reserved fields or
reserved encodings are used, a Reserved Register/Field fault is raised on the mov to ITIR or
TLB insert instruction. In implementations where ITIR is checked on a TLB insert instruction,
ITIR{63:32} and ITIR{31:8} may be ignored if GR[r]{0} is zero (not-present Translation
Insertion Format).
• The IFA value is checked either when a mov to IFA instruction is executed, or when a TLB
insert instruction is executed, depending on the processor implementation. If an unimplemeted
virtual address is used, an Unimplemented Data Address fault is raised on the mov to IFA or
TLB insert instruction.
Software must issue an instruction serialization operation to ensure installs into the ITLB are
observed by dependent instruction fetches and a data serialization operation to ensure installs into
the DTLB are observed by dependent memory data references.
Figure 4-5. Translation Insertion Format
6353 52 51 50 4932 3112 119 8 7 6 5 42 1 0
GR[r]iged cippnarpld amaci p
ITIR
IFAvpn
RR[vrn]
rv/cikeypsrv/ci
rvridigrv ig
Table 4-2 describes all the translation interface fields.
Table 4-2. Translation Interface Fields
TLB
Field
ciGR[r]{1,51:50}Checked on Insert – Checked on a TLB insert instruction. If reserved fields or
rv/ciITIR{1:0,63:32}Reserved/Checked on Insert – Depending on implementation, may be
Source
Field
encodings are used, a Reserved Register/Field fault is raised on the TLB
insert instruction.
reserved (checked on a mov to ITIR instruction) or checked on a TLB insert
instruction. If reserved fields or encodings are used, a Reserved
Register/Field fault is raised on the mov to ITIR or TLB insert instruction. In
implementations where ITIR is checked on a TLB insert instruction,
ITIR{63:32} may be ignored if GR[r]{0} is zero (not-present Translation
Insertion Format).
rvRR[vrn]{1,63:32}Reserved – Checked on a mov to RR instruction. If reserved fields or
pGR[r]{0}Present bit – When 0, references using this translation cause an Instruction or
maGR[r]{4:2}Memory Attribute – describes the cacheability, coherency, write-policy and
aGR[r]{5}Accessed Bit – When 0 and PSR.da is 0, data references to the page cause a
dGR[r]{6}Dirty Bit – When 0 and PSR.da is 0, Intel Itanium store or semaphore
plGR[r]{8:7}Privilege Level – Specifies the privilege level or promotion level of the page.
arGR[r]{11:9}Access Rights – page granular read, write and execute permissions and
ppnGR[r]{49:12}Physical Page Number – Most significant bits of the mapped physical address.
igGR[r]{63:53}
edGR[r]{52}Exception Deferral – For a speculative load that results in an exception, the
psITIR{7:2}Page Size – Page size of the mapping. For page sizes larger than 4K bytes
keyITIR{31:8}Protection Key – Uniquely tags the translation to a protection domain. If a
vpnIFA{63:12}Virtual Page Number – Depending on a translation’s page size, some of the
ridRR[VRN].ridVirtual Region Identifier – On TLB inserts the Region Identifier selected by
Source
Field
IFA{11:0},
RR[vrn]{0,7:2}
Description
encodings are used, a Reserved Register/Field fault is raised on the mov to
RR instruction.
Data Page Not Present fault. Most other fields are ignored by the processor,
see Figure 4-6 for details. This bit is typically used to indicate that the
mapped physical page is not resident in physical memory. The present bit
is not a valid bit. For each TLB entry, the processor maintains an
additional hidden valid bit indicating if the entry is enabled for matching.
speculative attributes of the mapped physical page. See “Memory Attributes”
on page 2:69 for details.
Data Access Bit fault. When 0 and PSR.ia is 0, instruction references to the
page cause an Instruction Access Bit fault. When 0, IA-32 references to the
page cause an Instruction or Data Access Bit fault. This bit can trigger a fault
on reference for tracing or debugging purposes. The processor does not
update the Accessed bit on a reference.
references to the page cause a Data Dirty Bit fault. When 0, IA-32 store or
semaphore references to the page cause a Data Dirty Bit fault. The processor
does not update the Dirty bit on a store or semaphore reference.
See “Page Access Rights” on page 2:51 for complete details.
privilege controls. See “Page Access Rights” on page 2:51 for details.
Depending on the page size used in the mapping, some of the least significant
PPN bits are ignored.
available – Software can use these fields for operating system defined
parameters. These bits are ignored when inserted into the TLB by the
processor.
speculative load’s instruction page TLB.ed bit is one of the conditions which
determines whether the exception must be deferred. See “Deferral of
Speculative Load Faults” on page 2:98 for complete details. This bit is ignored
in the data TLB for data memory references and for IA-32 memory references.
the low-order bits of PPN and VPN are ignored. Page sizes are defined as 2
bytes. See “Page Sizes” on page 2:52 for a list of supported page sizes.
translation’s Key is not found in the Protection Key Registers (PKRs), access
is denied and a Data or Instruction Key Miss fault is raised. See “Protection
Keys” on page 2:54 for complete details. In implementations where ITIR is
checked on a TLB insert instruction, ITIR{31:8} may be ignored if GR[r]{0} is
zero (not-present Translation Insertion Format).
least-significant VPN bits specified are ignored in the translation process.
VPN{63:61} (VRN) selects the region register.
VPN{63:61} (VRN) is used as additional match bits for subsequent accesses
and purges (much like vpn bits).
ps
2:50Volume 2: Addressing and Protection
The format in Figure 4-6 is defined for not-present translations (P-bit is zero).
Figure 4-6. Translation Insertion Format – Not Present
6332 3112 118 72 1 0
GR[r]ig0
ITIR
IFAvpn
RR[vrn]
4.1.1.6Page Access Rights
Page granular access controls use 4 levels of privilege. Privilege level 0 is the most privileged and
has access to all privileged instructions; privilege level 3 is least privileged. Access (including
IA-32) to a page is determined by the TLB.ar and TLB.pl fields, and by the privilege level of the
access, as defined in Table 4-3. RSE fills and spills obt ain their privilege level from RSC.pl; all
other accesses (including IA-32) obtain their privilege level from PSR.cpl. Within each cell, “–”
means no access, “R” means read access, “W” means write access, “X” means execute access, and
“Pn” means promote PSR.cpl to privilege level “n” when an Enter Privileged Code (
instruction is executed.
Table 4-3. Page Access Rights
TLB.arTLB.pl
0 3 RRRRread only
2
1
0
13RXRXRXRXread, execute
2
1
0
23RWRWRWRWread, write
2
1
0
33RWXRWXRWXRWXread, write, execute
2
1
0
43R
2
1
0
53RXRXRX
2
1
0
3210
–RRR
––RR
–––R
– RXRXRX
––RXRX
–––RX
– RWRWRW
––RWRW
–––RW
–RWXRWXRWX
––RWXRWX
–––RWX
–RRWRW
––RRW
–––RW
–RXRXRWX
––RXRWX
–––RWX
rv/cikeypsrv/ci
ig
rvridigrv ig
epc)
Privilege Level
RWRWRWread only / read, write
a
Description
RWXread, execute / read, write, exec
Volume 2: Addressing and Protection2:51
Table 4-3. Page Access Rights (Continued)
TLB.arTLB.pl
63RWXRWRWRWread, write, execute / read, write
2
1
0
7 3 XXX
2
1
0
a. RSC.pl, for RSE fills and spills; PSR.cpl for all other accesses.
b. User execute only pages can be enforced by setting PL to 3.
–RWXRWRW
––RWXRW
–––RW
XP2 XXRX
XP1XP1XRX
XP0XP0XP0RX
Privilege Level
3210
Software can verify page level permissions by the probe instruction, which checks accessibility to
a given virtual page by verifying privilege levels, page level read and write permission, and
protection key read and write permission.
Execute-only pages (TLB.ar 7) can be used to promote the privilege level on entry into the
operating system. User level code would typically branch into a promotion page (controlled by the
operating system) and execute the Enter Privileged Code (
promotes, the next instruction group is executed at the target privilege level specified by the
promotion page. A procedure return branch type (
a
Description
RXexec, promoteb / read, execute
epc) instruction. When epc successfully
br.ret) can demote the current privilege level.
4.1.1.7Page Sizes
A range of page sizes are supported to assist software in mapping system resources and improve
TLB/VHPT utilization. Typically, operating systems will select a small range of fixed page sizes to
implement virtual memory algorithms. Larger pages may be statically allocated. For example, large
areas of the virtual address space may be reserved for operating system kernels, frame buffers, or
memory-mapped I/O regions. Software may also elect to pin these translations, by placing them in
the translation registers.
Table 4-4 lists insertable and purgeable page sizes that are supported by all processor models.
Insertable page sizes can be specified in the translation cache, the translation registers, the region
registers and the VHPT. Insertable page sizes can also be used as parameters to TLB purge
instructions (
as parameters to TLB purge instructions.
Processors may also support additional insertable and purgeable page sizes. Please see the
processor-specific documentation for further information on the page sizes supported by the
Itanium processor.
ptc.l, ptc.g, ptc.ga or ptr). Page sizes that are purgeable only may only be used
4k8k16k 64k 256k1M4M16M 64M 256M4G
Page Sizes
2:52Volume 2: Addressing and Protection
Page sizes are encoded in translation entries and region registers as a 6-bit encoded page size field.
Each field specifies a mapping size of 2
unimplemented page sizes are specified to an
Reserved Register/Field fault is raised. If unimplemented page sizes are specified for a TLB purge
instruction an implementation may raise a Machine Check abort, may under-purge translations up
to ignoring the request, or may over-purge translations up to removal of all entries from the
translation cache. If unimplemented page sizes are specified by a
another processor, an implementation may under-purge translations up to ignoring the request, or
may over-purge translations up to removal of all entries from the translation cache. However, it
must not raise a Machine Check abort.
Virtual and physical pages are aligned on the natural boundary of the page. For example, 4K-byte
pages are aligned on 4K-byte boundaries, and 4 M-byte pages on 4 M-byte boundaries.
4.1.2Region Registers (RR)
Associated with each of the 8 virtual regions is a privileged Region Register (RR). Each register
contains a Region Identifier (RID) along with several other region attributes, see Figure 4-7. The
values placed in the region register by the operating system can be viewed as a collection of process
address space identifiers.
Figure 4-7. Region Register Format
6332 318 72 1 0
rvridpsrv ve
322461 1
N
bytes, thus a value of 12 represents a 4K-byte page. If
itc, itr or mov to region register instruction, a
ptc.g or ptc.ga broadcast from
Regions support multiple address space operating systems by avoiding the need to flush the TLB
on a context switch. Sharing between processes is promoted by mapping common global or shared
region identifiers into the region register working set of multiple processes. All IA-32 memory
references are through region register 0.
Table 4-5 describes the region register fields. Region Identifier (rid) bits 0 throug h 17 m ust be
implemented on all processor models. Some processor models may implement additional bits.
Additional implemented bits must be contiguous and start at bit 18. Unimplement e d bits are
reserved. Please see the processor-specific documentation for further information on the size of the
Region Identifier implemented on the Itanium processor.
Table 4-5. Region Register Fields
FieldBitsDescription
rv1,63:32reserved
ve0VHPT Walker Enable – When 1, the VHPT walker is enabled for the region. When 0,
disabled.
ps7:2Preferred page Size – Selects the virtual address bits used in hash functions for
set-associative TLBs or the VHPT. Encoded as 2
significant performance optimizations for the specified preferred page size for the
region.
rid31:8Region Identifier – During TLB inserts, the region identifier from the select region
register is used to tag translations to a specific address space. During TLB/VHPT
lookups, the region identifier is used to match translations and to distribute hash
indexes among VHPT and TLB sets.
a. For more details on the usage of this field, See “VHPT Hashing” on page 2:59.
ps
bytes. The processor may make
a
Volume 2: Addressing and Protection2:53
Software must issue an instruction serialization operation to ensure writes into the region registers
are observed by dependent instruction fetches and issue a data serialization operation for dependent
memory data references.
4.1.3Protection Keys
Protection Keys provide a method to restrict permission by tagging each virtual page with a unique
protection domain identifier. The Protection Key Registers (PKR) represent a register cache of all
protection keys required by a process. The operating system is responsible for management and
replacement polices of the protection key cache. Before a memory access (including IA-32) is
permitted, the processor compares a translation’s key value against all keys contained in the PKRs.
If a matching key is not found, the processor raises a Key Miss fault. If a matching Key is found,
access to the page is qualified by additional read, write and execute protection checks specified by
the matching protection key register. If these checks fail, a Key Permission fault is raised. Upon
receipt of a Key Miss or Key Permission fault, software can implement the desired security policy
for the protection domain. Figure 4-8 and Table 4-6 describe the protection key register format and
protection key register fields.
Figure 4-8. Protection Key Register Format
6332 318 74 3 2 1 0
rvkeyrvxd rd wd v
322441 1 1 1
Table 4-6. Protection Register Fields
FieldBitsDescription
v0Valid – When 1, the Protection Register entry is valid and is checked by the
processor when performing protection checks. When 0, the entry is ignored.
wd1Write Disable – When 1, write permission is denied to translations in the protection
rd2Read Disable – When 1, read permission is denied to translations in the protection
xd3Execute Disable – When 1, execute permission is denied to translations in the
key31:8Protection Key – uniquely tags translation to a given protection domain.
rv7:4,63:32reserved
domain.
domain.
protection domain.
Processor models have at least 16 protection key registers, and at least 18-bits of protection key.
Some processor models may implement additional protection key registers and protection key bits.
Unimplemented bits and registers are reserved. Key registers have at least as many implemented
key bits as region registers have rid bits. Additional implemented bits must be contiguous and start
at bit 18. Please see the processor-specific documentation for further information on the number of
protection key registers and protection key bits implemented on the Itanium processor.
Software must issue an instruction serialization operation to ensure writes into the protection key
registers are observed by dependent instruction fetches and a data serialization operation for
dependent memory data references.
2:54Volume 2: Addressing and Protection
The processor ensures uniqueness of protection keys by checking new valid protection keys against
all protection key registers during the move to PKR instruction. If a valid matching key is found in
any PKR register, the processor invalidates the matching PKR register by setting PKR.v to zero,
before performing the write of the new PKR register. The other fields in any matching PKR remain
unchanged when it is invalidated.
Key Miss and Permission faults are only raised when memory translations are enabled (PSR.dt is 1
for data references, PSR.it is 1 for instruction references, PSR.rt is 1 for register stack references),
and protection key checking is enabled (PSR.pk is one).
Data TLB protection keys can be acquired with the Translation Access Key (
Instruction TLB key values are not directly readable. To acquire instruction key values software
should make provisions to read memory structures.
4.1.4Translation Instructions
Table 4-7 lists translation instructions used to manage translations. Region registers, protection key
registers and the TLBs are accessed indirectly; the register number is determined by the contents of
a general register.
The processor does not ensure that modification of the translation resources is observed by
subsequent instruction fetches or data memory references. Software must issue an instruction
serialization operation before any dependent instruction fetch and a data serialization operation
before any dependent data memory reference.
Table 4-7. Translation Instructions
MnemonicDescriptionOperation
movrr[r3] = r
movr1 = rr[r
movpkr[r3] = r
movr1 = pkr[r
itc.i r
itc.d r
itr.i itr[r
itr.d dtr[r
3
3
] = r
2
] = r
2
probe r1 = r3, r
ptc.l r3, r
ptc.g r3, r
2
2
Move to region
2
register
Move from region
]
3
register
Move to protection key
2
register
Move from protection
]
3
key register
Insert instruction
translation cache
Insert data translation
cache
Insert instruction
3
translation register
Insert data translation
3
register
Probe data TLB for translationMnone
2
Purge a translation from local processor instruction and
data translation cache
Globally purge a translation from multiple processor’s
instruction and data translation caches
tak) instruction.
Instr.
Serialization
Type
Requirement
RR[GR[r
GR[r1] = RR[GR[r3]]Mnone
PKR[GR[r
GR[r1] = PKR[GR[r3]]Mnone
ITC = GR[r
DTC = GR[r
ITR[GR[r
DTR[GR[r
]] = GR[r2]Mdata/inst
3
]] = GR[r2]Mdata/inst
3
], IFA, ITIRMinst
3
], IFA, ITIRMdata
3
]] = GR[r3], IFA, ITIRMinst
2
]] = GR[r3], IFA, ITIRMdata
2
Mdata/inst
Mdata/inst
Volume 2: Addressing and Protection2:55
Table 4-7. Translation Instructions (Continued)
MnemonicDescriptionOperation
ptc.ga r
ptc.e r
, r
3
3
ptr.i r3, r
ptr.d r3, r
tak r1 = r
3
thash r1 = r
ttag r1 = r
tpa r1 = r
3
2
2
2
3
3
Globally purge a translation from multiple processor’s
instruction and data translation caches and remove
matching entries from multiple processor’s ALATs
Purge local instruction and data translation cache of all
entries
Purge instruction translation registersMinst
Purge data translation registersMdata
Obtain data TLB entry protection keyMnone
Generate translation’s VHPT hash addressMnone
Generate translation tag for VHPTMnone
Translate a virtual address to a physical addressMnone
4.1.5Virtual Hash Page Table (VHPT)
The VHPT is an extension of the TLB hierarchy designed to enhance vi rtual address translation
performance. The processor’s VHPT walker can optionally be configured to search the VHPT for a
translation after a failed instruction or data TLB search. The VHPT walker provides significant
performance enhancements by reducing the rate of flushing the processor’s pipelines due to a TLB
Miss fault, and by providing speculative translation fills concurrent to other processor operations.
Instr.
Serialization
Type
Requirement
Mdata/inst
Mdata/inst
The VHPT, resides in the virtual memory space and is configurable as either the primary page table
of the operating system or as a single large translation cache in memory (see Figure 4-9). Since the
VHPT resides in the virtual address space, an additional TLB miss can be raised when the VHPT is
referenced. This property allows the VHPT to also be used as a linear page table.
Figure 4-9. Virtual Hash Page Table (VHPT)
Virtual Address
Region
Registers
rid
ps
Hashing
Function
TLB
vpn
PTA
2
TC
Install
PTA.base
PTA.size
The processor does not manage the VHPT or perform any writes into the table. Software is
responsible for insertion of entries into the VHPT (including replacement algorithms), dirty/access
bit updates, invalidation due to purges and coherency in a multiprocessor system. The processor
does not ensure the TLBs are coherent with the VHPT memory image.
VHPT
Optional Collision Search Chain
Optional Operating System Page Tables
2:56Volume 2: Addressing and Protection
If software needs to control the entries inserted into the TLB more explicitly, or programs the
VHPT with differing mappings for the same virtual address range, it may need to take additional
action to ensure forward progress. See “VHPT Searching” on page 2:57.
4.1.5.1VHPT Configuration
The Page Table Address (PTA) register determines whether the processor is enabled to walk the
VHPT, anchors the VHPT in the virtual address space, and controls VHPT size and configuration
information. The VHPT can be configured as either a per-region virtual linear page table structure
(8-byte short format) or as a single large hash page table (32-byte long format). No mixing of
formats is allowed within the VHPT.
To implement a per-region linear page table structure an operating system would typically map the
leaf page table nodes with small backing virtual translations. The size of the table is expanded to
include all possible virtual mappings, effectively creating a large per-region flat page table within
the virtual address space.
To implement a single large hash page table, the entire VHPT is typically mapped with a single
large pinned virtual translation placed in the translation registers and the size of the table is reduced
such that only a subset of all virtual mappings can be resident within the table. Operating systems
can tune the size of the hash page table based on the size of physical memory and operating system
performance requirements.
4.1.5.2VHPT Searching
When enabled, the processor’s VHPT walker searches the VHPT for a translation after a failed
instruction or data TLB search. The VHPT walker checks only the specific VHPT entry addressed
by the short- or the long-format hash function, as selected by PTA.vf. If additional TLB misses are
encountered during the VHPT access, a VHPT Translation fault is raised. If the region-based
short-format VHPT entry contains no reserved bits or encodings, it is installed into the TLB, and
the processor again attempts to translate the failed instruction or data reference. If the long-format
VHPT entry’s tag specifies the correct region identifier and virtual address, and the entry contains
no reserved bits or encodings, it is installed into the TLB, and the processor again attempts to
translate the failed instruction or data reference. Otherwise the processor raises a TLB Miss fault.
The translation is installed into the TLB even if its VHPT entry is marked as not present (p=0).
Software may optionally search additional VHPT collision chains (associativities) or search for
translations within the operating system’s primary page tables. Performance is optimized by
placing frequently referenced translations within the VHPT structure directly searched by the
processor.
The VHPT walker is optional on a given processor model. Software can neither assume the
presence of a VHPT walker, nor that the VHPT walker will find a translation in the VHPT. The
VHPT walker can abort a search at any time for implementation-specific reasons, even if the
required translation entry is in the VHPT. Operating systems must regard the VHPT walker strictly
as a performance optimization and must be prepared to handle TLB misses if the walker fails.
VHPT walks may be done speculatively by the processor's VHPT walker. Additionally, VHPT
walks triggered by non-speculatively-executed instructions are not requ ired to be done in program
order. Therefore, if the walker is enabled and if the VHPT contains multiple entries that map the
same virtual address range, software must set up these entries such that any of them can be used in
the translation of any part of this virtual address range. Additionally, if software inserts a translation
Volume 2: Addressing and Protection2:57
into the TLB which is needed for forward progress, and this translation has a smaller page size than
the translation which would have been inserted on a VHPT walk for the same address, then
software may need to disable the VHPT walker in order to ensure forward progress, since this
inserted translation may be displaced by a VHPT walk before it can be used.
4.1.5.3Region-based VHPT Short Format
The region-based VHPT short format shown in Figure 4-10 uses 8-byte VHPT entries to support a
per-region linear page table configuration. To u s e the short-format VHPT, PTA.vf must be set to 0.
Figure 4-10. VHPT Short Format
6353 52 51 50 4912 119 87 6 5 42 10
igedrvppnarpld amarv p
111238321 1311
See “Translation Insertion Format” on page 2:48 for a description of all fields. The VHPT walker
provides the following default values when entries are installed into the TLB.
• Virtual Page Number – implied by the position of the entry in the VHPT. The hashed
short-format entry is considered to be the matching translation.
• Region Identifiers are not specified in the short format. To ensure uniqueness, software must
provide unique VHPT mappings per region. Region identifiers obtained from the referenced
region register are tagged with the translation when inserted into the TLB.
• Page Size – specified by the accessed region’s preferred page size (RR[VA{63:61}].ps)
• Protection Key – specified by the accessed region identifier value (RR[VA{63:61}].rid). As a
result, all implementations must ensure that the number of implemented key bits is greater than
or equal to the number of implemented region identifier bits.
If a translation is marked as not present, ignored fields are usable by software as noted in
Figure 4-11.
Figure 4-11. VHPT Not-present Short Format
6310
4.1.5.4VHPT Long Format
The long-format VHPT uses 32-byte VHPT entries to support a single large virtual hash page table.
To use the long-format VHPT, PTA.vf must be set to 1. The long format is a superset of the TLB
insertion format, as noted in Figure 4-12, and specifies full translation information (including
protection keys and page sizes). Additional fields are defined in Table 4-8. The long format is
typically used to build the hash page table configuration.
Figure 4-12. VHPT Long Format
offset6352 51 50 4932 3112 119 8 7 6 5 42 1 0
+0 iged r vppnarpld amarv p
+8
+16titag
ig0
64
rvkeypsrv
64
2:58Volume 2: Addressing and Protection
Figure 4-12. VHPT Long Format
offset6352 51 50 4932 3112 119 8 7 6 5 42 1 0
+24ig
Table 4-8. VHPT Long-format Fields
FieldOffsetDescription
tag+16 Translation Tag – The tag, in conjunction with the VHPT hash index, is used to
uniquely identify the translation. Tags are computed by hashing the virtual page
number and the region identifier. See “VHPT Hashing” on page 2:59 for details on tag
and hash index generation.
ti+16Tag Invalid Bit – If one, this bit of the tag indicates an invalid tag. On all processor
implementations, the VHPT walker and the ttag instruction generate tags with the ti
bit equal to 0. A VHPT entry with the ti bit equal to one will never be inserted into the
processor’s TLBs. Software can use the ti bit to invalidate long-format VHPT entries in
memory.
ig+24available – field for software use, ignored by the processor. Operating systems may
store any value, such as a link address to extend collision chains on a hash collision.
If a translation is marked as not present, ignored fields are usable by software as noted in
Figure 4-13. Also, in some implementations, +8{63:32} and +8{31:8} may be ignored as well.
Figure 4-13. VHPT Not-present Long Format
offset6332 318 72 1 0
+0 ig0
64
+8
+16titag
+24
For multiprocessor systems, atomic updates of long-format VHPT entries may be ensured by
software as follows:
• Before making multiple non-atomic updates to a VHPT entry in memory, software is required
to set its ti bit to one.
• After making multiple non-atomic updates to a VHPT entry in memory , software may clear its
ti bit to zero to re-enable tag matches.
The updates to the VHPT entry in memory must be constrained to be observable only after the store
that sets the ti bit to one is observable. This can be accomplished with a
performing the updates to the VHPT entry with release stores. Similarly, the clearing of the ti bit
must be constrained to be observable only after all of the updates to the VHPT entry are observable.
This can be accomplished with a
release store.
4.1.6VHPT Hashing
The processor provides two methods for software to determine a VHPT entry’s address: the
Translation Hash (
page 2:37. The virtual address of the VHPT entry is placed in the IHA register when a VHPT
Translation or TLB fault is delivered. In the long format, IHA can be used as a starting address to
thash) instruction, and the Interruption Hash Address (IHA) register defined on
rvkeypsrv
ig
mf instruction, or by
mf instruction, or by performing the clear of the ti bit with a
Volume 2: Addressing and Protection2:59
scan additional collision chains (associativities) defined by the operating system or to perform a
search in software. The
thash instruction is used to generate a VHPT entry’s address outside of
interruption handlers and provides the same hash function that is used to calculate IHA.
thash produces a VHPT entry’ s address for a given virtual address and region identifier, depending
on the setting of the PTA.vf bit. When PTA.vf=0,
thash returns the region-based short-format
index as defined in “Region-based VHPT Short-format Index” on page 2:60. When PTA.vf=1,
thash returns the long-format hash as defined in “Long-format VHPT Hash” on page 2:60. The
ttag instruction is only useful for long-format hashing, and generates a 64-bit ti/tag identifier that
the processor’s VHPT walker wil l check when it look s up a given virtual address and region
identifier. Software should use the
ttag instruction, and either the thash instruction or the IHA
register when forming translation tags and hash addresses for the long-format VHPT. These
resources encapsulate the implementation-specific long-format hashing functionality and improve
performance.
4.1.6.1Region-based VHPT Short-format Index
In the region-based short format, the linear page table for each region resides in the referenced
region itself. As a result, the short-format VHPT consists of separate per-region page tables, which
are anchored in each region by PTA.base{60:15}. For regions in which the VHPT is enabled, the
operating system is required to maintain a per-region linear page table. As defined in Figure 4-14,
the VHPT walker uses the virtual address, the region’s preferred page size, and the PT A.s ize field to
compute a linear index into the short-format VHPT.
Figure 4-14. Region-based VHPT Short-format Index Function
The size of the short-format VHPT (PTA.size) defines the size of the mapped virtual address space.
The maximum architectural table size in the short format is 2
region (2
61
bytes) using 4Kbyte pages, 2
VHPT entry is 8 bytes = 2
3
bytes large. As a resul t, the maxi mum tabl e size is 2
(61-12)
= 249 pages must be mappable. A short-format
per region. If the short format is used to map an address space smaller than 2
short-format table (PTA.size<52) can be used. Mapping of an address space of 2
pages requires a minimum PTA.size of (n-9).
In the short format, the
Figure 4-14. The
thash instruction returns the region-based short-format index defined in
ttag instruction is not used with the short format. VHPT translation and TLB
miss faults write the IHA register with the region-based short-format index defined in Figure 4-14.
4.1.6.2Long-format VHPT Hash
The long-format VHPT is a single large contiguous hash table that resides in the region defined by
PTA.base. As defined in Figure 4-15, the VHPT walker uses the virtual address, the region
identifier, the region’s preferred page size, and the PTA.size field to compute a hash index into the
52
bytes per region. To map an entire
(61-12+3)
61
, a smaller
= 252 bytes
n
with 4KByte
2:60Volume 2: Addressing and Protection
long-format VHPT. PTA.base{63:15} defines the base address and the region of the long-format
VHPT. PTA.size reflects the size of the hash table, and is typically set to a number significantly
smaller than 2
64
; the exact number is based on operating system performance requirements.
tlb_vhpt_hash_long) and long-format tag generation function
are implementation specific. However, on all processor models the hash and tag functions must
exclude the virtual region number (virtual address bits VA{63: 61}) from the hash and tag
computations. This ensures that a unique 85-bit global virtual address hashes to the same VHPT
hash address, regardless of which region the address is mapped to. All processor implementations
guarantee that the most significant bit of the tag (ti bit) is zero for all valid tags. The hash index and
tag together must uniquely identify a translation. The processor must ensure that the indices into the
hashed table, the region’s preferred page size, and the tag specified in an indexed entry can be used
in a reverse hash function to uniquely regenerate the region identifier and virtual address used to
generate the index and tag. This must be possible for all supported page sizes, implemented virtual
addresses and legal values of region identifiers. A hash function is reversible if using the hash result
and all but one input produces the missing input as the result of the reverse hash function. The
easiest hash function and reverse hash function is a simple XOR of bits. To ensure uniqueness,
software must follow these rules:
1.Software must use only one preferred page size for each unique region identifier at any
given time; otherwise, processor operation is undefined.
2.All tags for translations within a given region must be created with the preferred page size
assigned to the region; otherwise, processor operation is undefined.
3.Software is not allowed to have pages in the VHPT that are smaller than the preferred page
size for the region; otherwise, processor operation is undefined. Software can specify a page
with a page size larger than the preferred page size in the VHPT, but tag values for the entries
representing that page size must be generated using the preferred page size assigned to that
region.
4.To reuse a region identifier with a different preferred page size, software must first ensure
that the VHPT contains no insertable translations for that rid, purge all translations for that
rid from all processors that may have used it, and then update the region register with the
new preferred page size.
4.1.7VHPT Environment
The processor’s VHP T walker can optionally be configured to search the VHPT for a translation
after a failed instruction or data TLB search. The VHPT walker is enabled for different types of
references under the following conditions:
• Data and non-access references (including IA-32): P TA.ve=1, and RR[VA{63:61}].ve=1, and
PSR.dt=1.
Volume 2: Addressing and Protection2:61
• Instruction fetches (including IA-32): PTA.ve=1, and RR[VA{63:61}].ve=1, and PSR.dt=1,
and PSR.it=1, and PSR.ic=1.
• RSE references: PTA.ve=1, and RR[VA{63:61}].ve=1, and PSR.dt=1, and PSR.rt=1.
If the walker is not enabled, and an attempt is made to reference the VHPT, an Alternate
Instruction/Data TLB Miss fault is raised. The remainder of this section assumes that the VHPT is
enabled.
Region registers must support all implemented page sizes so software can use IHA,
ttag to manage the VHPT. thash and ttag are defined to operate on all page sizes supported by
thash and
the translation cache, regardless of the VHPT walker’s supported page sizes. The PTA register must
be implemented on processor models that do not implement a VHPT walker. Software must ensure
PTA is initialized and serialized before issuing
ttag, thash, before enabling the VHPT walker or
issuing a reference that may cause a VHPT walk. The minimum VHPT size is 32KBytes
(PTA.size=15), and operating systems must ensure that the VHPT is aligned on the natural
boundary of the structure; otherwise, processor operation is undefined. For example, a 64K-byte
table must be aligned on a 64K-byte boundary.
VHPT walker references to the VHPT are performed at privilege level 0, regardless of the state of
PSR.cpl. VHPT byte ordering is determined by the state of DCR.be. When DCR.be=1, VHPT
walker references are performed using big-endian memory formats; otherwise, VHPT walker
references are little-endian. A long-format VHPT reference is matched against the data break-point
registers as a 32-byte reference.
The VHPT is accessed by the processor only if the VHPT is virtually mapped into cacheable
memory areas. The walker may access the VHPT speculatively, i.e., references may be performed
that are not required by an in-order execution of the program. Any VHPT or TLB faults
encountered during a VHPT walker’s search are not reported until the faulting translation is
required by an in-order execution of the program. If the VHPT is mapped into non-cacheable
memory areas the VHPT is not referenced, and all TLB misses result in an Instruction/Data TLB
Miss fault.
The VHPT walker will abort the search and deliver an Instruction/Data TLB Miss fault if an
attempt is made to install translations that have reserved bits or encodings, or if the translation
mapping the VHPT would have taken one of the following faults: Data Page Not Present, Data NaT
Page Consumption, Data Key Miss, Data Key Permission, Data Access Bit, or Data Debug. The
VHPT walker may abort a search and deliver an Instruction/Data TLB Miss fault at any time for
implementation-specific reasons.
The processor’s VHP T walker is required to read and insert VHPT entries from memory atomically
(an 8-byte atomic read-and-insert for short format, and a 32-byte atomic read-and-insert for long
format). Some implementation strategies for achieving this atomicity are as follows:
• If the walker performs its VHPT read with multiple cache accesses which are not done as an
atomic unit, and if an update to part of the entry that is being installed is made in-between these
multiple reads, the walker must abort the insert and deliver an Instruction/Data TLB Miss.
• If the walker performs its VHPT read and the insertion of th e entry into the TLB as separate
actions, and not as an atomic unit, and if an update to part of the entry that is being installed is
made in-between the read and the insert, the walker must either abort the insert and deliver an
Instruction/Data TLB Miss, or ignore the update and install the complete old entry.
• If the purge address range of a TLB purge operation (
ptc.ga, ptr.i, or ptr.d) overlaps the virtual address the walker is attempting to insert, then
2:62Volume 2: Addressing and Protection
ptc.l, ptc.e, local or remote ptc.g or
the walker must either abort the insert and deliver an Instruction/Data TLB Miss, or delay the
purge operation until after the walker either completes the insertion or aborts the walk.
The RSE can only raise a VHPT fault on a mandatory RSE spill/fill operation as defined for
successful execution of an
operations may generate speculative VHPT walks provided encountered faults are not reported.
Data TLB Miss faults encountered during a VHPT walk are permitted and, when PSR.ic=1, are
converted into a VHPT Translation fault as defined in the next section.
alloc, loadrs, flushrs, br.ret or rfi instruction. Eager RSE
4.1.8Translation Searching
The general sequence of searching the TLB and VHPT is shown in Figure 4-16. On a failed TLB
search, if the VHPT walker is disabled for the re ferenced region an Alternate Instruction/Da ta TLB
Miss fault is raised. If the VHPT walker is enable d for the referenced region, the VHPT is accessed
to locate the missing translation. See “VHPT Environment” on page 2:61. If additional TLB misses
are encountered during the VHPT walker’s references, a VHPT Translation fault is raised. If the
VHPT walker does not find the required translation in the VHPT or the search is aborted, an
Instruction/Data TLB Miss fault is raised. Otherwise the entry is loaded into the ITC or DTC.
Provided the above fault conditions are not detected, the processor may load the entry into the ITC
or DTC even if an in-order execution of the program did not require the translation.
See T able 4-1, “Purge Behavior of TLB Inserts and Purges,” on page 2:47 for the purge behavior of
VHPT walker inserts.
After the translation entry is loaded, additional TLB faults are checked; these include in priority
order: Page Not Present, NaT page Consumption, Key Miss, Key Permission, Access Rights,
Access Bit, and Dirty Bit faults. Table 4-9 describes the TLB and VHPT walker related faults.
On a failed TLB/VHPT search, the processor loads interruption registers and translation defaults as
defined in “Interruption Vector Descriptions” on page 2:157 defining the parameters of the
translation fault. Provided the operating system accepts the defaults provided, only the physical
address portion of a TLB entry need be provided on a TLB insert.
Volume 2: Addressing and Protection2:63
Figure 4-16. TLB/VHPT Search
Alternate Instruction
TLB Miss fault
VHPT Instruction fault
Instruction TLB Miss fault
Faults:
Page Not Present
NaT Page Consumption
Key Miss
Key Permission
Access Rights
Access Bit
Debug
Instruction TLB VHPT Search
Virtual Address
Search TLB
No
Inst VHPT Wa lker En able d
VHPT Walker
TLB Miss
Search VHPT
Failed Search:
T ag M ismatch or
Walker Abort
TC Insert
Fault Checks
Access Memory
Not Found
Yes
Found
No Fault
Found
Unimplemented Data Address fault
Data Nested TLB
fault
Alternate Data
TLB Miss fault
Data Nested TLB
fault
VHPT Data fault
Data Nested TLB
fault
Data TLB Miss
fault
Faults:
Page Not Present
NaT Page Consumption
Key Miss
Key Permission
Access Rights
Dirty Bit
Access Bit
Debug
Unaligned Data Reference
Unsupported Data Reference
0
PSR.ic
1/In-flight
0
PSR.ic
1/In-flight
0
PSR.ic
1/In-flight
Data TLB VHPT Search
Virtual Address
No
Implemented VA?
Search TLB
No
VHPT Walker Enabled
VHPT Walker
TLB Miss
Search VHPT
Failed Search:
Tag Mismatch or
Walker Abort
TC Insert
Fault Checks
Access Memory
Yes
Found
Not Found
Data
Yes
Found
No Fault
Table 4-9. TLB and VHPT Search Faults
FaultDescription
VHPT Instruction/DataRaised if there is an additional TLB miss when the VHPT walker
Alternate Instruction/Data
TLB Miss
Instruction/Data TLB MissRaised when the VHPT walker is enabled, but the processor:
attempts to access the VHPT. Typically used to construct leaf table
mappings for linear page table configurations.
Raised when the VHPT walker is not enabled and an instruction or
data reference causes a TLB miss. For example, the VHPT walker
can be disabled within a given virtual region so region-specific
translation algorithms can be utilized.
• Cannot locate the required VHPT entry, or
• The processor aborts the VHPT search for
implementation-specific reasons, or
• The VHPT walker is not implemented, or
• The referenced region specifies a non-supported VHPT
preferred page size, or
• Reserved fields or unimplemented PPN bits are used in the
translation, or
• The hash address falls into unimplemented virtual address
space, or
• The hash address matches a data debug register.
Instruction/Data TLB Miss handlers are essentially software walkers
of the VHPT.
2:64Volume 2: Addressing and Protection
Table 4-9. TLB and VHPT Search Faults (Continued)
FaultDescription
Data Nested TLBRaised when a Data TLB Miss, Alternate Data TLB Miss, or VHPT
Instruction/Data Page Not PresentThe referenced translation’s P-bit is 0.
Instruction/Data NaT Page
Consumption
Instruction/Data Key MissThe referenced translation’s permission key is not present in the set
Instruction/Data Access RightsPage granular read, write, execute and privilege level accesses are
Data Dirty BitThe referenced translation’s Dirty bit is 0 on a store or semaphore
Instruction/Data Access BitThe referenced translation’s Access bit is 0.
Data Translation fault occurs and PSR.ic is 0 and not in-flight (e.g.,
fault within a TLB miss handler). Data Nested TLB faults enable
software to avoid overheads for potential data TLB Miss faults.
A non-speculative load, store, mandatory RSE load/store, execution
on, or semaphore operation accesses a page marked with the
physical memory attribute NaTPage. See “Not a Thing Attribute
(NaTPage)” on page 2:79 for details.
of valid protection key registers.
by the matching protection key registers.
denied.
operation.
4.1.932-bit Virtual Addressing
32-bit virtual data addressing is supported in the Itanium instruction set architecture by three
models: zero-extension, sign-extension, and pointer “swizzling.” IA-32 memory references use the
zero-extension model, all IA-32 32-bit virtual linear addresses are zero extended into the 64-bit
virtual address space.
The zero-extension model performs address computations with the
add and shladd instructions
while software ensures that the upper 32-bits are always zeros. This model constrains 32-bit virtual
addressing to virtual region zero. In this model, regions 1 to 7 are accessible only by 64-bit
addressing.
In the sign-extension model, software ensures that the upper 32-bits of a virtual address are always
equal to bit 31. Address computations use the
the 32 bit address space into two halves that are spread into 2
add, shladd, and sxt instructions. This model splits
31
bytes of virtual regions 0 and 7
within the 64-bit virtual address space. In this model, regions 2 to 6 are accessible only by 64-bit
addressing.
The pointer “swizzling” model performs address computations with the
addp4, and shladdp4
instructions. These instructions generate a 32-bit address within the 64-bit virtual address space as
shown in Figure 4-17. The 32-bit virtual address space is divided into 4 sections that are spread into
30
2
bytes of virtual regions 0 to 3 within the 64-bit virtual address space. In this model, regions 4 to
7 are accessible only by 64-bit addressing.
Volume 2: Addressing and Protection2:65
Figure 4-17. 32-bit Address Generation using addp4
63
In the pointer “swizzling” model, mappings within each region do not necessarily start at offset
zero, since the upper 2-bits of a 32-bit address serve both as the virtual region number and an offset
within each region. Virtual address bits{62:61} do not participate in the address addition, therefore
some regions may be effectively larger than 2
of a carry into bits{62:61}. Note that the conversion is non-destructive: a converted 64-bit pointer
can be used as a 32-bit pointer. Flat 31 or 32 bit address spaces can be constructed by assigning the
same region identifier to contiguous region registers. Branches into another 2
performed by first calculating the target address in the 32-bit virtual space and then converting to a
64-bit pointer by
addp4. Otherwise, branch targets will extend above the 2
the originating region.
4.1.10Virtual Aliasing
Base
32 31 30 29
63 62 61 60
0
000000
Offset
0
63
32 310
+
32 310
30
bytes due to the addition of a 32-bit offset and lack
30
-byte region are
30
byte boundary within
Virtual aliasing (two or more virtual pages mapped to the same physical page) is functionally
supported for memory references (including IA-32), however performance may be degraded on
some processor models where the distance between virtual aliases is less than 1 MB. To avoid any
possible performance degradation, software is advised to use aliases whose virtual addresses differ
by an integer multiple of 1 MB. The processor ensures cache coherency and data dependencies in
the presence of an alias. Stores using a virtual alias followed by a load with another alias to the
same physical location see the effects of prior stores to the same physical memory location.
To support advanced loads in the presence of a virtual alias, the processor ensures that the
Advanced Load Address Table (ALAT) is resolved using physical addresses and is coherent with
physical memory. For details, please refer to “Detailed Functionality of the ALAT and Related
Instructions” on page 1:60.
4.2Physical Addressing
Objects in memory and I/O occupy a common 63-bit physical address space that is accessed using
byte addresses. Accesses to physical memory and I/O may be performed via virtual addresses
mapped to the 63-bit physical address space or by direct physical addressing. Current page table
formats allow for mapping virtual addresses into 50 bits of physical address space (on processor
implementations that support this many physical address bits). Future extensions to the page table
formats will allow larger mappings, up to the full 63 bits of physical address space.
Physical addressing for instruction references (including IA-32) is enabled wh en PSR.it is 0, data
references (including IA-32) when PSR.dt is 0, and register stack references when PSR.rt is 0.
2:66Volume 2: Addressing and Protection
While software views the physical addressing as being 63-bits, implementations may implement
between 32 and 63 physical address bits. All processor models must implement a contiguous set of
physical address bits starting at bit 32 and continuing upwards. Please see the processor-specific
documentation for further information on the number of physical address bits implemented on the
Itanium processor. Implementations must validate that memory references are performed to
implemented physical address bits. Instruction references to unimplemented physical addresses
result either in an Unimplemented Instruction Address trap on the last valid instruction, or in an
Unimplemented Instruction Address fault on the instruction fetch of the unimplemented address.
Data references to unimplemented physical addresses result in an Unimplemented Data Address
fault. Memory references to unpopulated address ranges result in an asynchronous Machine Check
abort, when the platform signals a transaction time-out. Exact machine check behavior is model
specific.
4.3Unimplemented Address Bits
Based on the processor model, some physical and/or virtual address bits may not be implemented.
Regardless of the number of implemented address bits, all general purpose, branch, control and
application registers implement all 64 register bits on all processors. Similarly, regardless of the
number of implemented address bits, data and instruction breakpoint registers must implement all
64 address bits and all 56 mask bits on all processors.
4.3.1Unimplemented Physical Address Bits
As shown in Figure 4-18, a 64-bit physical address consists of three fields: physical memory
attribute (PMA), unimplemented and implemented bits.
Figure 4-18. Physical Address Bit Fields
63 62IMPL_PA_MSB0
PMAunimplementedimplemented
162 - IMPL_PA_MSBIMPL_PA_MSB + 1
All processor models implement at least 32 physical address bits, bits 0 to 31, plus the physical
memory attribute bit. Additional implemented physical bits must be contiguous starting at bit 32.
IMPL_PA_MSB is the implementation-specific position of the most significant implemented
physical address bit. In a processor that implements all physical address bits, IMPL_PA_MSB is
62. Please see the processor-specific documentation for further information on the number of
physical address bits implemented on the Itanium processor.
If unimplemented physical address bits are set by software, an Unimplemented Data Address fault
is raised during the TLB insert instructions (
noted in “VHPT Hashing” on page 2:59, abort the VHPT search if unimplemented or reserved
fields are used. For translations marked as Not-Present (TLB.p is 0), the processor does not check
the validity of PPN and some reserved bits as noted in Figure 4-6.
When a processor model does not implement all physical address bits, the missing bits are defined
to be zero. Physical addresses in which bits PA{62:min(IMPL_PA_MSB+1,62)} are not zero are
considered “unimplemented” physical addresses on that processor model. Physical addresses are
checked for correctness on use by ensuring that PA{62:min(IMPL_PA_MSB+1,62)} bits are zero.
itc, itr). Inserts performed by the VHPT walker, as
Volume 2: Addressing and Protection2:67
4.3.2Unimplemented Virtual Address Bits
As shown in Figure 4-19, a 64-bit virtual address consists of three fields: virtual region number
(VRN), unimplemented and implemented bits.
Figure 4-19. Virtual Address Bit Fields
636160IMPL_VA_MSB0
VRNunimplementedimplemented
360 - IMPL_VA_MSBIMPL_VA_MSB + 1
All processor models provide three VRN bits in V A{63:61}. IMPL_VA_MSB is the
implementation-specific bit position of the most significant implemented virtual address bit. In
addition to the three VRN bits, all processor models implement at least 51 virtual address bits; i.e.,
the smallest IMPL_VA_MSB is 50. In a processor that implements all 64 virtual address bits
IMPL_VA_MSB is 60. Please see the processor-specific documentation for further information on
the number of virtual address bits implemented on the Itanium processor.
If the PSR.vm bit is implemented, and if PSR.vm is 1, then virtual addresses are treated as though
one additional virtual address bit were unimplemented. If the PSR.vm bit is implemented, at least
52 virtual address bits must be implemented.
When a processor model does not implement all virtual address bits, the missing bits are defined to
be a sign-extension of VA{IMPL_VA_MSB}. Virtual addresses in which bits
VA{60:min(IMPL_VA_MSB+1,60)} do not match VA{IM PL_VA_MSB} are considered
“unimplemented” virtual addresses on that processor model. Virtual addresses are checked for
correctness on use by ensuring that VA{60:min(IMPL_VA_MSB+1,60)} bits are identical to
VA{IMPL_VA_M SB}.
4.3.3Instruction Behavior with Unimplemented Addresses
The use of an unimplemented address affects instruction execution as described in the bullet list
below. If instruction address translation is enabled, an “unimplemented address” refers to an
unimplemented virtual address. If instruction address translation is disabled, an “unimplemented
address” refers to an unimplemented physical address.
• Non-speculative memory references (non-speculative loads, stores, and semaphores), the
following non-access references:
mandatory RSE operations to unimplemented addresses result in an Unimplemented Data
Address fault.
• Virtual addresses used by instruction and data TLB purge/insert operations are checked, and if
the base address (register r3 of the purge, IFA for inserts) targets an unimplemented virtual
address, a Unimplemented Data Address fault is raised. The page size of the insert or purge is
ignored.
• Speculative loads from unimplemented addresses always return a NaT bit in the target register.
• A non-faulting
probe instruction to an unimplemented address returns zero in the target
register.
•A
tak instruction to an unimplemented address returns one in the target register.
• A non-faulting
lfetch to an unimplemented address is silently ignored.
• Eager RSE operations to unimplemented addresses do not fault.
fc, fc.i, tpa, lfetch.fault, and probe.fault, and
2:68Volume 2: Addressing and Protection
• Execution of a taken branch, taken chk, or an rfi to an unimplemented address, or execution
of a non-branching slot 2 instruction in a bundle at the upper edge of the implemented address
space (where the next sequential bundle address would be an unimplemented address) results
either in an Unimplemented Instruction Address trap on the branch,
slot 2 instruction, or in an Unimplemented Instruction Address fault on the fetch of the
unimplemented address.
•When
ptc.g or ptc.ga operations place a virtual address on the bus, the virtual address is
sign-extended to a full 64-bit format. If an incoming
ptc.g or ptc.ga presents a virtual
address base that targets an unimplemented virtual address, the upper (unimplemented) virtual
address bits are dropped, and the purge is performed with the truncated address.
• The behavior of executing
vmsw.1 in a bundle whose address will become unimplemented
after PSR.vm is set to 1 is undefined.
4.4Memory Attributes
When virtual addressing is enabled, memory attributes defining the speculative, cacheability and
write-policies of the virtually mapped physical page are defined by the TLB. When physical
addressing is enabled, memory attributes are supplied as described in “Physical Addressing
Memory Attributes” on page 2:70.
4.4.1Virtual Addressing Memory Attributes
chk, rfi or non-branching
For virtual memory references, the memory attribute field of each virtual translation describes
physical memory properties as shown in Table 4-10.
a. The Coherency column in this table refers to multiprocessor coherence on normal, side-effect free memory.
The data dependency rules defined in “Memory Access Ordering” on page 1:68 ensure uni-processor
coherence for the memory attributes listed in each row.
b. WC is not MP coherent w.r.t. any memory attribute, but is uni-processor coherent w.r.t. itself.
c. This memory attribute is reserved for Software use.
c
WC110
UCE101
001
010
011
CoalescingNot MP coherent
Uncacheable
Non-coalescing
The attribute UCE is identical to UC except when executing an
enables the exporting of the
fetchadd instruction outside the processor. Support for UCE is
model-specific; see “Effects of Memory Attributes on Memory Reference Instructions” on
page 2:79 for details.
Coherent
Respect to
Non-sequential &
speculative
Sequential &
non-speculative
WB, WBL
UC, UCE
fetchadd instruction. UCE
a
with
b
Volume 2: Addressing and Protection2:69
Insert TLB instructions (itc, itr) that attempt to insert reserved memory attributes (Table 4-10)
into the TLB raise Reserved Register/Field faults. External system operation is undefined if
software inserts a memory attribute supported by the processor but not supported by the external
system.
If software modifies the memory attributes for a page, it must follow the attribute transition
requirements in Section 4.4.11, “Memory Attribute Transition” on page 2:81.
It is recommended that processor models report a Machine Check abort if the following memory
attribute aliasing is detected:
• Cache hit on an uncacheable page, other than as the target of a local or remote flush cache (
fc.i) instruction (see “Effects of Memory Attributes on Memory Reference Instructions” on
page 2:79).
4.4.2Physical Addressing Memory Attributes
The selection of memory attributes for physical addressing is selected by bit 63 of the address
contained in the address base register as shown in Figure 4-20 and Table 4-11.
a. Coherency here refers to multiprocessor coherence on normal, side-effect free memory.
limited speculation
non-speculative
See “Speculation Attributes” on page 2:73 for a description of physical addressing limited
speculation. Bit{63} is discarded when forming the physical address, effectively creating a
write-back name space and an uncached name space as shown in Figure 4-21.
0
Coherent
a
with
respect to
WBL, WB
UC, UCE
2:70Volume 2: Addressing and Protection
Figure 4-21. Addressing Memory Attributes
64
2
Base Register
64
2
Uncached
Non-speculative
Name Space
Cached Write-back
Limited Speculation
Name Space
UC
63
2
WBL
0
Software must use the correct name space when using physical addressing; otherwise, I/O devices
with side-effects may be accessed speculatively. Physical addressing accesses are ordered only if
ordered loads or ordered stores are used. Otherwise, physical addressing memory references are
unordered.
4.4.3Cacheability and Coherency Attribute
A page can be either cacheable or uncacheable. If a page is marked cacheable, the processor is
permitted to allocate a local copy of the corresponding physical memory in all levels of the
processor memory/cache hierarchy. Allocation may be modified by the cache control hints of
memory reference instructions.
263 Physical
Address Space
63
2
A page which is cached is coherent with memory; i.e., the processor and memory system ensure
that there is a consistent view of memory from each processor. Processors support multiprocessor
cache coherence based on physical addresses between all processors in the coherence domain
(tightly coupled multiprocessors). Coherency is supported in the presence of virtual aliases,
although software is recommended to use aliases which are an integer multiple of 1 MB apart to
avoid any possible performance degradation.
Processors are not required to maintain coherency between processor local instruction and data
caches for Itanium architecture-based code; i.e., locally initiated Itanium stores may not be
observed by the local instruction cache. Processors are required to maintain coherency between
processor local instruction and data caches for IA-32 code. Instruction caches are also not required
to be coherent with multiprocessor Itanium instruction set originated memory references.
Instruction caches are required to be coherent with multiprocessor IA-32 instruction set originated
memory references. The processor must ensure that transactions from other I/O agents (such as
DMA) are physically coherent with the instruction and data cache.
For non-cacheable references the processor provides no coherency mechanisms; the memory
system must ensure that a consistent view of memory is seen by each processor. See “Coalescing
Attribute” on page 2:72 for a description of coh e rency for the coalescing memory attribute.
Volume 2: Addressing and Protection2:71
4.4.4Cache Write Policy Attribute
Write-back cacheable pages need only modify the processor’s copy of the physical memory
location; written data need only be passed to the memory system when the processor’s copy is
displaced, or a Flush Cache (
fc) instruction is issued to flush a virtual address. A cache line can
only be written back to memory if a store, semaphore (successful or not), the
mandatory RSE store, or a
.excl hinted lfetch instruction targeting that line has executed without a
fault. These events enable write-backs. A synchronized
write-backs (after the line has been flushed).
As described in “Invalidating ALAT Entries” on page 1:62, platform visible removal of cache lines
from a processor’s caches (e.g., cache line write-backs or platform visible replacements) cause the
corresponding ALAT entries to be invalidated.
4.4.5Coalescing Attribute
For uncacheable pages, the coalescing attribute informs the processor that multiple stores to this
page may be collected in a coalescing buffer and issued later as a single larger mer ged transaction.
The processor may accumulate stores for an indefinite period of time. Multiple pending loads may
also be coalesced into a single larger transaction which is placed in a coalescing buffer. Coalescing
is a performance hint for the processor; a processor may or may not implement coalescing.
A processor with multiple coalescing buffers must provide a flush policy that flushes buffers at
roughly equal rate even if some buffers are only partially full. The processor may make coalesced
buffer flushes visible in any order. Furthermore, individual bytes within a single coalesced buffer
may be flushed and made visible in any order.
ld.bias, a
fc instruction disables subsequent
Stores (including IA-32), which are coalesced, are performed out of order; coalescing may occur in
both the space and time domains. For example, a write to bytes 4 and 5 and a write to bytes 6 and 7
may be coalesced into a single write of bytes 4, 5, 6, and 7. In addition, a write of bytes 5 and 6 may
be combined with a write of bytes 6 and 7 into a single write of bytes 5, 6, and 7.
Any release operation (regardless of whether it references a page with a coalescing memory
attribute), or any fence type instruction, forces write-coalesced data to be flushed and made visible
prior to the instruction itself becoming visible. (See Table 4-14 on page 2:76 for a list of release and
fence instructions.) Any IA-32 serializing instruction, or access to an uncached memory type,
forces write-coalesced data to become flushed and made visible prior to itself becoming visible.
Even though IA-32 stores and loads are ordered, the write-coalesced data is not flushed unless the
IA-32 stores or loads are to uncached memory types.
The Flush Cache (
least 32 bytes of the 32-byte aligned address specified by the Flush Cache (
forcing the data to become visible. The Flush Cache (
additional write-coalesced data. The Flush Write buffers (
fc, fc.i) instruction flushes all write-coalesced data whose address is within at
fc, fc.i) instruction,
fc, fc.i) instruction may also flush
fwb) instruction is a “hint” to the
processor to expedite flushing (visibility) of any pending stores held in the coalescing buffer(s),
without regard to address.
No indication is given when the flushing of the stores is completed. An
fwb instruction does not
ensure ordering of coalesced stores, since later stores may be flushed before prior stores. To ensure
prior coalesced stores are made visible before later stores, software must issue a release operation
between stores.
2:72Volume 2: Addressing and Protection
The processor may at any time flush coalesced stores in any order before explicitly requested to do
so by software.
Coalesced pages are not ensured to be coherent with other processors’ coalescing buffers or caches,
or with the local processor’s caches. Loads to coalesced memory pages by a processor see the
results of all prior stores by the same processor to the same coalesced memory page. Memory
references made by the coalescing buffer (e.g., buffer flushes) have an unordered non-sequential
memory ordering attribute. See “Sequentiality Attribute and Ordering” on page 2:75.
Data that has been read or prefetched into a coalescing buffer prior to execution of an Itanium
acquire or fence type instruction is invalidated by the acquire or fence instruction. (See Table 4-14
for a list of acquire and fence instructions.)
4.4.6Speculation Attributes
For present pages (TLB.p=1) which are marked with a speculative or a NaTPage memory attribute,
the processor may prefetch instructions (including IA-32), perform address generation and perform
load accesses (including IA-32) without resolving prior control dependencies, including predicates,
branches and interruptions. A page should only be marked speculative if accesses to that page have
no side-effects. For example, many memory-mapped I/O devices have side-effects associated with
reads and should be marked non-speculative. If a page is marked speculative, a processor can read
any location in the page at any time independent of a programmer’s intentions or control flow
changes. As a result, software is required, at all times, to maintain val id page t able attrib utes for t he
ppn, ps and ma fields of all present translations whose memory attribute is speculative or NaTPage.
High-performance operation is only attainable on speculative pages. The speculative attribute is a
hint; a processor may behave non-speculatively.
Prefetches are enabled if a speculative translation exists. Prefetches are asynchronous data and
instruction memory accesses that appear logically to initiate and finish between some pair of
instructions. This access may not be visible to subsequent flush cache (
instructions. This behavior is implementation-dependent.
The processor will not initiate memory references (16-byte instruction bundle fetch es, IA-32
instruction fetches, RSE fills and spills, VHPT references, and data memory accesses) to
non-speculative pages until all previous control dependencies (predicates, branches, and
exceptions) are resolved; i.e., the memory reference is required by an in-order execution of the
program. Additionally, for references to non-speculative pages, the processor:
• May not generate any memory access for a control or data speculative data reference.
• Will generate exactly one memory access for each aligned, non-speculative data reference.
(Misaligned data references may cause multiple memory accesses, although these accesses are
guaranteed to be non-overlapping – each byte will be accessed exactly once.)
• May generate multiple 16-byte memory accesses (to the same address) for each 16-byte
instruction bundle fetch reference.
To ensure virtual and physical accesses to non-speculative pages are performed in program order
and only once per program order occurrence, the rules in Table 4-12 and Table 4-13 are defined.
Software should also ensure that RSE spill/fill transactions are not performed to non-speculative
memory that may contain I/O devices; otherwise, system behavior is undefined.
a. Includes the faulting form of line prefetch (lfetch.fault).
b. Includes the non-faulting form of line prefetch (lfetch), which does not cause a cache fill if the memory
attribute is non-speculative or limited speculation.
c. Hardware-generated speculative references include non-demand instruction prefetches (including IA-32),
hardware-generated data prefetch references, and eager RSE memory references.
d. The processor may only issue hardware-generated speculative references to a 4K-byte physical page if it is a
verified page.
Load
(ld)
Speculative
a
Load
(ld.s)
Hardware-generated
Speculative
References
d
Table 4-13. Register Return Values on Non-faulting Advanced/Speculative Loads
a. Speculative or speculative advanced loads that cause deferred exceptions result in failed speculation. The
processor aborts the reference. If the target of the load is a GR, the processor sets the register’s NaT bit to
one. If the target of the load is an FR, the processor sets the target FR to NaTVal. The processor performs all
other side-effects (such as post-increment).
b. Speculative or speculative advanced loads to limited or non-speculative memory pages result in failed
speculation. The processor aborts the reference. If the target of the load is a GR, the processor sets the
register’s NaT bit to 1. If the target of the load is an FR, the processor sets the target FR to NaTVal. The
processor performs all other side-effects (such as post-increment).
c. Advanced loads to non-speculative memory pages always fail. The processor aborts the reference, sets the
target register to zero, and performs all other side-effects (such as post-increment).
Speculative Load
(ld.s)
SuccessFailureSuccessFailureSuccessFailure
a
b
b
Advanced Load
(ld.a)
ValueN/aValueNaT
N/AZero
ValueN/aN/aNaT
Speculative Advanced Load
c
N/ANaT
(ld.sa)
c
a
b
b
4.4.6.1Limited Speculation and the WBL Physical Addressing Attribute
Processors are allowed to reference limited speculation pages (WBL pages) speculatively, in order
to increase performance, but this speculation is limited to prevent speculative references to 4Kbyte
physical pages for which there is no actual memory (which would cause spurious machine checks).
Processors must not make hardware-generated speculative references to a given WBL 4Kbyte page
until a verified reference has been made. Processors may optionally implement storage to hold the
addresses of WBL 4Kbyte pages for which verified references have been made, and may make
subsequent hardware-generated speculative references to these pages. Such pages are termed
verified pages.
A verified reference is an instruction or data reference made to the page by an in-order execution of
the program; that is, a reference which would have been made had the instructions from the
program been fetched and executed one at a time. A hardware-generated speculative reference does
not constitute a verified reference. Hardware-generated speculative references include:
• Instruction fetches when the processor has not yet determined whether prior branches were
predicted correctly
2:74Volume 2: Addressing and Protection
• Instruction fetches when the processor has not yet determined whether prior instructions will
raise faults or traps
• Data references by instructions when the processor has not yet determined whether prior
branches were predicted correctly
• Data references by instructions when the processor has not yet determined whether prior
For an instruction fetch to constitute a verified reference, it must only be determined that an
in-order execution of the program requires that the IP point to this address, independent of whether
the instruction at this address will subsequently take a fault or interrupt.
For a data reference to constitute a verified reference, the instruction must meet one of the
following requirements:
• It executes without any fault or interrupt
• It takes an Unaligned Data Reference fault
• It takes a Data Debug fault
• It takes an External interrupt, but if it had not taken an External interrupt, it would have met
one of the above qualifications (execute without fault, take an Unaligned Data Reference fault,
or take a Data Debug fault)
Data-speculative loads are treated the same as normal loads, and if an in-order execution of the
program requires the execution of a data speculative load, it constitutes a verified reference.
Control-speculative loads to limited-speculation pages always defer and thus never constitu te
verified references.
It is not necessary for a processor to determine whether a reference will complete without
generating a machine check for it to be a verified reference. If software actually references a
physical address which will cause a machine check, hardware may generate multiple speculative
references to the same page, potentially causing multiple machine checks.
Processors may access verified pages normally, as they would WB pages, including the use of
caching, pipelining and hardware-generate speculative references to improve performance.
Calling the PAL_PREFETCH_VISIBILITY procedure forces the processor to clear the storage
holding the addresses of verified pages.
4.4.7Sequentiality Attribute and Ordering
Memory ordering is defined in Section 4.4.7, “Memory Access Ordering” on page 1:68. This
section defines additional ordering rules for non-cacheable memory, cache synchronization
(
sync.i) and global TLB purge operations (ptc.g, ptc.ga).
As described in Section 4.4.7, “Memory Access Ordering” on page 1:68, read-after-write,
write-after-write, and write-after-read dependencies to the same memory location (memory
dependency) are performed in program order by the processor. Otherwise, all other memory
references may be performed in any order unless the reference is specifically marked as ordered.
Volume 2: Addressing and Protection2:75
IA-32 memory references follow a stronger processor consistency memory model. See “IA-32
Memory Ordering” on page 2:255. for IA-32 memory ordering details. Explicit ordering takes the
form of a set of Itanium instructions: ordered load and check load (
ordered store (
synchronization (
sync.i) and global TLB purge (ptc.g, ptc.ga). The sync.i instruction is
ld.acq, ld.c.clr.acq),
used to maintain an ordering relationship between instruction and data caches on local and remote
processors. The global TLB purge instructions maintain multiprocessor TLB coherence.
For VHPT walks, visibility is defined by the memory read(s) which retrieves translation
information, and the associated insertion of the translation into the TLB. VHPT walks are
performed asynchronously with respect to program execution, and each walker VHPT read (which
appears as though it were performed atomically) is made visible at some single point in the
program order. Ordering constraints from Table 4-14 do not prevent VHPT walks from becoming
visible.
Table 4-14 defines a set of “Orderable Instructions” that follow one of four ordering semantics:
unordered, release, acquire or fence. The table defines the ordering semantics and the instructions
of each category. Only these Itanium instructions can be used to establish multiprocessor ordering
relations.
In the following discussion, the terms previous and subsequent are used to refer to the program
specified order. The term visible is used to refer to all architecturally visible effects of performing
an instruction. For memory accesses and semaphores this involves at least reading or writing
memory. For
Visibility of
ALAT lookups (
mf.a, visibility is defined by platform acceptance of previous memory accesses.
sync.i is defined by visibility of previous flush cache (fc, fc.i) operations. For
ld.c, chk.a), visibility is determination of ALAT hit or miss. For global TLB
purge operations, visibility is defined by removal of an address translation from the TLBs on all
processors in the TLB coherence domain. Global TLB purge instructions (
ptc.g and ptc.ga)
follow release semantics on the local processor as well as on remote processors, except with respect
to global purge instructions being executed by that remote processor. For local TLB purge
operations, visibility is defined by removal of an address translation on the local processor. Local
TLB purge instructions (
ptc.l, ptc.e) ensure that all prior stores are made locally visible before
the actual purge operation is performed.
Table 4-14. Ordering Semantics and Instructions
Ordering
Semantics
Unordered instructions may become visible in
any order.
Unordered
Release
2:76Volume 2: Addressing and Protection
Release instructions guarantee that all
previous orderable instructions are made
visible prior to being made visible themselves.
Table 4-14. Ordering Semantics and Instructions (Continued)
Ordering
Semantics
Acquire
Fence
DescriptionOrderable Intel
Acquire instructions guarantee that they are
made visible prior to all subsequent orderable
instructions.
Fence instructions combine the release and
acquire semantics into a bi-directional fence;
i.e., they guarantee that all previous orderable
instructions are made visible prior to any
subsequent orderable instruction being made
visible.
Itanium memory accesses to sequential pages occur in program order with respect to all other
sequential pages in the same peripheral domain, but are not necessarily ordered with respect to
non-sequential page accesses. A peripheral domain is a platform-specific collection of uncacheable
addresses. An I/O device is normally contained in a peripheral domain and all sequential accesses
from one processor to that device will be ordered with respect to each other. Sequentiality ensures
that uncacheable, non-coalescing memory references from one processor to a peripheral domain
reach that domain in program order. Sequentiality does not imply visibility.
Inter-Processor Interrupt Messages (8-byte stores to a Processor Interrupt Block address, through a
UC memory attribute) are exceptions to the sequential semantics. IPI's are not ordered with respect
to other IPI's directed at the same processor. Further, fence operations do not enforce ordering
between two IPI's. See Section 5.8.4.2, “Interrupt and IPI Ordering” on page 2:124.
Table 4-15 defines the ordering between unordered, release, acquire and fence type operations to
sequential and non-sequential pages. Table 4-15 defines the minimal ordering requirem ents; an
implementation may enforce more restrictive ordering than required by the architecture. The actual
mechanism for enforcing memory access ordering is implementation dependent.
Table 4-15. Ordering Semantics
First OperationFence
FenceOOOOOOO
Non-sequentialAcquireOOOOOOO
ReleaseO–O––O–
UnorderedO–O––O–
a
Sequential
a. Except for IPI.
b. “O” indicates that the first and second operation become visible in program order.
c. A dash indicates no ordering is implied.
d. “S” indicates that the first and the second operation reach a peripheral domain in program order.
e. “OS” implies that both “O” and “S” ordering relations apply.
AcquireOOOOOSOSOS
ReleaseO–O–SOSS
UnorderedO–O
Table 4-15 establishes an order between operations on a particular processor. For operations to
cacheable write-back memory the order established by these rules is observed by all observers in
the coherence domain.
For example, when this sequence is executed on a processor:
st [a]
st.rel [b]
and a second processor executes this sequence:
ld.acq [b]
ld [a]
if the second processor observes the store to [b], it will also observe the store to [a].
Unless an ordering constraint from Table 4-15 prevents a memory read
1
from becoming visible, the
read may be satisfied with values found in a store buffer (or any logically equivalent structure).
These values need not be globally visible even when the operation that created the value was a
st.rel. This local bypassing behavior may make accesses of different sizes but with overlapping
memory references appear to complete non-atomically. To ensure that a memory write is globally
observed prior to a memory read, software must place an explicit fence operation between the two
operations.
Aligned
st.rel and semaphore operations
2
from multiple processors to cacheable write-back
memory become visible to all observers in a single total order (i.e., in a particular interleaving; if it
becomes visible to any observer, then it is visible to all observers), except that for
processor may observe (via
The Itanium architecture ensures this single total order only for aligned
operations to cacheable write-back memory. Other memory operations
ld or ld.acq) its own update prior to it being observed globally.
st.rel and semaphore
3
from multiple processors
st.rel each
are not required to become visible in any particular order, unless they are constrained w.r.t. each
other by the ordering rules defined in Table 4-15.
Ordering of loads is further constrained by data dependency. That is, if one load reads a value
written by an earlier load by the same processor (either directly or transitively, through either
registers or memory), then the two loads become visible in program order.
For example, when this sequence is executed on a processor:
st [a] = data
st.rel [b] = a
and a second processor executes this sequence:
ld x = [b]
ld y = [x]
if the second processor observes the store to [b], it will also observe the store to [a].
Also for example, when this sequence is executed on a processor:
st [a]
st.rel [b] = ‘new’
1. This includes all types of loads (ld and ld.acq), and RSE memory reads. Note, however, that the read operation
of semaphores cannot be satisfied with values found in a store buffer.
2. Both acquire and release semaphore forms
3. e.g. unordered stores, loads, ld.acq, or memory operations to pages with attributes other than write-back
cacheable.
2:78Volume 2: Addressing and Protection
and a second processor executes this sequence:
ld x = [b]
cmp.eq p1 = x, ‘new’
(p1)ld y = [a]
if the second processor observes the store to [b], it will also observe the store to [a].
And for example, when this sequence is executed on a processor:
st [a]
st.rel [b] = ‘new’
and a second processor executes this sequence:
ld x = [b]
cmp.eq p1 = x, ‘new’
(p1)br target
target:
...
ld y = [a]
if the second processor observes the store to [b], it will also observe the store to [a].
The flush cache (
fc, fc.i) instruction follows data dependency ordering. fc and fc.i are ordered
only with respect to previous and subsequent load, store, or semaphore instructions to the same
line, regardless of the specified memory attribute. Subsequent memory operations to the same line
need not wait for prior
fc or fc.i completion before being globally visible. fc and fc.i are not
ordered with respect to memory operations to different lines.
fc.i operations. Instead, the sync.i instruction synchronizes fc and fc.i instructions, and the
sync.i is made visible using an mf instruction.
4.4.8Not a Thing Attribute (NaTPage)
A NaTPage attribute prevents non-speculative references to a page, and ensures that speculative
references to the page always defer the Data NaT Page Consumption fault. However, as described
in “Speculation Attributes” on page 2:73, the processor may issue memory references to a
NaTPage. As a result, all NaTPages must be backed by a valid physical page.
Speculative or speculative advanced loads to pages marked as a NaTPage cause the deferred
exception indicator (NaT or NaTVal) to be written to the load target register, and the memory
reference is aborted. However, all other effects of the load instruction such as post-increment are
performed. Instruction fetches, loads, stores and semaphores (including IA-32), but except for
Itanium speculative loads, pages marked as NaTPage raise a NaT Page Consumption fault.
A speculative reference to a page marked as NaTPage may still take lower priority faults, if not
explicitly deferred in the DCR. See “Deferral of Speculative Load Faults” on page 2:98.
mf does not ensure visibility of fc and
4.4.9Effects of Memory Attributes on Memory Reference
Instructions
Memory attributes affect the following Itanium instructions.
Volume 2: Addressing and Protection2:79
• ldfe, stfe: Hardware support for 10-byte memory accesses to a page that is neither a
cacheable page with write-back write policy nor a NaTPage is optional. On processor
implementations that do not support such accesses, an Unsupported Data Reference Fault is
raised when an unsupported reference is attempted.
For extended floating-point loads the fault is delivered only on the normal, advanced, and
check load flavors (
the
ldfe instruction that target pages that are not cacheable with write-back policy always
ldfe, ldfe.a, ldfe.c.nc, ldfe.c.clr). Control speculative flavors of
defer the fault. Refer to “Deferral of Speculative Load Faults” on page 2:98 for details.
•
cmpxchg and xchg: These instructions are only supported to cacheable pages with write-back
write policy.
fault.
cmpxchg and xchg accesses to NaTPages causes a Data NaT Page Consumption
cmpxchg and xchg accesses to pages with other memory attributes cause an
Unsupported Data Reference fault.
•
fetchadd: The fetchadd instruction can be executed successfully only if the access is to a
cacheable page with write-back write policy or to a UCE page.
fetchadd accesses to
NaTPages cause a Data NaT Page Consumption fault. Accesses to pages with other memory
attributes cause an Unsupported Data Reference fault. When accessing a cacheable page with
write-back write policy, atomic fetch and add operation is ensured by the processor
cache-coherence protocol. For highly contended semaphores, the cache line transactions
required to guarantee atomicity can limit performance. In such cases, a centralized “fetch and
add” semaphore mechanism may improve performance. If supported by the processor and the
platform, the UCE attribute allows the processor to “export” the
platform as an atomic “fetch and add.” Effects of the exported
dependent. If exporting of
fetchadd instruction to a UCE page takes an Unsupported Data Reference fault.
• Flush Cache Instructions –
fetchadd instructions is not supported by the processor, a
fc instructions must always be “broadcast” to other processors,
fetchadd operation to the
fetchadd are platform
independent of the memory attribute in the local processor. It is legal to use an uncacheable
memory attribute for any valid address when used as a flush cache (
fc) instruction target. This
behavior is required to enable transitions from one memory attribute to another and in case
different memory attributes are associated with the address in another processor.
• Prefetch instructions –
lfetch and any implicit prefetches to pages that are not cacheable are
suppressed. No transaction is initiated. This allows programs to issue prefetch instructions
even if the program is not sure the memory is cacheable.
4.4.10Effects of Memory Attributes on Advanced/Check Loads
The ALAT behavior of advanced and check loads is dependent on the memory attribute of the page
referenced by the load. These behaviors are required; advanced and check load completers are not
hints.
All speculative pages have identical behavior with respect to the ALAT. Advanced loads to
speculative pages always allocate an ALAT entry for the register, size, and address tuple specified
by the advanced load. Speculative advanced loads allocate an ALAT entry if the speculative load is
successful (i.e., no deferred exception); if the speculative advanced load results in a deferred
exception, any matching ALAT entry is removed and no new ALA T entry i s allocated. Check loads
with clear completers (
ALAT hit and do not change the state of the ALAT on ALAT miss . Check loads with no-clear
completers (
2:80Volume 2: Addressing and Protection
ld.c.nc, ldf.c.nc) allocate an ALAT entry on ALAT miss. On ALAT hit, the ALAT
ld.c.clr, ld.c.clr.acq, ldf.c.clr) remove a matching ALA T entry on
Loading...
+ hidden pages
You need points to download manuals.
1 point = 1 manual.
You can buy points or you can get point for every manual you upload.