THIS DOCUMENT IS PROVIDED “AS IS” WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF
MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL,
SPECIFICATION OR SAMPLE.
®
Information in this document is provided in connection with Intel
otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of
Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale
and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or
infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life
products. No license, express or implied, by estoppel or
saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel
reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future
changes to them.
®
processors based on the Itanium architecture may contain design defects or errors known as errata which may cause the
Intel
product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained
by calling1-800-548-4725, or by visiting Intel's website at http://www.intel.com.
Intel, Itanium, Pentium, VTune and MMX are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the
United States and other countries.
The Intel® Itanium® architecture is a unique combination of innovative features such
as explicit parallelism, predication, speculation and more. The architecture is designed
to be highly scalable to fill the ever increasing performance requirements of various
server and workstation market segments. The Itanium architecture features a
revolutionary 64-bit instruction set architecture (ISA) which applies a new processor
architecture technology called EPIC, or Explicitly Parallel Instruction Computing. A key
feature of the Itanium architecture is IA-32 instruction set compatibility.
The Intel
comprehensive description of the programming environment, resources, and instruction
set visible to both the application and system programmer. In addition, it also describes
how programmers can take advantage of the features of the Itanium architecture to
help them optimize code.
®
Itanium® Architecture Software Developer’s Manual provides a
1.1Overview of Volume 1: Application Architecture
This volume defines the Itanium application architecture, including application level
resources, programming environment, and the IA-32 application interface. This volume
also describes optimization techniques used to generate high performance software.
1.1.1Part 1: Application Architecture Guide
Chapter 1, “About this Manual” provides an overview of all volumes in the Intel®
Itanium
Chapter 2, “Introduction to the Intel
the architecture.
Chapter 3, “Execution Environment” describes the Itanium register set used by
applications and the memory organization models.
®
Architecture Software Developer’s Manual.
®
Itanium® Architecture” provides an overview of
Chapter 4, “Application Programming Model” gives an overview of the behavior of
Itanium application instructions (grouped into related functions).
Chapter 5, “Floating-point Programming Model” describes the Itanium floating-point
architecture (including integer multiply).
Chapter 6, “IA-32 Application Execution Model in an Intel
Environment” describes the operation of IA-32 instructions within the Itanium System
Environment from the perspective of an application programmer.
®
Itanium® System
1.1.2Part 2: Optimization Guide for the Intel® Itanium®
Architecture
Chapter 1, “About the Optimization Guide” gives an overview of the optimization guide.
Volume 4: About this Manual4:1
Chapter 2, “Introduction to Programming for the Intel® Itanium® Architecture”
provides an overview of the application programming environment for the Itanium
architecture.
Chapter 3, “Memory Reference” discusses features and optimizations related to control
and data speculation.
Chapter 4, “Predication, Control Flow, and Instruction Stream” describes optimization
features related to predication, control flow, and branch hints.
Chapter 5, “Software Pipelining and Loop Support” provides a detailed discussion on
optimizing loops through use of software pipelining.
Chapter 6, “Floating-point Applications” discusses current performance limitations in
floating-point applications and features that address these limitations.
1.2Overview of Volume 2: System Architecture
This volume defines the Itanium system architecture, including system level resources
and programming state, interrupt model, and processor firmware interface. This
volume also provides a useful system programmer's guide for writing high performance
system software.
1.2.1Part 1: System Architecture Guide
Chapter 1, “About this Manual” provides an overview of all volumes in the Intel®
Itanium
Chapter 2, “Intel
designed to support execution of Itanium architecture-based operating systems running
IA-32 or Itanium architecture-based applications.
Chapter 3, “System State and Programming Model” describes the Itanium architectural
state which is visible only to an operating system.
Chapter 4, “Addressing and Protection” defines the resources available to the operating
system for virtual to physical address translation, virtual aliasing, physical addressing,
and memory ordering.
Chapter 5, “Interruptions” describes all interruptions that can be generated by a
processor based on the Itanium architecture.
Chapter 6, “Register Stack Engine” describes the architectural mechanism which
automatically saves and restores the stacked subset (GR32 – GR 127) of the general
register file.
Chapter 7, “Debugging and Performance Monitoring” is an overview of the performance
monitoring and debugging resources that are available in the Itanium architecture.
Chapter 8, “Interruption Vector Descriptions” lists all interruption vectors.
®
Architecture Software Developer’s Manual.
®
Itanium® System Environment” introduces the environment
and intercepts that can occur during IA-32 instruction set execution in the Itanium
System Environment.
Chapter 10, “Itanium
®
Architecture-based Operating System Interaction Model with
IA-32 Applications” defines the operation of IA-32 instructions within the Itanium
System Environment from the perspective of an Itanium architecture-based operating
system.
Chapter 11, “Processor Abstraction Layer” describes the firmware layer which abstracts
processor implementation-dependent features.
1.2.2Part 2: System Programmer’s Guide
Chapter 1, “About the System Programmer’s Guide” gives an introduction to the second
section of the system architecture guide.
Chapter 2, “MP Coherence and Synchronization” describes multiprocessing
synchronization primitives and the Itanium memory ordering model.
Chapter 3, “Interruptions and Serialization” describes how the processor serializes
execution around interruptions and what state is preserved and made available to
low-level system code when interruptions are taken.
Chapter 4, “Context Management” describes how operating systems need to preserve
Itanium register contents and state. This chapter also describes system architecture
mechanisms that allow an operating system to reduce the number of registers that
need to be spilled/filled on interruptions, system calls, and context switches.
Chapter 5, “Memory Management” introduces various memory management strategies.
Chapter 6, “Runtime Support for Control and Data Speculation” describes the operating
system support that is required for control and data speculation.
Chapter 7, “Instruction Emulation and Other Fault Handlers” describes a variety of
instruction emulation handlers that Itanium architecture-based operating systems are
expected to support.
Chapter 8, “Floating-point System Software” discusses how processors based on the
Itanium architecture handle floating-point numeric exceptions and how the software
stack provides complete IEEE-754 compliance.
Chapter 9, “IA-32 Application Support” describes the support an Itanium
architecture-based operating system needs to provide to host IA-32 applications.
Chapter 10, “External Interrupt Architecture” describes the external interrupt
architecture with a focus on how external asynchronous interrupt handling can be
controlled by software.
Chapter 11, “I/O Architecture” describes the I/O architecture with a focus on platform
issues and support for the existing IA-32 I/O port space.
Volume 4: About this Manual4:3
Chapter 12, “Performance Monitoring Support” describes the performance monitor
architecture with a focus on what kind of support is needed from Itanium
architecture-based operating systems.
Chapter 13, “Firmware Overview” introduces the firmware model, and how various
firmware layers (PAL, SAL, UEFI, ACPI) work together to enable processor and system
initialization, and operating system boot.
1.2.3Appendices
Appendix A, “Code Examples” provides OS boot flow sample code.
1.3Overview of Volume 3: Intel® Itanium®
Instruction Set Reference
This volume is a comprehensive reference to the Itanium instruction set, including
instruction format/encoding.
Chapter 1, “About this Manual” provides an overview of all volumes in the Intel
Itanium
Chapter 2, “Instruction Reference” provides a detailed description of all Itanium
instructions, organized in alphabetical order by assembly language mnemonic.
Chapter 3, “Pseudo-Code Functions” provides a table of pseudo-code functions which
are used to define the behavior of the Itanium instructions.
Chapter 4, “Instruction Formats” describes the encoding and instruction format
instructions.
Chapter 5, “Resource and Dependency Semantics” summarizes the dependency rules
that are applicable when generating code for processors based on the Itanium
architecture.
®
Architecture Software Developer’s Manual.
1.4Overview of Volume 4: IA-32 Instruction Set
Reference
This volume is a comprehensive reference to the IA-32 instruction set, including
instruction format/encoding.
Chapter 1, “About this Manual” provides an overview of all volumes in the Intel
Itanium
®
Architecture Software Developer’s Manual.
®
®
Chapter 2, “Base IA-32 Instruction Reference” provides a detailed description of all
base IA-32 instructions, organized in alphabetical order by assembly language
mnemonic.
description of all IA-32 Intel
performance of multimedia intensive applications. Organized in alphabetical order by
assembly language mnemonic.
Chapter 4, “IA-32 SSE Instruction Reference” provides a detailed description of all
IA-32 SSE instructions designed to increase performance of multimedia intensive
applications, and is organized in alphabetical order by assembly language mnemonic.
1.5Terminology
The following definitions are for terms related to the Itanium architecture and will be
used throughout this document:
Instruction Set Architecture (ISA) – Defines application and system level
resources. These resources include instructions and registers.
Itanium Architecture – The new ISA with 64-bit instruction capabilities, new
performance- enhancing features, and support for the IA-32 instruction set.
IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the
®
Intel
Itanium System Environment – The operating system environment that supports
the execution of both IA-32 and Itanium architecture-based code.
64 and IA-32 Architectures Software Developer’s Manual.
®
MMX™ technology instructions designed to increase
IA-32 System Environment – The operating system privileged environment and
resources as defined by the Intel Architecture Software Developer’s Manual. Resources
include virtual paging, control registers, debugging, performance monitoring, machine
checks, and the set of privileged instructions.
Itanium
and System Abstraction Layer (SAL).
Processor Abstraction Layer (PAL) – The firmware layer which abstracts processor
features that are implementation dependent.
System Abstraction Layer (SAL) – The firmware layer which abstracts system
features that are implementation dependent.
®
Architecture-based Firmware – The Processor Abstraction Layer (PAL)
1.6Related Documents
The following documents can be downloaded at the Intel’s Developer Site at
http://developer.intel.com:
• Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual for Software Development and Optimization– Document number 308065
provides model-specific information about the dual-core Itanium processors.
• Intel
®
Itanium® 2 Processor Reference Manual for Software Development
and Optimization – This document (Document number 251110) describes
Volume 4: About this Manual4:5
model-specific architectural features incorporated into the Intel® Itanium® 2
processor, the second processor based on the Itanium architecture.
• Intel
®
Itanium® Processor Reference Manual for Software Development –
This document (Document number 245320) describes model-specific architectural
features incorporated into the Intel
®
Itanium® processor, the first processor based
on the Itanium architecture.
• Intel
®
64 and IA-32 Architectures Software Developer’s Manual – This set
of manuals describes the Intel 32-bit architecture. They are available from the Intel
Literature Department by calling 1-800-548-4725 and requesting Document
Numbers 243190, 243191and 243192.
• Intel
®
Itanium® Software Conventions and Runtime Architecture Guide –
This document (Document number 245358) defines general information necessary
to compile, link, and execute a program on an Itanium architecture-based
operating system.
• Intel
®
Itanium® Processor Family System Abstraction Layer Specification –
This document (Document number 245359) specifies requirements to develop
platform firmware for Itanium architecture-based systems.
The following document can be downloaded at the Unified EFI Forum website at
http://www.uefi.org:
• Unified Extensible Firmware Interface Specification – This document defines
a new model for the interface between operating systems and platform firmware.
1.7Revision History
Date of
Revision
March 20102.3Added information about illegal virtualization optimization combinations and
Revision
Number
IIPA requirements.
Added Resource Utilization Counter and PAL_VP_INFO.
PAL_VP_INIT and VPD.vpr changes.
New PAL_VPS_RESUME_HANDLER parameter to indicate RSE Current
Frame Load Enable setting at the target instruction.
PAL_VP_INIT_ENV implementation-specific configuration option.
Minimum Virtual address increased to 54 bits.
New PAL_MC_ERROR_INFO health indicator.
New PAL_MC_ERROR_INJECT implementation-specific bit fields.
MOV-to_SR.L reserved field checking.
Added virtual machine disable.
Added variable frequency mode additions to ACPI P-state description.
Removed pal_proc_vector argument from PAL_VP_SAVE and
PAL_VP_RESTORE.
Added PAL_PROC_SET_FEATURES data speculation disable.
Added Interruption Instruction Bundle registers.
Min-state save area size change.
PAL_MC_DYNAMIC_STATE changes.
PAL_PROC_SET_FEATURES data poisoning promotion changes.
ACPI P-state clarifications.
Synchronization requirements for virtualization opcode optimization.
New priority hint and multi-threading hint recommendations.
Description
4:6Volume 4: About this Manual
Date of
Revision
August 20052.2Allow register fields in CR.LID register to be read-only and CR.LID checking
Revision
Number
Description
on interruption messages by processors optional. See Vol 2, Part I, Ch 5
“Interruptions” and Section 11.2.2 PALE_RESET Exit State for details.
Relaxed reserved and ignored fields checkings in IA-32 application registers
in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.
Introduced visibility constraints between stores and local purges to ensure
TLB consistency for UP VHPT update and local purge scenarios. See Vol 2,
Part I, Ch 4 and description of
Architecture extensions for processor Power/Performance states (P-states).
See Vol 2 PAL Chapter for details.
Introduced Unimplemented Instruction Address fault.
Relaxed ordering constraints for VHPT walks. See Vol 2, Part I, Ch 4 and 5 for
details.
Architecture extensions for processor virtualization.
All instructions which must be last in an instruction group results in undefined
behavior when this rule is violated.
Added architectural sequence that guarantees increasing ITC and PMD
values on successive reads.
Addition of PAL_BRAND_INFO, PAL_GET_HW_POLICY,
PAL_MC_ERROR_INJECT, PAL_MEMORY_BUFFER,
PAL_SET_HW_POLICY and PAL_SHUTDOWN procedures.
Allows IPI-redirection feature to be optional.
Undefined behavior for 1-byte accesses to the non-architected regions in the
IPI block.
Modified insertion behavior for TR overlaps. See Vol 2, Part I, Ch 4 for details.
“Bus parking” feature is now optional for PAL_BUS_GET_FEATURES.
Introduced low-power synchronization primitive using
FR32-127 is now preserved in PAL calling convention.
New return value from PAL_VM_SUMMARY procedure to indicate the
number of multiple concurrent outstanding TLB purges.
Performance Monitor Data (PMD) registers are no longer sign-extended.
New memory attribute transition sequence for memory on-line delete. See Vol
2, Part I, Ch 4 for details.
Added 'shared error' (se) bit to the Processor State Parameter (PSP) in
PAL_MC_ERROR_INFO procedure.
Clarified PMU interrupts as edge-triggered.
Modified ‘proc_number’ parameter in PAL_LOGICAL_TO_PHYSICAL
procedure.
Modified pal_copy_info alignment requirements.
New bit in PAL_PROC_GET_FEATURES for variable P-state performance.
Clarified descriptions for check_target_register and
check_target_register_sof.
Various fixes in dependency tables in Vol 3 Ch 5.
Clarified effect of sending IPIs to non-existent processor in Vol 2, Part I, Ch 5.
Clarified instruction serialization requirements for interruptions in Vol 2, Part II,
Ch 3.
Updated performance monitor context switch routine in Vol 2, Part I, Ch 7.
ptc.l instruction in Vol 3 for details.
hint instruction.
Volume 4: About this Manual4:7
Date of
Revision
Revision
Number
Description
August 20022.1Added Predicate Behavior of alloc Instruction Clarification (Section 4.1.2,
Part I, Volume 1; Section 2.2, Part I, Volume 3).
Added New fc.i Instruction (Section 4.4.6.1, and 4.4.6.2, Part I, Volume 1;
Section 4.3.3, 4.4.1, 4.4.5, 4.4.6, 4.4.7, 5.5.2, and 7.1.2, Part I, Volume 2;
Section 2.5, 2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Volume 2; Section 2.2, 3,
4.1, 4.4.6.5, and 4.4.10.10, Part I, Volume 3).
Added Interval Time Counter (ITC) Fault Clarification (Section 3.3.2, Part I,
Volume 2).
Added Interruption Control Registers Clarification (Section 3.3.5, Part I,
Volume 2).
Added Spontaneous NaT Generation on Speculative Load (ld.s)
(Section 5.5.5 and 11.9, Part I, Volume 2; Section 2.2 and 3, Part I, Volume 3).
Added Performance Counter Standardization (Sections 7.2.3 and 11.6, Part I,
Volume 2).
Added Freeze Bit Functionality in Context Switching and Interrupt Generation
Clarification (Sections 7.2.1, 7.2.2, 7.2.4.1, and 7.2.4.2, Part I, Volume 2)
Added IA_32_Exception (Debug) IIPA Description Change (Section 9.2, Part
I, Volume 2).
Added capability for Allowing Multiple PAL_A_SPEC and PAL_B Entries in the
Firmware Interface Table (Section 11.1.6, Part I, Volume 2).
Added BR1 to Min-state Save Area (Sections 11.3.2.3 and 11.3.3, Part I,
references (Section 4.4.6).
PAL memory accesses and restrictions clarification (Section 11.9).
PSP validity on INITs from PAL_MC_ERROR_INFO clarification (Section
Volume 3:
IA-32 CPUID clarification (p. 5-71).
Revised figures for extract, deposit, and alloc instructions (Section 2.2).
RCPPS, RCPSS, RSQRTPS, and RSQRTSS clarification (Section 7.12).
IA-32 related changes (Section 5.3).
tak, tpa change (Section 2.2).
July 20001.1Volume 1:
Processor Serial Number feature removed (Chapter 3).
Clarification on exceptions to instruction dependency (Section 3.4.3).
Description
Volume 4: About this Manual4:9
Date of
Revision
January 20001.0Initial release of document.
Revision
Number
Volume 2:
Clarifications regarding “reserved” fields in ITIR (Chapter 3).
Instruction and Data translation must be enabled for executing IA-32
instructions (Chapters 3,4 and 10).
FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI
(Chapters 3 and 4).
Clarification regarding ordering data dependency.
Out-of-order IPI delivery is now allowed (Chapters 4 and 5).
Content of EFLAG field changed in IIM (p. 9-24).
PAL_CHECK and PAL_INIT calls – exit state changes (Chapter 11).
PAL_CHECK processor state parameter changes (Chapter 11).
PAL_BUS_GET/SET_FEATURES calls – added two new bits (Chapter 11).
PAL_MC_ERROR_INFO call – Changes made to enhance and simplify the
call to provide more information regarding machine check (Chapter 11).
PAL_ENTER_IA_32_Env call changes – entry parameter represents the entry
order; SAL needs to initialize all the IA-32 registers properly before making
this call (Chapter 11).
PAL_CACHE_FLUSH – added a new cache_type argument (Chapter 11).
PAL_SHUTDOWN – removed from list of PAL calls (Chapter 11).
Clarified memory ordering changes (Chapter 13).
Clarification in dependence violation table (Appendix A).
Volume 3:
fmix instruction page figures corrected (Chapter 2).
Clarification of “reserved” fields in ITIR (Chapters 2 and 3).
Modified conditions for alloc/loadrs/flushrs instruction placement in bundle/
instruction group (Chapters 2 and 4).
IA-32 JMPE instruction page typo fix (p. 5-238).
Processor Serial Number feature removed (Chapter 5).
Description
§
4:10Volume 4: About this Manual
Base IA-32 Instruction Reference2
This section lists all IA-32 instructions and their behavior in the Itanium System
Environment and IA-32 System Environments on an processor based on the Itanium
architecture. Unless noted otherwise all IA-32 and MMX technology and SSE
instructions operate as defined in the IntelDeveloper’s Manual.
This volume describes the complete IA-32 Architecture instruction set, including the
integer, floating-point, MMX technology and SSE technology, and system instructions.
The instruction descriptions are arranged in alphabetical order. For each instruction, the
forms are given for each operand combination, including the opcode, operands
required, and a description. Also given for each instruction are a description of the
instruction and its operands, an operational description, a description of the effect of
the instructions on flags in the EFLAGS register, and a summary of the exceptions that
can be generated.
For all IA-32 the following relationships hold:
• Writes – Writes of any IA-32 general purpose, floating-point or SSE, MMX
technology registers by IA-32 instructions are reflected in the Itanium registers
defined to hold that IA-32 state when IA-32 instruction set completes execution.
• Reads – Reads of any IA-32 general purpose, floating-point or SSE, MMX
technology registers by IA-32 instructions see the state of the Itanium registers
defined to hold the IA-32 state after entering the IA-32 instruction set.
• State mappings – IA-32 numeric instructions are controlled by and reflect their
status in FCW, FSW, FTW, FCS, FIP, FOP, FDS and FEA. On exit from the IA-32
instruction set, Itanium numeric status and control resources defined to hold IA-32
state reflect the results of all IA-32 prior numeric instructions in FCR, FSR, FIR and
FDR. Itanium numeric status and control resources defined to hold IA-32 state are
honored by IA-32 numeric instructions when entering the IA-32 instruction set.
®
64 and IA-32 Architectures Software
2.1Additional Intel® Itanium® Faults
The following fault behavior is defined for all IA-32 instructions in the Itanium System
Environment:
• IA-32 Faults – All IA-32 faults are performed as defined in the Intel
IA-32 Architectures Software Developer’s Manual, unless otherwise noted.
IA-32 faults are delivered on the IA_32_Exception interruption vector.
• IA-32 GPFault – Null segments are signified by the segment descriptor register’s
P-bit being set to zero. IA-32 memory references through DSD, ESD, FSD, and GSD
with the P-bit set to zero result in an IA-32 GPFault.
• Itanium Low FP Reg Fault – If PSR.dfl is 1, execution of any IA-32 MMX
technology, SSE or floating-point instructions results in a Disabled FP Register fault
(regardless of whether FR2-31 is referenced).
• Itanium High FP Reg Fault – If PSR.dfh is 1, execution of the first target IA-32
instruction following an br.ia or rfi results in a Disabled FP Register fault
(regardless of whether FR32-127 is referenced).
Volume 4: Base IA-32 Instruction Reference4:11
®
64 and
• Itanium Instruction Mem Faults – The following additional Itanium memory
faults can be generated on each virtual page referenced when fetching IA-32 or
MMX technology or SSE instructions for execution:
• Alternative instruction TLB fault
• VHPT instruction fault
• Instruction TLB fault
• Instruction Page Not Present fault
• Instruction NaT Page Consumption Abort
• Instruction Key Miss fault
• Instruction Key Permission fault
• Instruction Access Rights fault
• Instruction Access Bit fault
• Itanium Data Mem Faults – The following additional Itanium memory faults can
be generated on each virtual page touched when reading or writing memory
operands from the IA-32 instruction set including MMX technology and SSE
instructions:
•Nested TLB fault
• Alternative data TLB fault
•VHPT data fault
• Data TLB fault
• Data Page Not Present fault
• Data NaT Page Consumption Abort
• Data Key Miss fault
• Data Key Permission fault
• Data Access Rights fault
• Data Dirty bit fault
• Data Access bit fault
2.2Interpreting the IA-32 Instruction Reference
Pages
This section describes the information contained in the various sections of the
instruction reference pages that make up the majority of this chapter. It also explains
the notational conventions and abbreviations used in these sections.
2.2.1IA-32 Instruction Format
The following is an example of the format used for each Intel architecture instruction
description in this chapter.
2.2.1.0.0.1CMC—Complement Carry Flag
OpcodeInstructionDescription
F5CMCComplement carry flag
4:12Volume 4: Base IA-32 Instruction Reference
2.2.1.1Opcode Column
The “Opcode” column gives the complete object code produced for each form of the
instruction. When possible, the codes are given as hexadecimal bytes, in the same
order in which they appear in memory. Definitions of entries other than hexadecimal
bytes are as follows:
• /digit – A digit between 0 and 7 indicates that the ModR/M byte of the instruction
uses only the r/m (register or memory) operand. The reg field contains the digit
that provides an extension to the instruction's opcode.
• /r – Indicates that the ModR/M byte of the instruction contains both a register
operand and an r/m operand.
• cb, cw, cd, cp – A 1-byte (cb), 2-byte (cw), 4-byte (cd), or 6-byte (cp) value
following the opcode that is used to specify a code offset and possibly a new value
for the code segment register.
• ib, iw, id – A 1-byte (ib), 2-byte (iw), or 4-byte (id) immediate operand to the
instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The
opcode determines if the operand is a signed value. All words and doublewords are
given with the low-order byte first.
• +rb, +rw, +rd – A register code, from 0 through 7, added to the hexadecimal byte
given at the left of the plus sign to form a single opcode byte. The register codes
are given in Tab l e 2 - 1.
• +i – A number used in floating-point instructions when one of the operands is ST(i)
from the FPU register stack. The number i (which can range from 0 to 7) is added to
the hexadecimal byte given at the left of the plus sign to form a single opcode byte.
Table 2-1.Register Encodings Associated with the +rb, +rw, and +rd
Nomenclature
rbrwrd
AL= 0AX= 0EAX= 0
CL= 1CX= 1ECX= 1
DL= 2DX= 2EDX= 2
BL= 3BX= 3EBX= 3
rbrwrd
AH= 4SP= 4ESP= 4
CH= 5BP= 5EBP= 5
DH= 6SI= 6ESI= 6
BH= 7DI= 7EDI= 7
2.2.1.2Instruction Column
The “Instruction” column gives the syntax of the instruction statement as it would
appear in an ASM386 program. The following is a list of the symbols used to represent
operands in the instruction statements:
• rel8 – A relative address in the range from 128 bytes before the end of the
instruction to 127 bytes after the end of the instruction.
• rel16 and rel32 – A relative address within the same code segment as the
instruction assembled. The rel16 symbol applies to instructions with an
operand-size attribute of 16 bits; the rel32 symbol applies to instructions with an
operand-size attribute of 32 bits.
Volume 4: Base IA-32 Instruction Reference4:13
• ptr16:16 and ptr16:32 – A far pointer, typically in a code segment different from
that of the instruction. The notation 16:16 indicates that the value of the pointer
has two parts. The value to the left of the colon is a 16-bit selector or value
destined for the code segment register. The value to the right corresponds to the
offset within the destination segment. The ptr16:16 symbol is used when the
instruction's operand-size attribute is 16 bits; the ptr16:32 symbol is used when
the operand-size attribute is 32 bits.
• r8 – One of the byte general-purpose registers AL, CL, DL, BL, AH, CH, DH, or BH.
• r16 – One of the word general-purpose registers AX, CX, DX, BX, SP, BP, SI, or DI.
• r32 – One of the doubleword general-purpose registers EAX, ECX, EDX, EBX, ESP,
EBP, ESI, or EDI.
• imm8 – An immediate byte value. The imm8 symbol is a signed number between –
128 and +127 inclusive. For instructions in which imm8 is combined with a word or
doubleword operand, the immediate value is sign-extended to form a word or
doubleword. The upper byte of the word is filled with the topmost bit of the
immediate value.
• imm16 – An immediate word value used for instructions whose operand-size
attribute is 16 bits. This is a number between –32,768 and +32,767 inclusive.
• imm32 – An immediate doubleword value used for instructions whose
operand-size attribute is 32 bits. It allows the use of a number between
+2,147,483,647 and -2,147,483,648 inclusive.
• r/m8 – A byte operand that is either the contents of a byte general-purpose
register (AL, BL, CL, DL, AH, BH, CH, and DH), or a byte from memory.
• r/m16 – A word general-purpose register or memory operand used for instructions
whose operand-size attribute is 16 bits. The word general-purpose registers are:
AX, BX, CX, DX, SP, BP, SI, and DI. The contents of memory are found at the
address provided by the effective address computation.
• r/m32 – A doubleword general-purpose register or memory operand used for
instructions whose operand-size attribute is 32 bits. The doubleword
general-purpose registers are: EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI. The
contents of memory are found at the address provided by the effective address
computation.
• m – A 16- or 32-bit operand in memory.
• m8 – A byte operand in memory, usually expressed as a variable or array name,
but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used
only with the string instructions and the XLAT instruction.
• m16 – A word operand in memory, usually expressed as a variable or array name,
but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used
only with the string instructions.
• m32 – A doubleword operand in memory, usually expressed as a variable or array
name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is
used only with the string instructions.
• m64 – A memory quadword operand in memory. This nomenclature is used only
with the CMPXCHG8B instruction.
• m16:16, m16:32 – A memory operand containing a far pointer composed of two
numbers. The number to the left of the colon corresponds to the pointer's segment
selector. The number to the right corresponds to its offset.
• m16&32, m16&16, m32&32 – A memory operand consisting of data item pairs
whose sizes are indicated on the left and the right side of the ampersand. All
4:14Volume 4: Base IA-32 Instruction Reference
memory addressing modes are allowed. The m16&16 and m32&32 operands are
used by the BOUND instruction to provide an operand containing an upper and
lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to
provide a word with which to load the limit field, and a doubleword with which to
load the base field of the corresponding GDTR and IDTR registers.
• moffs8, moffs16, moffs32 – A simple memory variable (memory offset) of type
byte, word, or doubleword used by some variants of the MOV instruction. The
actual address is given by a simple offset relative to the segment base. No ModR/M
byte is used in the instruction. The number shown with moffs indicates its size,
which is determined by the address-size attribute of the instruction.
• Sreg – A segment register. The segment register bit assignments are ES=0, CS=1,
SS=2, DS=3, FS=4, and GS=5.
• m32real, m64real, m80real – A single-, double-, and extended-real
(respectively) floating-point operand in memory.
• m16int, m32int, m64int – A word-, short-, and long-integer (respectively)
floating-point operand in memory.
• ST or ST(0) – The top element of the FPU register stack.
• ST(i) – The i
• mm – An MMX technology register. The 64-bit MMX technology registers are: MM0
through MM7.
• mm/m32 – The low order 32 bits of an MMX technology register or a 32-bit
memory operand. The 64-bit MMX technology registers are: MM0 through MM7.
The contents of memory are found at the address provided by the effective address
computation.
• mm/m64 – An MMX technology register or a 64-bit memory operand. The 64-bit
MMX technology registers are: MM0 through MM7. The contents of memory are
found at the address provided by the effective address computation.
th
element from the top of the FPU register stack. (i = 0 through 7).
2.2.1.3Description Column
The “Description” column following the “Instruction” column briefly explains the various
forms of the instruction. The following “Description” and “Operation” sections contain
more details of the instruction's operation.
2.2.1.4Description
The “Description” section describes the purpose of the instructions and the required
operands. It also discusses the effect of the instruction on flags.
2.2.2Operation
The “Operation” section contains an algorithmic description (written in pseudo-code) of
the instruction. The pseudo-code uses a notation similar to the Algol or Pascal
language. The algorithms are composed of the following elements:
• Comments are enclosed within the symbol pairs “(*” and “*)”.
• Compound statements are enclosed in keywords, such as IF, THEN, ELSE, and FI for
an if statement, DO and OD for a do statement, or CASE... OF and ESAC for a case
statement.
Volume 4: Base IA-32 Instruction Reference4:15
• A register name implies the contents of the register. A register name enclosed in
brackets implies the contents of the location whose address is contained in that
register. For example, ES:[DI] indicates the contents of the location whose ES
segment relative address is in register DI. [SI] indicates the contents of the
address contained in register SI relative to SI’s default segment (DS) or overridden
segment.
• Parentheses around the “E” in a general-purpose register name, such as (E)SI,
indicates that an offset is read from the SI register if the current address-size
attribute is 16 or is read from the ESI register if the address-size attribute is 32.
• Brackets are also used for memory operands, where they mean that the contents of
the memory location is a segment-relative offset. For example, [SRC] indicates that
the contents of the source operand is a segment-relative offset.
•A B; indicates that the value of B is assigned to A.
• The symbols =,
meaning equal, not equal, greater or equal, less or equal, respectively. A relational
expression such as A = B is TRUE if the value of A is equal to B; otherwise it is
FALSE.
• The expression “<< COUNT” and “>> COUNT” indicates that the destination
operand should be shifted left or right, respectively, by the number of bits indicated
by the count operand.
The following identifiers are used in the algorithmic descriptions:
• OperandSize and AddressSize – The OperandSize identifier represents the
operand-size attribute of the instruction, which is either 16 or 32 bits. The
AddressSize identifier represents the address-size attribute, which is either 16 or
32 bits. For example, the following pseudo-code indicates that the operand-size
attribute depends on the form of the CMPS instruction used.
, , and are relational operators used to compare two values,
IF instruction = CMPSW
THEN OperandSize 16;
ELSE
IF instruction = CMPSD
THEN OperandSize 32;
FI;
FI;
See “Operand-Size and Address-Size Attributes” in Chapter 3 of the Intel
Architecture Software Developer’s Manual, Volume 1, for general guidelines on how
these attributes are determined.
• StackAddrSize – Represents the stack address-size attribute associated with the
instruction, which has a value of 16 or 32 bits (see “Address-Size Attribute for
Stack” in Chapter 4 of the Intel Architecture Software Developer’s Manual, Volume
1).
• SRC – Represents the source operand.
• DEST – Represents the destination operand.
The following functions are used in the algorithmic descriptions:
• ZeroExtend(value) – Returns a value zero-extended to the operand-size attribute
of the instruction. For example, if the operand-size attribute is 32, zero extending a
byte value of -10 converts the byte from F6H to a doubleword value of 000000F6H.
If the value passed to the ZeroExtend function and the operand-size attribute are
the same size, ZeroExtend returns the value unaltered.
4:16Volume 4: Base IA-32 Instruction Reference
• SignExtend(value) – Returns a value sign-extended to the operand-size attribute
of the instruction. For example, if the operand-size attribute is 32, sign extending a
byte containing the value -10 converts the byte from F6H to a doubleword value of
FFFFFFF6H. If the value passed to the SignExtend function and the operand-size
attribute are the same size, SignExtend returns the value unaltered.
• SaturateSignedWordToSignedByte – Converts a signed 16-bit value to a signed
8-bit value. If the signed 16-bit value is less than -128, it is represented by the
saturated value -128 (80H); if it is greater than 127, it is represented by the
saturated value 127 (7FH).
• SaturateSignedDwordToSignedWord – Converts a signed 32-bit value to a
signed 16-bit value. If the signed 32-bit value is less than -32768, it is represented
by the saturated value
-32768 (8000H); if it is greater than 32767, it is represented by the saturated
value 32767 (7FFFH).
• SaturateSignedWordToUnsignedByte – Converts a signed 16-bit value to an
unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented
by the saturated value zero (00H); if it is greater than 255, it is represented by the
saturated value 255 (FFH).
• SaturateToSignedByte – Represents the result of an operation as a signed 8-bit
value. If the result is less than -128, it is represented by the saturated value -128
(80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).
• SaturateToSignedWord – Represents the result of an operation as a signed
16-bit value. If the result is less than -32768, it is represented by the saturated
value -32768 (8000H); if it is greater than 32767, it is represented by the
saturated value 32767 (7FFFH).
• SaturateToUnsignedByte – Represents the result of an operation as a signed
8-bit value. If the result is less than zero it is represented by the saturated value
zero (00H); if it is greater than 255, it is represented by the saturated value 255
(FFH).
• SaturateToUnsignedWord – Represents the result of an operation as a signed
16-bit value. If the result is less than zero it is represented by the saturated value
zero (00H); if it is greater than 65535, it is represented by the saturated value
65535 (FFFFH).
• LowOrderWord(DEST * SRC) – Multiplies a word operand by a word operand and
stores the least significant word of the doubleword result in the destination
operand.
• HighOrderWord(DEST * SRC) – Multiplies a word operand by a word operand
and stores the most significant word of the doubleword result in the destination
operand.
• Push(value) – Pushes a value onto the stack. The number of bytes pushed is
determined by the operand-size attribute of the instruction.
• Pop() – Removes the value from the top of the stack and returns it. The statement
EAX Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will
return either a word or a doubleword depending on the operand-size attribute.
• PopRegisterStack – Marks the FPU ST(0) register as empty and increments the
FPU register stack pointer (TOP) by 1.
• Switch-Tasks – Performs a task switch.
• Bit(BitBase, BitOffset) – Returns the value of a bit within a bit string, which is a
sequence of bits in memory or a register. Bits are numbered from low-order to
Volume 4: Base IA-32 Instruction Reference4:17
high-order within registers and within memory bytes. If the base operand is a
02131
BitOffset = 21
0777500
0777500
BitBase +1BitBaseBitBase -1
BitOffset = +13
BitBaseBitBase -1BitBase -2
BitOffset = -11
register, the offset can be in the range 0..31. This offset addresses a bit within the
indicated register. An example, the function Bit[EAX, 21] is illustrated in Figure 2-2.
Figure 2-2.Bit Offset for BIT[EAX,21]
If BitBase is a memory address, BitOffset can range from -2 GBits to 2 GBits. The
addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase +
(BitOffset DIV 8)), where DIV is signed division with rounding towards negative infinity,
and MOD returns a positive number. This operation is illustrated in Figure 2-3.
Figure 2-3.Memory Bit Indexing
2.2.3Flags Affected
The “Flags Affected” section lists the flags in the EFLAGS register that are affected by
the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1.
The arithmetic and logical instructions usually assign values to the status flags in a
uniform manner (see Appendix A, EFLAGS Cross-Reference, in the Intel Architecture Software Developer’s Manual, Volume 1). Non-conventional assignments are described
in the “Operation” section. The values of flags listed as undefined may be changed by
the instruction in an indeterminate manner. Flags that are not listed are unchanged by
the instruction.
2.2.4FPU Flags Affected
The floating-point instructions have an “FPU Flags Affected” section that describes how
4:18Volume 4: Base IA-32 Instruction Reference
each instruction can affect the four condition code flags of the FPU status word.
2.2.5Protected Mode Exceptions
The “Protected Mode Exceptions” section lists the exceptions that can occur when the
instruction is executed in protected mode and the reasons for the exceptions. Each
exception is given a mnemonic that consists of a pound sign (#) followed by two letters
and an optional error code in parentheses. For example, #GP(0) denotes a general
protection exception with an error code of 0. Tab l e 2 - 2 associates each two-letter
mnemonic with the corresponding interrupt vector number and exception name. See
Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developer’s Manual, Volume 3, for a detailed description of the exceptions.
Application programmers should consult the documentation provided with their
operating systems to determine the actions taken when exceptions occur.
2.2.6Real-address Mode Exceptions
The “Real-Address Mode Exceptions” section lists the exceptions that can occur when
the instruction is executed in real-address mode.
Table 2-2.Exception Mnemonics, Names, and Vector Numbers
Vector
No.
a. The UD2 instruction was introduced in the Pentium® Pro processor.
b. This exception was introduced in the Intel® 486 processor.
c. This exception was introduced in the Pentium processor and enhanced in the Pentium Pro processor.
MnemonicNameSource
0#DEDivide ErrorDIV and IDIV instructions.
1#DBDebugAny code or data reference.
3#BPBreakpointINT 3 instruction.
4#OFOverflowINTO instruction.
5#BRBOUND Range ExceededBOUND instruction.
6#UDInvalid Opcode (Undefined Opcode)UD2 instruction or reserved opcode.
7#NMDevice Not Available (No Math
Coprocessor)
8#DFDouble FaultAny instruction that can generate an
10#TSInvalid TSSTask switch or TSS access.
11#NPSegment Not PresentLoading segment registers or accessing
12#SSStack Segment FaultStack operations and SS register loads.
13#GPGeneral ProtectionAny memory reference and other protection
14#PFPage FaultAny memory reference.
16#MFFloating-point Error (Math Fault)Floating-point or WAIT/FWAIT instruction.
17#ACAlignment CheckAny data reference in memory.
18#MCMachine CheckModel dependent.
Floating-point or WAIT/FWAIT instruction.
exception, an NMI, or an INTR.
system segments.
checks.
c
a
b
2.2.7Virtual-8086 Mode Exceptions
The “Virtual-8086 Mode Exceptions” section lists the exceptions that can occur when
the instruction is executed in virtual-8086 mode.
Volume 4: Base IA-32 Instruction Reference4:19
2.2.8Floating-point Exceptions
The “Floating-point Exceptions” section lists additional exceptions that can occur when
a floating-point instruction is executed in any mode. All of these exception conditions
result in a floating-point error exception (#MF, vector number 16) being generated.
Tab le 2 -3 associates each one- or two-letter mnemonic with the corresponding
exception name. See “Floating-Point Exception Conditions” in Chapter 7 of the Intel Architecture Software Developer’s Manual, Volume 1, for a detailed description of these
exceptions.
Table 2-3.Floating-point Exception Mnemonics and Names
16#PFloating-point inexact result (precision)Inexact result (precision)
MnemonicNameSource
Floating-point invalid operation:
#IS
#IA
- Stack overflow or underflow
- Invalid arithmetic operation
- FPU stack overflow or underflow
- Invalid FPU arithmetic operation
number
2.3IA-32 Base Instruction Reference
The remainder of this chapter provides detailed descriptions of each of the Intel
architecture instructions.
4:20Volume 4: Base IA-32 Instruction Reference
AAA—ASCII Adjust After Addition
OpcodeInstructionDescription
37AAAASCII adjust AL after addition
Description
Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL
register is the implied source and destination operand for this instruction. The AAA
instruction is only useful when it follows an ADD instruction that adds (binary addition)
two unpacked BCD values and stores a byte result in the AL register. The AAA
instruction then adjusts the contents of the AL register to contain the correct 1-digit
unpacked BCD result.
If the addition produces a decimal carry, the AH register is incremented by 1, and the
CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared
and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are
cleared to 0.
Operation
IF ((AL AND FH) > 9) OR (AF = 1)
THEN
AL (AL + 6);
AH AH + 1;
AF 1;
CF 1;
ELSE
AF 0;
CF 0;
FI;
AL AL AND FH;
Flags Affected
The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise
they are cleared to 0. The OF, SF, ZF, and PF flags are undefined.
Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the
most-significant digit in the AH register) so that a division operation performed on the
result will yield a correct unpacked BCD value. The AAD instruction is only useful when
it precedes a DIV instruction that divides (binary division) the adjusted value in the AL
register by an unpacked BCD value.
The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then
clears the AH register to 00H. The value in the AX register is then equal to the binary
equivalent of the original unpacked two-digit number in registers AH and AL.
Operation
tempAL AL;
tempAH AH;
AL (tempAL + (tempAH imm8)) AND FFH;
AH 0
The immediate value (imm8) is taken from the second byte of the instruction, which
under normal assembly is 0AH (10 decimal). However, this immediate value can be
changed to produce a different result.
Flags Affected
The SF, ZF, and PF flags are set according to the result; the OF, AF, and CF flags are
undefined.
Adjusts the result of the multiplication of two unpacked BCD values to create a pair of
unpacked BCD values. The AX register is the implied source and destination operand for
this instruction. The AAM instruction is only useful when it follows an MUL instruction
that multiplies (binary multiplication) two unpacked BCD values and stores a word
result in the AX register. The AAM instruction then adjusts the contents of the AX
register to contain the correct 2-digit unpacked BCD result.
Operation
tempAL AL;
AH tempAL / imm8;
AL tempAL MOD imm8;
The immediate value (imm8) is taken from the second byte of the instruction, which
under normal assembly is 0AH (10 decimal). However, this immediate value can be
changed to produce a different result.
Flags Affected
The SF, ZF, and PF flags are set according to the result. The OF, AF, and CF flags are
undefined.