Intel ITANIUM ARCHITECTURE User Manual

Intel® Itanium® Architecture Software Developer’s Manual
Volume 4: IA-32 Instruction Set Reference
May 2010
Document Number: 323208
THIS DOCUMENT IS PROVIDED “AS IS” WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.
®
Information in this document is provided in connection with Intel otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life
products. No license, express or implied, by estoppel or
saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without notice.
Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
®
processors based on the Itanium architecture may contain design defects or errors known as errata which may cause the
Intel product to deviate from published specifications. Current characterized errata are available on request.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling1-800-548-4725, or by visiting Intel's website at http://www.intel.com.
Intel, Itanium, Pentium, VTune and MMX are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
Copyright © 1999-2010, Intel Corporation
*Other names and brands may be claimed as the property of others.
Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3 398
Contents
1 About this Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:1
1.1 Overview of Volume 1: Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:1
1.1.1 Part 1: Application Architecture Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:1
1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture . . . . . . . . 4:1
1.2 Overview of Volume 2: System Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:2
1.2.1 Part 1: System Architecture Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:2
1.2.2 Part 2: System Programmer’s Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:3
1.2.3 Appendices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:4
1.3 Overview of Volume 3: Intel® Itanium® Instruction Set Reference . . . . . . . . . . . . . . 4:4
1.4 Overview of Volume 4: IA-32 Instruction Set Reference. . . . . . . . . . . . . . . . . . . . . . . 4:4
1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:5
1.6 Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:5
1.7 Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:6
2 Base IA-32 Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:11
2.1 Additional Intel
2.2 Interpreting the IA-32 Instruction Reference Pages . . . . . . . . . . . . . . . . . . . . . . . . . 4:12
2.2.1 IA-32 Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:12
2.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:15
2.2.3 Flags Affected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18
2.2.4 FPU Flags Affected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18
2.2.5 Protected Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:19
2.2.6 Real-address Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:19
2.2.7 Virtual-8086 Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:19
2.3 IA-32 Base Instruction Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:20
3IA-32 Intel
2.2.8 Floating-point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:20
®
MMX™ Technology Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . 4:399
®
Itanium® Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:11
4 IA-32 SSE Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:463
4.1 IA-32 SSE Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:463
4.2 About the Intel
4.3 Single Instruction Multiple Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:464
4.4 New Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:464
4.5 SSE Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:465
4.6 Extended Instruction Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:465
4.6.1 Instruction Group Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:466
4.7 IEEE Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:474
4.7.1 Real Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:474
4.7.2 Operating on NaNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:480
4.8 Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481
4.8.1 Memory Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481
4.8.2 SSE Register Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481
4.9 Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:483
4.10 Instruction Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:483
4.11 Reserved Behavior and Software Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:484
4.12 Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:484
4.13 SIMD Integer Instruction Set Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:562
4.14 Cacheability Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:575
®
SSE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:463
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:583
Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3 399
Figures
2-2 Bit Offset for BIT[EAX,21]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18
2-3 Memory Bit Indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18
2-4 Version Information in Registers EAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:79
3-1 Operation of the MOVD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:401
3-2 Operation of the MOVQ Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:403
3-3 Operation of the PACKSSDW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:405
3-4 Operation of the PACKUSWB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:408
3-5 Operation of the PADDW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:410
3-6 Operation of the PADDSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:413
3-7 Operation of the PADDUSB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:416
3-8 Operation of the PAND Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:419
3-9 Operation of the PANDN Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:421
3-10 Operation of the PCMPEQW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:423
3-11 Operation of the PCMPGTW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:426
3-12 Operation of the PMADDWD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:429
3-13 Operation of the PMULHW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:431
3-14 Operation of the PMULLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:433
3-15 Operation of the POR Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:435
3-16 Operation of the PSLLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:437
3-17 Operation of the PSRAW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:440
3-18 Operation of the PSRLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:443
3-19 Operation of the PSUBW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:446
3-20 Operation of the PSUBSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:449
3-21 Operation of the PSUBUSB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:452
3-22 High-order Unpacking and Interleaving of Bytes with the PUNPCKHBW Instruction. . . . . . 4:455
3-23 Low-order Unpacking and Interleaving of Bytes with the PUNPCKLBW Instruction . . . . . . 4:458
3-24 Operation of the PXOR Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:461
4-1 Packed Single-FP Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:464
4-2 SSE Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:465
4-3 Packed Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:466
4-4 Scalar Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:466
4-5 Packed Shuffle Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:468
4-6 Unpack High Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:469
4-7 Unpack Low Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:469
4-8 Binary Real Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:475
4-9 Binary Floating-point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:476
4-10 Real Numbers and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:478
4-11 Four Packed FP Data in Memory (at address 1000H) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481
Tables
2-1 Register Encodings Associated with the +rb, +rw, and +rd Nomenclature . . . . . . . . . .4:13
2-2 Exception Mnemonics, Names, and Vector Numbers . . . . . . . . . . . . . . . . . . . . .4:19
2-3 Floating-point Exception Mnemonics and Names . . . . . . . . . . . . . . . . . . . . . . .4:20
2-4 Information Returned by CPUID Instruction . . . . . . . . . . . . . . . . . . . . . . . . . .4:78
2-5 Feature Flags Returned in EDX Register . . . . . . . . . . . . . . . . . . . . . . . . . . .4:80
400 Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3
2-6 FPATAN Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:149
2-7 FPREM Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:151
2-8 FPREM1 Zeros and NaNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:154
2-9 FSUB Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:183
2-10 FSUBR Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:186
2-11 FYL2X Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4:199
2-12 FYL2XP1 Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:201
2-13 IDIV Operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:204
2-14 INT Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:218
2-15 LAR Descriptor Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:253
2-16 LEA Address and Operand Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:258
2-17 Repeat Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:338
4-1 Real Number Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4:476
4-2 Denormalization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4:478
4-3 Results of Operations with NAN Operands . . . . . . . . . . . . . . . . . . . . . . . . . 4:481
4-4 Precision and Range of SSE Datatype . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:482
4-5 Real Number and NaN Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:482
4-6 SSE Instruction Behavior with Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:483
4-7 SIMD Integer Instructions – Behavior with Prefixes . . . . . . . . . . . . . . . . . . . . . 4:483
4-8 Cacheability Control Instruction Behavior with Prefixes . . . . . . . . . . . . . . . . . . . 4:483
4-9 Key to SSE Naming Convention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:485
§
Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3 401
402 Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3

About this Manual 1

The Intel® Itanium® architecture is a unique combination of innovative features such as explicit parallelism, predication, speculation and more. The architecture is designed to be highly scalable to fill the ever increasing performance requirements of various server and workstation market segments. The Itanium architecture features a revolutionary 64-bit instruction set architecture (ISA) which applies a new processor architecture technology called EPIC, or Explicitly Parallel Instruction Computing. A key feature of the Itanium architecture is IA-32 instruction set compatibility.
The Intel comprehensive description of the programming environment, resources, and instruction set visible to both the application and system programmer. In addition, it also describes how programmers can take advantage of the features of the Itanium architecture to help them optimize code.
®
Itanium® Architecture Software Developer’s Manual provides a

1.1 Overview of Volume 1: Application Architecture

This volume defines the Itanium application architecture, including application level resources, programming environment, and the IA-32 application interface. This volume also describes optimization techniques used to generate high performance software.

1.1.1 Part 1: Application Architecture Guide

Chapter 1, “About this Manual” provides an overview of all volumes in the Intel®
Itanium
Chapter 2, “Introduction to the Intel
the architecture.
Chapter 3, “Execution Environment” describes the Itanium register set used by
applications and the memory organization models.
®
Architecture Software Developer’s Manual.
®
Itanium® Architecture” provides an overview of
Chapter 4, “Application Programming Model” gives an overview of the behavior of
Itanium application instructions (grouped into related functions).
Chapter 5, “Floating-point Programming Model” describes the Itanium floating-point
architecture (including integer multiply).
Chapter 6, “IA-32 Application Execution Model in an Intel Environment” describes the operation of IA-32 instructions within the Itanium System
Environment from the perspective of an application programmer.
®
Itanium® System

1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture

Chapter 1, “About the Optimization Guide” gives an overview of the optimization guide.
Volume 4: About this Manual 4:1
Chapter 2, “Introduction to Programming for the Intel® Itanium® Architecture”
provides an overview of the application programming environment for the Itanium architecture.
Chapter 3, “Memory Reference” discusses features and optimizations related to control
and data speculation.
Chapter 4, “Predication, Control Flow, and Instruction Stream” describes optimization
features related to predication, control flow, and branch hints.
Chapter 5, “Software Pipelining and Loop Support” provides a detailed discussion on
optimizing loops through use of software pipelining.
Chapter 6, “Floating-point Applications” discusses current performance limitations in
floating-point applications and features that address these limitations.

1.2 Overview of Volume 2: System Architecture

This volume defines the Itanium system architecture, including system level resources and programming state, interrupt model, and processor firmware interface. This volume also provides a useful system programmer's guide for writing high performance system software.

1.2.1 Part 1: System Architecture Guide

Chapter 1, “About this Manual” provides an overview of all volumes in the Intel®
Itanium
Chapter 2, “Intel
designed to support execution of Itanium architecture-based operating systems running IA-32 or Itanium architecture-based applications.
Chapter 3, “System State and Programming Model” describes the Itanium architectural
state which is visible only to an operating system.
Chapter 4, “Addressing and Protection” defines the resources available to the operating
system for virtual to physical address translation, virtual aliasing, physical addressing, and memory ordering.
Chapter 5, “Interruptions” describes all interruptions that can be generated by a
processor based on the Itanium architecture.
Chapter 6, “Register Stack Engine” describes the architectural mechanism which
automatically saves and restores the stacked subset (GR32 – GR 127) of the general register file.
Chapter 7, “Debugging and Performance Monitoring” is an overview of the performance
monitoring and debugging resources that are available in the Itanium architecture.
Chapter 8, “Interruption Vector Descriptions” lists all interruption vectors.
®
Architecture Software Developer’s Manual.
®
Itanium® System Environment” introduces the environment
4:2 Volume 4: About this Manual
Chapter 9, “IA-32 Interruption Vector Descriptions” lists IA-32 exceptions, interrupts
and intercepts that can occur during IA-32 instruction set execution in the Itanium System Environment.
Chapter 10, “Itanium
®
Architecture-based Operating System Interaction Model with IA-32 Applications” defines the operation of IA-32 instructions within the Itanium
System Environment from the perspective of an Itanium architecture-based operating system.
Chapter 11, “Processor Abstraction Layer” describes the firmware layer which abstracts
processor implementation-dependent features.

1.2.2 Part 2: System Programmer’s Guide

Chapter 1, “About the System Programmer’s Guide” gives an introduction to the second
section of the system architecture guide.
Chapter 2, “MP Coherence and Synchronization” describes multiprocessing
synchronization primitives and the Itanium memory ordering model.
Chapter 3, “Interruptions and Serialization” describes how the processor serializes
execution around interruptions and what state is preserved and made available to low-level system code when interruptions are taken.
Chapter 4, “Context Management” describes how operating systems need to preserve
Itanium register contents and state. This chapter also describes system architecture mechanisms that allow an operating system to reduce the number of registers that need to be spilled/filled on interruptions, system calls, and context switches.
Chapter 5, “Memory Management” introduces various memory management strategies.
Chapter 6, “Runtime Support for Control and Data Speculation” describes the operating
system support that is required for control and data speculation.
Chapter 7, “Instruction Emulation and Other Fault Handlers” describes a variety of
instruction emulation handlers that Itanium architecture-based operating systems are expected to support.
Chapter 8, “Floating-point System Software” discusses how processors based on the
Itanium architecture handle floating-point numeric exceptions and how the software stack provides complete IEEE-754 compliance.
Chapter 9, “IA-32 Application Support” describes the support an Itanium
architecture-based operating system needs to provide to host IA-32 applications.
Chapter 10, “External Interrupt Architecture” describes the external interrupt
architecture with a focus on how external asynchronous interrupt handling can be controlled by software.
Chapter 11, “I/O Architecture” describes the I/O architecture with a focus on platform
issues and support for the existing IA-32 I/O port space.
Volume 4: About this Manual 4:3
Chapter 12, “Performance Monitoring Support” describes the performance monitor
architecture with a focus on what kind of support is needed from Itanium architecture-based operating systems.
Chapter 13, “Firmware Overview” introduces the firmware model, and how various
firmware layers (PAL, SAL, UEFI, ACPI) work together to enable processor and system initialization, and operating system boot.

1.2.3 Appendices

Appendix A, “Code Examples” provides OS boot flow sample code.

1.3 Overview of Volume 3: Intel® Itanium® Instruction Set Reference

This volume is a comprehensive reference to the Itanium instruction set, including instruction format/encoding.
Chapter 1, “About this Manual” provides an overview of all volumes in the Intel
Itanium
Chapter 2, “Instruction Reference” provides a detailed description of all Itanium
instructions, organized in alphabetical order by assembly language mnemonic.
Chapter 3, “Pseudo-Code Functions” provides a table of pseudo-code functions which
are used to define the behavior of the Itanium instructions.
Chapter 4, “Instruction Formats” describes the encoding and instruction format
instructions.
Chapter 5, “Resource and Dependency Semantics” summarizes the dependency rules
that are applicable when generating code for processors based on the Itanium architecture.
®
Architecture Software Developer’s Manual.

1.4 Overview of Volume 4: IA-32 Instruction Set Reference

This volume is a comprehensive reference to the IA-32 instruction set, including instruction format/encoding.
Chapter 1, “About this Manual” provides an overview of all volumes in the Intel
Itanium
®
Architecture Software Developer’s Manual.
®
®
Chapter 2, “Base IA-32 Instruction Reference” provides a detailed description of all
base IA-32 instructions, organized in alphabetical order by assembly language mnemonic.
4:4 Volume 4: About this Manual
Chapter 3, “IA-32 Intel® MMX™ Technology Instruction Reference” provides a detailed
description of all IA-32 Intel performance of multimedia intensive applications. Organized in alphabetical order by assembly language mnemonic.
Chapter 4, “IA-32 SSE Instruction Reference” provides a detailed description of all
IA-32 SSE instructions designed to increase performance of multimedia intensive applications, and is organized in alphabetical order by assembly language mnemonic.

1.5 Terminology

The following definitions are for terms related to the Itanium architecture and will be used throughout this document:
Instruction Set Architecture (ISA) – Defines application and system level resources. These resources include instructions and registers.
Itanium Architecture – The new ISA with 64-bit instruction capabilities, new performance- enhancing features, and support for the IA-32 instruction set.
IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the
®
Intel
Itanium System Environment – The operating system environment that supports the execution of both IA-32 and Itanium architecture-based code.
64 and IA-32 Architectures Software Developer’s Manual.
®
MMX™ technology instructions designed to increase
IA-32 System Environment – The operating system privileged environment and resources as defined by the Intel Architecture Software Developer’s Manual. Resources include virtual paging, control registers, debugging, performance monitoring, machine checks, and the set of privileged instructions.
Itanium
and System Abstraction Layer (SAL).
Processor Abstraction Layer (PAL) – The firmware layer which abstracts processor features that are implementation dependent.
System Abstraction Layer (SAL) – The firmware layer which abstracts system features that are implementation dependent.
®
Architecture-based Firmware – The Processor Abstraction Layer (PAL)

1.6 Related Documents

The following documents can be downloaded at the Intel’s Developer Site at http://developer.intel.com:
Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual for Software Development and Optimization– Document number 308065 provides model-specific information about the dual-core Itanium processors.
Intel
®
Itanium® 2 Processor Reference Manual for Software Development
and Optimization – This document (Document number 251110) describes
Volume 4: About this Manual 4:5
model-specific architectural features incorporated into the Intel® Itanium® 2 processor, the second processor based on the Itanium architecture.
Intel
®
Itanium® Processor Reference Manual for Software Development –
This document (Document number 245320) describes model-specific architectural features incorporated into the Intel
®
Itanium® processor, the first processor based
on the Itanium architecture.
Intel
®
64 and IA-32 Architectures Software Developer’s Manual – This set
of manuals describes the Intel 32-bit architecture. They are available from the Intel Literature Department by calling 1-800-548-4725 and requesting Document Numbers 243190, 243191and 243192.
Intel
®
Itanium® Software Conventions and Runtime Architecture Guide –
This document (Document number 245358) defines general information necessary to compile, link, and execute a program on an Itanium architecture-based operating system.
Intel
®
Itanium® Processor Family System Abstraction Layer Specification –
This document (Document number 245359) specifies requirements to develop platform firmware for Itanium architecture-based systems.
The following document can be downloaded at the Unified EFI Forum website at http://www.uefi.org:
Unified Extensible Firmware Interface Specification – This document defines a new model for the interface between operating systems and platform firmware.

1.7 Revision History

Date of
Revision
March 2010 2.3 Added information about illegal virtualization optimization combinations and
Revision
Number
IIPA requirements. Added Resource Utilization Counter and PAL_VP_INFO. PAL_VP_INIT and VPD.vpr changes. New PAL_VPS_RESUME_HANDLER parameter to indicate RSE Current
Frame Load Enable setting at the target instruction. PAL_VP_INIT_ENV implementation-specific configuration option. Minimum Virtual address increased to 54 bits. New PAL_MC_ERROR_INFO health indicator. New PAL_MC_ERROR_INJECT implementation-specific bit fields. MOV-to_SR.L reserved field checking. Added virtual machine disable. Added variable frequency mode additions to ACPI P-state description. Removed pal_proc_vector argument from PAL_VP_SAVE and
PAL_VP_RESTORE. Added PAL_PROC_SET_FEATURES data speculation disable. Added Interruption Instruction Bundle registers. Min-state save area size change. PAL_MC_DYNAMIC_STATE changes. PAL_PROC_SET_FEATURES data poisoning promotion changes. ACPI P-state clarifications. Synchronization requirements for virtualization opcode optimization. New priority hint and multi-threading hint recommendations.
Description
4:6 Volume 4: About this Manual
Date of
Revision
August 2005 2.2 Allow register fields in CR.LID register to be read-only and CR.LID checking
Revision
Number
Description
on interruption messages by processors optional. See Vol 2, Part I, Ch 5 “Interruptions” and Section 11.2.2 PALE_RESET Exit State for details.
Relaxed reserved and ignored fields checkings in IA-32 application registers in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.
Introduced visibility constraints between stores and local purges to ensure TLB consistency for UP VHPT update and local purge scenarios. See Vol 2, Part I, Ch 4 and description of
Architecture extensions for processor Power/Performance states (P-states). See Vol 2 PAL Chapter for details.
Introduced Unimplemented Instruction Address fault. Relaxed ordering constraints for VHPT walks. See Vol 2, Part I, Ch 4 and 5 for
details. Architecture extensions for processor virtualization. All instructions which must be last in an instruction group results in undefined
behavior when this rule is violated. Added architectural sequence that guarantees increasing ITC and PMD
values on successive reads. Addition of PAL_BRAND_INFO, PAL_GET_HW_POLICY,
PAL_MC_ERROR_INJECT, PAL_MEMORY_BUFFER, PAL_SET_HW_POLICY and PAL_SHUTDOWN procedures.
Allows IPI-redirection feature to be optional. Undefined behavior for 1-byte accesses to the non-architected regions in the
IPI block. Modified insertion behavior for TR overlaps. See Vol 2, Part I, Ch 4 for details. “Bus parking” feature is now optional for PAL_BUS_GET_FEATURES. Introduced low-power synchronization primitive using FR32-127 is now preserved in PAL calling convention. New return value from PAL_VM_SUMMARY procedure to indicate the
number of multiple concurrent outstanding TLB purges. Performance Monitor Data (PMD) registers are no longer sign-extended. New memory attribute transition sequence for memory on-line delete. See Vol
2, Part I, Ch 4 for details. Added 'shared error' (se) bit to the Processor State Parameter (PSP) in
PAL_MC_ERROR_INFO procedure. Clarified PMU interrupts as edge-triggered. Modified ‘proc_number’ parameter in PAL_LOGICAL_TO_PHYSICAL
procedure. Modified pal_copy_info alignment requirements. New bit in PAL_PROC_GET_FEATURES for variable P-state performance. Clarified descriptions for check_target_register and
check_target_register_sof. Various fixes in dependency tables in Vol 3 Ch 5. Clarified effect of sending IPIs to non-existent processor in Vol 2, Part I, Ch 5. Clarified instruction serialization requirements for interruptions in Vol 2, Part II,
Ch 3. Updated performance monitor context switch routine in Vol 2, Part I, Ch 7.
ptc.l instruction in Vol 3 for details.
hint instruction.
Volume 4: About this Manual 4:7
Date of
Revision
Revision
Number
Description
August 2002 2.1 Added Predicate Behavior of alloc Instruction Clarification (Section 4.1.2,
Part I, Volume 1; Section 2.2, Part I, Volume 3). Added New fc.i Instruction (Section 4.4.6.1, and 4.4.6.2, Part I, Volume 1;
Section 4.3.3, 4.4.1, 4.4.5, 4.4.6, 4.4.7, 5.5.2, and 7.1.2, Part I, Volume 2; Section 2.5, 2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Volume 2; Section 2.2, 3,
4.1, 4.4.6.5, and 4.4.10.10, Part I, Volume 3). Added Interval Time Counter (ITC) Fault Clarification (Section 3.3.2, Part I,
Volume 2). Added Interruption Control Registers Clarification (Section 3.3.5, Part I,
Volume 2). Added Spontaneous NaT Generation on Speculative Load (ld.s)
(Section 5.5.5 and 11.9, Part I, Volume 2; Section 2.2 and 3, Part I, Volume 3). Added Performance Counter Standardization (Sections 7.2.3 and 11.6, Part I,
Volume 2). Added Freeze Bit Functionality in Context Switching and Interrupt Generation
Clarification (Sections 7.2.1, 7.2.2, 7.2.4.1, and 7.2.4.2, Part I, Volume 2) Added IA_32_Exception (Debug) IIPA Description Change (Section 9.2, Part
I, Volume 2). Added capability for Allowing Multiple PAL_A_SPEC and PAL_B Entries in the
Firmware Interface Table (Section 11.1.6, Part I, Volume 2). Added BR1 to Min-state Save Area (Sections 11.3.2.3 and 11.3.3, Part I,
Volume 2). Added Fault Handling Semantics for lfetch.fault Instruction (Section 2.2,
Part I, Volume 3).
December 2001 2.0 Volume 1:
Faults in ld.c that hits ALAT clarification (Section 4.4.5.3.1). IA-32 related changes (Section 6.2.5.4, Section 6.2.3, Section 6.2.4, Section
6.2.5.3). Load instructions change (Section 4.4.1).
4:8 Volume 4: About this Manual
Date of
Revision
Revision
Number
Volume 2: Class pr-writers-int clarification (Table A-5). PAL_MC_DRAIN clarification (Section 4.4.6.1). VHPT walk and forward progress change (Section 4.1.1.2). IA-32 IBR/DBR match clarification (Section 7.1.1). ISR figure changes (pp. 8-5, 8-26, 8-33 and 8-36). PAL_CACHE_FLUSH return argument change – added new status return
argument (Section 11.8.3). PAL self-test Control and PAL_A procedure requirement change – added new
arguments, figures, requirements (Section 11.2). PAL_CACHE_FLUSH clarifications (Chapter 11). Non-speculative reference clarification (Section 4.4.6). RID and Preferred Page Size usage clarification (Section 4.1). VHPT read atomicity clarification (Section 4.1). IIP and WC flush clarification (Section 4.4.5). Revised RSE and PMC typographical errors (Section 6.4). Revised DV table (Section A.4). Memory attribute transitions – added new requirements (Section 4.4). MCA for WC/UC aliasing change (Section 4.4.1). Bus lock deprecation – changed behavior of DCR ‘lc’ bit (Section 3.3.4.1,
Section 10.6.8, Section 11.8.3). PAL_PROC_GET/SET_FEATURES changes – extend calls to allow
implementation-specific feature control (Section 11.8.3). Split PAL_A architecture changes (Section 11.1.6). Simple barrier synchronization clarification (Section 13.4.2). Limited speculation clarification – added hardware-generated speculative
references (Section 4.4.6). PAL memory accesses and restrictions clarification (Section 11.9). PSP validity on INITs from PAL_MC_ERROR_INFO clarification (Section
11.8.3). Speculation attributes clarification (Section 4.4.6). PAL_A FIT entry, PAL_VM_TR_READ, PSP, PAL_VERSION clarifications
(Sections 11.8.3 and 11.3.2.1). TLB searching clarifications (Section 4.1). IA-32 related changes (Section 10.3, Section 10.3.2, Section 10.3.2, Section
10.3.3.1, Section 10.10.1). IPSR.ri and ISR.ei changes (Table 3-2, Section 3.3.5.1, Section 3.3.5.2,
Section 5.5, Section 8.3, and Section 2.2).
Volume 3: IA-32 CPUID clarification (p. 5-71). Revised figures for extract, deposit, and alloc instructions (Section 2.2). RCPPS, RCPSS, RSQRTPS, and RSQRTSS clarification (Section 7.12). IA-32 related changes (Section 5.3). tak, tpa change (Section 2.2).
July 2000 1.1 Volume 1:
Processor Serial Number feature removed (Chapter 3). Clarification on exceptions to instruction dependency (Section 3.4.3).
Description
Volume 4: About this Manual 4:9
Date of
Revision
January 2000 1.0 Initial release of document.
Revision
Number
Volume 2: Clarifications regarding “reserved” fields in ITIR (Chapter 3). Instruction and Data translation must be enabled for executing IA-32
instructions (Chapters 3,4 and 10). FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI (Chapters 3 and 4). Clarification regarding ordering data dependency. Out-of-order IPI delivery is now allowed (Chapters 4 and 5). Content of EFLAG field changed in IIM (p. 9-24). PAL_CHECK and PAL_INIT calls – exit state changes (Chapter 11). PAL_CHECK processor state parameter changes (Chapter 11). PAL_BUS_GET/SET_FEATURES calls – added two new bits (Chapter 11). PAL_MC_ERROR_INFO call – Changes made to enhance and simplify the call to provide more information regarding machine check (Chapter 11). PAL_ENTER_IA_32_Env call changes – entry parameter represents the entry order; SAL needs to initialize all the IA-32 registers properly before making
this call (Chapter 11). PAL_CACHE_FLUSH – added a new cache_type argument (Chapter 11). PAL_SHUTDOWN – removed from list of PAL calls (Chapter 11). Clarified memory ordering changes (Chapter 13). Clarification in dependence violation table (Appendix A).
Volume 3: fmix instruction page figures corrected (Chapter 2). Clarification of “reserved” fields in ITIR (Chapters 2 and 3). Modified conditions for alloc/loadrs/flushrs instruction placement in bundle/ instruction group (Chapters 2 and 4). IA-32 JMPE instruction page typo fix (p. 5-238). Processor Serial Number feature removed (Chapter 5).
Description
§
4:10 Volume 4: About this Manual

Base IA-32 Instruction Reference 2

This section lists all IA-32 instructions and their behavior in the Itanium System Environment and IA-32 System Environments on an processor based on the Itanium architecture. Unless noted otherwise all IA-32 and MMX technology and SSE instructions operate as defined in the Intel Developer’s Manual.
This volume describes the complete IA-32 Architecture instruction set, including the integer, floating-point, MMX technology and SSE technology, and system instructions. The instruction descriptions are arranged in alphabetical order. For each instruction, the forms are given for each operand combination, including the opcode, operands required, and a description. Also given for each instruction are a description of the instruction and its operands, an operational description, a description of the effect of the instructions on flags in the EFLAGS register, and a summary of the exceptions that can be generated.
For all IA-32 the following relationships hold:
Writes – Writes of any IA-32 general purpose, floating-point or SSE, MMX
technology registers by IA-32 instructions are reflected in the Itanium registers defined to hold that IA-32 state when IA-32 instruction set completes execution.
Reads – Reads of any IA-32 general purpose, floating-point or SSE, MMX
technology registers by IA-32 instructions see the state of the Itanium registers defined to hold the IA-32 state after entering the IA-32 instruction set.
State mappings – IA-32 numeric instructions are controlled by and reflect their
status in FCW, FSW, FTW, FCS, FIP, FOP, FDS and FEA. On exit from the IA-32 instruction set, Itanium numeric status and control resources defined to hold IA-32 state reflect the results of all IA-32 prior numeric instructions in FCR, FSR, FIR and FDR. Itanium numeric status and control resources defined to hold IA-32 state are honored by IA-32 numeric instructions when entering the IA-32 instruction set.
®
64 and IA-32 Architectures Software

2.1 Additional Intel® Itanium® Faults

The following fault behavior is defined for all IA-32 instructions in the Itanium System Environment:
IA-32 Faults – All IA-32 faults are performed as defined in the Intel
IA-32 Architectures Software Developer’s Manual, unless otherwise noted. IA-32 faults are delivered on the IA_32_Exception interruption vector.
IA-32 GPFault – Null segments are signified by the segment descriptor register’s
P-bit being set to zero. IA-32 memory references through DSD, ESD, FSD, and GSD with the P-bit set to zero result in an IA-32 GPFault.
Itanium Low FP Reg Fault – If PSR.dfl is 1, execution of any IA-32 MMX
technology, SSE or floating-point instructions results in a Disabled FP Register fault (regardless of whether FR2-31 is referenced).
Itanium High FP Reg Fault – If PSR.dfh is 1, execution of the first target IA-32
instruction following an br.ia or rfi results in a Disabled FP Register fault (regardless of whether FR32-127 is referenced).
Volume 4: Base IA-32 Instruction Reference 4:11
®
64 and
Itanium Instruction Mem Faults – The following additional Itanium memory faults can be generated on each virtual page referenced when fetching IA-32 or MMX technology or SSE instructions for execution:
• Alternative instruction TLB fault
• VHPT instruction fault
• Instruction TLB fault
• Instruction Page Not Present fault
• Instruction NaT Page Consumption Abort
• Instruction Key Miss fault
• Instruction Key Permission fault
• Instruction Access Rights fault
• Instruction Access Bit fault
Itanium Data Mem Faults – The following additional Itanium memory faults can be generated on each virtual page touched when reading or writing memory operands from the IA-32 instruction set including MMX technology and SSE instructions:
•Nested TLB fault
• Alternative data TLB fault
•VHPT data fault
• Data TLB fault
• Data Page Not Present fault
• Data NaT Page Consumption Abort
• Data Key Miss fault
• Data Key Permission fault
• Data Access Rights fault
• Data Dirty bit fault
• Data Access bit fault

2.2 Interpreting the IA-32 Instruction Reference Pages

This section describes the information contained in the various sections of the instruction reference pages that make up the majority of this chapter. It also explains the notational conventions and abbreviations used in these sections.

2.2.1 IA-32 Instruction Format

The following is an example of the format used for each Intel architecture instruction description in this chapter.
2.2.1.0.0.1 CMC—Complement Carry Flag
Opcode Instruction Description
F5 CMC Complement carry flag
4:12 Volume 4: Base IA-32 Instruction Reference
2.2.1.1 Opcode Column
The “Opcode” column gives the complete object code produced for each form of the instruction. When possible, the codes are given as hexadecimal bytes, in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows:
/digit – A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.
/r – Indicates that the ModR/M byte of the instruction contains both a register operand and an r/m operand.
cb, cw, cd, cp – A 1-byte (cb), 2-byte (cw), 4-byte (cd), or 6-byte (cp) value following the opcode that is used to specify a code offset and possibly a new value for the code segment register.
ib, iw, id – A 1-byte (ib), 2-byte (iw), or 4-byte (id) immediate operand to the instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed value. All words and doublewords are given with the low-order byte first.
+rb, +rw, +rd – A register code, from 0 through 7, added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte. The register codes are given in Tab l e 2 - 1.
+i – A number used in floating-point instructions when one of the operands is ST(i) from the FPU register stack. The number i (which can range from 0 to 7) is added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte.
Table 2-1. Register Encodings Associated with the +rb, +rw, and +rd
Nomenclature
rb rw rd
AL = 0 AX = 0 EAX = 0
CL = 1 CX = 1 ECX = 1
DL = 2 DX = 2 EDX = 2
BL = 3 BX = 3 EBX = 3
rb rw rd
AH = 4 SP = 4 ESP = 4
CH = 5 BP = 5 EBP = 5
DH = 6 SI = 6 ESI = 6
BH = 7 DI = 7 EDI = 7
2.2.1.2 Instruction Column
The “Instruction” column gives the syntax of the instruction statement as it would appear in an ASM386 program. The following is a list of the symbols used to represent operands in the instruction statements:
rel8 – A relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the end of the instruction.
rel16 and rel32 – A relative address within the same code segment as the instruction assembled. The rel16 symbol applies to instructions with an operand-size attribute of 16 bits; the rel32 symbol applies to instructions with an operand-size attribute of 32 bits.
Volume 4: Base IA-32 Instruction Reference 4:13
ptr16:16 and ptr16:32 – A far pointer, typically in a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16-bit selector or value destined for the code segment register. The value to the right corresponds to the offset within the destination segment. The ptr16:16 symbol is used when the instruction's operand-size attribute is 16 bits; the ptr16:32 symbol is used when the operand-size attribute is 32 bits.
r8 – One of the byte general-purpose registers AL, CL, DL, BL, AH, CH, DH, or BH.
r16 – One of the word general-purpose registers AX, CX, DX, BX, SP, BP, SI, or DI.
r32 – One of the doubleword general-purpose registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, or EDI.
imm8 – An immediate byte value. The imm8 symbol is a signed number between – 128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign-extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value.
imm16 – An immediate word value used for instructions whose operand-size attribute is 16 bits. This is a number between –32,768 and +32,767 inclusive.
imm32 – An immediate doubleword value used for instructions whose operand-size attribute is 32 bits. It allows the use of a number between +2,147,483,647 and -2,147,483,648 inclusive.
r/m8 – A byte operand that is either the contents of a byte general-purpose register (AL, BL, CL, DL, AH, BH, CH, and DH), or a byte from memory.
r/m16 – A word general-purpose register or memory operand used for instructions whose operand-size attribute is 16 bits. The word general-purpose registers are: AX, BX, CX, DX, SP, BP, SI, and DI. The contents of memory are found at the address provided by the effective address computation.
r/m32 – A doubleword general-purpose register or memory operand used for instructions whose operand-size attribute is 32 bits. The doubleword general-purpose registers are: EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI. The contents of memory are found at the address provided by the effective address computation.
m – A 16- or 32-bit operand in memory.
m8 – A byte operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions and the XLAT instruction.
m16 – A word operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.
m32 – A doubleword operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.
m64 – A memory quadword operand in memory. This nomenclature is used only with the CMPXCHG8B instruction.
m16:16, m16:32 – A memory operand containing a far pointer composed of two numbers. The number to the left of the colon corresponds to the pointer's segment selector. The number to the right corresponds to its offset.
m16&32, m16&16, m32&32 – A memory operand consisting of data item pairs whose sizes are indicated on the left and the right side of the ampersand. All
4:14 Volume 4: Base IA-32 Instruction Reference
memory addressing modes are allowed. The m16&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and IDTR registers.
moffs8, moffs16, moffs32 – A simple memory variable (memory offset) of type
byte, word, or doubleword used by some variants of the MOV instruction. The actual address is given by a simple offset relative to the segment base. No ModR/M byte is used in the instruction. The number shown with moffs indicates its size, which is determined by the address-size attribute of the instruction.
Sreg – A segment register. The segment register bit assignments are ES=0, CS=1,
SS=2, DS=3, FS=4, and GS=5.
m32real, m64real, m80real – A single-, double-, and extended-real
(respectively) floating-point operand in memory.
m16int, m32int, m64int – A word-, short-, and long-integer (respectively)
floating-point operand in memory.
ST or ST(0) – The top element of the FPU register stack.
ST(i) – The i
mm – An MMX technology register. The 64-bit MMX technology registers are: MM0
through MM7.
mm/m32 – The low order 32 bits of an MMX technology register or a 32-bit
memory operand. The 64-bit MMX technology registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.
mm/m64 – An MMX technology register or a 64-bit memory operand. The 64-bit
MMX technology registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.
th
element from the top of the FPU register stack. (i = 0 through 7).
2.2.1.3 Description Column
The “Description” column following the “Instruction” column briefly explains the various forms of the instruction. The following “Description” and “Operation” sections contain more details of the instruction's operation.
2.2.1.4 Description
The “Description” section describes the purpose of the instructions and the required operands. It also discusses the effect of the instruction on flags.

2.2.2 Operation

The “Operation” section contains an algorithmic description (written in pseudo-code) of the instruction. The pseudo-code uses a notation similar to the Algol or Pascal language. The algorithms are composed of the following elements:
• Comments are enclosed within the symbol pairs “(*” and “*)”.
• Compound statements are enclosed in keywords, such as IF, THEN, ELSE, and FI for
an if statement, DO and OD for a do statement, or CASE... OF and ESAC for a case statement.
Volume 4: Base IA-32 Instruction Reference 4:15
• A register name implies the contents of the register. A register name enclosed in brackets implies the contents of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the location whose ES segment relative address is in register DI. [SI] indicates the contents of the address contained in register SI relative to SI’s default segment (DS) or overridden segment.
• Parentheses around the “E” in a general-purpose register name, such as (E)SI, indicates that an offset is read from the SI register if the current address-size attribute is 16 or is read from the ESI register if the address-size attribute is 32.
• Brackets are also used for memory operands, where they mean that the contents of the memory location is a segment-relative offset. For example, [SRC] indicates that the contents of the source operand is a segment-relative offset.
•A  B; indicates that the value of B is assigned to A.
• The symbols =, meaning equal, not equal, greater or equal, less or equal, respectively. A relational expression such as A = B is TRUE if the value of A is equal to B; otherwise it is FALSE.
• The expression “<< COUNT” and “>> COUNT” indicates that the destination operand should be shifted left or right, respectively, by the number of bits indicated by the count operand.
The following identifiers are used in the algorithmic descriptions:
OperandSize and AddressSize – The OperandSize identifier represents the operand-size attribute of the instruction, which is either 16 or 32 bits. The AddressSize identifier represents the address-size attribute, which is either 16 or 32 bits. For example, the following pseudo-code indicates that the operand-size attribute depends on the form of the CMPS instruction used.
, , and are relational operators used to compare two values,
IF instruction = CMPSW
THEN OperandSize 16; ELSE
IF instruction = CMPSD
THEN OperandSize 32;
FI;
FI;
See “Operand-Size and Address-Size Attributes” in Chapter 3 of the Intel Architecture Software Developer’s Manual, Volume 1, for general guidelines on how
these attributes are determined.
StackAddrSize – Represents the stack address-size attribute associated with the instruction, which has a value of 16 or 32 bits (see “Address-Size Attribute for Stack” in Chapter 4 of the Intel Architecture Software Developer’s Manual, Volume
1).
SRC – Represents the source operand.
DEST – Represents the destination operand.
The following functions are used in the algorithmic descriptions:
ZeroExtend(value) – Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, zero extending a byte value of -10 converts the byte from F6H to a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the operand-size attribute are the same size, ZeroExtend returns the value unaltered.
4:16 Volume 4: Base IA-32 Instruction Reference
SignExtend(value) – Returns a value sign-extended to the operand-size attribute
of the instruction. For example, if the operand-size attribute is 32, sign extending a byte containing the value -10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand-size attribute are the same size, SignExtend returns the value unaltered.
SaturateSignedWordToSignedByte – Converts a signed 16-bit value to a signed
8-bit value. If the signed 16-bit value is less than -128, it is represented by the saturated value -128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).
SaturateSignedDwordToSignedWord – Converts a signed 32-bit value to a
signed 16-bit value. If the signed 32-bit value is less than -32768, it is represented by the saturated value
-32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).
SaturateSignedWordToUnsignedByte – Converts a signed 16-bit value to an
unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).
SaturateToSignedByte – Represents the result of an operation as a signed 8-bit
value. If the result is less than -128, it is represented by the saturated value -128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).
SaturateToSignedWord – Represents the result of an operation as a signed
16-bit value. If the result is less than -32768, it is represented by the saturated value -32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).
SaturateToUnsignedByte – Represents the result of an operation as a signed
8-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).
SaturateToUnsignedWord – Represents the result of an operation as a signed
16-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535 (FFFFH).
LowOrderWord(DEST * SRC) – Multiplies a word operand by a word operand and
stores the least significant word of the doubleword result in the destination operand.
HighOrderWord(DEST * SRC) – Multiplies a word operand by a word operand
and stores the most significant word of the doubleword result in the destination operand.
Push(value) – Pushes a value onto the stack. The number of bytes pushed is
determined by the operand-size attribute of the instruction.
Pop() – Removes the value from the top of the stack and returns it. The statement
EAX Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will return either a word or a doubleword depending on the operand-size attribute.
PopRegisterStack – Marks the FPU ST(0) register as empty and increments the
FPU register stack pointer (TOP) by 1.
Switch-Tasks – Performs a task switch.
Bit(BitBase, BitOffset) – Returns the value of a bit within a bit string, which is a
sequence of bits in memory or a register. Bits are numbered from low-order to
Volume 4: Base IA-32 Instruction Reference 4:17
high-order within registers and within memory bytes. If the base operand is a
02131
BitOffset = 21
07775 0 0
0777500
BitBase +1 BitBase BitBase -1
BitOffset = +13
BitBase BitBase -1 BitBase -2
BitOffset = -11
register, the offset can be in the range 0..31. This offset addresses a bit within the indicated register. An example, the function Bit[EAX, 21] is illustrated in Figure 2-2.
Figure 2-2. Bit Offset for BIT[EAX,21]
If BitBase is a memory address, BitOffset can range from -2 GBits to 2 GBits. The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase + (BitOffset DIV 8)), where DIV is signed division with rounding towards negative infinity, and MOD returns a positive number. This operation is illustrated in Figure 2-3.
Figure 2-3. Memory Bit Indexing

2.2.3 Flags Affected

The “Flags Affected” section lists the flags in the EFLAGS register that are affected by the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1. The arithmetic and logical instructions usually assign values to the status flags in a uniform manner (see Appendix A, EFLAGS Cross-Reference, in the Intel Architecture Software Developer’s Manual, Volume 1). Non-conventional assignments are described in the “Operation” section. The values of flags listed as undefined may be changed by the instruction in an indeterminate manner. Flags that are not listed are unchanged by the instruction.

2.2.4 FPU Flags Affected

The floating-point instructions have an “FPU Flags Affected” section that describes how
4:18 Volume 4: Base IA-32 Instruction Reference
each instruction can affect the four condition code flags of the FPU status word.

2.2.5 Protected Mode Exceptions

The “Protected Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in protected mode and the reasons for the exceptions. Each exception is given a mnemonic that consists of a pound sign (#) followed by two letters and an optional error code in parentheses. For example, #GP(0) denotes a general protection exception with an error code of 0. Tab l e 2 - 2 associates each two-letter mnemonic with the corresponding interrupt vector number and exception name. See Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developer’s Manual, Volume 3, for a detailed description of the exceptions.
Application programmers should consult the documentation provided with their operating systems to determine the actions taken when exceptions occur.

2.2.6 Real-address Mode Exceptions

The “Real-Address Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in real-address mode.
Table 2-2. Exception Mnemonics, Names, and Vector Numbers
Vector
No.
a. The UD2 instruction was introduced in the Pentium® Pro processor. b. This exception was introduced in the Intel® 486 processor. c. This exception was introduced in the Pentium processor and enhanced in the Pentium Pro processor.
Mnemonic Name Source
0 #DE Divide Error DIV and IDIV instructions.
1 #DB Debug Any code or data reference.
3 #BP Breakpoint INT 3 instruction.
4 #OF Overflow INTO instruction.
5 #BR BOUND Range Exceeded BOUND instruction.
6 #UD Invalid Opcode (Undefined Opcode) UD2 instruction or reserved opcode.
7 #NM Device Not Available (No Math
Coprocessor)
8 #DF Double Fault Any instruction that can generate an
10 #TS Invalid TSS Task switch or TSS access.
11 #NP Segment Not Present Loading segment registers or accessing
12 #SS Stack Segment Fault Stack operations and SS register loads.
13 #GP General Protection Any memory reference and other protection
14 #PF Page Fault Any memory reference.
16 #MF Floating-point Error (Math Fault) Floating-point or WAIT/FWAIT instruction.
17 #AC Alignment Check Any data reference in memory.
18 #MC Machine Check Model dependent.
Floating-point or WAIT/FWAIT instruction.
exception, an NMI, or an INTR.
system segments.
checks.
c
a
b

2.2.7 Virtual-8086 Mode Exceptions

The “Virtual-8086 Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in virtual-8086 mode.
Volume 4: Base IA-32 Instruction Reference 4:19

2.2.8 Floating-point Exceptions

The “Floating-point Exceptions” section lists additional exceptions that can occur when a floating-point instruction is executed in any mode. All of these exception conditions result in a floating-point error exception (#MF, vector number 16) being generated.
Tab le 2 -3 associates each one- or two-letter mnemonic with the corresponding
exception name. See “Floating-Point Exception Conditions” in Chapter 7 of the Intel Architecture Software Developer’s Manual, Volume 1, for a detailed description of these exceptions.
Table 2-3. Floating-point Exception Mnemonics and Names
Vector
No.
16
16 #Z Floating-point divide-by-zero FPU divide-by-zero
16 #D Floating-point denormalized operation Attempting to operate on a denormal
16 #O Floating-point numeric overflow FPU numeric overflow
16 #U Floating-point numeric underflow FPU numeric underflow
16 #P Floating-point inexact result (precision) Inexact result (precision)
Mnemonic Name Source
Floating-point invalid operation: #IS #IA
- Stack overflow or underflow
- Invalid arithmetic operation
- FPU stack overflow or underflow
- Invalid FPU arithmetic operation
number

2.3 IA-32 Base Instruction Reference

The remainder of this chapter provides detailed descriptions of each of the Intel architecture instructions.
4:20 Volume 4: Base IA-32 Instruction Reference
AAA—ASCII Adjust After Addition
Opcode Instruction Description
37 AAA ASCII adjust AL after addition
Description
Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.
If the addition produces a decimal carry, the AH register is incremented by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are cleared to 0.
Operation
IF ((AL AND FH) > 9) OR (AF = 1)
THEN
AL (AL + 6); AH AH + 1; AF 1; CF 1;
ELSE
AF 0;
CF 0; FI; AL AL AND FH;
Flags Affected
The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are cleared to 0. The OF, SF, ZF, and PF flags are undefined.
Additional Itanium System Environment Exceptions
Itanium Reg Faults NaT Register Consumption Abort.
Exceptions (All Operating Modes)
None.
Volume 4: Base IA-32 Instruction Reference 4:21
AAD—ASCII Adjust AX Before Division
Opcode Instruction Description
D5 0A AAD ASCII adjust AX before division
Description
Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AL register by an unpacked BCD value.
The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit number in registers AH and AL.
Operation
tempAL AL; tempAH AH; AL (tempAL + (tempAH imm8)) AND FFH; AH 0
The immediate value (imm8) is taken from the second byte of the instruction, which under normal assembly is 0AH (10 decimal). However, this immediate value can be changed to produce a different result.
Flags Affected
The SF, ZF, and PF flags are set according to the result; the OF, AF, and CF flags are undefined.
Additional Itanium System Environment Exceptions
Itanium Reg Faults NaT Register Consumption Abort.
Exceptions (All Operating Modes)
None.
4:22 Volume 4: Base IA-32 Instruction Reference
AAM—ASCII Adjust AX After Multiply
Opcode Instruction Description
D4 0A AAM ASCII adjust AX after multiply
Description
Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked BCD result.
Operation
tempAL AL; AH tempAL / imm8; AL tempAL MOD imm8;
The immediate value (imm8) is taken from the second byte of the instruction, which under normal assembly is 0AH (10 decimal). However, this immediate value can be changed to produce a different result.
Flags Affected
The SF, ZF, and PF flags are set according to the result. The OF, AF, and CF flags are undefined.
Additional Itanium System Environment Exceptions
Itanium Reg Faults NaT Register Consumption Abort.
Exceptions (All Operating Modes)
None.
Volume 4: Base IA-32 Instruction Reference 4:23
Loading...
+ 574 hidden pages