Intel ITANIUM ARCHITECTURE User Manual

Download

Intel® Itanium® Architecture Software Developer’s Manual

Volume 4: IA-32 Instruction Set Reference

Revision 2.3

May 2010

Document Number: 323208

THIS DOCUMENT IS PROVIDED “AS IS” WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.

Information in this document is provided in connection with Intel otherwise, to any intellectual property rights is granted by this document. Except as provided in Intel's Terms and Conditions of Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any express or implied warranty, relating to sale and/or use of Intel products including liability or warranties relating to fitness for a particular purpose, merchantability, or infringement of any patent, copyright or other intellectual property right. Intel products are not intended for use in medical, life

products. No license, express or implied, by estoppel or

saving, or life sustaining applications.

Intel may make changes to specifications and product descriptions at any time, without notice.

Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.

processors based on the Itanium architecture may contain design defects or errors known as errata which may cause the

Intel product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling1-800-548-4725, or by visiting Intel's website at http://www.intel.com.

Intel, Itanium, Pentium, VTune and MMX are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

*Other names and brands may be claimed as the property of others.

Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3 398

Contents

1 About this Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:1

1.1 Overview of Volume 1: Application Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:1

1.1.1 Part 1: Application Architecture Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:1

1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture . . . . . . . . 4:1

1.2 Overview of Volume 2: System Architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:2

1.2.1 Part 1: System Architecture Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:2

1.2.2 Part 2: System Programmer’s Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:3

1.2.3 Appendices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:4

1.3 Overview of Volume 3: Intel® Itanium® Instruction Set Reference . . . . . . . . . . . . . . 4:4

1.4 Overview of Volume 4: IA-32 Instruction Set Reference. . . . . . . . . . . . . . . . . . . . . . . 4:4

1.5 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:5

1.6 Related Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:5

1.7 Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:6

2 Base IA-32 Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:11

2.1 Additional Intel

2.2 Interpreting the IA-32 Instruction Reference Pages . . . . . . . . . . . . . . . . . . . . . . . . . 4:12

2.2.1 IA-32 Instruction Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:12

2.2.2 Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:15

2.2.3 Flags Affected. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18

2.2.4 FPU Flags Affected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18

2.2.5 Protected Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:19

2.2.6 Real-address Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:19

2.2.7 Virtual-8086 Mode Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:19

2.3 IA-32 Base Instruction Reference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:20

3IA-32 Intel

2.2.8 Floating-point Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:20

MMX™ Technology Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . 4:399

Itanium® Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:11

4 IA-32 SSE Instruction Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:463

4.1 IA-32 SSE Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:463

4.2 About the Intel

4.3 Single Instruction Multiple Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:464

4.4 New Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:464

4.5 SSE Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:465

4.6 Extended Instruction Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:465

4.6.1 Instruction Group Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:466

4.7 IEEE Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:474

4.7.1 Real Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:474

4.7.2 Operating on NaNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:480

4.8 Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481

4.8.1 Memory Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481

4.8.2 SSE Register Data Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481

4.9 Instruction Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:483

4.10 Instruction Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:483

4.11 Reserved Behavior and Software Compatibility . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:484

4.12 Notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:484

4.13 SIMD Integer Instruction Set Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:562

4.14 Cacheability Control Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:575

SSE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:463

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:583

Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3 399

Figures

2-2 Bit Offset for BIT[EAX,21]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18

2-3 Memory Bit Indexing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:18

2-4 Version Information in Registers EAX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:79

3-1 Operation of the MOVD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:401

3-2 Operation of the MOVQ Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:403

3-3 Operation of the PACKSSDW Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:405

3-4 Operation of the PACKUSWB Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:408

3-5 Operation of the PADDW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:410

3-6 Operation of the PADDSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:413

3-7 Operation of the PADDUSB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:416

3-8 Operation of the PAND Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:419

3-9 Operation of the PANDN Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:421

3-10 Operation of the PCMPEQW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:423

3-11 Operation of the PCMPGTW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:426

3-12 Operation of the PMADDWD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:429

3-13 Operation of the PMULHW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:431

3-14 Operation of the PMULLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:433

3-15 Operation of the POR Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:435

3-16 Operation of the PSLLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:437

3-17 Operation of the PSRAW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:440

3-18 Operation of the PSRLW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:443

3-19 Operation of the PSUBW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:446

3-20 Operation of the PSUBSW Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:449

3-21 Operation of the PSUBUSB Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:452

3-22 High-order Unpacking and Interleaving of Bytes with the PUNPCKHBW Instruction. . . . . . 4:455

3-23 Low-order Unpacking and Interleaving of Bytes with the PUNPCKLBW Instruction . . . . . . 4:458

3-24 Operation of the PXOR Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:461

4-1 Packed Single-FP Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:464

4-2 SSE Register Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:465

4-3 Packed Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:466

4-4 Scalar Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:466

4-5 Packed Shuffle Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:468

4-6 Unpack High Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:469

4-7 Unpack Low Operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:469

4-8 Binary Real Number System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:475

4-9 Binary Floating-point Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:476

4-10 Real Numbers and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:478

4-11 Four Packed FP Data in Memory (at address 1000H) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:481

Tables

2-1 Register Encodings Associated with the +rb, +rw, and +rd Nomenclature . . . . . . . . . .4:13

2-2 Exception Mnemonics, Names, and Vector Numbers . . . . . . . . . . . . . . . . . . . . .4:19

2-3 Floating-point Exception Mnemonics and Names . . . . . . . . . . . . . . . . . . . . . . .4:20

2-4 Information Returned by CPUID Instruction . . . . . . . . . . . . . . . . . . . . . . . . . .4:78

2-5 Feature Flags Returned in EDX Register . . . . . . . . . . . . . . . . . . . . . . . . . . .4:80

400 Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3

2-6 FPATAN Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:149

2-7 FPREM Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:151

2-8 FPREM1 Zeros and NaNs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:154

2-9 FSUB Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:183

2-10 FSUBR Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:186

2-11 FYL2X Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4:199

2-12 FYL2XP1 Zeros and NaNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:201

2-13 IDIV Operands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:204

2-14 INT Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:218

2-15 LAR Descriptor Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:253

2-16 LEA Address and Operand Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:258

2-17 Repeat Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:338

4-1 Real Number Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4:476

4-2 Denormalization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4:478

4-3 Results of Operations with NAN Operands . . . . . . . . . . . . . . . . . . . . . . . . . 4:481

4-4 Precision and Range of SSE Datatype . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:482

4-5 Real Number and NaN Encodings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:482

4-6 SSE Instruction Behavior with Prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:483

4-7 SIMD Integer Instructions – Behavior with Prefixes . . . . . . . . . . . . . . . . . . . . . 4:483

4-8 Cacheability Control Instruction Behavior with Prefixes . . . . . . . . . . . . . . . . . . . 4:483

4-9 Key to SSE Naming Convention. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4:485

Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3 401

402 Intel® Itanium® Architecture Software Developer’s Manual, Rev. 2.3

About this Manual 1

The Intel® Itanium® architecture is a unique combination of innovative features such as explicit parallelism, predication, speculation and more. The architecture is designed to be highly scalable to fill the ever increasing performance requirements of various server and workstation market segments. The Itanium architecture features a revolutionary 64-bit instruction set architecture (ISA) which applies a new processor architecture technology called EPIC, or Explicitly Parallel Instruction Computing. A key feature of the Itanium architecture is IA-32 instruction set compatibility.

The Intel comprehensive description of the programming environment, resources, and instruction set visible to both the application and system programmer. In addition, it also describes how programmers can take advantage of the features of the Itanium architecture to help them optimize code.

Itanium® Architecture Software Developer’s Manual provides a

1.1 Overview of Volume 1: Application Architecture

This volume defines the Itanium application architecture, including application level resources, programming environment, and the IA-32 application interface. This volume also describes optimization techniques used to generate high performance software.

1.1.1 Part 1: Application Architecture Guide

Chapter 1, “About this Manual” provides an overview of all volumes in the Intel®

Itanium

Chapter 2, “Introduction to the Intel

the architecture.

Chapter 3, “Execution Environment” describes the Itanium register set used by

applications and the memory organization models.

Architecture Software Developer’s Manual.

Itanium® Architecture” provides an overview of

Chapter 4, “Application Programming Model” gives an overview of the behavior of

Itanium application instructions (grouped into related functions).

Chapter 5, “Floating-point Programming Model” describes the Itanium floating-point

architecture (including integer multiply).

Chapter 6, “IA-32 Application Execution Model in an Intel Environment” describes the operation of IA-32 instructions within the Itanium System

Environment from the perspective of an application programmer.

Itanium® System

1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture

Chapter 1, “About the Optimization Guide” gives an overview of the optimization guide.

Volume 4: About this Manual 4:1

Chapter 2, “Introduction to Programming for the Intel® Itanium® Architecture”

provides an overview of the application programming environment for the Itanium architecture.

Chapter 3, “Memory Reference” discusses features and optimizations related to control

and data speculation.

Chapter 4, “Predication, Control Flow, and Instruction Stream” describes optimization

features related to predication, control flow, and branch hints.

Chapter 5, “Software Pipelining and Loop Support” provides a detailed discussion on

optimizing loops through use of software pipelining.

Chapter 6, “Floating-point Applications” discusses current performance limitations in

floating-point applications and features that address these limitations.

1.2 Overview of Volume 2: System Architecture

This volume defines the Itanium system architecture, including system level resources and programming state, interrupt model, and processor firmware interface. This volume also provides a useful system programmer's guide for writing high performance system software.

1.2.1 Part 1: System Architecture Guide

Chapter 1, “About this Manual” provides an overview of all volumes in the Intel®

Itanium

Chapter 2, “Intel

designed to support execution of Itanium architecture-based operating systems running IA-32 or Itanium architecture-based applications.

Chapter 3, “System State and Programming Model” describes the Itanium architectural

state which is visible only to an operating system.

Chapter 4, “Addressing and Protection” defines the resources available to the operating

system for virtual to physical address translation, virtual aliasing, physical addressing, and memory ordering.

Chapter 5, “Interruptions” describes all interruptions that can be generated by a

processor based on the Itanium architecture.

Chapter 6, “Register Stack Engine” describes the architectural mechanism which

automatically saves and restores the stacked subset (GR32 – GR 127) of the general register file.

Chapter 7, “Debugging and Performance Monitoring” is an overview of the performance

monitoring and debugging resources that are available in the Itanium architecture.

Chapter 8, “Interruption Vector Descriptions” lists all interruption vectors.

Architecture Software Developer’s Manual.

Itanium® System Environment” introduces the environment

4:2 Volume 4: About this Manual

Chapter 9, “IA-32 Interruption Vector Descriptions” lists IA-32 exceptions, interrupts

and intercepts that can occur during IA-32 instruction set execution in the Itanium System Environment.

Chapter 10, “Itanium

Architecture-based Operating System Interaction Model with IA-32 Applications” defines the operation of IA-32 instructions within the Itanium

System Environment from the perspective of an Itanium architecture-based operating system.

Chapter 11, “Processor Abstraction Layer” describes the firmware layer which abstracts

processor implementation-dependent features.

1.2.2 Part 2: System Programmer’s Guide

Chapter 1, “About the System Programmer’s Guide” gives an introduction to the second

section of the system architecture guide.

Chapter 2, “MP Coherence and Synchronization” describes multiprocessing

synchronization primitives and the Itanium memory ordering model.

Chapter 3, “Interruptions and Serialization” describes how the processor serializes

execution around interruptions and what state is preserved and made available to low-level system code when interruptions are taken.

Chapter 4, “Context Management” describes how operating systems need to preserve

Itanium register contents and state. This chapter also describes system architecture mechanisms that allow an operating system to reduce the number of registers that need to be spilled/filled on interruptions, system calls, and context switches.

Chapter 5, “Memory Management” introduces various memory management strategies.

Chapter 6, “Runtime Support for Control and Data Speculation” describes the operating

system support that is required for control and data speculation.

Chapter 7, “Instruction Emulation and Other Fault Handlers” describes a variety of

instruction emulation handlers that Itanium architecture-based operating systems are expected to support.

Chapter 8, “Floating-point System Software” discusses how processors based on the

Itanium architecture handle floating-point numeric exceptions and how the software stack provides complete IEEE-754 compliance.

Chapter 9, “IA-32 Application Support” describes the support an Itanium

architecture-based operating system needs to provide to host IA-32 applications.

Chapter 10, “External Interrupt Architecture” describes the external interrupt

architecture with a focus on how external asynchronous interrupt handling can be controlled by software.

Chapter 11, “I/O Architecture” describes the I/O architecture with a focus on platform

issues and support for the existing IA-32 I/O port space.

Volume 4: About this Manual 4:3

Chapter 12, “Performance Monitoring Support” describes the performance monitor

architecture with a focus on what kind of support is needed from Itanium architecture-based operating systems.

Chapter 13, “Firmware Overview” introduces the firmware model, and how various

firmware layers (PAL, SAL, UEFI, ACPI) work together to enable processor and system initialization, and operating system boot.

1.2.3 Appendices

Appendix A, “Code Examples” provides OS boot flow sample code.

1.3 Overview of Volume 3: Intel® Itanium® Instruction Set Reference

This volume is a comprehensive reference to the Itanium instruction set, including instruction format/encoding.

Chapter 1, “About this Manual” provides an overview of all volumes in the Intel

Itanium

Chapter 2, “Instruction Reference” provides a detailed description of all Itanium

instructions, organized in alphabetical order by assembly language mnemonic.

Chapter 3, “Pseudo-Code Functions” provides a table of pseudo-code functions which

are used to define the behavior of the Itanium instructions.

Chapter 4, “Instruction Formats” describes the encoding and instruction format

instructions.

Chapter 5, “Resource and Dependency Semantics” summarizes the dependency rules

that are applicable when generating code for processors based on the Itanium architecture.

Architecture Software Developer’s Manual.

1.4 Overview of Volume 4: IA-32 Instruction Set Reference

This volume is a comprehensive reference to the IA-32 instruction set, including instruction format/encoding.

Chapter 1, “About this Manual” provides an overview of all volumes in the Intel

Itanium

Architecture Software Developer’s Manual.

Chapter 2, “Base IA-32 Instruction Reference” provides a detailed description of all

base IA-32 instructions, organized in alphabetical order by assembly language mnemonic.

4:4 Volume 4: About this Manual

Chapter 3, “IA-32 Intel® MMX™ Technology Instruction Reference” provides a detailed

description of all IA-32 Intel performance of multimedia intensive applications. Organized in alphabetical order by assembly language mnemonic.

Chapter 4, “IA-32 SSE Instruction Reference” provides a detailed description of all

IA-32 SSE instructions designed to increase performance of multimedia intensive applications, and is organized in alphabetical order by assembly language mnemonic.

1.5 Terminology

The following definitions are for terms related to the Itanium architecture and will be used throughout this document:

Instruction Set Architecture (ISA) – Defines application and system level resources. These resources include instructions and registers.

Itanium Architecture – The new ISA with 64-bit instruction capabilities, new performance- enhancing features, and support for the IA-32 instruction set.

IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the

Intel

Itanium System Environment – The operating system environment that supports the execution of both IA-32 and Itanium architecture-based code.

64 and IA-32 Architectures Software Developer’s Manual.

MMX™ technology instructions designed to increase

IA-32 System Environment – The operating system privileged environment and resources as defined by the Intel Architecture Software Developer’s Manual. Resources include virtual paging, control registers, debugging, performance monitoring, machine checks, and the set of privileged instructions.

Itanium

and System Abstraction Layer (SAL).

Processor Abstraction Layer (PAL) – The firmware layer which abstracts processor features that are implementation dependent.

System Abstraction Layer (SAL) – The firmware layer which abstracts system features that are implementation dependent.

Architecture-based Firmware – The Processor Abstraction Layer (PAL)

1.6 Related Documents

The following documents can be downloaded at the Intel’s Developer Site at http://developer.intel.com:

• Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual for Software Development and Optimization– Document number 308065 provides model-specific information about the dual-core Itanium processors.

• Intel

Itanium® 2 Processor Reference Manual for Software Development

and Optimization – This document (Document number 251110) describes

Volume 4: About this Manual 4:5

model-specific architectural features incorporated into the Intel® Itanium® 2 processor, the second processor based on the Itanium architecture.

• Intel

Itanium® Processor Reference Manual for Software Development –

This document (Document number 245320) describes model-specific architectural features incorporated into the Intel

Itanium® processor, the first processor based

on the Itanium architecture.

• Intel

64 and IA-32 Architectures Software Developer’s Manual – This set

of manuals describes the Intel 32-bit architecture. They are available from the Intel Literature Department by calling 1-800-548-4725 and requesting Document Numbers 243190, 243191and 243192.

• Intel

Itanium® Software Conventions and Runtime Architecture Guide –

This document (Document number 245358) defines general information necessary to compile, link, and execute a program on an Itanium architecture-based operating system.

• Intel

Itanium® Processor Family System Abstraction Layer Specification –

This document (Document number 245359) specifies requirements to develop platform firmware for Itanium architecture-based systems.

The following document can be downloaded at the Unified EFI Forum website at http://www.uefi.org:

• Unified Extensible Firmware Interface Specification – This document defines a new model for the interface between operating systems and platform firmware.

1.7 Revision History

Date of

Revision

March 2010 2.3 Added information about illegal virtualization optimization combinations and

Revision

Number

IIPA requirements. Added Resource Utilization Counter and PAL_VP_INFO. PAL_VP_INIT and VPD.vpr changes. New PAL_VPS_RESUME_HANDLER parameter to indicate RSE Current

Frame Load Enable setting at the target instruction. PAL_VP_INIT_ENV implementation-specific configuration option. Minimum Virtual address increased to 54 bits. New PAL_MC_ERROR_INFO health indicator. New PAL_MC_ERROR_INJECT implementation-specific bit fields. MOV-to_SR.L reserved field checking. Added virtual machine disable. Added variable frequency mode additions to ACPI P-state description. Removed pal_proc_vector argument from PAL_VP_SAVE and

PAL_VP_RESTORE. Added PAL_PROC_SET_FEATURES data speculation disable. Added Interruption Instruction Bundle registers. Min-state save area size change. PAL_MC_DYNAMIC_STATE changes. PAL_PROC_SET_FEATURES data poisoning promotion changes. ACPI P-state clarifications. Synchronization requirements for virtualization opcode optimization. New priority hint and multi-threading hint recommendations.

Description

4:6 Volume 4: About this Manual

Date of

Revision

August 2005 2.2 Allow register fields in CR.LID register to be read-only and CR.LID checking

Revision

Number

Description

on interruption messages by processors optional. See Vol 2, Part I, Ch 5 “Interruptions” and Section 11.2.2 PALE_RESET Exit State for details.

Relaxed reserved and ignored fields checkings in IA-32 application registers in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.

Introduced visibility constraints between stores and local purges to ensure TLB consistency for UP VHPT update and local purge scenarios. See Vol 2, Part I, Ch 4 and description of

Architecture extensions for processor Power/Performance states (P-states). See Vol 2 PAL Chapter for details.

Introduced Unimplemented Instruction Address fault. Relaxed ordering constraints for VHPT walks. See Vol 2, Part I, Ch 4 and 5 for

details. Architecture extensions for processor virtualization. All instructions which must be last in an instruction group results in undefined

behavior when this rule is violated. Added architectural sequence that guarantees increasing ITC and PMD

values on successive reads. Addition of PAL_BRAND_INFO, PAL_GET_HW_POLICY,

PAL_MC_ERROR_INJECT, PAL_MEMORY_BUFFER, PAL_SET_HW_POLICY and PAL_SHUTDOWN procedures.

Allows IPI-redirection feature to be optional. Undefined behavior for 1-byte accesses to the non-architected regions in the

IPI block. Modified insertion behavior for TR overlaps. See Vol 2, Part I, Ch 4 for details. “Bus parking” feature is now optional for PAL_BUS_GET_FEATURES. Introduced low-power synchronization primitive using FR32-127 is now preserved in PAL calling convention. New return value from PAL_VM_SUMMARY procedure to indicate the

number of multiple concurrent outstanding TLB purges. Performance Monitor Data (PMD) registers are no longer sign-extended. New memory attribute transition sequence for memory on-line delete. See Vol

2, Part I, Ch 4 for details. Added 'shared error' (se) bit to the Processor State Parameter (PSP) in

PAL_MC_ERROR_INFO procedure. Clarified PMU interrupts as edge-triggered. Modified ‘proc_number’ parameter in PAL_LOGICAL_TO_PHYSICAL

procedure. Modified pal_copy_info alignment requirements. New bit in PAL_PROC_GET_FEATURES for variable P-state performance. Clarified descriptions for check_target_register and

check_target_register_sof. Various fixes in dependency tables in Vol 3 Ch 5. Clarified effect of sending IPIs to non-existent processor in Vol 2, Part I, Ch 5. Clarified instruction serialization requirements for interruptions in Vol 2, Part II,

Ch 3. Updated performance monitor context switch routine in Vol 2, Part I, Ch 7.

ptc.l instruction in Vol 3 for details.

hint instruction.

Volume 4: About this Manual 4:7

Date of

Revision

Number

Description

August 2002 2.1 Added Predicate Behavior of alloc Instruction Clarification (Section 4.1.2,

Part I, Volume 1; Section 2.2, Part I, Volume 3). Added New fc.i Instruction (Section 4.4.6.1, and 4.4.6.2, Part I, Volume 1;

Section 4.3.3, 4.4.1, 4.4.5, 4.4.6, 4.4.7, 5.5.2, and 7.1.2, Part I, Volume 2; Section 2.5, 2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Volume 2; Section 2.2, 3,

4.1, 4.4.6.5, and 4.4.10.10, Part I, Volume 3). Added Interval Time Counter (ITC) Fault Clarification (Section 3.3.2, Part I,

Volume 2). Added Interruption Control Registers Clarification (Section 3.3.5, Part I,

Volume 2). Added Spontaneous NaT Generation on Speculative Load (ld.s)

(Section 5.5.5 and 11.9, Part I, Volume 2; Section 2.2 and 3, Part I, Volume 3). Added Performance Counter Standardization (Sections 7.2.3 and 11.6, Part I,

Volume 2). Added Freeze Bit Functionality in Context Switching and Interrupt Generation

Clarification (Sections 7.2.1, 7.2.2, 7.2.4.1, and 7.2.4.2, Part I, Volume 2) Added IA_32_Exception (Debug) IIPA Description Change (Section 9.2, Part

I, Volume 2). Added capability for Allowing Multiple PAL_A_SPEC and PAL_B Entries in the

Firmware Interface Table (Section 11.1.6, Part I, Volume 2). Added BR1 to Min-state Save Area (Sections 11.3.2.3 and 11.3.3, Part I,

Volume 2). Added Fault Handling Semantics for lfetch.fault Instruction (Section 2.2,

Part I, Volume 3).

December 2001 2.0 Volume 1:

Faults in ld.c that hits ALAT clarification (Section 4.4.5.3.1). IA-32 related changes (Section 6.2.5.4, Section 6.2.3, Section 6.2.4, Section

6.2.5.3). Load instructions change (Section 4.4.1).

4:8 Volume 4: About this Manual

Date of

Revision

Number

Volume 2: Class pr-writers-int clarification (Table A-5). PAL_MC_DRAIN clarification (Section 4.4.6.1). VHPT walk and forward progress change (Section 4.1.1.2). IA-32 IBR/DBR match clarification (Section 7.1.1). ISR figure changes (pp. 8-5, 8-26, 8-33 and 8-36). PAL_CACHE_FLUSH return argument change – added new status return

argument (Section 11.8.3). PAL self-test Control and PAL_A procedure requirement change – added new

arguments, figures, requirements (Section 11.2). PAL_CACHE_FLUSH clarifications (Chapter 11). Non-speculative reference clarification (Section 4.4.6). RID and Preferred Page Size usage clarification (Section 4.1). VHPT read atomicity clarification (Section 4.1). IIP and WC flush clarification (Section 4.4.5). Revised RSE and PMC typographical errors (Section 6.4). Revised DV table (Section A.4). Memory attribute transitions – added new requirements (Section 4.4). MCA for WC/UC aliasing change (Section 4.4.1). Bus lock deprecation – changed behavior of DCR ‘lc’ bit (Section 3.3.4.1,

Section 10.6.8, Section 11.8.3). PAL_PROC_GET/SET_FEATURES changes – extend calls to allow

implementation-specific feature control (Section 11.8.3). Split PAL_A architecture changes (Section 11.1.6). Simple barrier synchronization clarification (Section 13.4.2). Limited speculation clarification – added hardware-generated speculative

references (Section 4.4.6). PAL memory accesses and restrictions clarification (Section 11.9). PSP validity on INITs from PAL_MC_ERROR_INFO clarification (Section

11.8.3). Speculation attributes clarification (Section 4.4.6). PAL_A FIT entry, PAL_VM_TR_READ, PSP, PAL_VERSION clarifications

(Sections 11.8.3 and 11.3.2.1). TLB searching clarifications (Section 4.1). IA-32 related changes (Section 10.3, Section 10.3.2, Section 10.3.2, Section

10.3.3.1, Section 10.10.1). IPSR.ri and ISR.ei changes (Table 3-2, Section 3.3.5.1, Section 3.3.5.2,

Section 5.5, Section 8.3, and Section 2.2).

Volume 3: IA-32 CPUID clarification (p. 5-71). Revised figures for extract, deposit, and alloc instructions (Section 2.2). RCPPS, RCPSS, RSQRTPS, and RSQRTSS clarification (Section 7.12). IA-32 related changes (Section 5.3). tak, tpa change (Section 2.2).

July 2000 1.1 Volume 1:

Processor Serial Number feature removed (Chapter 3). Clarification on exceptions to instruction dependency (Section 3.4.3).

Description

Volume 4: About this Manual 4:9

Date of

Revision

January 2000 1.0 Initial release of document.

Revision

Number

Volume 2: Clarifications regarding “reserved” fields in ITIR (Chapter 3). Instruction and Data translation must be enabled for executing IA-32

instructions (Chapters 3,4 and 10). FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI (Chapters 3 and 4). Clarification regarding ordering data dependency. Out-of-order IPI delivery is now allowed (Chapters 4 and 5). Content of EFLAG field changed in IIM (p. 9-24). PAL_CHECK and PAL_INIT calls – exit state changes (Chapter 11). PAL_CHECK processor state parameter changes (Chapter 11). PAL_BUS_GET/SET_FEATURES calls – added two new bits (Chapter 11). PAL_MC_ERROR_INFO call – Changes made to enhance and simplify the call to provide more information regarding machine check (Chapter 11). PAL_ENTER_IA_32_Env call changes – entry parameter represents the entry order; SAL needs to initialize all the IA-32 registers properly before making

this call (Chapter 11). PAL_CACHE_FLUSH – added a new cache_type argument (Chapter 11). PAL_SHUTDOWN – removed from list of PAL calls (Chapter 11). Clarified memory ordering changes (Chapter 13). Clarification in dependence violation table (Appendix A).

Volume 3: fmix instruction page figures corrected (Chapter 2). Clarification of “reserved” fields in ITIR (Chapters 2 and 3). Modified conditions for alloc/loadrs/flushrs instruction placement in bundle/ instruction group (Chapters 2 and 4). IA-32 JMPE instruction page typo fix (p. 5-238). Processor Serial Number feature removed (Chapter 5).

Description

4:10 Volume 4: About this Manual

Base IA-32 Instruction Reference 2

This section lists all IA-32 instructions and their behavior in the Itanium System Environment and IA-32 System Environments on an processor based on the Itanium architecture. Unless noted otherwise all IA-32 and MMX technology and SSE instructions operate as defined in the Intel Developer’s Manual.

This volume describes the complete IA-32 Architecture instruction set, including the integer, floating-point, MMX technology and SSE technology, and system instructions. The instruction descriptions are arranged in alphabetical order. For each instruction, the forms are given for each operand combination, including the opcode, operands required, and a description. Also given for each instruction are a description of the instruction and its operands, an operational description, a description of the effect of the instructions on flags in the EFLAGS register, and a summary of the exceptions that can be generated.

For all IA-32 the following relationships hold:

• Writes – Writes of any IA-32 general purpose, floating-point or SSE, MMX

technology registers by IA-32 instructions are reflected in the Itanium registers defined to hold that IA-32 state when IA-32 instruction set completes execution.

• Reads – Reads of any IA-32 general purpose, floating-point or SSE, MMX

technology registers by IA-32 instructions see the state of the Itanium registers defined to hold the IA-32 state after entering the IA-32 instruction set.

• State mappings – IA-32 numeric instructions are controlled by and reflect their

status in FCW, FSW, FTW, FCS, FIP, FOP, FDS and FEA. On exit from the IA-32 instruction set, Itanium numeric status and control resources defined to hold IA-32 state reflect the results of all IA-32 prior numeric instructions in FCR, FSR, FIR and FDR. Itanium numeric status and control resources defined to hold IA-32 state are honored by IA-32 numeric instructions when entering the IA-32 instruction set.

64 and IA-32 Architectures Software

2.1 Additional Intel® Itanium® Faults

The following fault behavior is defined for all IA-32 instructions in the Itanium System Environment:

• IA-32 Faults – All IA-32 faults are performed as defined in the Intel

IA-32 Architectures Software Developer’s Manual, unless otherwise noted. IA-32 faults are delivered on the IA_32_Exception interruption vector.

• IA-32 GPFault – Null segments are signified by the segment descriptor register’s

P-bit being set to zero. IA-32 memory references through DSD, ESD, FSD, and GSD with the P-bit set to zero result in an IA-32 GPFault.

• Itanium Low FP Reg Fault – If PSR.dfl is 1, execution of any IA-32 MMX

technology, SSE or floating-point instructions results in a Disabled FP Register fault (regardless of whether FR2-31 is referenced).

• Itanium High FP Reg Fault – If PSR.dfh is 1, execution of the first target IA-32

instruction following an br.ia or rfi results in a Disabled FP Register fault (regardless of whether FR32-127 is referenced).

Volume 4: Base IA-32 Instruction Reference 4:11

64 and

• Itanium Instruction Mem Faults – The following additional Itanium memory faults can be generated on each virtual page referenced when fetching IA-32 or MMX technology or SSE instructions for execution:

• Alternative instruction TLB fault

• VHPT instruction fault

• Instruction TLB fault

• Instruction Page Not Present fault

• Instruction NaT Page Consumption Abort

• Instruction Key Miss fault

• Instruction Key Permission fault

• Instruction Access Rights fault

• Instruction Access Bit fault

• Itanium Data Mem Faults – The following additional Itanium memory faults can be generated on each virtual page touched when reading or writing memory operands from the IA-32 instruction set including MMX technology and SSE instructions:

•Nested TLB fault

• Alternative data TLB fault

•VHPT data fault

• Data TLB fault

• Data Page Not Present fault

• Data NaT Page Consumption Abort

• Data Key Miss fault

• Data Key Permission fault

• Data Access Rights fault

• Data Dirty bit fault

• Data Access bit fault

2.2 Interpreting the IA-32 Instruction Reference Pages

This section describes the information contained in the various sections of the instruction reference pages that make up the majority of this chapter. It also explains the notational conventions and abbreviations used in these sections.

2.2.1 IA-32 Instruction Format

The following is an example of the format used for each Intel architecture instruction description in this chapter.

2.2.1.0.0.1 CMC—Complement Carry Flag

Opcode Instruction Description

F5 CMC Complement carry flag

4:12 Volume 4: Base IA-32 Instruction Reference

2.2.1.1 Opcode Column

The “Opcode” column gives the complete object code produced for each form of the instruction. When possible, the codes are given as hexadecimal bytes, in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows:

• /digit – A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode.

• /r – Indicates that the ModR/M byte of the instruction contains both a register operand and an r/m operand.

• cb, cw, cd, cp – A 1-byte (cb), 2-byte (cw), 4-byte (cd), or 6-byte (cp) value following the opcode that is used to specify a code offset and possibly a new value for the code segment register.

• ib, iw, id – A 1-byte (ib), 2-byte (iw), or 4-byte (id) immediate operand to the instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed value. All words and doublewords are given with the low-order byte first.

• +rb, +rw, +rd – A register code, from 0 through 7, added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte. The register codes are given in Tab l e 2 - 1.

• +i – A number used in floating-point instructions when one of the operands is ST(i) from the FPU register stack. The number i (which can range from 0 to 7) is added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte.

Table 2-1. Register Encodings Associated with the +rb, +rw, and +rd

Nomenclature

rb rw rd

AL = 0 AX = 0 EAX = 0

CL = 1 CX = 1 ECX = 1

DL = 2 DX = 2 EDX = 2

BL = 3 BX = 3 EBX = 3

rb rw rd

AH = 4 SP = 4 ESP = 4

CH = 5 BP = 5 EBP = 5

DH = 6 SI = 6 ESI = 6

BH = 7 DI = 7 EDI = 7

2.2.1.2 Instruction Column

The “Instruction” column gives the syntax of the instruction statement as it would appear in an ASM386 program. The following is a list of the symbols used to represent operands in the instruction statements:

• rel8 – A relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the end of the instruction.

• rel16 and rel32 – A relative address within the same code segment as the instruction assembled. The rel16 symbol applies to instructions with an operand-size attribute of 16 bits; the rel32 symbol applies to instructions with an operand-size attribute of 32 bits.

Volume 4: Base IA-32 Instruction Reference 4:13

• ptr16:16 and ptr16:32 – A far pointer, typically in a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16-bit selector or value destined for the code segment register. The value to the right corresponds to the offset within the destination segment. The ptr16:16 symbol is used when the instruction's operand-size attribute is 16 bits; the ptr16:32 symbol is used when the operand-size attribute is 32 bits.

• r8 – One of the byte general-purpose registers AL, CL, DL, BL, AH, CH, DH, or BH.

• r16 – One of the word general-purpose registers AX, CX, DX, BX, SP, BP, SI, or DI.

• r32 – One of the doubleword general-purpose registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, or EDI.

• imm8 – An immediate byte value. The imm8 symbol is a signed number between – 128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign-extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value.

• imm16 – An immediate word value used for instructions whose operand-size attribute is 16 bits. This is a number between –32,768 and +32,767 inclusive.

• imm32 – An immediate doubleword value used for instructions whose operand-size attribute is 32 bits. It allows the use of a number between +2,147,483,647 and -2,147,483,648 inclusive.

• r/m8 – A byte operand that is either the contents of a byte general-purpose register (AL, BL, CL, DL, AH, BH, CH, and DH), or a byte from memory.

• r/m16 – A word general-purpose register or memory operand used for instructions whose operand-size attribute is 16 bits. The word general-purpose registers are: AX, BX, CX, DX, SP, BP, SI, and DI. The contents of memory are found at the address provided by the effective address computation.

• r/m32 – A doubleword general-purpose register or memory operand used for instructions whose operand-size attribute is 32 bits. The doubleword general-purpose registers are: EAX, EBX, ECX, EDX, ESP, EBP, ESI, and EDI. The contents of memory are found at the address provided by the effective address computation.

• m – A 16- or 32-bit operand in memory.

• m8 – A byte operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions and the XLAT instruction.

• m16 – A word operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.

• m32 – A doubleword operand in memory, usually expressed as a variable or array name, but pointed to by the DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions.

• m64 – A memory quadword operand in memory. This nomenclature is used only with the CMPXCHG8B instruction.

• m16:16, m16:32 – A memory operand containing a far pointer composed of two numbers. The number to the left of the colon corresponds to the pointer's segment selector. The number to the right corresponds to its offset.

• m16&32, m16&16, m32&32 – A memory operand consisting of data item pairs whose sizes are indicated on the left and the right side of the ampersand. All

4:14 Volume 4: Base IA-32 Instruction Reference

memory addressing modes are allowed. The m16&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and IDTR registers.

• moffs8, moffs16, moffs32 – A simple memory variable (memory offset) of type

byte, word, or doubleword used by some variants of the MOV instruction. The actual address is given by a simple offset relative to the segment base. No ModR/M byte is used in the instruction. The number shown with moffs indicates its size, which is determined by the address-size attribute of the instruction.

• Sreg – A segment register. The segment register bit assignments are ES=0, CS=1,

SS=2, DS=3, FS=4, and GS=5.

• m32real, m64real, m80real – A single-, double-, and extended-real

(respectively) floating-point operand in memory.

• m16int, m32int, m64int – A word-, short-, and long-integer (respectively)

floating-point operand in memory.

• ST or ST(0) – The top element of the FPU register stack.

• ST(i) – The i

• mm – An MMX technology register. The 64-bit MMX technology registers are: MM0

through MM7.

• mm/m32 – The low order 32 bits of an MMX technology register or a 32-bit

memory operand. The 64-bit MMX technology registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.

• mm/m64 – An MMX technology register or a 64-bit memory operand. The 64-bit

MMX technology registers are: MM0 through MM7. The contents of memory are found at the address provided by the effective address computation.

element from the top of the FPU register stack. (i = 0 through 7).

2.2.1.3 Description Column

The “Description” column following the “Instruction” column briefly explains the various forms of the instruction. The following “Description” and “Operation” sections contain more details of the instruction's operation.

2.2.1.4 Description

The “Description” section describes the purpose of the instructions and the required operands. It also discusses the effect of the instruction on flags.

2.2.2 Operation

The “Operation” section contains an algorithmic description (written in pseudo-code) of the instruction. The pseudo-code uses a notation similar to the Algol or Pascal language. The algorithms are composed of the following elements:

• Comments are enclosed within the symbol pairs “(*” and “*)”.

• Compound statements are enclosed in keywords, such as IF, THEN, ELSE, and FI for

an if statement, DO and OD for a do statement, or CASE... OF and ESAC for a case statement.

Volume 4: Base IA-32 Instruction Reference 4:15

• A register name implies the contents of the register. A register name enclosed in brackets implies the contents of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the location whose ES segment relative address is in register DI. [SI] indicates the contents of the address contained in register SI relative to SI’s default segment (DS) or overridden segment.

• Parentheses around the “E” in a general-purpose register name, such as (E)SI, indicates that an offset is read from the SI register if the current address-size attribute is 16 or is read from the ESI register if the address-size attribute is 32.

• Brackets are also used for memory operands, where they mean that the contents of the memory location is a segment-relative offset. For example, [SRC] indicates that the contents of the source operand is a segment-relative offset.

•A  B; indicates that the value of B is assigned to A.

• The symbols =, meaning equal, not equal, greater or equal, less or equal, respectively. A relational expression such as A = B is TRUE if the value of A is equal to B; otherwise it is FALSE.

• The expression “<< COUNT” and “>> COUNT” indicates that the destination operand should be shifted left or right, respectively, by the number of bits indicated by the count operand.

The following identifiers are used in the algorithmic descriptions:

• OperandSize and AddressSize – The OperandSize identifier represents the operand-size attribute of the instruction, which is either 16 or 32 bits. The AddressSize identifier represents the address-size attribute, which is either 16 or 32 bits. For example, the following pseudo-code indicates that the operand-size attribute depends on the form of the CMPS instruction used.

, , and  are relational operators used to compare two values,

IF instruction = CMPSW

THEN OperandSize  16; ELSE

IF instruction = CMPSD

THEN OperandSize  32;

FI;

See “Operand-Size and Address-Size Attributes” in Chapter 3 of the Intel Architecture Software Developer’s Manual, Volume 1, for general guidelines on how

these attributes are determined.

• StackAddrSize – Represents the stack address-size attribute associated with the instruction, which has a value of 16 or 32 bits (see “Address-Size Attribute for Stack” in Chapter 4 of the Intel Architecture Software Developer’s Manual, Volume

1).

• SRC – Represents the source operand.

• DEST – Represents the destination operand.

The following functions are used in the algorithmic descriptions:

• ZeroExtend(value) – Returns a value zero-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, zero extending a byte value of -10 converts the byte from F6H to a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the operand-size attribute are the same size, ZeroExtend returns the value unaltered.

4:16 Volume 4: Base IA-32 Instruction Reference

• SignExtend(value) – Returns a value sign-extended to the operand-size attribute

of the instruction. For example, if the operand-size attribute is 32, sign extending a byte containing the value -10 converts the byte from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand-size attribute are the same size, SignExtend returns the value unaltered.

• SaturateSignedWordToSignedByte – Converts a signed 16-bit value to a signed

8-bit value. If the signed 16-bit value is less than -128, it is represented by the saturated value -128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).

• SaturateSignedDwordToSignedWord – Converts a signed 32-bit value to a

signed 16-bit value. If the signed 32-bit value is less than -32768, it is represented by the saturated value

-32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).

• SaturateSignedWordToUnsignedByte – Converts a signed 16-bit value to an

unsigned 8-bit value. If the signed 16-bit value is less than zero, it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).

• SaturateToSignedByte – Represents the result of an operation as a signed 8-bit

value. If the result is less than -128, it is represented by the saturated value -128 (80H); if it is greater than 127, it is represented by the saturated value 127 (7FH).

• SaturateToSignedWord – Represents the result of an operation as a signed

16-bit value. If the result is less than -32768, it is represented by the saturated value -32768 (8000H); if it is greater than 32767, it is represented by the saturated value 32767 (7FFFH).

• SaturateToUnsignedByte – Represents the result of an operation as a signed

8-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 255, it is represented by the saturated value 255 (FFH).

• SaturateToUnsignedWord – Represents the result of an operation as a signed

16-bit value. If the result is less than zero it is represented by the saturated value zero (00H); if it is greater than 65535, it is represented by the saturated value 65535 (FFFFH).

• LowOrderWord(DEST * SRC) – Multiplies a word operand by a word operand and

stores the least significant word of the doubleword result in the destination operand.

• HighOrderWord(DEST * SRC) – Multiplies a word operand by a word operand

and stores the most significant word of the doubleword result in the destination operand.

• Push(value) – Pushes a value onto the stack. The number of bytes pushed is

determined by the operand-size attribute of the instruction.

• Pop() – Removes the value from the top of the stack and returns it. The statement

EAX  Pop(); assigns to EAX the 32-bit value from the top of the stack. Pop will return either a word or a doubleword depending on the operand-size attribute.

• PopRegisterStack – Marks the FPU ST(0) register as empty and increments the

FPU register stack pointer (TOP) by 1.

• Switch-Tasks – Performs a task switch.

• Bit(BitBase, BitOffset) – Returns the value of a bit within a bit string, which is a

sequence of bits in memory or a register. Bits are numbered from low-order to

Volume 4: Base IA-32 Instruction Reference 4:17

high-order within registers and within memory bytes. If the base operand is a

02131

BitOffset = 21

07775 0 0

0777500

BitBase +1 BitBase BitBase -1

BitOffset = +13

BitBase BitBase -1 BitBase -2

BitOffset = -11

register, the offset can be in the range 0..31. This offset addresses a bit within the indicated register. An example, the function Bit[EAX, 21] is illustrated in Figure 2-2.

Figure 2-2. Bit Offset for BIT[EAX,21]

If BitBase is a memory address, BitOffset can range from -2 GBits to 2 GBits. The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase + (BitOffset DIV 8)), where DIV is signed division with rounding towards negative infinity, and MOD returns a positive number. This operation is illustrated in Figure 2-3.

Figure 2-3. Memory Bit Indexing

2.2.3 Flags Affected

The “Flags Affected” section lists the flags in the EFLAGS register that are affected by the instruction. When a flag is cleared, it is equal to 0; when it is set, it is equal to 1. The arithmetic and logical instructions usually assign values to the status flags in a uniform manner (see Appendix A, EFLAGS Cross-Reference, in the Intel Architecture Software Developer’s Manual, Volume 1). Non-conventional assignments are described in the “Operation” section. The values of flags listed as undefined may be changed by the instruction in an indeterminate manner. Flags that are not listed are unchanged by the instruction.

2.2.4 FPU Flags Affected

The floating-point instructions have an “FPU Flags Affected” section that describes how

4:18 Volume 4: Base IA-32 Instruction Reference

each instruction can affect the four condition code flags of the FPU status word.

2.2.5 Protected Mode Exceptions

The “Protected Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in protected mode and the reasons for the exceptions. Each exception is given a mnemonic that consists of a pound sign (#) followed by two letters and an optional error code in parentheses. For example, #GP(0) denotes a general protection exception with an error code of 0. Tab l e 2 - 2 associates each two-letter mnemonic with the corresponding interrupt vector number and exception name. See Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developer’s Manual, Volume 3, for a detailed description of the exceptions.

Application programmers should consult the documentation provided with their operating systems to determine the actions taken when exceptions occur.

2.2.6 Real-address Mode Exceptions

The “Real-Address Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in real-address mode.

Table 2-2. Exception Mnemonics, Names, and Vector Numbers

Vector

No.

a. The UD2 instruction was introduced in the Pentium® Pro processor. b. This exception was introduced in the Intel® 486 processor. c. This exception was introduced in the Pentium processor and enhanced in the Pentium Pro processor.

Mnemonic Name Source

0 #DE Divide Error DIV and IDIV instructions.

1 #DB Debug Any code or data reference.

3 #BP Breakpoint INT 3 instruction.

4 #OF Overflow INTO instruction.

5 #BR BOUND Range Exceeded BOUND instruction.

6 #UD Invalid Opcode (Undefined Opcode) UD2 instruction or reserved opcode.

7 #NM Device Not Available (No Math

Coprocessor)

8 #DF Double Fault Any instruction that can generate an

10 #TS Invalid TSS Task switch or TSS access.

11 #NP Segment Not Present Loading segment registers or accessing

12 #SS Stack Segment Fault Stack operations and SS register loads.

13 #GP General Protection Any memory reference and other protection

14 #PF Page Fault Any memory reference.

16 #MF Floating-point Error (Math Fault) Floating-point or WAIT/FWAIT instruction.

17 #AC Alignment Check Any data reference in memory.

18 #MC Machine Check Model dependent.

Floating-point or WAIT/FWAIT instruction.

exception, an NMI, or an INTR.

system segments.

checks.

2.2.7 Virtual-8086 Mode Exceptions

The “Virtual-8086 Mode Exceptions” section lists the exceptions that can occur when the instruction is executed in virtual-8086 mode.

Volume 4: Base IA-32 Instruction Reference 4:19

2.2.8 Floating-point Exceptions

The “Floating-point Exceptions” section lists additional exceptions that can occur when a floating-point instruction is executed in any mode. All of these exception conditions result in a floating-point error exception (#MF, vector number 16) being generated.

Tab le 2 -3 associates each one- or two-letter mnemonic with the corresponding

exception name. See “Floating-Point Exception Conditions” in Chapter 7 of the Intel Architecture Software Developer’s Manual, Volume 1, for a detailed description of these exceptions.

Table 2-3. Floating-point Exception Mnemonics and Names

Vector

No.

16 #Z Floating-point divide-by-zero FPU divide-by-zero

16 #D Floating-point denormalized operation Attempting to operate on a denormal

16 #O Floating-point numeric overflow FPU numeric overflow

16 #U Floating-point numeric underflow FPU numeric underflow

16 #P Floating-point inexact result (precision) Inexact result (precision)

Mnemonic Name Source

Floating-point invalid operation: #IS #IA

- Stack overflow or underflow

- Invalid arithmetic operation

- FPU stack overflow or underflow

- Invalid FPU arithmetic operation

number

2.3 IA-32 Base Instruction Reference

The remainder of this chapter provides detailed descriptions of each of the Intel architecture instructions.

4:20 Volume 4: Base IA-32 Instruction Reference

AAA—ASCII Adjust After Addition

Opcode Instruction Description

37 AAA ASCII adjust AL after addition

Description

Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the addition produces a decimal carry, the AH register is incremented by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are cleared to 0.

Operation

IF ((AL AND FH) > 9) OR (AF = 1)

THEN

AL  (AL + 6); AH  AH + 1; AF  1; CF  1;

ELSE

AF  0;

CF  0; FI; AL  AL AND FH;

Flags Affected

The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are cleared to 0. The OF, SF, ZF, and PF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

Volume 4: Base IA-32 Instruction Reference 4:21

AAD—ASCII Adjust AX Before Division

Opcode Instruction Description

D5 0A AAD ASCII adjust AX before division

Description

Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the AL register by an unpacked BCD value.

The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to 00H. The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit number in registers AH and AL.

Operation

tempAL  AL; tempAH  AH; AL  (tempAL + (tempAH  imm8)) AND FFH; AH  0

The immediate value (imm8) is taken from the second byte of the instruction, which under normal assembly is 0AH (10 decimal). However, this immediate value can be changed to produce a different result.

Flags Affected

The SF, ZF, and PF flags are set according to the result; the OF, AF, and CF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

4:22 Volume 4: Base IA-32 Instruction Reference

AAM—ASCII Adjust AX After Multiply

Opcode Instruction Description

D4 0A AAM ASCII adjust AX after multiply

Description

Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked BCD values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain the correct 2-digit unpacked BCD result.

Operation

tempAL  AL; AH  tempAL / imm8; AL  tempAL MOD imm8;

Flags Affected

The SF, ZF, and PF flags are set according to the result. The OF, AF, and CF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

Volume 4: Base IA-32 Instruction Reference 4:23

AAS—ASCII Adjust AL After Subtraction

Opcode Instruction Description

3F AAS ASCII adjust AL after subtraction

Description

Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.

If the subtraction produced a decimal carry, the AH register is decremented by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top nibble set to 0.

Operation

IF ((AL AND FH) > 9) OR (AF = 1) THEN

AL  AL - 6; AH  AH - 1; AF  1; CF  1;

ELSE

CF  0; AF  0;

FI; AL  AL AND FH;

Flags Affected

The AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are cleared to 0. The OF, SF, ZF, and PF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

4:24 Volume 4: Base IA-32 Instruction Reference

ADC—Add with Carry

Opcode Instruction Description

14 ib ADC AL,imm8 Add with carry imm8 to AL

15 iw ADC AX,imm16 Add with carry imm16 to AX

15 id ADC EAX,imm32 Add with carry imm32 to EAX

80 /2 ib ADC r/m8,imm8 Add with carry imm8 to r/m8

81 /2 iw ADC r/m16,imm16 Add with carry imm16 to r/m16

81 /2 id ADC r/m32,imm32 Add with CF imm32 to r/m32

83 /2 ib ADC r/m16,imm8 Add with CF sign-extended imm8 to r/m16

83 /2 ib ADC r/m32,imm8 Add with CF sign-extended imm8 into r/m32

10 /r ADC r/m8,r8 Add with carry byte register to r/m8

11 / r ADC r/m16,r16 Add with carry r16 to r/m16

11 / r ADC r/m32,r32 Add with CF r32 to r/m32

12 /r ADC r8,r/m8 Add with carry r/m8 to byte register

13 /r ADC r16,r/m16 Add with carry r/m16 to r16

13 /r ADC r32,r

Description

Adds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. The state of the CF flag represents a carry from a previous addition. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

/m32 Add with CF r/m32 to r32

The ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.

The ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is followed by an ADC instruction.

Operation

DEST  DEST + SRC + CF;

Flags Affected

The OF, SF, ZF, AF, CF, and PF flags are set according to the result.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Volume 4: Base IA-32 Instruction Reference 4:25

ADC—Add with Carry (Continued)

Protected Mode Exceptions

#GP(0) If the destination is located in a nonwritable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:26 Volume 4: Base IA-32 Instruction Reference

ADD—Add

Opcode Instruction Description

04 ib ADD AL,imm8 Add imm8 to AL

05 iw ADD AX,imm16 Add imm16 to AX

05 id ADD EAX,imm32 Add imm32 to EAX

80 /0 ib ADD r/m8,imm8 Add imm8 to r/m8

81 /0 iw ADD r/m16,imm16 Add imm16 to r/m16

81 /0 id ADD r/m32,imm32 Add imm32 to r/m32

83 /0 ib ADD r/m16,imm8 Add sign-extended imm8 to r/m16

83 /0 ib ADD r/m32,imm8 Add sign-extended imm8 to r/m32

00 /r ADD r/m8,r8 Add r8 to r/m8

01 /r ADD r/m16,r16 Add r16 to r/m16

01 /r ADD r/m32,r32 Add r32 to r/m32

02 /r ADD r8,r/m8 Add r/m8 to r8

03 /r ADD r16,r/m16 Add r/m16 to r16

03 /r ADD r

32,r/m32 Add r/m32 to r32

Description

Adds the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

The ADD instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.

Operation

DEST  DEST + SRC;

Flags Affected

The OF, SF, ZF, AF, CF, and PF flags are set according to the result.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Volume 4: Base IA-32 Instruction Reference 4:27

ADD—Add (Continued)

Protected Mode Exceptions

#GP(0) If the destination is located in a nonwritable segment.

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

#SS(0)If a memory operand effective address is outside the SS segment limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:28 Volume 4: Base IA-32 Instruction Reference

AND—Logical AND

Opcode Instruction Description

24 ib AND AL,imm8 AL AND imm8

25 iw AND AX,imm16 AX AND imm16

25 id AND EAX,imm32 EAX AND imm32

80 /4 ib AND r/m8,imm8 r/m8 AND imm8

81 /4 iw AND r/m16,imm16 r/m16 AND imm16

81 /4 id AND r/m32,imm32 r/m32 AND imm32

83 /4 ib AND r/m16,imm8 r/m16 AND imm8

83 /4 ib AND r/m32,imm8 r/m32 AND imm8

20 /r AND r/m8,r8 r/m8 AND r8

21 /r AND r/m16,r16 r/m16 AND r16

21 /r AND r/m32,r32 r/m32 AND r32

22 /r AND r8,r/m8 r8 AND r/m8

23 /r AND r16,r/m16 r16 AND r/m16

23 /r AND r32,r/m32 r32 AND r/m32

Description

Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location.

Operation

DEST  DEST AND SRC;

Flags Affected

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Protected Mode Exceptions

#GP(0) If the destination operand points to a nonwritable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

limit.

Volume 4: Base IA-32 Instruction Reference 4:29

AND—Logical AND (Continued)

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:30 Volume 4: Base IA-32 Instruction Reference

ARPL—Adjust RPL Field of Segment Selector

Opcode Instruction Description

63 /r ARPL r/m16,r16 Adjust RPL of r/m16 to not less than RPL of r16

Description

Compares the RPL fields of two segment selectors. The first operand (the destination operand) contains one segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0 and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand, the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand. Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can be a word register or a memory location; the source operand must be a word register.)

The ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applications). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system by an application program to match the privilege level of the application program. Here the segment selector passed to the operating system is placed in the destination operand and segment selector for the application program’s code segment is placed in the source operand. (The RPL field in the source operand represents the privilege level of the application program.) Execution of the ARPL instruction then insures that the RPL of the segment selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of the application program. (The segment selector for the application program’s code segment can be read from the procedure stack following a procedure call.)

See the Intel Architecture Software Developer’s Manual, Volume 3 for more information about the use of this instruction.

Operation

IF DEST(RPL) < SRC(RPL) THEN

ZF  1; DEST(RPL)  SRC(RPL);

ELSE

ZF  0;

FI;

Flags Affected

The ZF flag is set to 1 if the RPL field of the destination operand is less than that of the source operand; otherwise, is cleared to 0.

Volume 4: Base IA-32 Instruction Reference 4:31

ARPL—Adjust RPL Field of Segment Selector (Continued)

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

Protected Mode Exceptions

#GP(0) If the destination is located in a nonwritable segment.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#UD The ARPL instruction is not recognized in real address mode.

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

Virtual 8086 Mode Exceptions

#UD The ARPL instruction is not recognized in virtual 8086 mode.

4:32 Volume 4: Base IA-32 Instruction Reference

BOUND—Check Array Index Against Bounds

Opcode Instruction Description

62 /r BOUND r16,m16&16 Check if r16 (array index) is within bounds specified by m16&16

62 /r BOUND r32,m32&32 Check if r32 (array index) is within bounds specified by m16&16

Description

Determines if the first operand (array index) is within the bounds of an array specified the second operand (bounds operand). The array index is a signed integer located in a register. The bounds operand is a memory location that points to a pair of signed doubleword-integers (when the operand-size attribute is 32) or a pair of signed word-integers (when the operand-size attribute is 16). The first doubleword (or word) is the lower bound of the array and the second doubleword (or word) is the upper bound of the array. The array index must be greater than or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If the index is not within bounds, a BOUND range exceeded exception (#BR) is signaled. (When a this exception is generated, the saved return instruction pointer points to the BOUND instruction.)

The bounds limit data structure (two words or doublewords containing the lower and upper limits of the array) is usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles to obtain the effective address of the array bounds.

Operation

IF (ArrayIndex < LowerBound OR ArrayIndex > (UppderBound + OperandSize/8]))

(* Below lower bound or above upper bound *) THEN

#BR; FI;

Flags Affected

None.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Volume 4: Base IA-32 Instruction Reference 4:33

BOUND—Check Array Index Against Bounds (Continued)

Protected Mode Exceptions

#BR If the bounds test fails.

#UD If second operand is not a memory location.

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#BR If the bounds test fails.

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#BR If the bounds test fails.

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:34 Volume 4: Base IA-32 Instruction Reference

BSF—Bit Scan Forward

Opcode Instruction Description

0F BC BSF r16,r/m16 Bit scan forward on r/m16

0F BC BSF r32,r/m32 Bit scan forward on r/m32

Description

Searches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the contents source operand are 0, the contents of the destination operand is undefined.

Operation

IF SRC = 0

THEN

ZF  1;

DEST is undefined;

ELSE

ZF  0;

temp  0;

WHILE Bit(SRC, temp) = 0 DO

temp  temp + 1;

DEST  temp;

OD;

FI;

Flags Affected

The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

Volume 4: Base IA-32 Instruction Reference 4:35

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

BSF—Bit Scan Forward (Continued)

Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:36 Volume 4: Base IA-32 Instruction Reference

BSR—Bit Scan Reverse

Opcode Instruction Description

0F BD BSR r16,r/m16 Bit scan reverse on r/m16

0F BD BSR r32,r/m32 Bit scan reverse on r/m32

Description

Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source operand. If the contents source operand are 0, the contents of the destination operand is undefined.

Operation

IF SRC = 0

THEN

ZF  1;

DEST is undefined;

ELSE

ZF  0;

temp  OperandSize - 1;

WHILE Bit(SRC, temp) = 0 DO

temp  temp  1;

DEST  temp;

OD;

FI;

Flags Affected

The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

Volume 4: Base IA-32 Instruction Reference 4:37

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

BSR—Bit Scan Reverse (Continued)

Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:38 Volume 4: Base IA-32 Instruction Reference

BSWAP—Byte Swap

Opcode Instruction Description

0F C8+rd BSWAP r32 Reverses the byte order of a 32-bit register.

Description

Reverses the byte order of a 32-bit (destination) register: bits 0 through 7 are swapped with bits 24 through 31, and bits 8 through 15 are swapped with bits 16 through 23. This instruction is provided for converting little-endian values to big-endian format and vice versa.

To swap bytes in a word value (16-bit register), use the XCHG instruction. When the BSWAP instruction references a 16-bit register, the result is undefined.

Operation

TEMP  DEST DEST(7..0)  TEMP(31..24) DEST(15..8)  TEMP(23..16) DEST(23..16)  TEMP(15..8) DEST(31..24)  TEMP(7..0)

Flags Affected

None.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

Intel Architecture Compatibility Information

The BSWAP instruction is not supported on Intel architecture processors earlier than the Intel486™ processor family. For compatibility with this instruction, include functionally-equivalent code for execution on Intel processors earlier than the Intel486 processor family.

Volume 4: Base IA-32 Instruction Reference 4:39

BT—Bit Test

Opcode Instruction Description

0F A3 BT r/m16,r16 Store selected bit in CF flag

0F A3 BT r/m32,r32 Store selected bit in CF flag

0F BA /4 ib BT r/m16,imm8 Store selected bit in CF flag

0F BA /4 ib BT r/m32,imm8 Store selected bit in CF flag

Description

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand) and stores the value of the bit in the CF flag. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16or 32-bit register, respectively. If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The offset operand then selects a bit position within the range 2

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. In this case, the low-order 3 or 5 bits (3 for 16-bit operands, 5 for 32-bit operands) of the immediate bit offset are stored in the immediate bit offset field, and the high-order bits are shifted and combined with the byte displacement in the addressing mode by the assembler. The processor will ignore the high order bits if they are not zero.

to 231  1 for a register offset and 0 to 31 for an immediate offset.

When accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32-bit operand size, using by the following relationship:

Effective Address + (4  (BitOffset DIV 32))

Or, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship:

Effective Address + (2  (BitOffset DIV 16))

It may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit addressing mechanism, software should avoid referencing areas of memory close to address space holes. In particular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instructions to load from or store to these addresses, and use the register form of these instructions to manipulate the data.

Operation

CF  Bit(BitBase, BitOffset)

Flags Affected

The CF flag contains the value of the selected bit. The OF, SF, ZF, AF, and PF flags are undefined.

4:40 Volume 4: Base IA-32 Instruction Reference

BT—Bit Test (Continued)

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

limit.

reference is made.

Volume 4: Base IA-32 Instruction Reference 4:41

BTC—Bit Test and Complement

Opcode Instruction Description

0F BB BTC r/m16,r16 Store selected bit in CF flag and complement

0F BB BTC r/m32,r32 Store selected bit in CF flag and complement

0F BA /7 ib BTC r/m16,imm8 Store selected bit in CF flag and complement

0F BA /7 ib BTC r/m32,imm8 Store selected bit in CF flag and complement

Description

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and complements the selected bit in the bit string. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively. If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The offset operand then selects a bit position within the range 2 and 0 to 31 for an immediate offset.

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. See

“BT—Bit Test” on page 4:40 for more information on this addressing mechanism.

to 231  1 for a register offset

Operation

CF  Bit(BitBase, BitOffset) Bit(BitBase, BitOffset)  NOT Bit(BitBase, BitOffset);

Flags Affected

The CF flag contains the value of the selected bit before it is complemented. The OF, SF, ZF, AF, and PF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

4:42 Volume 4: Base IA-32 Instruction Reference

BTC—Bit Test and Complement (Continued)

Protected Mode Exceptions

#GP(0) If the destination operand points to a non-writable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

Volume 4: Base IA-32 Instruction Reference 4:43

BTR—Bit Test and Reset

Opcode Instruction Description

0F B3 BTR r/m16,r16 Store selected bit in CF flag and clear

0F B3 BTR r/m32,r32 Store selected bit in CF flag and clear

0F BA /6 ib BTR r/m16,imm8 Store selected bit in CF flag and clear

0F BA /6 ib BTR r/m32,imm8 Store selected bit in CF flag and clear

Description

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and clears the selected bit in the bit string to 0. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively. If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The offset operand then selects a bit position within the range 2 and 0 to 31 for an immediate offset.

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. See

“BT—Bit Test” on page 4:40 for more information on this addressing mechanism.

to 231  1 for a register offset

Operation

CF  Bit(BitBase, BitOffset) Bit(BitBase, BitOffset)  0;

Flags Affected

The CF flag contains the value of the selected bit before it is cleared. The OF, SF, ZF, AF, and PF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

4:44 Volume 4: Base IA-32 Instruction Reference

BTR—Bit Test and Reset (Continued)

Protected Mode Exceptions

#GP(0) If the destination operand points to a nonwritable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

Volume 4: Base IA-32 Instruction Reference 4:45

BTS—Bit Test and Set

Opcode Instruction Description

0F AB BTS r/m16,r16 Store selected bit in CF flag and set

0F AB BTS r/m32,r32 Store selected bit in CF flag and set

0F BA /5 ib BTS r/m16,imm8 Store selected bit in CF flag and set

0F BA /5 ib BTS r/m32,imm8 Store selected bit in CF flag and set

Description

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register or an immediate value. If the bit base operand specifies a register, the instruction takes the modulo 16 or 32 (depending on the register size) of the bit offset operand, allowing any bit position to be selected in a 16- or 32-bit register, respectively. If the bit base operand specifies a memory location, it represents the address of the byte in memory that contains the bit base (bit 0 of the specified byte) of the bit string. The offset operand then selects a bit position within the range 2 and 0 to 31 for an immediate offset.

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combination with the displacement field of the memory operand. See

“BT—Bit Test” on page 4:40 for more information on this addressing mechanism.

to 231  1 for a register offset

Operation

CF  Bit(BitBase, BitOffset) Bit(BitBase, BitOffset)  1;

Flags Affected

The CF flag contains the value of the selected bit before it is set. The OF, SF, ZF, AF, and PF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

4:46 Volume 4: Base IA-32 Instruction Reference

BTS—Bit Test and Set (Continued)

Protected Mode Exceptions

#GP(0) If the destination operand points to a nonwritable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

Volume 4: Base IA-32 Instruction Reference 4:47

CALL—Call Procedure

Opcode Instruction Description

E8 cw CALL rel16 Call near, displacement relative to next instruction

E8 cd CALL rel32 Call near, displacement relative to next instruction

FF /2 CALL r/m16 Call near, r/m16 indirect

FF /2 CALL r/m32 Call near, r/m32 indirect

9A cd CALL ptr16:16 Call far, to full pointer given

9A cp CALL ptr16:32 Call far, to full pointer given

FF /3 CALL m16:16 Call far, address at r/m16

FF /3 CALL m16:32 Call far, address at r/m32

Description

Saves procedure linking information on the procedure stack and jumps to the procedure (called procedure) specified with the destination (target) operand. The target operand specifies the address of the first instruction in the called procedure. This operand can be an immediate value, a general-purpose register, or a memory location.

This instruction can be used to execute four different types of calls:

• Near call – A call to a procedure within the current code segment (the segment

currently pointed to by the CS register), sometimes referred to as an intrasegment call.

• Far call – A call to a procedure located in a different segment than the current code

segment, sometimes referred to as an intersegment call.

• Inter-privilege-level far call – A far call to a procedure in a segment at a different

privilege level than that of the currently executing program or procedure. Results

in an IA-32_Intercept(Gate) in Itanium System Environment.

• Task switch – A call to a procedure located in a different task. Results in an

IA-32_Intercept(Gate) in Itanium System Environment.

The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See Chapter 6 in the Intel Architecture Software Developer’s Manual, Volume 3 for information on task switching with the CALL instruction.

When executing a near call, the processor pushes the value of the EIP register (which contains the address of the instruction following the CALL instruction) onto the procedure stack (for use later as a return-instruction pointer. The processor then jumps to the address specified with the target operand for the called procedure. The target operand specifies either an absolute address in the code segment (that is an offset from the base of the code segment) or a relative offset (a signed offset relative to the current value of the instruction pointer in the EIP register, which points to the instruction following the call). An absolute address is specified directly in a register or indirectly in a memory location (r/m16 or r/m32 target-operand form). (When accessing an absolute address indirectly using the stack pointer (ESP) as a base register, the base value used is the value of the ESP before the instruction executes.) A relative offset (rel16 or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 16- or 32-bit immediate value, which is added to the instruction pointer.

4:48 Volume 4: Base IA-32 Instruction Reference

CALL—Call Procedure (Continued)

When executing a near call, the operand-size attribute determines the size of the target operand (16 or 32 bits) for absolute addresses. Absolute addresses are loaded directly into the EIP register. When a relative offset is specified, it is added to the value of the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared to 0s, resulting in a maximum instruction pointer size of 16 bits. The CS register is not changed on near calls.

When executing a far call, the processor pushes the current value of both the CS and EIP registers onto the procedure stack for use as a return-instruction pointer. The processor then performs a far jump to the code segment and address specified with the target operand for the called procedure. Here the target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and address of the called procedure is encoded in the instruction using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared to 0s.

Any far call from a 32-bit code segment to a 16-bit code segment should be made from the first 64 Kbytes of the 32-bit code segment, because the operand-size attribute of the instruction is set to 16, allowing only a 16-bit return address offset to be saved. Also, the call should be made using a 16-bit call gate so that 16-bit values will be pushed on the stack.

When the processor is operating in protected mode, a far call can also be used to access a code segment at a different privilege level or to switch tasks. Here, the processor uses the segment selector part of the far address to access the segment descriptor for the segment being jumped to. Depending on the value of the type and access rights information in the segment selector, the CALL instruction can perform:

• A far call to the same privilege level (described in the previous paragraph).

• An far call to a different privilege level. Results in an IA-32_Intercept(Gate) in

Itanium System Environment.

• A task switch. Results in an IA-32_Intercept(Gate) in Itanium System Environment.

When executing an inter-privilege-level far call, the code segment for the procedure being called is accessed through a call gate. The segment selector specified by the target operand identifies the call gate. In executing a call through a call gate where a change of privilege level occurs, the processor switches to the stack for the privilege level of the called procedure, pushes the current values of the CS and EIP registers and the SS and ESP values for the old stack onto the new stack, then performs a far jump to the new code segment. The new code segment is specified in the call gate descriptor; the new stack segment is specified in the TSS for the currently running task. The jump to the new code segment occurs after the stack switch. On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack, a set of parameters from the calling procedures stack, and the segment selector and instruction pointer for the calling procedure’s code segment. (A value in the call gate descriptor determines how many parameters to copy to the new stack.)

Finally, the processor jumps to the address of the procedure being called within the new code segment. The procedure address is the offset specified by the target operand. Here again, the target operand can specify the far address of the call gate and procedure either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32).

Volume 4: Base IA-32 Instruction Reference 4:49

CALL—Call Procedure (Continued)

Executing a task switch with the CALL instruction, is similar to executing a call through a call gate. Here the target operand specifies the segment selector of the task gate for the task being switched to and the address of the procedure being called in the task. The task gate in turn points to the TSS for the task, which contains the segment selectors for the task’s code and stack segments. The CALL instruction can also specify the segment selector of the TSS directly. See the Intel Architecture Software Developer’s Manual, Volume 3 the for detailed information on the mechanics of a task switch.

Operation

IF near call

THEN IF near relative call

IF the instruction pointer is not within code segment limit THEN #GP(0); FI; THEN IF OperandSize = 32

THEN

IF stack not large enough for a 4-byte return address THEN #SS(0); FI; Push(EIP); EIP  EIP + DEST; (* DEST is rel32 *)

ELSE (* OperandSize = 16 *)

IF stack not large enough for a 2-byte return address THEN #SS(0); FI; Push(IP); EIP  (EIP + DEST) AND 0000FFFFH; (* DEST is rel16 *)

FI;

FI; ELSE (* near absolute call *)

IF the instruction pointer is not within code segment limit THEN #GP(0); FI; IF OperandSize = 32

THEN

IF stack not large enough for a 4-byte return address THEN #SS(0); FI; Push(EIP); EIP  DEST; (* DEST is r/m32 *)

ELSE (* OperandSize = 16 *)

IF stack not large enough for a 2-byte return address THEN #SS(0); FI; Push(IP); EIP  DEST AND 0000FFFFH; (* DEST is r/m16 *)

FI;

FI:

IF Itanium System Environment AND PSR.tb THEN IA_32_Exception(Debug);

FI; IF far call AND (PE = 0 OR (PE = 1 AND VM = 1)) (* real address or virtual 8086 mode *)

THEN

IF OperandSize = 32

THEN

IF stack not large enough for a 6-byte return address THEN #SS(0); FI; IF the instruction pointer is not within code segment limit THEN #GP(0); FI; Push(CS); (* padded with 16 high-order bits *) Push(EIP); CS  DEST[47:32]; (* DEST is ptr16:32 or [m16:32] *) EIP  DEST[31:0]; (* DEST is ptr16:32 or [m16:32] *)

ELSE (* OperandSize = 16 *)

IF stack not large enough for a 4-byte return address THEN #SS(0); FI; IF the instruction pointer is not within code segment limit THEN #GP(0); FI; Push(CS);

4:50 Volume 4: Base IA-32 Instruction Reference

CALL—Call Procedure (Continued)

Push(IP); CS  DEST[31:16]; (* DEST is ptr16:16 or [m16:16] *) EIP  DEST[15:0]; (* DEST is ptr16:16 or [m16:16] *) EIP  EIP AND 0000FFFFH; (* clear upper 16 bits *)

FI;

IF Itanium System Environment AND PSR.tb THEN IA_32_Exception(Debug);

FI;

IF far call AND (PE = 1 AND VM = 0) (* Protected mode, not virtual 8086 mode *)

THEN

IF segment selector in target operand null THEN #GP(0); FI; IF segment selector index not within descriptor table limits

THEN #GP(new code selector); FI; Read type and access rights of selected segment descriptor; IF segment type is not a conforming or nonconforming code segment, call gate,

task gate, or TSS THEN #GP(segment selector); FI; Depending on type and access rights

GO TO CONFORMING-CODE-SEGMENT;

GO TO NONCONFORMING-CODE-SEGMENT;

GO TO CALL-GATE;

GO TO TASK-GATE;

GO TO TASK-STATE-SEGMENT;

FI;

CONFORMING-CODE-SEGMENT:

IF DPL > CPL THEN #GP(new code segment selector); FI; IF not present THEN #NP(selector); FI; IF OperandSize = 32

THEN

IF stack not large enough for a 6-byte return address THEN #SS(0); FI;

IF the instruction pointer is not within code segment limit THEN #GP(0); FI;

Push(CS); (* padded with 16 high-order bits *)

Push(EIP);

CS  DEST(NewCodeSegmentSelector);

(* segment descriptor information also loaded *)

CS(RPL)  CPL

EIP  DEST(offset); ELSE (* OperandSize = 16 *)

IF stack not large enough for a 4-byte return address THEN #SS(0); FI;

IF the instruction pointer is not within code segment limit THEN #GP(0); FI;

Push(CS);

Push(IP);

CS  DEST(NewCodeSegmentSelector);

(* segment descriptor information also loaded *)

CS(RPL)  CPL

EIP  DEST(offset) AND 0000FFFFH; (* clear upper 16 bits *)

FI;

IF Itanium System Environment AND PSR.tb THEN IA_32_Exception(Debug);

END;

NONCONFORMING-CODE-SEGMENT:

IF (RPL > CPL) OR (DPL  CPL) THEN #GP(new code segment selector); FI;

Volume 4: Base IA-32 Instruction Reference 4:51

CALL—Call Procedure (Continued)

IF stack not large enough for return address THEN #SS(0); FI; tempEIP  DEST(offset) IF OperandSize=16

THEN

tempEIP  tempEIP AND 0000FFFFH; (* clear upper 16 bits *) FI; IF tempEIP outside code segment limit THEN #GP(0); FI; IF OperandSize = 32

THEN

Push(CS); (* padded with 16 high-order bits *)

Push(EIP);

CS  DEST(NewCodeSegmentSelector);

(* segment descriptor information also loaded *)

CS(RPL)  CPL;

EIP  tempEIP;

ELSE (* OperandSize = 16 *)

Push(CS);

Push(IP);

CS  DEST(NewCodeSegmentSelector);

(* segment descriptor information also loaded *)

CS(RPL)  CPL;

EIP  tempEIP; FI;

IF Itanium System Environment AND PSR.tb THEN IA_32_Exception(Debug);

END;

CALL-GATE:

IF call gate DPL < CPL or RPL THEN #GP(call gate selector); FI; IF not present THEN #NP(call gate selector); FI;

IF Itanium System Environment THEN IA-32_Intercept(Gate,CALL);

IF call gate code-segment selector is null THEN #GP(0); FI; IF call gate code-segment selector index is outside descriptor table limits

THEN #GP(code segment selector); FI; Read code segment descriptor; IF code-segment segment descriptor does not indicate a code segment OR code-segment segment descriptor DPL > CPL

THEN #GP(code segment selector); FI; IF code segment not present THEN #NP(new code segment selector); FI; IF code segment is non-conforming AND DPL < CPL

THEN go to MORE-PRIVILEGE;

ELSE go to SAME-PRIVILEGE; FI;

END;

MORE-PRIVILEGE:

IF current TSS is 32-bit TSS

THEN

TSSstackAddress  new code segment (DPL  8) + 4 IF (TSSstackAddress + 7)  TSS limit

THEN #TS(current TSS selector); FI; newSS  TSSstackAddress + 4; newESP  stack address;

ELSE (* TSS is 16-bit *)

4:52 Volume 4: Base IA-32 Instruction Reference

CALL—Call Procedure (Continued)

TSSstackAddress  new code segment (DPL  4) + 2 IF (TSSstackAddress + 4)  TSS limit

THEN #TS(current TSS selector); FI; newESP  TSSstackAddress; newSS  TSSstackAddress + 2;

FI; IF stack segment selector is null THEN #TS(stack segment selector); FI; IF stack segment selector index is not within its descriptor table limits

THEN #TS(SS selector); FI Read code segment descriptor; IF stack segment selector's RPL  DPL of code segment

OR stack segment DPL  DPL of code segment

OR stack segment is not a writable data segment

THEN #TS(SS selector); FI IF stack segment not present THEN #SS(SS selector); FI; IF CallGateSize = 32

THEN

IF stack does not have room for parameters plus 16 bytes

THEN #SS(SS selector); FI; IF CallGate(InstructionPointer) not within code segment limit THEN #GP(0); FI; SS  newSS; (* segment descriptor information also loaded *) ESP  newESP; CS:EIP  CallGate(CS:InstructionPointer); (* segment descriptor information also loaded *) Push(oldSS:oldESP); (* from calling procedure *) temp  parameter count from call gate, masked to 5 bits; Push(parameters from calling procedure’s stack, temp) Push(oldCS:oldEIP); (* return address to calling procedure *)

ELSE (* CallGateSize = 16 *)

IF stack does not have room for parameters plus 8 bytes

THEN #SS(SS selector); FI; IF (CallGate(InstructionPointer) AND FFFFH) not within code segment limit

THEN #GP(0); FI; SS  newSS; (* segment descriptor information also loaded *) ESP  newESP; CS:IP  CallGate(CS:InstructionPointer); (* segment descriptor information also loaded *) Push(oldSS:oldESP); (* from calling procedure *) temp  parameter count from call gate, masked to 5 bits; Push(parameters from calling procedure’s stack, temp) Push(oldCS:oldEIP); (* return address to calling procedure *)

FI; CPL  CodeSegment(DPL) CS(RPL)  CPL

END;

SAME-PRIVILEGE:

IF CallGateSize = 32

THEN

IF stack does not have room for 8 bytes

THEN #SS(0); FI;

Volume 4: Base IA-32 Instruction Reference 4:53

CALL—Call Procedure (Continued)

IF EIP not within code segment limit then #GP(0); FI; CS:EIP  CallGate(CS:EIP) (* segment descriptor information also loaded *) Push(oldCS:oldEIP); (* return address to calling procedure *)

ELSE (* CallGateSize = 16 *)

IF stack does not have room for parameters plus 4 bytes

THEN #SS(0); FI; IF IP not within code segment limit THEN #GP(0); FI; CS:IP  CallGate(CS:instruction pointer) (* segment descriptor information also loaded *) Push(oldCS:oldIP); (* return address to calling procedure *)

FI; CS(RPL)  CPL

END;

TASK-GATE:

IF task gate DPL < CPL or RPL

THEN #GP(task gate selector); FI; IF task gate not present

THEN #NP(task gate selector); FI;

IF Itanium System Environment THEN IA-32_Intercept(Gate,CALL);

Read the TSS segment selector in the task-gate descriptor; IF TSS segment selector local/global bit is set to local

OR index not within GDT limits

THEN #GP(TSS selector); FI; Access TSS descriptor in GDT;

IF TSS descriptor specifies that the TSS is busy (low-order 5 bits set to 00001)

THEN #GP(TSS selector); FI; IF TSS not present

THEN #NP(TSS selector); FI; SWITCH-TASKS (with nesting) to TSS; IF EIP not within code segment limit

THEN #GP(0); FI;

END;

TASK-STATE-SEGMENT:

IF TSS DPL < CPL or RPL ORTSS segment selector local/global bit is set to local OR TSS descriptor indicates TSS not available

THEN #GP(TSS selector); FI; IF TSS is not present

THEN #NP(TSS selector); FI;

IF Itanium System Environment THEN IA-32_Intercept(Gate,CALL);

SWITCH-TASKS (with nesting) to TSS IF EIP not within code segment limit

4:54 Volume 4: Base IA-32 Instruction Reference

CALL—Call Procedure (Continued)

THEN #GP(0);

FI;

END;

Flags Affected

All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur.

Additional Itanium System Environment Exceptions

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

IA-32_Intercept Gate Intercept for CALLs through CALL Gates, Task Gates and Task

IA_32_Exception Taken Branch Debug Exception if PSR.tb is 1

Protected Mode Exceptions

#GP(0) If target offset in destination operand is beyond the new code

#GP(selector) If code segment or gate or TSS selector index is outside descriptor

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Segments

segment limit.

If the segment selector in the destination operand is null.

If the code segment selector in the gate is null.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.

table limits.

If the segment descriptor pointed to by the segment selector in the destination operand is not for a conforming-code segment, nonconforming-code segment, call gate, task gate, or task state segment.

If the DPL for a nonconforming-code segment is not equal to the CPL or the RPL for the segment’s segment selector is greater than the CPL.

If the DPL for a conforming-code segment is greater than the CPL.

If the DPL from a call-gate, task-gate, or TSS segment descriptor is less than the CPL or than the RPL of the call-gate, task-gate, or TSS’s segment selector.

If the segment descriptor for a segment selector from a call gate does not indicate it is a code segment.

If the segment selector from a call gate is beyond the descriptor table limits.

If the DPL for a code-segment obtained from a call gate is greater than the CPL.

If the segment selector for a TSS has its local/global bit set for local.

If a TSS segment descriptor specifies that the TSS is busy or not available.

Volume 4: Base IA-32 Instruction Reference 4:55

CALL—Call Procedure (Continued)

#SS(0) If pushing the return address, parameters, or stack segment pointer

#SS(selector) If pushing the return address, parameters, or stack segment pointer

#NP(selector) If a code segment, data segment, stack segment, call gate, task

#TS(selector) If the new stack segment selector and ESP are beyond the end of

#PF(fault-code) If a page fault occurs.

#AC(0) If an unaligned memory access occurs when the CPL is 3 and

onto the stack exceeds the bounds of the stack segment, when no stack switch occurs.

If a memory operand effective address is outside the SS segment limit.

onto the stack exceeds the bounds of the stack segment, when a stack switch occurs.

If the SS register is being loaded as part of a stack switch and the segment pointed to is marked not present.

If stack segment does not have room for the return address, parameters, or stack segment pointer, when stack switch occurs.

gate, or TSS is not present.

the TSS.

If the new stack segment selector is null.

If the RPL of the new stack segment selector in the TSS is not equal to the DPL of the code segment being accessed.

If DPL of the stack segment descriptor for the new stack segment is not equal to the DPL of the code segment descriptor.

If the new stack segment is not a writable data segment.

If segment-selector index for stack segment is outside descriptor table limits.

alignment checking is enabled.

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

or GS segment limit.

If the target offset is beyond the code segment limit.

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#PF(fault-code) If a page fault occurs.

#AC(0) If an unaligned memory access occurs when alignment checking is

or GS segment limit.

If the target offset is beyond the code segment limit.

enabled.

4:56 Volume 4: Base IA-32 Instruction Reference

CBW/CWDE—Convert Byte to Word/Convert Word to Doubleword

Opcode Instruction Description

98 CBW AX  sign-extend of AL

98 CWDE EAX  sign-extend of AX

Description

Double the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to doubleword) instruction copies the sign (bit

15) of the word in the AX register into the higher 16 bits of the EAX register.

The CBW and CWDE mnemonics reference the same opcode. The CBW instruction is intended for use when the operand-size attribute is 16 and the CWDE instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when CBW is used and to 32 when CWDE is used. Others may treat these mnemonics as synonyms (CBW/CWDE) and use the current setting of the operand-size attribute to determine the size of values to be converted, regardless of the mnemonic used.

The CWDE instruction is different from the CWD (convert word to double) instruction. The CWD instruction uses the DX:AX register pair as a destination operand; whereas, the CWDE instruction uses the EAX register as a destination.

Operation

IF OperandSize = 16 (* instruction = CBW *)

THEN AX  SignExtend(AL); ELSE (* OperandSize = 32, instruction = CWDE *)

EAX  SignExtend(AX);

FI;

Flags Affected

None.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

Volume 4: Base IA-32 Instruction Reference 4:57

CDQ—Convert Double to Quad

See entry for CWD/CDQ — Convert Word to Double/Convert Double to Quad.

4:58 Volume 4: Base IA-32 Instruction Reference

CLC—Clear Carry Flag

Opcode Instruction Description

F8 CLC Clear CF flag

Description

Clears the CF flag in the EFLAGS register.

Operation

CF  0;

Flags Affected

The CF flag is cleared to 0. The OF, ZF, SF, AF, and PF flags are unaffected.

Exceptions (All Operating Modes)

None.

Volume 4: Base IA-32 Instruction Reference 4:59

CLD—Clear Direction Flag

Opcode Instruction Description

FC CLD Clear DF flag

Description

Clears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index registers (ESI and/or EDI).

Operation

DF  0;

Flags Affected

The DF flag is cleared to 0. The CF, OF, ZF, SF, AF, and PF flags are unaffected.

Exceptions (All Operating Modes)

None.

4:60 Volume 4: Base IA-32 Instruction Reference

CLI—Clear Interrupt Flag

Opcode Instruction Description

FA CLI Clear interrupt flag; interrupts disabled when interrupt flag

Description

Clears the IF flag in the EFLAGS register. No other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no affect on the generation of exceptions and NMI interrupts. In the Itanium System Environment, external interrupts are enabled

for IA-32 instructions if PSR.i and (~CFLG.if or EFLAG.if) is 1 and for Itanium instructions if PSR.i is 1.

The following decision table indicates the action of the CLI instruction (bottom of the table) depending on the processor’s mode of operating and the CPL and IOPL of the currently running program or procedure (top of the table).

PE = 0 1 1 1 1

VM = X 0 X 0 1

CPL X  IOPL X > IOPL X

IOPL X X  3X< 3

IF 0YYYNN

#GP(0) N N N Y Y

cleared

Notes: XDon't care. NAction in column 1 not taken. YAction in column 1 taken.

Operation

OLD_IF <- IF;

IF PE = 0 (* Executing in real-address mode *)

THEN

IF  0;

ELSE

IF VM = 0 (* Executing in protected mode *)

THEN

IF CR4.PVI = 1

THEN

IF CPL = 3 THEN

IF IOPL<3 THEN VIF <- 0; ELSE IF <- 0; FI;

ELSE (*CPL < 3*)

IF IOPL < CPL THEN #GP(0); ELSE IF <- 0;

Volume 4: Base IA-32 Instruction Reference 4:61

CLI—Clear Interrupt Flag (Continued)

FI;

ELSE (*CR4.PVI==0 *)

IF IOPL < CPL THEN #GP(0); ELSE IF <- 0; FI;

FI;

ELSE (* Executing in Virtual-8086 mode *)

IF IOPL = 3

THEN

IF 

ELSE

IF CR4.VME= 0 THEN #GP(0); ELSE VIF <- 0; FI;

FI;

IF Itanium System Environment AND CFLG.ii AND IF != OLD_IF

THEN IA-32_Intercept(System_Flag,CLI);

Flags Affected

The IF is cleared to 0 if the CPL is equal to or less than the IOPL; otherwise, the it is not affected. The other flags in the EFLAGS register are unaffected.

Additional Itanium System Environment Exceptions

IA-32_Intercept System Flag Intercept Trap if CFLG.ii is 1 and the IF flag changes

state.

Protected Mode Exceptions

#GP(0) If the CPL is greater (has less privilege) than the IOPL of the current

program or procedure.

Real Address Mode Exceptions

None.

Virtual 8086 Mode Exceptions

#GP(0) If the CPL is greater (has less privilege) than the IOPL of the current

program or procedure.

4:62 Volume 4: Base IA-32 Instruction Reference

CLTS—Clear Task-Switched Flag in CR0

Opcode Instruction Description

0F 06 CLTS Clears TS flag in CR0

Description

Clears the task-switched (TS) flag in the CR0 register. This instruction is intended for use in operating-system procedures. It is a privileged instruction that can only be executed at a CPL of 0. It is allowed to be executed in real-address mode to allow initialization for protected mode.

The processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU context in multitasking applications. See the description of the TS flag in the Intel Architecture Software Developer’s Manual, Volume 3 for more information about this flag.

Operation

IF Itanium System Environment THEN IA-32_Intercept(INST,CLTS);

CR0(TS)  0;

Flags Affected

The TS flag in CR0 register is cleared.

Additional Itanium System Environment Exceptions

IA-32_Intercept Mandatory Instruction Intercept fault.

Protected Mode Exceptions

#GP(0) If the CPL is greater than 0.

Real Address Mode Exceptions

None.

Virtual 8086 Mode Exceptions

#GP(0) If the CPL is greater than 0.

Volume 4: Base IA-32 Instruction Reference 4:63

CMC—Complement Carry Flag

Opcode Instruction Description

F5 CMC Complement CF flag

Description

Complements the CF flag in the EFLAGS register.

Operation

CF  NOT CF;

Flags Affected

The CF flag contains the complement of its original value. The OF, ZF, SF, AF, and PF flags are unaffected.

Exceptions (All Operating Modes)

None.

4:64 Volume 4: Base IA-32 Instruction Reference

CMOVcc—Conditional Move

Opcode Instruction Description

0F 47 cw/cd CMOVA r16, r/m16 Move if above (CF=0 and ZF=0)

0F 47 cw/cd CMOVA r32, r/m32 Move if above (CF=0 and ZF=0)

0F 43 cw/cd CMOVAE r16, r/m16 Move if above or equal (CF=0)

0F 43 cw/cd CMOVAE r32, r/m32 Move if above or equal (CF=0)

0F 42 cw/cd CMOVB r16, r/m16 Move if below (CF=1)

0F 42 cw/cd CMOVB r32, r/m32 Move if below (CF=1)

0F 46 cw/cd CMOVBE r16, r/m16 Move if below or equal (CF=1 or ZF=1)

0F 46 cw/cd CMOVBE r32, r/m32 Move if below or equal (CF=1 or ZF=1)

0F 42 cw/cd CMOVC r16, r/m16 Move if carry (CF=1)

0F 42 cw/cd CMOVC r32, r/m32 Move if carry (CF=1)

0F 44 cw/cd CMOVE r16, r/m16 Move if equal (ZF=1)

0F 44 cw/cd CMOVE r32, r/m32 Move if equal (ZF=1)

0F 4F cw/cd CMOVG r16, r/m16 Move if greater (ZF=0 and SF=OF)

0F 4F cw/cd CMOVG r32, r/m32 Move if greater (ZF=0 and SF=OF)

0F 4D cw/cd CMOVGE r16, r/m16 Move if greater or equal (SF=OF)

0F 4D cw/cd CMOVGE r32, r/m32 Move if greater or equal (SF=OF)

0F 4C cw/cd CMOVL r16, r/m16 Move if less (SF<>OF)

0F 4C cw/cd CMOVL r32, r/m32 Move if less (SF<>OF)

0F 4E cw/cd CMOVLE r16, r/m16 Move if less or equal (ZF=1 or SF<>OF)

0F 4E cw/cd CMOVLE r32, r/m32 Move if less or equal (ZF=1 or SF<>OF)

0F 46 cw/cd CMOVN

0F 46 cw/cd CMOVNA r32, r/m32 Move if not above (CF=1 or ZF=1)

0F 42 cw/cd CMOVNAE r16, r/m16 Move if not above or equal (CF=1)

0F 42 cw/cd CMOVNAE r32, r/m32 Move if not above or equal (CF=1)

0F 43 cw/cd CMOVNB r16, r/m16 Move if not below (CF=0)

0F 43 cw/cd CMOVNB r32, r/m32 Move if not below (CF=0)

0F 47 cw/cd CMOVNBE r16, r/m16 Move if not below or equal (CF=0 and ZF=0)

0F 47 cw/cd CMOVNBE r32, r/m32 Move if not below or equal (CF=0 and ZF=0)

0F 43 cw/cd CMOVNC r16, r/m16 Move if not carry (CF=0)

0F 43 cw/cd CMOVNC r32, r/m32 Move if not carry (CF=0)

0F 45 cw/cd CMOVNE r16, r/m16 Move if not equal (ZF=0)

0F 45 cw/cd CMOVNE r32, r/m32 Move if not equal (ZF=0)

0F 4E cw/cd CMOVNG r16, r/m16 Move if not greater (ZF=1 or SF<>OF)

0F 4E cw/cd CMOVNG r32, r/m32 Move if not greater (ZF=1 or SF<>OF)

0F 4C cw/cd CMOVNGE r16, r/m16 Move if not greater or equal (SF<>OF)

0F 4C cw/cd CMOVNGE r32, r/m32 Move if not greater or equal (SF<>OF)

0F 4D cw/cd CMOVNL r16, r/m16 Move if not less (SF=OF)

0F 4D cw/cd CMOVNL r32, r/m32 Move if not less (SF=OF)

0F 4F cw/cd CMOVNLE r16, r/m16 Move if not less or equal (ZF=0 and SF=OF)

0F 4F cw/cd CMOVNLE r32, r/m32 Move if not less or equal (ZF=0 and SF=OF)

A r16, r/m16 Move if not above (CF=1 or ZF=1)

Volume 4: Base IA-32 Instruction Reference 4:65

CMOVcc—Conditional Move (Continued)

Opcode Instruction Description

0F 41 cw/cd CMOVNO r16, r/m16 Move if not overflow (OF=0)

0F 41 cw/cd CMOVNO r32, r/m32 Move if not overflow (OF=0)

0F 4B cw/cd CMOVNP r16, r/m16 Move if not parity (PF=0)

0F 4B cw/cd CMOVNP r32, r/m32 Move if not parity (PF=0)

0F 49 cw/cd CMOVNS r16, r/m16 Move if not sign (SF=0)

0F 49 cw/cd CMOVNS r32, r/m32 Move if not sign (SF=0)

0F 45 cw/cd CMOVNZ r16, r/m16 Move if not zero (ZF=0)

0F 45 cw/cd CMOVNZ r32, r/m32 Move if not zero (ZF=0)

0F 40 cw/cd CMOVO r16, r/m16 Move if overflow (OF=0)

0F 40 cw/cd CMOVO r32, r/m32 Move if overflow (OF=0)

0F 4A cw/cd CMOVP r16, r/m16 Move if parity (PF=1)

0F 4A cw/cd CMOVP r32, r/m32 Move if parity (PF=1)

0F 4A cw/cd CMOVPE r16, r/m16 Move if parity even (PF=1)

0F 4A cw/cd CMOVPE r32, r/m32 Move if parity even (PF=1)

0F 4B cw/cd CMOVPO r16, r/m16 Move if parity odd (PF=0)

0F 4B cw/cd CMOVPO r32, r/m32 Move if parity odd (PF=0)

0F 48 cw/cd CMOVS r16, r/m16 Move if sign (SF=1)

0F 48 cw/cd CMOVS r32, r/m32 Move if sign (SF=1)

0F 44 cw/cd CMOVZ r16, r/m16 Move if zero (ZF=1)

0F 44 cw/cd CMOVZ r32, r/m32 Move if zero (ZF=1)

Description

The CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move is not performed and execution continues with the instruction following the CMOVcc instruction.

If the condition is false for the memory form, some processor implementations will initiate the load (and discard the loaded data), possible memory faults can be generated. Other processor models will not initiate the load and not generate any faults if the condition is false.

These instructions can move a 16- or 32-bit value from memory to a general-purpose register or from one general-purpose register to another. Conditional moves of 8-bit register operands are not supported.

The conditions for each CMOVcc mnemonic is given in the description column of the above table. The terms “less” and “greater” are used for comparisons of signed integers and the terms “above” and “below” are used for unsigned integers.

Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the CMOVA (conditional move if above) instruction and the CMOVNBE (conditional move if not below or equal) instruction are alternate mnemonics for the opcode 0F 47H.

4:66 Volume 4: Base IA-32 Instruction Reference

CMOVcc—Conditional Move (Continued)

The CMOVcc instructions are new for the Pentium Pro processor family; however, they may not be supported by all the processors in the family. Software can determine if the CMOVcc instructions are supported by checking the processor’s feature information with the CPUID instruction (see “CPUID—CPU Identification” on page 4:78).

Operation

temp  DEST IF condition TRUE

THEN

DEST  SRC

ELSE

DEST  temp

FI;

Flags Affected

None.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

or GS segment limit.

limit.

Volume 4: Base IA-32 Instruction Reference 4:67

CMOVcc—Conditional Move (Continued)

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

limit.

reference is made.

4:68 Volume 4: Base IA-32 Instruction Reference

CMP—Compare Two Operands

Opcode Instruction Description

3C ib CMP AL, imm8 Compare imm8 with AL

3D iw CMP AX, imm16 Compare imm16 with AX

3D id CMP EAX, imm32 Compare imm32 with EAX

80 /7 ib CMP r/m8, imm8 Compare imm8 with r/m8

81 /7 iw CMP r/m16, imm16 Compare imm16 with r/m16

81 /7 id CMP r/m32,imm32 Compare imm32 with r/m32

83 /7 ib CMP r/m16,imm8 Compare imm8 with r/m16

83 /7 ib CMP r/m32,imm8 Compare imm8 with r/m32

38 /r CMP r/m8,r8 Compare r8 with r/m8

39 /r CMP r/m16,r16 Compare r16 with r/m16

39 /r CMP r/m32,r32 Compare r32 with r/m32

3A /r CMP r8,r/m8 Compare r/m8 with r8

3B /r CMP r16,r/m16 Compare r/m16 with r16

3B /r CMP r

Description

Compares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is sign-extended to the length of the first operand.

32,r/m32 Compare r/m32 with r32

The CMP instruction is typically used in conjunction with a conditional jump (Jcc), condition move (CMOVcc), or SETcc instruction. The condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction.

Operation

temp  SRC1  SignExtend(SRC2); ModifyStatusFlags; (* Modify status flags in the same manner as the SUB instruction*)

Flags Affected

The CF, OF, SF, ZF, AF, and PF flags are set according to the result.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Volume 4: Base IA-32 Instruction Reference 4:69

CMP—Compare Two Operands (Continued)

Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

4:70 Volume 4: Base IA-32 Instruction Reference

CMPS/CMPSB/CMPSW/CMPSD—Compare String Operands

Opcode Instruction Description

A6 CMPS DS:(E)SI, ES:(E)DI Compares byte at address DS:(E)SI with byte at address

ES:(E)DI and sets the status flags accordingly

A7 CMPS DS:SI, ES:DI Compares byte at address DS:SI with byte at address

A7 CMPS DS:ESI, ES:EDI Compares byte at address DS:ESI with byte at address

A6 CMPSB Compares byte at address DS:(E)SI with byte at address

A7 CMPSW Compares byte at address DS:SI with byte at address

A7 CMPSD Compares byte at address DS:ESI with byte at address

Description

Compares the byte, word, or double word specified with the first source operand with the byte, word, or double word specified with the second source operand and sets the status flags in the EFLAGS register according to the results. The first source operand specifies the memory location at the address DS:ESI and the second source operand specifies the memory location at address ES:EDI. (When the operand-size attribute is 16, the SI and DI register are used as the source-index and destination-index registers, respectively.) The DS segment may be overridden with a segment override prefix, but the ES segment cannot be overridden.

ES:DI and sets the status flags accordingly

ES:EDI and sets the status flags accordingly

ES:(E)DI and sets the status flags accordingly

ES:DI and sets the status flags accordingly

ES:EDI and sets the status flags accordingly

The CMPSB, CMPSW, and CMPSD mnemonics are synonyms of the byte, word, and doubleword versions of the CMPS instructions. They are simpler to use, but provide no type or segment checking. (For the CMPS instruction, “DS:ESI” and “ES:EDI” must be explicitly specified in the instruction.)

After the comparison, the ESI and EDI registers are incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the ESI and EDI register are incremented; if the DF flag is 1, the ESI and EDI registers are decremented.) The registers are incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for doubleword operations.

The CMPS, CMPSB, CMPSW, and CMPSD instructions can be preceded by the REP prefix for block comparisons of ECX bytes, words, or doublewords. More often, however, these instructions will be used in a LOOP construct that takes some action based on the setting of the status flags before the next comparison is made.

Volume 4: Base IA-32 Instruction Reference 4:71

CMPS/CMPSB/CMPSW/CMPSD—Compare String Operands (Continued)

Operation

temp SRC1  SRC2; SetStatusFlags(temp); IF (byte comparison)

THEN IF DF = 0

THEN (E)DI  1; (E)SI  1;

ELSE (E)DI  -1; (E)SI  -1; FI; ELSE IF (word comparison)

THEN IF DF = 0

THEN DI  2; (E)SI  2;

ELSE DI  -2; (E)SI  -2; FI; ELSE (* doubleword comparison *)

THEN IF DF = 0

THEN EDI  4; (E)SI  4; ELSE EDI  -4; (E)SI  -4;

FI;

Flags Affected

The CF, OF, SF, ZF, AF, and PF flags are set according to the temporary result of the comparison.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Protected Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

or GS segment limit.

limit.

4:72 Volume 4: Base IA-32 Instruction Reference

CMPS/CMPSB/CMPSW/CMPSD—Compare String Operands (Continued)

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

limit.

reference is made.

Volume 4: Base IA-32 Instruction Reference 4:73

CMPXCHG—Compare and Exchange

Opcode Instruction Description

0F B0/r CMPXCHG r/m8,r8 Compare AL with r/m8. If equal, ZF is set and r8 is loaded into

0F B1/r CMPXCHG r/m16,r16 Compare AX with r/m16. If equal, ZF is set and r16 is loaded

0F B1/r CMPXCHG r/m32,r32 Compare EAX with r/m32. If equal, ZF is set and r32 is loaded

Description

Compares the value in the AL, AX, or EAX register (depending on the size of the operand) with the first operand (destination operand). If the two values are equal, the second operand (source operand) is loaded into the destination operand. Otherwise, the destination operand is loaded into the AL, AX, or EAX register.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the interface to the processor’s bus, the destination operand receives a write cycle without regard to the result of the comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is written into the destination. (The processor never produces a locked read without also producing a locked write.)

Operation

r/m8. Else, clear ZF and load r/m8 into AL.

into r/m16. Else, clear ZF and load r/m16 into AL

into r/m32. Else, clear ZF and load r/m32 into AL

(* accumulator = AL, AX, or EAX, depending on whether *) (* a byte, word, or doubleword comparison is being performed*)

IF Itanium System Environment AND External_Atomic_Lock_Required AND DCR.lc

THEN IA-32_Intercept(LOCK,CMPXCHG);

IF accumulator = DEST

THEN

ZF  1 DEST  SRC

ELSE

ZF  0 accumulator  DEST

FI;

Flags Affected

The ZF flag is set if the values in the destination operand and register AL, AX, or EAX are; otherwise it is cleared. The CF, PF, AF, SF, and OF flags are set according to the results of the comparison operation.

4:74 Volume 4: Base IA-32 Instruction Reference

CMPXCHG—Compare and Exchange (Continued)

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

IA-32_Intercept Lock Intercept

Protected Mode Exceptions

#GP(0) If the destination is located in a nonwritable segment.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

– If an external atomic bus lock is required to

complete this operation and DCR.lc is 1, no atomic transaction occurs, this instruction is faulted and an IA-32_Intercept(Lock) fault is generated. The software lock handler is responsible for the emulation of this instruction.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

or GS segment limit.

limit.

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

limit.

reference is made.

Intel Architecture Compatibility

This instruction is not supported on Intel processors earlier than the Intel486 processors.

Volume 4: Base IA-32 Instruction Reference 4:75

CMPXCHG8B—Compare and Exchange 8 Bytes

Opcode Instruction Description

0F C7 /1 m64 CMPXCHG8B m64 Compare EDX:EAX with m64. If equal, set ZF and load

ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX.

Description

Compares the 64-bit value in EDX:EAX with the operand (destination operand). If the values are equal, the 64-bit value in ECX:EBX is stored in the destination operand. Otherwise, the value in the destination operand is loaded into EDX:EAX. The destination operand is an 8-byte memory location. For the EDX:EAX and ECX:EBX register pairs, EDX and ECX contain the high-order 32 bits and EAX and EBX contain the low-order 32 bits of a 64-bit value.

Operation

IF Itanium System Environment AND External_Atomic_Lock_Required AND DCR.lc

THEN IA-32_Intercept(LOCK,CMPXCHG);

IF (EDX:EAX = DEST)

ZF  1 DEST  ECX:EBX

ELSE

ZF  0 EDX:EAX  DEST

FI;

Flags Affected

The ZF flag is set if the destination operand and EDX:EAX are equal; otherwise it is cleared. The CF, PF, AF, SF, and OF flags are unaffected.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

IA-32_Intercept Lock Intercept

4:76 Volume 4: Base IA-32 Instruction Reference

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

– If an external atomic bus lock is required to

CMPXCHG8B—Compare and Exchange 8 Bytes (Continued)

Protected Mode Exceptions

#UD If the destination operand is not a memory location.

#GP(0) If the destination is located in a nonwritable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

limit.

reference is made while the current privilege level is 3.

or GS segment limit.

limit.

or GS segment limit.

limit.

reference is made.

Intel Architecture Compatibility

This instruction is not supported on Intel processors earlier than the Pentium processors.

Volume 4: Base IA-32 Instruction Reference 4:77

CPUID—CPU Identification

Opcode Instruction Description

0F A2 CPUID Returns processor identification and feature information in the

Description

Returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers. The information returned is selected by entering a value in the EAX register before the instruction is executed. Tab le 2 -4 shows the information returned, depending on the initial value loaded into the EAX register.

The ID flag (bit 21) in the EFLAGS register indicates support for the CPUID instruction. If a software procedure can set and clear this flag, the processor executing the procedure supports the CPUID instruction.

The information returned with the CPUID instruction is divided into two groups: basic information and extended function information. Basic information is returned by entering an input value starting at 0 in the EAX register; extended function information is returned by entering an input value starting at 80000000H. When the input value in the EAX register is 0, the processor returns the highest value the CPUID instruction recognizes in the EAX register for returning basic information. Always use an EAX parameter value that is equal to or greater than zero and less than or equal to this highest EAX return value for basic information. When the input value in the EAX register is 80000000H, the processor returns the highest value the CPUID instruction recognizes in the EAX register for returning extended function information. Always use an EAX parameter value that is equal to or greater than zero and less than or equal to this highest EAX return value for extended function information.

EAX, EBX, ECX, and EDX registers, according to the input value entered initially in the EAX register.

The CPUID instruction can be executed at any privilege level to serialize instruction execution. Serializing instruction execution guarantees that any modifications to flags, registers, and memory for previous instructions are completed before the next instruction is fetched and executed.

Table 2-4. Information Returned by CPUID Instruction

Initial EAX Value Information Provided about the Processor

Basic CPUID Information

0 EAX

EBX ECX EDX

1H EAX

EBX

ECX EDX

2H EAX

EBX ECX EDX

Maximum CPUID Input Value 756E6547H “Genu” (G in BL) 6C65746EH “ntel” (n in CL) 49656E69H “ineI” (i in DL)

Version Information (Type, Family, Model, and Stepping ID) Bits 7-0: Brand Index Bits 15-8: CLFLUSH line size (Value * 8 = cache line size in bytes) Bits 23-16: Number of logical processors per physical processor Bits 31-24: Local APIC ID Reserved Feature Information (see Table 2-5)

Cache and TLB Information Cache and TLB Information Cache and TLB Information Cache and TLB Information

4:78 Volume 4: Base IA-32 Instruction Reference

Table 2-4. Information Returned by CPUID Instruction (Continued)

31 1211 8 7 4 3

EAX

Model

Family

Stepping

Extended

Model

Extended Family

Processor Type

Initial EAX Value Information Provided about the Processor

Extended Function CPUID Information

8000000H EAX

EBX ECX EDX

8000001H EAX

EBX ECX EDX

8000002H EAX

EBX ECX EDX

8000003H EAX

EBX ECX EDX

a. This field is not supported for processors based on Itanium architecture, zero (unsupported encoding) is

returned.

b. This field is invalid for processors based on Itanium architecture, reserved value is returned.

Maximum Input Value for Extended Function CPUID Information Reserved Reserved Reserved

Extended Processor Signature and Extended Feature Bits. (Currently reserved.) Reserved

Reserved Reserved

Processor Brand String Processor Brand String Continued Processor Brand String Continued Processor Brand String Continued

Processor Brand String Continued Processor Brand String Continued Processor Brand String Continued Processor Brand String Continued

When the input value is 1, the processor returns version information in the EAX register (see Figure 2-4). The version information consists of an Intel architecture family identifier, a model identifier, a stepping ID, and a processor type.

Figure 2-4. Version Information in Registers EAX

If the values in the family and/or model fields reach or exceed FH, the CPUID instruction will generate two additional fields in the EAX register: the extended family field and the extended model field. Here, a value of FH in either the model field or the family field indicates that the extended model or family field, respectively, is valid. Family and model numbers beyond FH range from 0FH to FFH, with the least significant hexadecimal digit always FH.

See AP-485, Intel

Processor Identification and the CPUID Instruction (Order Number

241618) for more information on identifying Intel architecture processors.

Volume 4: Base IA-32 Instruction Reference 4:79

CPUID—CPU Identification (Continued)

When the input value in EAX is 1, three unrelated pieces of information are returned to the EBX register:

• Brand index (low byte of EBX) table that contains brand strings for IA-32 processors. Please refer to AP-485,

Intel

Processor Identification and the CPUID Instruction (Order Number 241618)

for information on brand indices.

Note: The Brand index field is not supported for processors based on Itanium

architecture, zero (unsupported encoding) is returned.

• CLFLUSH instruction cache line size (second byte of EBX) the size of the cache line flushed with CLFLUSH instruction in 8-byte increments. This field is valid only when the CLFSH feature flag is set.

• Local APIC ID (high byte of EBX) the local APIC on the processor during power up.

Note: The local APIC ID field is invalid for processors based on the Itanium

architecture, reserved value is returned. Software should check the feature flags to make sure they are not running on processors based on the Itanium architecture before interpreting the return value in this field.

When the EAX register contains a value of 1, the CPUID instruction (in addition to loading the processor signature in the EAX register) loads the EDX register with the feature flags. The feature flags (when a Flag = 1) indicate what features the processor supports. Ta b le 2- 5 lists the currently defined feature flag values.

– this number provides an entry into a brand string

– this number indicates

– this number is the 8-bit ID that is assigned to

A feature flag set to 1 indicates the corresponding feature is supported. Software should identify Intel as the vendor to properly interpret the feature flags.

Table 2-5. Feature Flags Returned in EDX Register

Bit Mnemonic Description

0FPUFloating Point Unit On-Chip. The processor contains an x87 FPU.

1VMEVirtual 8086 Mode Enhancements. Virtual 8086 mode

enhancements, including CR4.VME for controlling the feature, CR4.PVI for protected mode virtual interrupts, software interrupt indirection, expansion of the TSS with the software indirection bitmap, and EFLAGS.VIF and EFLAGS.VIP flags.

2DEDebugging Extensions. Support for I/O breakpoints, including

CR4.DE for controlling the feature, and optional trapping of accesses to DR4 and DR5.

3PSEPage Size Extension. Large pages of size 4Mbyte are supported,

including CR4.PSE for controlling the feature, the defined dirty bit in PDE (Page Directory Entries), optional reserved bit trapping in CR3, PDEs, and PTEs.

4TSCTime Stamp Counter. The RDTSC instruction is supported, including

CR4.TSD for controlling privilege.

5MSRModel Specific Registers RDMSR and WRMSR Instructions. The

RDMSR and WRMSR instructions are supported. Some of the MSRs are implementation dependent.

4:80 Volume 4: Base IA-32 Instruction Reference

Table 2-5. Feature Flags Returned in EDX Register (Continued)

Bit Mnemonic Description

6PAEPhysical Address Extension. Physical addresses greater than 32

7MCEMachine Check Exception. Exception 18 is defined for Machine

8CX8CMPXCHG8B Instruction. The compare-and-exchange 8 bytes (64

9 APIC APIC On-Chip. The processor contains an Advanced Programmable

10 Reserved Reserved.

11 SE P SYSENTER and SYSEXIT Instructions. The SYSENTER and

12 MTRR Memory Type Range Registers. MTRRs are supported. The

13 PGE PTE Global Bit. The global bit in page directory entries (PDEs) and

14 MCA Machine Check Architecture. The Machine Check Architecture,

15 CMOV Conditional Move Instructions. The conditional move instruction

16 PAT Page Attribute Table. Page Attribute Table is supported. This feature

17 PSE-36 32-Bit Page Size Extension. Extended 4-MByte pages that are

18 PSN Processor Serial Number. The processor supports the 96-bit

19 CLFSH CLFLUSH Instruction. CLFLUSH Instruction is supported.

20 NX Execute Disable Bit.

21 DS Debug Store. The processor supports the ability to write debug

bits are supported: extended page table entry formats, an extra level in the page translation tables is defined, 2 Mbyte pages are supported instead of 4 Mbyte pages if PAE bit is 1. The actual number of address bits beyond 32 is not defined, and is implementation specific.

Checks, including CR4.MCE for controlling the feature. This feature does not define the model-specific implementations of machine-check error logging, reporting, and processor shutdowns. Machine Check exception handlers may have to depend on processor version to do model-specific processing of the exception, or test for the presence of the Machine Check feature.

bits) instruction is supported (implicitly locked and atomic).

Interrupt Controller (APIC), responding to memory mapped commands in the physical address range FFFE0000H to FFFE0FFFH (by default – some processors permit the APIC to be relocated).

SYSEXIT and associated MSRs are supported.

MTRRcap MSR contains feature bits that describe what memory types are supported, how many variable MTRRs are supported, and whether fixed MTRRs are supported.

page table entries (PTEs) is supported, indicating TLB entries that are common to different processes and need not be flushed. The CR4.PGE bit controls this feature.

which provides a compatible mechanism for error reporting is supported. The MCG_CAP MSR contains feature bits describing how many banks of error reporting MSRs are supported.

CMOV is supported. In addition, if x87 FPU is present as indicated by the CPUID.FPU feature bit, then the FCOMI and FCMOV instructions are supported.

augments the Memory Type Range Registers (MTRRs), allowing an operating system to specify attributes of memory on a 4K granularity through a linear address.

capable of addressing physical memory beyond 4 GBytes are supported. This feature indicates that the upper four bits of the physical address of the 4-MByte page is encoded by bits 13-16 of the page directory entry.

processor identification number feature and the feature is enabled.

information into a memory resident buffer. This feature is used by the branch trace store (BTS) and precise event-based sampling (PEBS) facilities.

Volume 4: Base IA-32 Instruction Reference 4:81

Table 2-5. Feature Flags Returned in EDX Register (Continued)

Bit Mnemonic Description

22 ACPI Thermal Monitor and Software Controlled Clock Facilities. The

23 MMX Intel MMX Technology. The processor supports the Intel MMX

24 FXSR FXSAVE and FXRSTOR Instructions. The FXSAVE and FXRSTOR

25 SSE SSE. The processor supports the SSE extensions.

26 SSE2 SSE2. The processor supports the SSE2 extensions.

27 SS Self Snoop. The processor supports the management of conflicting

28 HTT Hyper-Threading Technology. The processor implements

29 TM Thermal Monitor. The processor implements the thermal monitor

30 Processor based on the Intel

Itanium architecture

31 PBE Pending Break Enable. The processor supports the use of the

processor implements internal MSRs that allow processor temperature to be monitored and processor performance to be modulated in predefined duty cycles under software control.

technology.

instructions are supported for fast save and restore of the floating point context. Presence of this bit also indicates that CR4.OSFXSR is available for an operating system to indicate that it supports the FXSAVE and FXRSTOR instructions

memory types by performing a snoop of its own cache structure for transactions issued to the bus.

Hyper-Threading technology.

automatic thermal control circuitry (TCC).

The processor is based on the Intel Itanium architecture and is capable of executing the Intel Itanium instruction set. IA-32 application level software MUST also check with the running operating system to see if the system can also support Itanium before switching to the Intel Itanium instruction set.

FERR#/PBE# pin when the processor is in the stop-clock state (STPCLK# is asserted) to signal the processor that an interrupt is pending and that the processor should return to normal operation to handle the interrupt. Bit 10 (PBE enable) in the IA32_MISC_ENABLE MSR enables this capability.

architecture-based code

When the input value is 2, the processor returns information about the processor’s internal caches and TLBs in the EAX, EBX, ECX, and EDX registers. The encoding of these registers is as follows:

• The least-significant byte in register EAX (register AL) indicates the number of times the CPUID instruction must be executed with an input value of 2 to get a complete description of the processor’s caches and TLBs.

• The most significant bit (bit 31) of each register indicates whether the register contains valid information (set to 0) or is reserved (set to 1).

• If a register contains valid information, the information is contained in 1 byte descriptors.

Please see the processor-specific supplement for further information on how to decode the return values for the processors internal caches and TLBs.

CPUID performs instruction serialization and a memory fence operation.

4:82 Volume 4: Base IA-32 Instruction Reference

CPUID—CPU Identification (Continued)

Operation

CASE (EAX) OF

EAX = 0H:

EAX  Highest input value understood by CPUID; EBX  Vendor identification string; EDX  Vendor identification string;

ECX  Vendor identification string; BREAK; EAX = 1H:

EAX[3:0]  Stepping ID;

EAX[7:4]  Model;

EAX[11:8]  Family;

EAX[13:12]  Processor Type;

EAX[15:14]  Reserved;

EAX[19:16]  Extended Model;

EAX[27:20]  Extended Family;

EAX[31:28]  Reserved;

EBX[7:0]  Brand Index; (* Always zero for processors based on Itanium architecture *)

EBX[15:8]  CLFLUSH Line Size;

EBX[16:23]  Number of logical processors per physical processor;

EBX[31:24]  Initial APIC ID; (* Reserved for processors based on Itanium architecture *)

ECX  Reserved;

EDX  Feature flags; BREAK; EAX = 2H:

EAX  Cache and TLB information;

EBX  Cache and TLB information;

ECX  Cache and TLB information;

EDX  Cache and TLB information; BREAK; EAX = 80000000H:

EAX  Highest extended function input value understood by CPUID;

EBX  Reserved;

ECX  Reserved;

EDX  Reserved; BREAK; EAX = 80000001H:

EAX  Extended Processor Signature and Feature Bits; (* Currently Reserved *)

EBX  Reserved;

ECX  Reserved;

EDX  Reserved; BREAK; EAX = 8

BREAK; EAX = 80000003H:

0000002H: EAX  Processor Name; EBX  Processor Name; ECX  Processor Name; EDX  Processor Name;

EAX  Processor Name; EBX  Processor Name; ECX  Processor Name; EDX  Processor Name;

Volume 4: Base IA-32 Instruction Reference 4:83

CPUID—CPU Identification (Continued)

BREAK;

EAX = 80000004H:

EAX  Processor Name; EBX  Processor Name; ECX  Processor Name;

EDX  Processor Name; BREAK; DEFAULT: (* EAX > highest value recognized by CPUID *)

EAX  Reserved, Undefined;

EBX  Reserved, Undefined;

ECX  Reserved, Undefined;

EDX  Reserved, Undefined; BREAK;

ESAC;

memory_fence(); instruction_serialize();

Flags Affected

None.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

Intel Architecture Compatibility

The CPUID instruction is not supported in early models of the Intel486 processor or in any Intel architecture processor earlier than the Intel486 processor. The ID flag in the EFLAGS register can be used to determine if this instruction is supported. If a procedure is able to set or clear this flag, the CPUID is supported by the processor running the procedure.

4:84 Volume 4: Base IA-32 Instruction Reference

CWD/CDQ—Convert Word to Doubleword/Convert Doubleword to Quadword

Opcode Instruction Description

99 CWD DX:AX  sign-extend of AX

99 CDQ EDX:EAX  sign-extend of EAX

Description

Doubles the size of the operand in register AX or EAX (depending on the operand size) by means of sign extension and stores the result in registers DX:AX or EDX:EAX, respectively. The CWD instruction copies the sign (bit 15) of the value in the AX register into every bit position in the DX register. The CDQ instruction copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register.

The CWD instruction can be used to produce a doubleword dividend from a word before a word division, and the CDQ instruction can be used to produce a quadword dividend from a doubleword before doubleword division.

The CWD and CDQ mnemonics reference the same opcode. The CWD instruction is intended for use when the operand-size attribute is 16 and the CDQ instruction for when the operand-size attribute is 32. Some assemblers may force the operand size to 16 when CWD is used and to 32 when CDQ is used. Others may treat these mnemonics as synonyms (CWD/CDQ) and use the current setting of the operand-size attribute to determine the size of values to be converted, regardless of the mnemonic used.

Operation

IF OperandSize = 16 (* CWD instruction *)

THEN DX  SignExtend(AX); ELSE (* OperandSize = 32, CDQ instruction *)

EDX  SignExtend(EAX);

FI;

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Flags Affected

None.

Exceptions (All Operating Modes)

None.

Volume 4: Base IA-32 Instruction Reference 4:85

CWDE—Convert Word to Doubleword

See entry for CBW/CWDE—Convert Byte to Word/Convert Word to Doubleword.

4:86 Volume 4: Base IA-32 Instruction Reference

DAA—Decimal Adjust AL after Addition

Opcode Instruction Description

27 DAA Decimal adjust AL after addition

Description

Adjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two 2-digit, packed BCD values and stores a byte result in the AL register. The DAA instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal carry is detected, the CF and AF flags are set accordingly.

Operation

IF (((AL AND 0FH) > 9) or AF = 1)

THEN

AL AL + 6; CF CF OR CarryFromLastAddition; (* CF OR carry from AL AL + 6 *) AF 1;

ELSE

AF 0; FI; IF ((AL AND F0H) > 90H) or CF = 1)

THEN

AL  AL + 60H;

CF  1;

ELSE

CF 0; FI;

Example

ADD AL, BL Before: AL=79H BL=35H EFLAGS(OSZAPC)=XXXXXX

DAA Before: AL=79H BL=35H EFLAGS(OSZAPC)=110000

After: AL=AEH BL=35H EFLAGS(0SZAPC)=110000

After: AL=AEH BL=35H EFLAGS(0SZAPC)=X00111

Flags Affected

The CF and AF flags are set if the adjustment of the value results in a decimal carry in either digit of the result (see “Operation” above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

Volume 4: Base IA-32 Instruction Reference 4:87

DAS—Decimal Adjust AL after Subtraction

Opcode Instruction Description

2F DAS Decimal adjust AL after subtraction

Description

Adjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL register. The DAS instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal borrow is detected, the CF and AF flags are set accordingly.

Operation

IF (AL AND 0FH) > 9 OR AF = 1

THEN

AL  AL  6; CF CF OR BorrowFromLastSubtraction; (* CF OR borrow from AL AL  6 *) AF  1;

ELSE AF  0;

FI; IF ((AL > 9FH) or CF = 1)

THEN

AL  AL  60H; CF  1;

ELSE CF  0;

FI;

Example

SUB AL, BL Before: AL=35H BL=47H EFLAGS(OSZAPC)=XXXXXX

After: AL=EEH BL=47H EFLAGS(0SZAPC)=010111

DAA Before: AL=EEH BL=47H EFLAGS(OSZAPC)=010111

After: AL=88H BL=47H EFLAGS(0SZAPC)=X10111

Flags Affected

The CF and AF flags are set if the adjustment of the value results in a decimal borrow in either digit of the result (see “Operation” above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Exceptions (All Operating Modes)

None.

4:88 Volume 4: Base IA-32 Instruction Reference

DEC—Decrement by 1

Opcode Instruction Description

FE /1 DEC r/m8 Decrement r/m8 by 1

FF /1 DEC r/m16 Decrement r/m16 by 1

FF /1 DEC r/m32 Decrement r/m32 by 1

48+rw DEC r16 Decrement r16 by 1

48+rd DEC r32 Decrement r32 by 1

Description

Subtracts 1 from the operand, while preserving the state of the CF flag. The source operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (Use a SUB instruction with an immediate operand of 1 to perform a decrement operation that does updates the CF flag.)

Operation

DEST  DEST - 1;

Flags Affected

The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

Protected Mode Exceptions

#GP(0) If the destination is located in a nonwritable segment.

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

limit.

reference is made while the current privilege level is 3.

Real Address Mode Exceptions

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

or GS segment limit.

limit.

Volume 4: Base IA-32 Instruction Reference 4:89

DEC—Decrement by 1 (Continued)

Virtual 8086 Mode Exceptions

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

limit.

reference is made.

4:90 Volume 4: Base IA-32 Instruction Reference

DIV—Unsigned Divide

Opcode Instruction Description

F6 /6 DIV r/m8 Unsigned divide AX by r/m8; AL  Quotient,

F7 /6 DIV r/m16 Unsigned divide DX:AX by r/m16; AX

F7 /6 DIV r/m32 Unsigned divide EDX:EAX by r/m32 doubleword;

Description

Divides (unsigned) the value in the AL, AX, or EAX register (dividend) by the source operand (divisor) and stores the result in the AX, DX:AX, or EDX:EAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size, as shown in the following table:

 Remainder

 Quotient, EDX  Remainder

EAX

 Quotient,

Operand Size Dividend Divisor Quotient Remainder

Word/byte AX r/m8 AL AH 255

Doubleword/word DX:AX r/m16 AX DX 65,535

Quadword/doubleword EDX:EAX r/m32 EAX EDX 2

Maximum

Quotient

 1

Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magnitude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.

Operation

IF SRC = 0

THEN #DE; (* divide error *) FI; IF OpernadSize = 8 (* word/byte operation *)

THEN

temp  AX / SRC; IF temp > FFH

THEN #DE; (* divide error *) ; ELSE

AL  temp; AH  AX MOD SRC;

FI;

ELSE

IF OpernadSize = 16 (* doubleword/word operation *)

THEN

temp  DX:AX / SRC; IF temp > FFFFH

THEN #DE; (* divide error *) ; ELSE

AX  temp; DX  DX:AX MOD SRC;

FI;

Volume 4: Base IA-32 Instruction Reference 4:91

DIV—Unsigned Divide (Continued)

ELSE (* quadword/doubleword operation *)

temp  EDX:EAX / SRC; IF temp > FFFFFFFFH

THEN #DE; (* divide error *) ; ELSE

EAX  temp; EDX  EDX:EAX MOD SRC;

FI;

Flags Affected

The CF, OF, SF, ZF, AF, and PF flags are undefined.

Additional Itanium System Environment Exceptions

Itanium Reg Faults NaT Register Consumption Abort.

Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data

Protected Mode Exceptions

#DE If the source operand (divisor) is 0

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS(0) If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault

If the quotient is too large for the designated register.

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

limit.

reference is made while the current privilege level is 3.

Real Address Mode Exceptions

#DE If the source operand (divisor) is 0.

If the quotient is too large for the designated register.

#GP If a memory operand effective address is outside the CS, DS, ES, FS,

4:92 Volume 4: Base IA-32 Instruction Reference

or GS segment limit.

If the DS, ES, FS, or GS register contains a null segment selector.

DIV—Unsigned Divide (Continued)

Virtual 8086 Mode Exceptions

#DE If the source operand (divisor) is 0.

If the quotient is too large for the designated register.

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,

#SS If a memory operand effective address is outside the SS segment

#PF(fault-code) If a page fault occurs.

#AC(0) If alignment checking is enabled and an unaligned memory

or GS segment limit.

limit.

reference is made.

Volume 4: Base IA-32 Instruction Reference 4:93

Intel ITANIUM ARCHITECTURE User Manual

Specifications and Main Features

Frequently Asked Questions

User Manual

About this Manual 1

1.1 Overview of Volume 1: Application Architecture

1.1.1 Part 1: Application Architecture Guide

1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture

1.2 Overview of Volume 2: System Architecture

1.2.1 Part 1: System Architecture Guide

1.2.2 Part 2: System Programmer’s Guide

1.2.3 Appendices

1.3 Overview of Volume 3: Intel® Itanium® Instruction Set Reference

1.4 Overview of Volume 4: IA-32 Instruction Set Reference

1.5 Terminology

1.6 Related Documents

1.7 Revision History

Base IA-32 Instruction Reference 2

2.1 Additional Intel® Itanium® Faults

2.2 Interpreting the IA-32 Instruction Reference Pages

2.2.1 IA-32 Instruction Format

2.2.1.1 Opcode Column

2.2.1.2 Instruction Column

2.2.1.3 Description Column

2.2.1.4 Description

2.2.2 Operation

2.2.3 Flags Affected

2.2.4 FPU Flags Affected

2.2.5 Protected Mode Exceptions

2.2.6 Real-address Mode Exceptions

2.2.7 Virtual-8086 Mode Exceptions

2.2.8 Floating-point Exceptions

2.3 IA-32 Base Instruction Reference